Semantic Foggy Scene Understanding with Synthetic Data
Christos Sakaridis, Dengxin Dai, and Luc Van Gool
International Journal of Computer Vision (IJCV), 2018
[PDF]   [Final PDF: view-only]   [BibTeX]   [arXiv]  

The first PDF above is a post-peer-review, pre-copyedit version of the article published in International Journal of Computer Vision. The final authenticated version is available online at or in a view-only mode through the "Final PDF" above.


Foggy Datasets

We present two distinct datasets for semantic understanding of foggy scenes: Foggy Cityscapes and Foggy Driving.

Foggy Cityscapes derives from the Cityscapes dataset and constitutes a collection of synthetic foggy images generated with our proposed fog simulation that automatically inherit the semantic annotations of their real, clear counterparts. Due to licensing issues, the main dataset modality of foggy images is only available for download at the Cityscapes website. This is also the case for semantic annotations as well as other modalities which are shared by Foggy Cityscapes and Cityscapes. On the contrary, the auxiliary Foggy Cityscapes modalities of denoised depth maps and transmittance maps are available at this website in the following packages:

Transmittance maps (8-bit) for foggy scenes in train, val, and test sets
15000 images (5000 images x 3 fog densities)
MD5 checksum

Transmittance maps (8-bit) for foggy scenes in trainextra set
19997 images
MD5 checksum

Depth maps (denoised and complete) for train, val, and test sets
5000 files (MATLAB MAT-files)
Download credentials    MD5 checksum

Depth maps (denoised and complete) for trainextra set
19997 files (MATLAB MAT-files)
Download credentials    MD5 checksum

Foggy Driving is a collection of 101 real-world foggy road scenes with annotations for semantic segmentation and object detection, used as a benchmark for the domain of foggy weather. We provide dense, pixel-level semantic annotations of these images for the 19 evaluation classes of Cityscapes. Bounding box annotations for objects belonging to 8 of the above classes that correspond to humans or vehicles are also available.

Foggy Cityscapes

We develop a fog simulation pipeline for real outdoor scenes and apply it to the complete set of 25000 images in the Cityscapes dataset to obtain Foggy Cityscapes. We also define a refined set of 550 training+validation Cityscapes images out of the original 3475 ones, which yield high-quality synthetic fog. The resulting collection of 550 foggy images is termed Foggy Cityscapes-refined.

We provide three different versions of Foggy Cityscapes for the 5000 training+validation+testing Cityscapes images, each characterized by a constant attenuation coefficient which determines the fog density and the visibility range. The values of the attenuation coefficient are 0.005, 0.01 and 0.02m-1 and correspond to visibility ranges of 600, 300 and 150m respectively. For the 20000 extra training Cityscapes images, we provide a single version with attenuation coefficient of 0.01m-1. Examples of Foggy Cityscapes scenes for varying fog density are shown below.

Clear weather 600m visibility 300m visibility 150m visibility

Foggy Driving

Foggy Driving consists of 101 color images depicting real-world foggy driving scenes. 51 of these images were captured with a cell phone camera in foggy conditions at various areas of Zurich, and the rest 50 images were collected from the web. The maximum image resolution in the dataset is 960x1280 pixels.

Foggy Driving features pixel-level semantic annotations for the set of 19 classes that are used for evaluation in Cityscapes. Individual instances of the 8 classes from the above set which correspond to humans or vehicles are labeled separately, which affords bounding box annotations for these classes. In total, Foggy Driving contains more than 500 annotated vehicles and almost 300 annotated humans. Given its moderate scale, this dataset is meant for evaluation purposes and we recommend against using its annotations to train semantic segmentation or object detection models. Example images along with their semantic annotations as well as overall annotation statistics are presented below.


We show that our partially synthetic Foggy Cityscapes dataset can be used per se for successfully adapting modern convolutional neural network (CNN) models to the condition of fog. Our experiments on semantic segmentation and object detection evidence that fine-tuning "clear-weather" Cityscapes-based models on Foggy Cityscapes improves their performance significantly on the real foggy scenes of Foggy Driving.

We present two learning approaches in this context, one using standard supervised learning and another using semi-supervised learning. In the former, supervised approach, we use Foggy Cityscapes-refined with its fine ground-truth annotations inherited from Cityscapes to fine-tune the clear-weather model to fog. In the latter, novel semi-supervised approach, we augment the training set used for fine-tuning with Foggy Cityscapes-coarse comprising 20000 synthetic foggy trainextra Cityscapes images, where the missing fine annotations for these images are substituted with the predictions of the clear-weather model for the clear-weather counterparts of the images. In the following, we present selected experiments and results from our article.

Semantic Segmentation

We experiment with the state-of-the-art RefineNet model. We compare the mean intersection over union (IoU) performance of our adaptation approaches on Foggy Driving against the original, clear-weather RefineNet model trained on Cityscapes. The mean IoU scores on Foggy Driving are 44.3% for the clear-weather model, 46.3% for the model fine-tuned with the supervised approach, and 49.7% for the model fine-tuned with the semi-supervised approach. Qualitative results on Foggy Driving for the clear-weather RefineNet model and our semi-supervised adaptation approach are shown below.

Foggy image Ground truth Clear-weather RefineNet Ours

Object Detection

We experiment with the modern Fast R-CNN model, using multiscale combinatorial grouping for object proposals. We first train a model on the original, clear-weather Cityscapes dataset which serves as our baseline. We then fine-tune this model with our supervised adaptation approach on Foggy Cityscapes-refined and compare the average precision (AP) performance of the two models on Foggy Driving. AP for car is 30.5% for the clear-weather model and 35.3% for the fine-tuned model, while mean AP over the 8 classes with distinct instances is 11.1% for the clear-weather model and 11.7% for the fine-tuned model. Qualitative car detection results on Foggy Driving for the clear-weather Fast R-CNN model and our supervised adaptation approach are depicted below.

Ground truth Clear-weather Fast R-CNN Ours

Pretrained Models

Semantic Segmentation

We provide the central pretrained models for both semantic segmentation networks which are used in the experiments of our article, i.e. the dilated convolutions network (DCN) and RefineNet.

For more details on how to test or further train these models, please refer to the original implementations of the two networks.

Object Detection

We also provide the central pretrained models for the object detection network Fast R-CNN we use in our experiments, as well as code for testing these models on Foggy Driving.


The source code for our fog simulation pipeline is available on GitHub.


Please cite our publication if you use our datasets, models or code in your work.

Moreover, in case you use the Foggy Cityscapes dataset, please cite additionally the Cityscapes publication, and in case you use our fog simulation code, please cite additionally both the Cityscapes publication and the SLIC superpixels publication.