Waterloo Building Dataset: A large-scale very-high-spatial- resolution image dataset for building rooftop extraction

s of the International Cartographic Association, 3, 2021. 30th International Cartographic Conference (ICC 2021), 14–18 December 2021, Florence, Italy. https://doi.org/10.5194/ica-abs-3-105-2021 | © Author(s) 2021. CC BY 4.0 License. Example FCN-8s U-Net Deeplabv3+ Fast SCNN Figure 1. An example of extraction results. In the first row are True Positive (blue), False Negative (green), False Positive (red) predictions superimposed onto the VHSR input image. In second row, white and black are used to depict building (positive) and non-building (negative) pixels. 83) was used as the geographic system and the Universal Transverse Mercator (UTM) Zone 17N was used as the projection system. The images are 8350 pixels by 8350 pixels, have a spatial resolution of 12 cm, and contain 3 bands (RGB). After filtering out the duplicated images at the common boundary of Kitchener and Waterloo, we are left with 242 images. For these images, a total of 14 experts worked together in about half year to generate labels and refine boundaries of buildings. Images and labels were further clipped into 69938 pairs of 512×512 patches considering the memory and computation requirements of Deep Learning models with respect to image size. 3. Methods and metrics An extensive comparative study was performed to benchmark existing methods in order to evaluate the quality of our dataset. For this work, we selected four semantic segmentation methods, including FCN-8s, U-Net, DeepLab v3+ and Fast SCNN, and seven evaluation metrics, including accuracy, IoU, mIoU, precision, recall, F1-score and FPS. 4. Results and discussion An example of extraction results is provided in Figure 1. References: [1] M. Chen. Building Detection from Very High Resolution Remotely Sensed Imagery Using Deep Neural Networks. Master's thesis, University of Waterloo, Canada, 2019. [2] K. Rastogi, P. Bodani and S. A. Sharma. Automated Building Footprint extraction from Very High-Resolution Imagery using Deep Learning Techniques. Geocarto International, pp.1-14, 2020. [3] J. Franke, M. Gebreslasie, I. Bauwens, J. Deleu, and F. Siegert. Earth observation in support of malaria control and epidemiology: MALAREO monitoring approaches. Geospatial health, 2015. [4] J. Thomas, A. Kareem, and K. W. Bowyer. Automated poststorm damage classification of low-rise building roofing systems using high-resolution aerial imagery. IEEE Transactions on Geoscience and Remote Sensing, 52(7), pp.38513861, 2013 [5] L. Sahar, S. Muthukumar, and S. P. French. Using aerial imagery and GIS in automated building footprint extraction and shape recognition for earthquake risk assessment of urban inventories. IEEE Transactions on Geoscience and Remote Sensing, 48(9), pp.3511-3520, 2010. [6] Q. Wen, K. Jiang, W. Wang, Q. Liu, Q. Guo, L. Li, and P. Wang. Automated building extraction from google earth images under complex backgrounds based on deep instance segmentation network. Sensors, 19(2), p.333, 2019. 1 We would like to acknowledge Qiutong Yu, Kun Zhao, Junbo Wang, Yan Liu, Hasti Andon Petrosians, Zhehan Zhang, Siyu Li, in our Geospatial Sensing and Data Intelligence (GSDI) lab at University of Waterloo, for their contributions in the annotation work. Abstracts of the International Cartographic Association, 3, 2021. 30th International Cartographic Conference (ICC 2021), 14–18 December 2021, Florence, Italy. https://doi.org/10.5194/ica-abs-3-105-2021 | © Author(s) 2021. CC BY 4.0 License. 2 of 2s of the International Cartographic Association, 3, 2021. 30th International Cartographic Conference (ICC 2021), 14–18 December 2021, Florence, Italy. https://doi.org/10.5194/ica-abs-3-105-2021 | © Author(s) 2021. CC BY 4.0 License. 2 of 2


Introduction
As a key element in urban areas, buildings are an important indicator for urban change detection [1]. Building rooftops or footprints (outlines along the exterior walls of buildings) are also essential for other urban applications, such as urban planning and management, cadastral management, urban geo-database update and smart city construction [2]. Apart from those urban exclusive applications, building datasets are also essential for population estimation and natural hazard and damage estimation. By combining building footprints with other building information such as the number of stories, population and population densities can be estimated efficiently, which is necessary for epidemic or pandemic control [3]. Furthermore, building maps are of paramount importance for natural hazard management and damage estimation [4]. To estimate earthquake damages and assess the risks, accurate building map data are required [5]. In addition, to assess loss from typhoons, floods and other geological disasters, building maps should be obtained effectively and efficiently at any cost post-disaster [6]. For these applications, Remote sensing-based, especially aerial image-based methods have achieved great results.
Automated extraction of building rooftops from VHSR images have been a challenging task in remote sensing. Conventional pixel-based and object-based image analysis methods are often ineffective due to requiring expertise in feature engineering or feature collection. The development of deep learning techniques revolutionized automated building rooftops extraction [1]. However, the deep learning techniques are known to be data intensive. A large amount of pixel-level labelled images of high quality is required for the development of new algorithms.
In this paper, we construct the Waterloo Building Dataset (WBD), which covers the Kitchener-Waterloo region in Ontario, Canada. The main contributions of this paper are two-fold.
(1) We release a manually edited large-scale VHSR aerial image dataset for building rooftops extraction.
(2) An extensive comparative study was performed to benchmark existing methods for the development of new methods.

Waterloo Building Dataset
The dataset covers 205.83 km 2 and includes both urban and rural areas with buildings of different shapes, heights and colours. To construct the dataset, we obtained VHSR aerial images covering study region from the year 2014 from the Geospatial Centre of the University of Waterloo. The images are authorized for release by the Regional Municipality of Waterloo. These images were collected with a Vexcel UltraCam-D camera. The North American Datum of 1983 (NAD 83) was used as the geographic system and the Universal Transverse Mercator (UTM) Zone 17N was used as the projection system.
The images are 8350 pixels by 8350 pixels, have a spatial resolution of 12 cm, and contain 3 bands (RGB). After filtering out the duplicated images at the common boundary of Kitchener and Waterloo, we are left with 242 images. For these images, a total of 14 experts 1 worked together in about half year to generate labels and refine boundaries of buildings. Images and labels were further clipped into 69938 pairs of 512×512 patches considering the memory and computation requirements of Deep Learning models with respect to image size.

Methods and metrics
An extensive comparative study was performed to benchmark existing methods in order to evaluate the quality of our dataset. For this work, we selected four semantic segmentation methods, including FCN-8s, U-Net, DeepLab v3+ and Fast SCNN, and seven evaluation metrics, including accuracy, IoU, mIoU, precision, recall, F1-score and FPS.

Results and discussion
An example of extraction results is provided in Figure 1.