shaped footprint could be reconstructed. Also, the approach utilizes roof points in determining the complete shape of the buildings and therefore resolves problems, as mentioned in (Shahzad and Zhu, 2015a), related to the visibility of the façades mainly pointing towards the azimuth direction of the SAR sensor. However, few points still need to be addressed. For instance, the reconstruction accuracy is restricted due to less number of available points and data gaps in the TomoSARpoint cloud. This could be improved by incorporating data from other viewing angles and/or adding more constraints such as parallelism or using a model based approaches based on a library of low level feature sets. Also, we have compared our results to the OSM data which is regularly updated but not yet fully complete. Therefore, a more accurate ground truth would be needed for assessing the exact performance of the approach. Nevertheless, this paper presents the first demonstration of automatic reconstructionof 2-D/3-D buildingfootprints from this class of data. Moreover, the developed methods are not strictly applicable to TomoSARpointclouds only but are also applicable to work on unstructured 3-D pointclouds generated from a different sensor with similar configuration (i.e., oblique geometry) with both low and high point densities. In the future, we will explore the potential of extending the algorithm towards generation of automatically reconstructed complete watertight prismatic (or polyhedral) 3-D/4-D building models from space.
(or use an already existing) a digital terrain model (DTM) by filtering techniques, e.g., morphological filtering (Sithole, 2004), gradient analysis (Vosselman, 2000), or iterative densification of triangular irregular network structure (Sohn, 2002) and then use the DTM to extract non ground points (Rottensteiner, 2002) from the rasterized point cloud data. Nadir looking LiDAR points essentially gives a digital surface model (DSM). Subtracting DSM from DTM provides us a normalized DSM (nDSM) which represent the height variation of non ground points. Building points are then extracted out by exploiting geometrical features such as deviations from surface model, local height measures, roughness and slope variations. Methods based on building boundary tracing from nDSM (Gross, 2005) or directly from pointclouds (Sampath, 2007) (Rottensteiner, 2002) have also been employed for building detection. With them, finer building boundaries are determined by regularization of the coarsely traced boundaries. All points that lie inside the boundary regions are considered as building points. Building points are also extracted out by explicitly labeling every point in the data set. For labeling purpose, features in local neighborhood like height, eigenvalue and plane features can be determined and used in conjunction with supervised (Mallet, 2011), semi supervised (Sampath, 2010) and unsupervised (Dorninger, 2008) classification techniques.
Typical data sources for reconstructing 3-D building models in- clude optical images (airborne or spaceborne), airborne LiDAR pointclouds, terrestrial LiDAR pointclouds and close range im- ages. In addition to them, recent advances in very high resolution synthetic aperture radar imaging together with its key attributes such as self-illumination and all-weather capability, have also at- tracted attention of many remote sensing analysts in character- izing urban objects such as buildings. However, SAR projects a 3-D scene onto two native coordinates i.e., “range” and “az- imuth”. In order to fully localize a point in 3-D, advanced in- terferometric SAR (InSAR) techniques are required that process stack(s) of complex-valued SAR images to retrieve the lost third dimension (i.e., the “elevation” coordinate). Among other InSAR methods, SAR tomography (TomoSAR) is the ultimate way of 3- D SAR imaging. By exploiting stack(s) of SAR images taken from slightly different positions, it builds up a synthetic aperture in elevation that enables the retrieval of precise 3-D position of dominant scatterers within one azimuth-range SAR image pixel. TomoSAR processing of very high resolution data of urban ar- eas provided by modern satellites (e.g., TerraSAR-X, TanDEM-X
This paper presents a novel workflow for data-driven buildingreconstruction from Light Detection and Ranging (LiDAR) pointclouds. The method comprises building extraction, a detailed roof segmentation using region growing with adaptive thresholds, segment bound- ary creation, and a structural 3D buildingreconstruction approach using adaptive 2.5D Dual Contouring. First, a 2D-grid is overlain on the segmented point cloud. Second, in each grid cell 3D vertices of the building model are estimated from the corresponding LiDAR points. Then, the number of 3D vertices is reduced in a quad-tree collapsing procedure, and the remaining vertices are connected according to their adjacency in the grid. Roof segments are represented by a Triangular Irregular Network (TIN) and are connected to each other by common vertices or - at height discrepancies - by vertical walls. Resulting 3D building models show a very high accuracy and level of detail, including roof superstructures such as dormers. The workflow is tested and evaluated for two data sets, using the evaluation method and test data of the “ISPRS Test Project on Urban Classification and 3D BuildingReconstruction” (Rottensteiner et al., 2012). Results show that the proposed method is comparable with the state of the art approaches, and outperforms them regarding undersegmentation and completeness of the scene reconstruction.
LiDAR points of buildings are extracted from the scene using previously available 2D building bound- ary polygons. Nearby points from terrain and vegetation are removed using filtering procedures. For roof segmentation, a robust region growing technique is developed. A unique feature of the segmentation method is the growing of triangles of a Triangulated Irregular Network (TIN) instead of LiDAR points. This minimizes the gaps between segments, because LiDAR points at segment intersections can be assigned multiple segment labels. Additionally, robust adaptive thresholds are introduced as region growing criteria. These enable the region growing procedure to stop at weak edges, while also segmenting non-planar roof segments. Results show that the proposed segmentation outperforms other methods concerning undersegmentation, and that it recognizes even weak edges. Evaluation and an extensive analysis of the input parameters’ effects on the results have shown that the segmentation is very robust against LiDAR point cloud characteristics and segment shape. Segment boundaries are cretated by collapsing the convex hull of segment points. Point density variations in across-track and along-track directions are considered in the collapsing procedure. For building modeling, the 2.5D dual contouring approach of Zhou and Neumann  is adapted to model complex roofs. After overlying a 2D grid to the segmented point cloud, vertices of the 3D building model are estimated for each grid cell by minimizing a Quadratic Error Function (QEF). Each QEF minimization results in a hyperpoint, which consists of one or more vertices of the building model at the same x-y-coordinates. This 2.5D-characteristic enables the connection ofbuilding ver- tices at step edges with vertical walls. In contrast to Zhou and Neumann , the proposed method
world coordinates enable the generation of high quality TomoSARpointclouds, containing not only the 3D positions of the scatterer location but also estimates of seasonal/temporal deformation, that are very attractive for generating 4-D city models from space. However there are some special considerations associated to these pointclouds that are worth to mention : 1) TomoSARpointclouds deliver moderate 3D positioning accuracy on the order of 1 m; 2) few number of images and limited orbit spread render the location error ofTomoSAR points highly anisotropic, with an elevation error typically one or two orders of magnitude higher than in range and azimuth ; 3) Due to the coherent imaging nature, temporally incoherent objects such as trees cannot be reconstructed from multipass spaceborne SAR image stacks; and 4) TomoSARpointclouds possess much higher density of points on the building façades due to side looking SAR geometry enabling systematic reconstructionof buildings footprint via façade points analysis. As depicted over smaller and larger areas in  and , façade reconstruction turns out to be an appropriate first step to detect and reconstruct building shape from these pointclouds when dense points on the façade are available. Especially, when data from multiple views e.g., from both ascending and descending orbits, are available, the
In this paper, we present an approach that allows automatic (parametric) reconstructionofbuilding shapes in 2-D/3-D usingTomoSARpointclouds. These pointclouds are generated by processing radar image stacks via advanced interferometric technique, called SAR tomography. The proposed approach reconstructs the building outline by exploiting both the available roof and façade information. Roof points are extracted out by employing a surface normals based region growing procedure via selected seed points while the extraction of façade points is based on thresholding the point scatterer density SD estimated by robust M-estimator. Spatial clustering is then applied to the extracted roof points in a way such that each roof cluster represents an individual building. Extracted façade points are reconstructed and afterwards incorporated to the segmented roof cluster to reconstruct the complete building shape. Initial buildingfootprints are derived by employing alpha shapes method that are later regularized. Finally, rectilinear constraints are added to yield better geometrically looking building shapes. The proposed approach is illustrated and validated by examples usingTomoSARpointclouds generated from a stack of TerraSAR-X high-resolution spotlight images from ascending orbit only covering two different test areas with one containing relatively smaller buildings in densely populated regions and the other containing moderate sized buildings in the city of Las Vegas.
priors are used for simulation. The desired building parame- ters are progressively achieved by minimizing the difference between simulated data and real data. These aforementioned researches mainly focus on isolated buildings, or building with simple shapes, such as rectangle shape or L-shape. In recent years, deep neural network has been increasingly popular and shown success in many applications including classification and segmentation in remote sensing images. The authors in  detect building areas using a fully convolutional network (FCN) with labeled data generated from TomoSAR data. However, the positioning accuracy of the used TomoSARpoint cloud is not enough for tasks such as buildingreconstruction and building height estimation. Thus, it remains a problem to estimate building heights in large-scale urban area using small number of SAR images.
During the last decades, several approaches for the reconstructionof 3D building models have been developed. Starting in the 1980s with manual and semi-automatic reconstruction methods of 3D building models from aerial images, the degree of automation has increased in recent years so that they became applicable to various areas. Some typical applications and examples are shown in section 1.1. Especially since the 1990s, when airborne light detection and ranging (LiDAR) technology became widely available, approaches for (semi-)automatic buildingreconstructionof large urban areas turned out to be of particular interest. Only in recent years, some large cities have built detailed 3D city models. Although much effort has been put into the development of a fully automatic reconstruction strategy in order to overcome the high costs of semi-automatic reconstructions, no solution proposed so far meets all requirements (e.g., in terms of completeness, correctness, and accuracy). The reasons for this are manifold as discussed in section 1.2. Some of them are manageable, for example, either by using modern sensors which provide denser and more accurate pointclouds than before or by incorporating additional data sources such as high-resolution images. However, there is quite a big demand for 3D building models in areas where such modern sensors or additional data sources are not available. Therefore, in this thesis a new fully automatic reconstruction approach of semantic 3D building models for low- and high-density airborne laser scanning (ALS) data of large urban areas is presented and discussed. Additionally, it is shown how automatically derived building knowledge can be used to enhance existing buildingreconstruction approaches. The specific research objectives are outlined in section 1.3. It includes an overview of the proposed reconstruction workflows and the contribution of this thesis. In order to have lean workflows with good performance, some general assumptions on the buildings to be reconstructed are imposed and explained in section 1.4. The introduction ends with an outline of this thesis in section 1.5.
L-shape is the most common building outline appearing in TomoSARpointclouds. For each building segment, one or zero L-shape is detected. The detection is achieved by catching two interconnected line segment using gray-scale Hough transform . Each pixel in the Hough transform matrix represents a line. In our algorithm, the pixel with the highest amplitude is firstly extracted, representing the first line of the L-shape. The second one is the pixel with highest amplitude away from the first one larger than a certain angle. To reject irregular L-shapes, constraints have to be put on this angle, the minimum pixel amplitude, and the minimum length of the line segments. For example, Figure 5 is the L-shape detection result of the test area near the Berlin central station. The detected L- shapes are overlaid on the binary façade image. The end points of L-shapes are marked with red and yellow dots.
errors caused by the incompleteness of data arising, for instance, from missed faces due to insufficient points, occlusion or vegetation clutter. It is said that having increased the point density of modern ALS, the data-driven approach allows to have a more accurate and robust result than that through the model-driven approach (Oude Elberink, 2008). A number of methods for doing the reconstruction, based on either model-driven or data-driven approaches using ALS data have been presented in the literature. When reviewing relevant literature, it is observed that the major problem is the efficient manipulation of the topology and roof primitives. From recent literature, it is seen that roof topology graphs (RTG) are widely used in both data-driven and model-driven approaches, especially for the efficient manipulation of topology and roof primitives (e.g. Verma et al., 2006; Milde et al., 2008; Oude Elberink and Vosselman, 2009), and in many cases, accurate results have been obtained. However, many unsolved problems need to be addressed within the processing chain ofbuildingreconstruction. This will be discussed in detail under Section 1.2.
Given an initial hypothesis of the building boundary, the refinement with well-designed constraints plays a vital role in improving the accuracy ofbuildingfootprints. The constraints usually come from the 3D features embedded in DSMs or pointclouds as well as from the 2D features in images. For the case of oblique UAV images, 3D features such as lines [ 5 ] and planes usually have low geometric accuracy due to the change of the viewing directions in oblique images [ 6 ]. Apart from that, image features can also be employed as effective constraints. Traditional methods employed color features [ 7 ] in early stages, which are vulnerable to shadows and illumination. Some methods extracted building boundaries by detecting 2D lines [ 8 ] or corners [ 9 ]. However, the detected edges or corners have uncertain semantic meanings and therefore can only be used as weak evidence. In contrast, pixel-wise semantic image segmentation provides an effective solution to this problem. Various handcrafted features have been proposed in traditional machine learning based classification tasks. For instance, 2D image features, 2.5D topographic features and 3D geometric features are integrated in [ 10 ] for supervised classification using an SVM classifier. With the rapid development of deep neural networks, deep-learning based segmentation methods have demonstrated their conspicuous advantages in yielding reliable and robust semantic segmentation compared to traditional machine-learning segmentation methods. Deconvolution networks are firstly applied in [ 11 ] for building extraction from remote sensing images and demonstrate promising segmentation accuracy.
In this research, we evaluated the potential of LOD1 3D reconstructionusing data from remote-sensing-derived geodata and volunteered geographic information (VGI). For this purpose, we used heights derived from sources provided for global mapping such as those produced through the TanDEM-X mission. We implemented two DEM fusion experiments to improve the quality of TanDEM-X in urban areas. First is to fuse the TanDEM-X and Cartosat-1 DEMs using corresponding weight maps generated through a supervised ANN-based pipeline. In the second experiment, multiple TanDEM-X raw DEMs are fused by variational models. The results confirm the quality improvement of TanDEM-X after DEM fusion. In another experiment, heights were from an archived TerraSAR-X and WorldView-2 image pair through a stereogrammetry framework. The output was a sparse point cloud with a promising accuracy. Since building outlines as an essential requirement for 3D reconstruction cannot be accurately recognized in those height sources, we employed outlines provided by OSM. It was also shown that the primary outlines are not perfect and should be modified and updated for an accurate reconstruction. The final results demonstrate the possibility of prismatic building model generation (at LOD1 level) on a wide area from easily accessible, remote sensing-derived geodata. Author Contributions: Conceptualization, Hossein Bagheri, Michael Schmitt, and Xiaoxiang Zhu; Methodology, Hossein Bagheri; Software, Hossein Bagheri; Data Curation and Investigation, Hossein Bagheri; Writing Hossein Bagheri; Review and editing, Michael Schmitt, and Xiaoxiang Zhu; Supervision, Michael Schmitt, and Xiaoxiang Zhu; Project Administration, Michael Schmitt; Funding Acquisition, Hossein Bagheri, Michael Schmitt; Resources, Xiaoxiang Zhu.
Different platforms and sensors are used to derive 3d models of urban scenes. 3d reconstruction from satellite and aerial images are used to derive sparse models mainly showing ground and roof surfaces of entire cities. In contrast to such sparse models, 3d reconstructions from UAV or ground images are much denser and show building facades and street furniture as traffic signs and garbage bins. Furthermore, pointclouds may also get acquired with LiDAR sensors. Pointclouds do not only differ in the viewpoints, but also in their scales and point densities. Consequently, the fusion of such heterogeneous pointclouds is highly challenging. Regarding urban scenes, another challenge is the occurence of only a few parallel planes where it is difficult to find the correct rotation parameters. We discuss the limitations of the general fusion methodology based on an initial alignment step followed by a local coregistration using ICP and present strategies to overcome them.
The analysis of individual trees in remote sensing data until now has mainly focused on the exploitation of aerial imagery or Li- DAR pointclouds. In this framework, many studies have been published about the detection and localization of individual trees (Pollock, 1996, Wulder et al., 2000, Leckie et al., 2005, Chen et al., 2006, Chang et al., 2013) as well as the delineation of their tree crowns (Culvenor, 2002, Pouliot et al., 2002, Erikson, 2003, Koch et al., 2006, Jing et al., 2012). In contrast to that, the analy- sis of forested areas on the single-tree level by means of synthetic aperture radar (SAR) remote sensing has not yet met the interest of the community, although modern sensors have reached sub- meter resolutions down to the decimeter-range in recent years. However, recently it has been shown that it is possible to gener- ate almost fully layover- and shadow-free pointclouds by means of airborne single-pass SAR tomography using millimeterwave sensors (Schmitt and Stilla, 2014). Therefore, in analogy to the approaches based on 3D LiDAR pointclouds, in this paper an unsupervised segmentation of the TomoSARpoint cloud aim- ing at the reconstructionof individual trees is proposed. While, for example, the studies of (Morsdorf et al., 2004) or (Gupta et al., 2010) are suggesting to employ k-means clustering for tree segmentation in LiDAR pointclouds, in the presented work the unsupervised mean-shift clustering algorithm is used. This way the need to know the number of expected clusters and an ini- tialization of their centers a priori is avoided and a fully auto- matic procedure is enabled (Comaniciu and Meer, 2002), which has already been proven for the reconstructionof buildings in To- moSAR pointclouds (Shahzad and Zhu, 2015). After clustering, rotational ellipsoids are used to model the individual segments in order to approximate the tree crown shapes. From these ellip- soids the tree positions, heights and crown diameters can be ex- tracted. This tree reconstruction strategy is evaluated using a 3D TomoSARpoint cloud, which was generated from airborne mil- limeterwave InSAR data acquired from multiple aspect angles.
In this Chapter a new method is proposed for generating 3D building models on different levels of detail (LOD). The proposed work flow is presented in Figure 6.1. The 3D models on differ- ent LOD follow the standard definition of the City Geography Markup Language (CityGML) described in (Kolbe et al., 2005). CityGML defines five LOD for multi-scale modeling: LOD0 – Regional model consisting of the 2.5D Digital Terrain Model (DTM), LOD1 – Building block model without roof structures, LOD2 – Building model including roof structures, LOD3 – Building model including architectural details, LOD4 – Building model including the interior. Algorithms for producing the first three levels of the LOD are explained in this chapter. Accord- ing to the above categorization, the first LOD corresponds to the digital terrain model (DTM). The non-ground regions are filtered using geodesic reconstruction to produce the DTM from LIDAR DSM (Arefi and Hahn, 2005; Arefi et al., 2007b). The LOD1 consists of a 3D representation of buildings using prismatic models, i.e., the building roof is approximated by a horizontal plane. Two techniques are implemented for the approximation of the detected building outline: hierarchical fitting of Minimum Bounding Rectangles and RANSAC-based straight line fitting and merging (Arefi et al., 2007a). For the third level of detail (LOD2), a projection-based approach is proposed resulting in a building model with roof structures. The algorithm is fast, because 2D data are analyzed instead of 3D data, i.e., lines are extracted rather than planes. The algorithm begins with extracting the building ridge lines thought to represent building parts. According to the location and orientation of each ridge line one para- metric model is generated. The models of the building parts are merged to form the overall building model.
The façade consists of planes, intersection lines (ridges), edges (façade boundary) and the corresponding vertices. These features will be reconstructed in this section. Mostly, reconstruction approaches follow the strategy of fitting planes in the segmented points and use some distance metric to identify adjacent segments. Instead of fitting planes in the segmented points, we adopt a different strategy: the surfaces are first classified to flat or curve surfaces by performing slope analysis inside each segmented cluster, i.e. flat surfaces possess constant slope while curve surfaces show gradually varying slope; the façade footprint is then estimated using the weighted least squares method.
Modern spaceborne SAR sensors such as TerraSAR-X and COSMO-SkyMed can deliver meter-resolution data that fits well to the inherent spatial scales of buildings. This very high resolution (VHR) data is therefore particularly suited for detailed urban mapping. In particular, using stacked VHR SAR images, advanced multi-pass interferometric techniques such as tomographic SAR inversion (TomoSAR) allow to retrieve not only the 3D geometrical shape but also the undergoing temporal motion of individual buildings and urban infrastructures . The resulting 4D pointclouds have a point (scatterer) density that is comparable to LiDAR. E.g. experiments using TerraSAR-X high-resolution spotlight data stacks show that the scatterer density retrieved usingTomoSAR is on the order of 1 million pts/km 2 . Object reconstruction from these high quality TomoSARpointclouds can greatly support the reconstructionof dynamic city models that could potentially be used to monitor and visualize the dynamics of urban infrastructure in very high level of details. Motivated by this, we presented very first results of façade reconstruction from single view (ascending stack) and multi-view (fused ascending and
In this paper, we present an approach for the detection and reconstructionofbuilding facades from these unstruc- tured TomoSARpointclouds. It consists of three main steps, including facade detection and extraction, segmentation, and reconstruction: First, the facade region is extracted by ana- lyzing the density of the point projected to the ground plane, the extracted facade points are then clustered into segments corresponding to individual facades by means of orientation analysis, and surface (flat or curved) model parameters of the segmented building facades are further estimated. Furthermore, we refine the elevation estimate of each raw TomoSARpoint by using its more accurate azimuth and range coordinates and the corresponding reconstructed surface model of the facade. The proposed approach is illustrated and validated by examples us- ing TomoSARpointclouds generated from stacks of TerraSAR- X high-resolution spotlight images from two viewing angles, i.e., both ascending and descending orbits.
Space-borne meter resolution SAR data, together with multi-pass InSAR techniques including persistent scatterer interferometry (PSI) and tomographic SAR inversion (TomoSAR), allow us to reconstruct the shape and undergoing motion of individual buildings and urban infrastructures -. TomoSAR in particular offer tremendous improvement in detailed reconstruction and monitoring of urban areas, especially man-made infrastructures . The rich scatterer information retrieved from multiple incidence angles by TomoSAR in particular enables us to generate 4D pointcloudsof the illuminated area with a point density comparable to LiDAR. These pointclouds can be potentially used for building façade reconstruction in urban environment from space with few considerations: 1) Side-looking SAR geometry enables TomoSARpointclouds to possess rich façade information; 2) Temporarily incoherent objects, e.g. trees, cannot be reconstructed from multi-pass space-borne SAR image stacks ; 3) The TomoSARpointclouds have a moderate 3D positioning accuracy in the order of 1m, while (airborne) LiDAR provides accuracy typically in the order of 0.1m.