Three-dimensional building models are important in various applications such as disaster management and urban planning. In this paper a method based on fusionof LiDAR pointcloudand aerial image data sources has been proposed. Firstly using 2D map, the point set relevant to each building separated from the overall LiDAR pointcloud. In the next step, the mean shift clustering algorithm applied to the points of different buildings in the feature space. Finally the segmentation stage ended with the separation of parallel and coplanar segments. Then using the adjacency matrix, adjacent segments are intersected and inner vertices are determined. In the other space, the area of any building cropped in the image space and the mean shift algorithm applied to it. Then, the lines of roof’s outline edge extracted by the Hough transform algorithm and the points obtained from the intersection of these lines transformed to the ground space. Finally, by integration of structural points of intersected adjacent facets and the transformed points from image space, reconstruction performed. In order to evaluate the efficiency of proposed method, buildings with different shapes and different level of complexity selected and the results of the 3D model reconstruction evaluated. The results showed credible efficiency of method for different buildings.
The approach starts with small point clouds containing one building at the time extracted from laser scanner data set by applying a pre-segmentation scheme. The laser scanner pointcloudof each building is analysed separately. A 2.5D-Delaunay triangle mesh structure (TIN) is calculated into the laser scanner pointcloud. For each triangle the orientation parameters in space (orientation, slope and perpendicular distance to the barycentre of the laser scanner pointcloud) are determined and mapped into a parameter space. As buildings are composed of planar features, primitives, triangles representing these features should group in parameter space. A cluster analysis technique is utilised to find and outline these groups/clusters. The clusters found in parameter space represent plane objects in object space. Grouping adjacent triangles in object space – which represent points in parameter space – enables the interpolation of planes in the ALS points that form the triangles. In each cluster point group a plane in object space is interpolated. All planes derived from the data set are intersected with their appropriate neighbours. From this, a roof topology is established, which describes the shape of the roof. This ensures that each plane has knowledge on its direct adjacent neighbours. Walls are added to the intersected roof planes and the virtual 3Dbuilding model is presented in a file written in VRML (Virtual Reality Macro Language).
of the building footprint, i.e., part of the buildingroof region could not be reconstructed due to unavailability of points. The presence of type-II facades implicitly validates this plausible phenomenon, and therefore, fusionof refined polygons by fully incorporating the reconstructed facades (of type II only) results in improved overall accuracy ofreconstruction. Doing the same for type-I facades, on the other hand, may affect the footprint polygon in the presence of facades belonging to inner building structures. Thus, only the orientation of type-I facades is essentially incorporated by the proposed procedure (steps 8 and 9 in Table II). In addition, steps 12–15 in Table II also pose a condition C1 for type-I facades such that they do not take part in the fusion process if the change in area of the polygon after incorporating the particular facade is greater than the certain fraction a f (fixed to 0.15 in this work) of the previous polygonal area. Thus, using condition C1 together with the method of type-I facade fusion, it is ensured that facades belonging to the inner structures of the building do not interfere during the fusion procedure, or in other words, only facades that are exterior and define the building outlines are utilized.
for this is due to the fact that point density on building roofs is quite varying and can contain gaps in between. This could lead to under reconstruct the building footprint i.e., part of the buildingroof region could not be reconstructed due to unavailability of points. Presence of type II façades implicitly validates this plausible phenomenon and therefore fusionof refined polygons by fully incorporating the reconstructed façades (of type II only) ) result in improved overall accuracy ofreconstruction. Doing same for type I façade, on the other hand, may affect the footprint polygon in presence of façades belonging to inner building structures. Thus, only the orientation of type I façade is essentially incorporated by the proposed procedure (steps 8-9 in ). In addition to this, steps 12 to 15 in also pose a condition C1 for type I façades such that they do not take part in the fusion process if the change in area of the polygon after incorporating the particular façade is greater than the certain fraction a f (fixed to 0.15 in this work)
Fig. 5 and Fig. 6 show normal building change examples with DSMs in high accuracy. The size of these two test regions are 450×700 m 2 , and 1000×400 m 2 , respectively. In Fig. 5 some seasonal changes are visible. The generated DSMs are displayed in Fig. 5c and 5d. The second test region (Fig. 6) shows much larger sized buildings, and these buildings are well separated from each other. The third test region consists of two images with the size of 160×340 pixels. This region is characterised by small sized buildings (Fig. 7). It has to be mentioned, the largest building with a dark colour roof does not have the correct height in the first DSM, as is shown in Fig. 7c. This test region is especially selected to prove the robustness of our fusion models. The image size of the fourth test region is 1600×1600 pixels, which is 640,000 m 2 . It has mainly large size buildings with complex roof shapes. From 2005 to 2010, besides newly constructed buildings, there are also rebuilt/demolished buildings. Especially, many roofs have been renovated with another material. Without height information, it is very difficult to separate the newly constructed buildings from other kinds of changes.
Different platforms and sensors are used to derive 3d models of urban scenes. 3dreconstruction from satellite and aerial images are used to derive sparse models mainly showing ground androof surfaces of entire cities. In contrast to such sparse models, 3d reconstructions from UAV or ground images are much denser and show building facades and street furniture as traffic signs and garbage bins. Furthermore, point clouds may also get acquired with LiDAR sensors. Point clouds do not only differ in the viewpoints, but also in their scales andpoint densities. Consequently, the fusionof such heterogeneous point clouds is highly challenging. Regarding urban scenes, another challenge is the occurence of only a few parallel planes where it is difficult to find the correct rotation parameters. We discuss the limitations of the general fusion methodology based on an initial alignment step followed by a local coregistration using ICP and present strategies to overcome them.
ence. Many approaches have been reported over the last decades. Sophisticated classification algorithms, e.g., support vector ma- chines (SVM) and random forests (RF), data modeling methods, e.g., hierarchical models, and graphical models such as condi- tional random fields (CRF), are well studied. Overviews are given in (Schindler, 2012) and (Vosselman, 2013). (Guo et al., 2011) present an urban scene classification on airborne LiDAR and mul- tispectral imagery studying the relevance of different features of multi-source data. An RF classifier is employed for feature evalu- ation. (Niemeyer et al., 2013) proposes a contextual classification of airborne LiDAR point clouds. An RF classifier is integrated into a CRF model and multi-scale features are employed. Recent work includes (Schmidt et al., 2014), in which full wave- form LiDAR is used to classify a mixed area of land and water body. Again, a framework combining RF and CRF is employed for classification and feature analysis. (Hoberg et al., 2015) presents a multi-scale classification of satellite imagery based also on a CRF model and extends the latter to multi-temporal classification. Concerning the use of more detailed 3D geome- try, (Zhang et al., 2014) presents roof type classification based on aerial LiDAR point clouds.
3Dbuildingreconstruction from point clouds is an active research topic in remote sensing, photogrammetry and computer vision. Most of the prior research has been done on 3Dbuildingreconstruction from LiDAR data which means high resolution and dense data. The interest of this work is 3Dbuildingreconstruction from Digital Surface Models (DSM) of stereo image matching of space borne satellite data which cover larger areas than LiDAR datasets in one data acquisition step and can be used also for remote regions. The challenging problem is the noise of this data because of low resolution and matching errors. In this paper, a top-down and bottom-up method is developed to find buildingroof models which exhibit the optimum fit to the point clouds of the DSM. In the bottom up step of this hybrid method, the building mask androof components such as ridge lines are extracted. In addition, in order to reduce the computational complexity and search space, roofs are classified to pitched and flat roofs as well. Ridge lines are utilized to estimate the roof primitives from a building library such as width, length, positions and orientation. Thereafter, a top- down approach based on Markov Chain Monte Carlo and simulated annealing is applied to optimize roof parameters in an iterative manner by stochastic sampling and minimizing the average of Euclidean distance between pointcloudand model surface as fitness function. Experiments are performed on two areas of Munich city which include three roof types (hipped, gable and flat roofs). The results show the efficiency of this method in even for this type of noisy datasets.
As the results summarized in Section 4 show, the DSen2-CR network is generally capable of removing clouds from Sentinel-2 imagery. This is not limited to a purely visual RGB representation of the declouded input image, but includes the reconstructionof the whole pixel spectrum with an average normalized RMSE between 2% and 20%, depending on the band. It should be noted, however, that the worst reconstruction results are achieved for the 60 m-bands, which are not meant to observe the surface of the Earth, but rather the atmosphere: B10, which shows the worst normalized RMSE values, is dedicated to a measurement of Cirrus clouds with a short-wave infrared wavelength; B9 is dedicated to measuring water vapor, and B1 is supposed to deliver information about coastal aerosoles (cf. Fig. 11 ). Since the SAR auxilary image uses a C-band signal with much longer wavelength, it is not affected by those atmospheric parameters at all and just provides information about the geometrical structure of the Earth surface. This, of course, distorts the reconstructionof the atmosphere-related Sentinel-2 bands, as can be seen in Fig. 11 . However, most classical Earth observation tasks, which benefit from a cloud-removal pre-processing step, do not employ those bands anyway and restrict their analyses to the 10 m- and 20 m-bands, which provide actual measurements of the Earth surface. Thus, the inclusion of the SAR auxiliary image can definitely be deemed helpful, which is also confirmed by the numerical results listed in Table 1 and the qualitative examples shown in Fig. 6 : The overall best result with respect to pure numbers is achieved when the classic loss T and SAR- optical data fusion are used. The new cloud-adaptive loss CARL , however, leads to a much better retainment of the original input and introduces less image translation artifacts, which are usually caused by training on images with a temporal offset. In summary, the combination of SAR-optical data fusionand the cloud-adaptive loss CARL provides the results that generalize best to different situations and also provide reliable cloud-removal for both rather thick clouds and vegetated areas which exhibit phenological changes. In the worst case, i.e. when the scene is comprised of complex patterns and the cloud cover is optically very thick, the network fails to provide a detailed and fully accurate reconstruction (c.f. the urban example in Fig. 6 ). It has to be stressed again, however, that the dataset used for training of the DSen2-CR model is globally sampled, which means that the network needs to learn a highly complex mapping from SAR to opticalimageryfor virtually every land cover type existing. By restricting the dataset or fine-tuning the model to a specific region or land cover type, it is expected that the SAR-to-optical translation results would improve significantly.
Currently, we are living in the “golden era of Earth observation”, characterized by an abundance of airborne and spaceborne sensors providing a large variety of remote sensing data. In this situation, every sensor type possesses different peculiarities, designed for specific tasks. One prominent example is the German interferometric SAR mission TanDEM-X, whose task is the generation of a global digital elevation model (Krieger et al., 2007). In order to do so, for every region of interest highly coherent InSAR image pairs acquired by the two satellites of this mission are needed. The same holds foroptical stereo sensors such as RapidEye, which additionally require cloudless weather and daylight during image acquisition (Tyc et al., 2005). Eventually, this means that there is a huge amount of data in the archives of which possibly a large potential remains unused, because information currently can only be extracted based within those narrowly defined mission-specific configurations. If, e.g., a second coherent SAR acquisition is not (yet) available, or if one image of an optical stereo pair is obstructed by severe cloud coverage, the mission goal – topography reconstruction – can currently not be fulfilled.
Fig. 1: Different Levels ofbuilding models in CityGML  Figure 1 displays different levels-of-detail as defined in the CityGML standard. As shown in this figure, the lowest level of detail (LOD) is 1 (LOD1), which simply models buildings as boxes without considering any details. Thus, for LOD1 only the outlines of buildings along with height information are required. Regarding the output of SAR-optical stereogrammetry being a relatively noisy and sparse pointcloud, an inclusion of auxiliary data is advised: We thus chose to extract the building footprints from the OpenStreetMap building layer, while the height information can be derived from the output of SAR-optical stereogrammetry. This is done by assigning the median height of all points found within a building outline to the footprint and thus extruding the footprint into the height axis as well.
In this paper we have presented an automatic (parametric) approach for façade reconstruction using TomoSAR point clouds. It consists of three main steps: façade extraction, segmentation andreconstruction. In our experiments, we rely on the assumption of having a high number of scatterers on the building façades. In most cases, the assumption is valid because of the existence of strong corner reflectors, e.g. window frames, on the building façades. However there are exceptional cases: 1) The façade structure is smooth i.e., only very few scatterers can be detected on the façades; 2) The building is low. In these cases, SD might not be the optimum choice. Alternatively, we can use other scatterer characteristics such as intensity (brightness) and SNR for extraction andreconstruction purposes. In the future, we will also concentrate on object based TomoSAR point clouds fusion, buildingroofreconstructionand automatic object reconstructionfor large areas.
the previous experiment into perspective and further eval- uate the factors benefiting the robust reconstructionofcloud-covered information, we conduct an ablation study. Especially, we investigate the effectiveness of the novel cloud detection mechanism explained in Section II-A and the local cloud-sensitive loss introduced in Section II-C. For this purpose, we retrain the model ours-0, as described in Section II, but omit the cloud-sensitive terms by fixating the values of all pixels in the cloud probability masks m to 1 .0. The effect of this is that the ablated model is no longer encouraged to minimize the changes to areas free ofcloud coverage, thus potentially resulting in unneeded changes. As additional baselines, we evaluate the goodness of simply using the S1 observations (VV- or VH-polarized), as well as cloud-covered S2 images as predictions and comparing against their cloud-free counterparts. Table III reports the declouding performance of baseline models and our models (0% and 100% paired data from Table II). Our network of 100% paired data performs best in terms of precision and F1 score. The raw S1 and S2 observations perform relatively poorly, except for the cloudy optical images scoring high on image diversity due to random cloud coverage. While it may be useful to consider the raw data as baselines, it is necessary to keep in mind that modalities, such as SAR, maybe at a disadvantage when directly comparing against the cloud-free optical target images.
In contrast to model-driven approaches, with high density point clouds, data-driven approaches perform well in complex scenarios by recognizing adjacent planar faces and their relations (e.g. ridges and step-edges) to achieve topologically and geometrically correct 3Dbuilding models as explained in Section 2.3. The main drawback of data-driven approaches, however, is their sensitivity to the incompleteness of data arising from occlusion, data gaps or vegetation clutter. Similarly, intersection of best fitted planes, for instance four planes, may create more than one point which yields extra short edges leading to an erroneous topology. A thorough discussion on the problems relevant to buildingreconstruction scheme, including effects due to data and scene complexities, is given in Oude Elberink (2008). In the reconstruction, defining hypothesis about the building shape is the major difficulty as it has to be automated. Assumptions that could be defined based on data or general knowledge about the scene, of course, help in this regard. Sometimes, constraints may conflict with the actual scene, for instance in the regularization ofroof outlines, along with the orthogonality and parallelism constraints, with respect to the fact that the main direction of the building may not always follow the actual direction. Taking into account topology representation, some approaches based on the topology graphs have been introduced (Sampath and Shan, 2010). They may be capable of handling more complicated scenes with a higher level of topology preservation. Some methods relying on RTG also use pre- defined primitive shapes and consequently, library graphs or target graphs (Verma et al., 2006). Although this would be a solution for data gaps, it would however be questionable when the targets are unable to represent a scene. Accordingly, the data-driven methods based on the segmented roof planes can be categorized into two groups: RTG-based and non RTG-based methods.
This paper presents a novel workflow for data-driven buildingreconstruction from Light Detection and Ranging (LiDAR) point clouds. The method comprises building extraction, a detailed roof segmentation using region growing with adaptive thresholds, segment bound- ary creation, and a structural 3Dbuildingreconstruction approach using adaptive 2.5D Dual Contouring. First, a 2D-grid is overlain on the segmented pointcloud. Second, in each grid cell 3D vertices of the building model are estimated from the corresponding LiDAR points. Then, the number of3D vertices is reduced in a quad-tree collapsing procedure, and the remaining vertices are connected according to their adjacency in the grid. Roof segments are represented by a Triangular Irregular Network (TIN) and are connected to each other by common vertices or - at height discrepancies - by vertical walls. Resulting 3Dbuilding models show a very high accuracy and level of detail, including roof superstructures such as dormers. The workflow is tested and evaluated for two data sets, using the evaluation method and test data of the “ISPRS Test Project on Urban Classification and3DBuildingReconstruction” (Rottensteiner et al., 2012). Results show that the proposed method is comparable with the state of the art approaches, and outperforms them regarding undersegmentation and completeness of the scene reconstruction.
In this Chapter a new method is proposed for generating 3Dbuilding models on different levels of detail (LOD). The proposed work flow is presented in Figure 6.1. The 3D models on differ- ent LOD follow the standard definition of the City Geography Markup Language (CityGML) described in (Kolbe et al., 2005). CityGML defines five LOD for multi-scale modeling: LOD0 – Regional model consisting of the 2.5D Digital Terrain Model (DTM), LOD1 – Building block model without roof structures, LOD2 – Building model including roof structures, LOD3 – Building model including architectural details, LOD4 – Building model including the interior. Algorithms for producing the first three levels of the LOD are explained in this chapter. Accord- ing to the above categorization, the first LOD corresponds to the digital terrain model (DTM). The non-ground regions are filtered using geodesic reconstruction to produce the DTM from LIDAR DSM (Arefi and Hahn, 2005; Arefi et al., 2007b). The LOD1 consists of a 3D representation of buildings using prismatic models, i.e., the buildingroof is approximated by a horizontal plane. Two techniques are implemented for the approximation of the detected building outline: hierarchical fitting of Minimum Bounding Rectangles and RANSAC-based straight line fitting and merging (Arefi et al., 2007a). For the third level of detail (LOD2), a projection-based approach is proposed resulting in a building model with roof structures. The algorithm is fast, because 2D data are analyzed instead of3D data, i.e., lines are extracted rather than planes. The algorithm begins with extracting the building ridge lines thought to represent building parts. According to the location and orientation of each ridge line one para- metric model is generated. The models of the building parts are merged to form the overall building model.
Reconstruction: Coarse outline of an individual roof segment is then reconstructed using alpha shapes algorithm. Due to varying and lower point density of TomoSAR points, alpha shapes however only define the coarse outline of an individual building which is usually rough and therefore needs to be refined/smoothed (or generalized). To this end, taking into account the average roof polygon complexity (APC), a regularization scheme based on either model fitting (i.e., minimum bounding ellipse/rectangle) or quadtree is adopted to simplify the roof polygons obtained around each segmented (or distinct) roof segment. The simplified roof polygons are then tested for zig-zag line removal using Visvalingam -Whyatt algorithm. Finally, height is associated to each regularized roof segment to obtain the 3-D prismatic model of individual buildings. The proposed approach is illustrated and validated over scenes containing two large buildings in the city of Las Vegas using TomoSAR point clouds generated from a stack of 25 images using Tomo-GENESIS software developed at German Aerospace Center (DLR).
In a next step the topology between the planes has to be recovered. For that reason the line segments are used again. All segments are features with a unique ID. While a plane is fitted to a significant cluster of segments, all respective IDs are linked this particular plane. A segment, which two planes have in common, represents a link between planes. A visualization of those linking segments can be found in figure 3. In terms of topology, a building can now be described by planes and edges. With the help of the planes’ geometrical information, namely normal vector and the edges’ topological information, nodes are reconstructed by intersecting three adjacent planes. If more than three planes belong to a node, the point coordinate is recovered by intersecting three planes several times and using the mean of the yielding points of intersections. If two adjacent planes don’t have another plane in common, it may be possible that a 4-node is needed to connect four adjacent planes, which occurs two times on the synthetic building. Therefor the search is extended, in such a way that two adjacent planes are associated to two other adjacent planes if they share common edge relations. The mean coordinate of all possible intersection points is the 4-node. A figure for the reconstructed building basically looks like figure 2 (bottom). More details on the differences of the results can be found in the next section.
Furthermore, for LOD1 reconstruction, we will consider two scenarios. The first one is to model buildings based on the original footprint layers provided by OSM. The second is to update these building outlines in a pre-processing step. This updating has proved to be helpful, because of OSM building footprints often consist of several intra-blocks with different heights. As displayed in Figure 1 , a building consisting of two blocks, each with different height level, may appear as an integrated building outline in OSM and thus, only one height value could be assigned for it in a simple LOD1 reconstruction process, while the outline should actually be split into two separate outlines. The result will be that the heights that actually lie in two separate clusters will erroneously be substituted by their median value located somewhere in the middle. While this ultimately leads to a significant height bias, modifying the outlines appropriately optimizes the final reconstruction. In this paper, this building modification is performed semi automatically: The candidate outlines are detected by clustering heights. The number of clusters determines the number of height levels and implies potential separate building blocks. Then, this is verified by visual comparison with open satellite imagery such as provided by Google Earth. Finally, the individual, newly separated building blocks are reconstructed by assigning separate median height values.
Presegmentation typically classifies the LiDAR pointcloud into buildings, terrain, and vegetation (including other non-terrain objects and clutter). Presegmentation can be performed in one step or sequentially, by first separating elevated points from those on ground, and then removing vegetation from the remaining data, or vice versa. Popular ground filtering method is to set a height threshold on a Digital Terrain Model (DTM), which can be produced e.g. by morphological filter operations [Morgan and Tempfli, 2000, Zhang et al., 2006, Ameri and Fritsch, 2000]. Other approaches are to identify planar LiDAR points2 and to create connected components of the latter, assuming the largest connected component to be ground [Verma et al., 2006]. Connected components can also be used for vegetation filtering, assuming connected components of small size [Verma et al., 2006] or of low planarity [Sampath and Shan, 2010] to be vegetation. The sequential process can be inverted, e.g. Sun and Salvaggio  classify first vegetation with a graph-cuts method, and then use Euclidean clustering to identify buildings. A one-step scene classification can be achieved e.g. by graph-cut optimization. Lafarge and Mallet  define expectation values for buildings, vegetation, ground and clutter by combining different covariance-based measures and height information in an energy optimization term. Dorninger and Pfeifer  extract all planar regions of the scene using a region growing segmentation 3 in feature space and group the extracted points to buildings with a mean-shift algorithm. Alternatively, buildingpoint clouds can be directly extracted from 2D building footprints, which are available beforehand[Rau and Lin, 2011], or which are provided interactively by user inputs [You et al., 2003].