Abstract—Modern spaceborne synthetic aperture radar (SAR) sensors, such as TerraSAR-X/TanDEM-X and COSMO-SkyMed, can deliver very high resolution (VHR) data beyond the inher- ent spatial scales of buildings. Processing these VHR data with advanced interferometric techniques, such as SAR tomography (TomoSAR), allows for the generation of four-dimensional pointclouds, containing not only the 3-D positions of the scatterer location but also the estimates of seasonal/temporal deformation on the scale of centimeters or even millimeters, making them very attractive for generating dynamic city models from space. Motivated by these chances, the authors have earlier proposed ap- proaches that demonstrated first attempts toward reconstructionofbuilding facades from this class of data. The approaches work well when high density of facade points exists, and the full shape of the building could be reconstructed if data are available from multiple views, e.g., from both ascending and descending orbits. However, there are cases when no or only few facade points are available. This usually happens for lower height buildings and renders the detectionof facade points/regions very challenging. Moreover, problems related to the visibility of facades mainly facing toward the azimuth direction (i.e., facades orthogonally ori- ented to the flight direction) can also cause difficulties in deriving the complete structure of individual buildings. These problems motivated us to reconstruct full 2-D/3-D shapesof buildings via exploitation of roof points. In this paper, we present a novel and complete data-driven framework for the automatic (parametric) reconstructionof 2-D/3-D buildingshapes (or footprints) using unstructured TomoSARpointclouds particularly generated from one viewing angle only. The proposed approach is illustrated and validated by examples using TomoSARpointclouds generated using TerraSAR-X high-resolution spotlight data stacks acquired from ascending orbit covering two different test areas, with one containing simple moderate-sized buildings in Las Vegas, USA and the other containing relatively complex building structures in Berlin, Germany.
With data provided by modern meter-resolution SAR sensors and advanced multi-pass interferometric techniques such as tomographic SAR inversion (TomoSAR), it is now possible to generate 4-D (space-time) pointcloudsof the illuminated area with point density of approx. 1 million points/km 2 . However, due to side looking geometry, these pointclouds exhibit much higher density of points on building façades in contrast to nadir looking LiDAR geometry (typically used for object reconstruction). Moreover, temporally incoherent objects such as trees cannot be reconstructed from multi-pass spaceborne SAR image stacks and provide moderate 3-D positioning accuracy in the order of 1m as compared to airborne LiDAR systems (around 0.1m). Despite of these special considerations, object reconstructionfrom these high quality pointclouds can greatly support the reconstructionof dynamic city models that could be potentially used to monitor and visualize the dynamics of urban infrastructure in very high level of details. Motivated by these chances, earlier approaches have been proposed to reconstruct building façades from this class of data. E.g., experimental results provided in (Zhu, 2014) and (Shahzad, 2014) over smaller and larger areas demonstrate that façade reconstruction is an appropriate first step to detect and reconstruct building shape when dense points on the façade are available. In particular, when data from multiple views e.g., from both ascending and descending orbits, are available, the full shape of buildings can be reconstructed using extracted façade points. However, there are cases when no or only few
Recent advances in very high resolution synthetic aperture radar (SAR) imagery and its key attributes—self-illumination and all-weather capability—have attracted the attention of many remote sensing analysts in the characterization of urban environments. Various techniques have been developed that make use of SAR imagery for buildingdetectionand recon- struction. Complex buildingshapes surrounded by roads and other structures make buildingdetection a challenging problem. One possible solution is to discriminate buildings from other objects using the building height and width measurements extracted from SAR imagery . The key issue is then the building height retrieval. For this purpose, various methods have been developed, including using sound electromagnetic models , layover  or shadow analysis  and simulation- based methods . In , an approach particularly suited for the detectionand extraction of large buildings based on information acquired from interferometric SAR (InSAR) data is proposed. Stochastic model-based and low level feature-based approaches for extracting and reconstructing buildings from a single SAR intensity image are presented in  and , respectively. Wang et al.  presented an approach for build- ing extraction from high-resolution single-aspect polarimetric SAR data. Since, in urban areas, the structures are densely packed, the appearance of one particular building is dependent on the viewing angle of the sensor. Using a single-view SAR image, it is difficult to detect buildings that have no orientation component in the sensor’s azimuth direction . To overcome this limit, multiview SAR acquisitions are required. In , an approach for estimating building dimensions using multi- view SAR images is presented. Bolter and Leberl  and Thiele et al.  proposed methods for buildingreconstruction based on multiview InSAR data. Buildingreconstruction in context to stereoscopic SAR radargrammetric and multiview polarimetric SAR acquisitions has also been used in  and , respectively.
Matei et al., 2008]. Sun and Salvaggio  create segment boundaries by overlying a 2D grid to their segmented point cloud: Each grid edge connecting an empty and an occupied grid cell is chosen as border edge. Very similarly, Zhou and Neumann  define boundaries by tracing the closest LiDAR points to those edges. Rottensteiner  define separation boundary lines between adjacent segments from the Delaunay triangulation: Differently segmented points connected by triangulation edges are boundary points, and the corresponding Voronoi edges form the boundary. Dorninger and Pfeifer , Kada and Wichmann  and Sampath and Shan  use a modified convex hull approach called alpha shapes, in which each next boundary vertix is determined only from the local neighborhood of the previous vertex. If the local neighborhood is determined by a fixed radius, alpha shapes produce only satisfactory results if the point density is regular. Therefore, Sampath and Shan  define the neighborhood with a rectangle whose extents and orientation depend on the along-track and across-track LiDAR sampling characterisitics. [Wang and Shan, 2009] identify unconnected boundary points by creating the convex hull of each point’s local neighborhood. If the point is a vertex of this convex hull, it is chosen as a building boundary vertex. Lafarge and Mallet  determine each boundary point based on its distance to the line fitted through its neighborhood.
With improved sensor resolution and advanced multi-pass interferometric techniques such as SAR tomographic inversion (TomoSAR), it is now possible to reconstruct both shape and motion of urban infrastructures. These sophisticated techniques not only opens up new possibilities to monitor and visualize the dynamics of urban infrastructure in very high level of details but also allows us to take a step further towards generation of 4D (space-time) or even higher dimensional dynamic city models that can potentially incorporate temporal (motion) behaviour along with the 3D information. Motivated by these chances, this paper presents a post processing approach that systematically allows automaticreconstructionofbuilding façades from 4D point cloud generated from tomographic SAR processing and put the particular focus on robust reconstructionof large areas. The approach is modular and consists of extracting facade points via point density estimation procedure based on directional window approach. Segmentation of facades into individual segments is then carried out using an unsupervised clustering procedure combining both the density-based clustering and the mean-shift algorithm. Subsequently, points of individual facade segments are identified as belonging to flat or curved surface and general 1st and 2nd order polynomials are used to model the facade geometry. Finally, intersection points of the adjacent façades describing the vertex points are determined to complete the reconstruction process. The proposed approach is illustrated and validated by examples using TomoSARpointclouds over the city of Las Vegas generated from a stack of TerraSAR- X high resolution spotlight images.
This paper presents a novel workflow for data-driven buildingreconstructionfrom Light Detectionand Ranging (LiDAR) pointclouds. The method comprises building extraction, a detailed roof segmentation using region growing with adaptive thresholds, segment bound- ary creation, and a structural 3Dbuildingreconstruction approach using adaptive 2.5D Dual Contouring. First, a 2D-grid is overlain on the segmented point cloud. Second, in each grid cell 3D vertices of the building model are estimated from the corresponding LiDAR points. Then, the number of3D vertices is reduced in a quad-tree collapsing procedure, and the remaining vertices are connected according to their adjacency in the grid. Roof segments are represented by a Triangular Irregular Network (TIN) and are connected to each other by common vertices or - at height discrepancies - by vertical walls. Resulting 3Dbuilding models show a very high accuracy and level of detail, including roof superstructures such as dormers. The workflow is tested and evaluated for two data sets, using the evaluation method and test data of the “ISPRS Test Project on Urban Classification and3DBuildingReconstruction” (Rottensteiner et al., 2012). Results show that the proposed method is comparable with the state of the art approaches, and outperforms them regarding undersegmentation and completeness of the scene reconstruction.
algorithms, 3D objects are reconstructed by surface fitting in the segmented building regions . Numerous methods are employed for building roof segmentation andreconstruction such as unsupervised clustering approaches , region growing algorithms  and graph based matching techniques . These techniques, however, cannot be directly applied to TomoSARpointclouds due to different object contents illuminated by the side looking SAR. In this paper, we propose an approach for building façade detectionandreconstructionfromTomoSARpointclouds. The proposed approach is illustrated and validated by examples using TomoSARpointclouds generated from a stack of 25 TerraSAR-X high spotlight images reported in . Our test building is the Bellagio hotel in Las Vegas. Fig. 1 (b) shows the TerraSAR-X mean intensity map of the area of interest while Fig. 1 (a) is the corresponding optical image. Fig. 1 (c) gives an overview of the input TomoSARpointclouds in UTM coordinates.
world coordinates enable the generation of high quality TomoSARpointclouds, containing not only the 3D positions of the scatterer location but also estimates of seasonal/temporal deformation, that are very attractive for generating 4-D city models from space. However there are some special considerations associated to these pointclouds that are worth to mention : 1) TomoSARpointclouds deliver moderate 3D positioning accuracy on the order of 1 m; 2) few number of images and limited orbit spread render the location error ofTomoSAR points highly anisotropic, with an elevation error typically one or two orders of magnitude higher than in range and azimuth ; 3) Due to the coherent imaging nature, temporally incoherent objects such as trees cannot be reconstructed from multipass spaceborne SAR image stacks; and 4) TomoSARpointclouds possess much higher density of points on the building façades due to side looking SAR geometry enabling systematic reconstructionof buildings footprint via façade points analysis. As depicted over smaller and larger areas in  and , façade reconstruction turns out to be an appropriate first step to detect and reconstruct building shape from these pointclouds when dense points on the façade are available. Especially, when data from multiple views e.g., from both ascending and descending orbits, are available, the
In recent years, by introducing the LiDAR system and using the 3D geo-referenced data, different methods have been proposed for 3Dbuilding modelling. In recent years, several methods have been proposed for 3Dbuildingreconstruction. The algorithms comprise methods that only employ LIDAR point cloud for model generation while some others use additional data sources such as aerial or satellite imagery. (Ma, 2006) proposed a methodology of3Dbuilding model reconstruction will be examined based on the integration of aerial photographs and LIDAR data. The methodology is comprised of two elements. The first one is to reconstruct 3Dbuilding models from LIDAR data. Rough building models are the outcome of this step. The second element is to refine the rough models with information derived from aerial photographs. Cheng et al. proposed an approach by integrating aerial imagery and LiDAR data to reconstruct 3Dbuilding models (Cheng, Gong, Li, & Liu, 2011). In this approach, an algorithm for determination of principal orientations of a building was introduced and3D boundary segments were then determined by incorporating LiDAR data and the 2D segments extracted from images, a strategy including automatic recovery of lost boundaries was finally used for 3Dbuilding model reconstruction. The focus of this study is to improve the quality ofbuilding boundaries, not building roofs. (Satari, 2012) proposed a multi-resolution hybrid approach for the reconstructionofbuilding models from LiDAR data. The detectionof the main roof planes is obtained
The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing (Shapiro u. Stockman ). It estimates the parameters of a shape from its points. The purpose of the technique is to find imperfect instances of objects within a certain class ofshapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform. It can be used to detect lines, circles and other primitive shapes if their parametric equation is known. In principle, it works by mapping every point in the data to a manifold in the parameter space. This manifold describes all possible variants of the parametrized primitive. Making the parametrizing simpler or limit the parameter space speed up the algorithm. This is especially true for 3D shape detection, where for example to detect a plane using the plane equation ax+by+cz+d=0 requires 3D Hough space, which will quickly occupy large space of memory and performance since all possible planes in every transformed pointclouds need to be examined. A plane can also be fitted based on normalized normal vectors using only two of the Euler angles and distance from origin, α, β and d. There is no need to the third Euler angle since the information when transforming around the axis in redundant (Hulik u. a. ).
Moreover, the reconstructed façades could remain either incomplete or break into more than one segment due to the following reasons: 1) Higher building structures present nearby can partly (or fully) occlude the façades of lower buildings; 2) Due to the geometrical shape, only very few points are available at some parts ofbuilding façades. Computed vertex points are therefore first categorized into two types: First type consists of vertices that are computed from the intersection of two adjacent façades, while the second type consists of the other vertices representing “open” endpoints. Reconstructed façades are later refined by inserting additional segments between the broken regions and extend those façades that remain incomplete by statistically analyzing and matching the local height distribution of the nearest open endpoint vertices.
Reconstruction: Coarse outline of an individual roof segment is then reconstructed using alpha shapes algorithm. Due to varying and lower point density ofTomoSAR points, alpha shapes however only define the coarse outline of an individual building which is usually rough and therefore needs to be refined/smoothed (or generalized). To this end, taking into account the average roof polygon complexity (APC), a regularization scheme based on either model fitting (i.e., minimum bounding ellipse/rectangle) or quadtree is adopted to simplify the roof polygons obtained around each segmented (or distinct) roof segment. The simplified roof polygons are then tested for zig-zag line removal using Visvalingam -Whyatt algorithm. Finally, height is associated to each regularized roof segment to obtain the 3-D prismatic model of individual buildings. The proposed approach is illustrated and validated over scenes containing two large buildings in the city of Las Vegas using TomoSARpointclouds generated from a stack of 25 images using Tomo-GENESIS software developed at German Aerospace Center (DLR).
shaped footprint could be reconstructed. Also, the approach utilizes roof points in determining the complete shape of the buildings and therefore resolves problems, as mentioned in (Shahzad and Zhu, 2015a), related to the visibility of the façades mainly pointing towards the azimuth direction of the SAR sensor. However, few points still need to be addressed. For instance, the reconstruction accuracy is restricted due to less number of available points and data gaps in the TomoSARpoint cloud. This could be improved by incorporating data from other viewing angles and/or adding more constraints such as parallelism or using a model based approaches based on a library of low level feature sets. Also, we have compared our results to the OSM data which is regularly updated but not yet fully complete. Therefore, a more accurate ground truth would be needed for assessing the exact performance of the approach. Nevertheless, this paper presents the first demonstration ofautomaticreconstructionof 2-D/3-D building footprints from this class of data. Moreover, the developed methods are not strictly applicable to TomoSARpointclouds only but are also applicable to work on unstructured 3-D pointclouds generated from a different sensor with similar configuration (i.e., oblique geometry) with both low and high point densities. In the future, we will explore the potential of extending the algorithm towards generation of automatically reconstructed complete watertight prismatic (or polyhedral) 3-D/4-D building models from space.
A considerable amount of studies addresses DSM-assisted build- ing footprint extraction. Some of them fuse data from different sensors, chiefly multispectral images and LiDAR-derived DSMs (Matikainen et al., 2010; Hermosilla et al., 2011; Grigillo and Kanjir, 2012). Those studies, however, rely on different data sources which may imply difficulties concerning the availability and temporal coincidence of the data. Exploiting the potentials of a single platform could yield a solution. Hence, a number of authors directly extracted footprint shapesfrom LiDAR pointclouds (Wang et al., 2006; Zhang et al., 2006; Arefi et al., 2008) or nadir RGB imagery (Shorter and Kasparis, 2009). Stereo images also provide possibilities for detecting building geometries using optical and height information derived from the same data source (Arefi and Reinartz, 2013; Tian et al., 2014) or solely height infor- mation (Weidner, 1997). Photogrammetric techniques have been used to extract three-dimensional line segments for 3D model generation (Zebedin et al., 2008). These line segments, however, require a good perspective coverage of the scene in order to be useful for building extraction. We therefore studied the potentials and limitations of extracting line segments from individual im- ages used for DSM generation and subsequently verifying them on the basis of the DSM.
Space-borne meter resolution SAR data, together with multi-pass InSAR techniques including persistent scatterer interferometry (PSI) and tomographic SAR inversion (TomoSAR), allow us to reconstruct the shape and undergoing motion of individual buildings and urban infrastructures -. TomoSAR in particular offer tremendous improvement in detailed reconstructionand monitoring of urban areas, especially man-made infrastructures . The rich scatterer information retrieved from multiple incidence angles by TomoSAR in particular enables us to generate 4D pointcloudsof the illuminated area with a point density comparable to LiDAR. These pointclouds can be potentially used for building façade reconstruction in urban environment from space with few considerations: 1) Side-looking SAR geometry enables TomoSARpointclouds to possess rich façade information; 2) Temporarily incoherent objects, e.g. trees, cannot be reconstructed from multi-pass space-borne SAR image stacks ; 3) The TomoSARpointclouds have a moderate 3D positioning accuracy in the order of 1m, while (airborne) LiDAR provides accuracy typically in the order of 0.1m.
Note that big separation numbers n produce large configuration factors which can lead to erroneous measurements. As a consequence, an enlargement of the dipole lengths for Schlumberger, pole-dipole and dipole-dipole configurations seems reasonable, denoted with ”:”. Furthermore, it is recommended to shift the electrodes by the dipole length ”:” instead of the electrode distance ”.” along the profile. Thus, a reduced number of data is gained. However, this can be compensated by applying a larger maximum separation factor. The data points are assigned to a representative x-location, e.g. the midpoint, and the separation factor n. Since large n are associated with increased investigation depths, is seems reasonable to plot the data in a pseudo-section with n going down. Remember, that a pseudo-section is just a graphical representation of the measured data. However, experienced geophysicists are able to retrieve a concept of the earth’s structure without inverting the data.
In contrast to model-free exploration methods that focus on autonomy and real-time capability in unknown environments, model-based path planning algorithms rely on an available proxy model of the environment and focus on estimating a subsequent optimal path to maximize the coverage and accuracy of the object globally [ 10 , 11 , 35 – 38 ]. In contrary to active modeling, these explore-and-exploit methods do not receive any feedback from the acquired images during the exploitation flight, which demands high attention to the applied heuristics being used for generating the refinement path. The global optimization of coverage and accuracy, on the other hand, usually leads to larger completeness and smoother trajectories compared to model-free methods. Recent work has proposed to extend this procedure by iteratively refining the model from several subsequent flights, taking into account the remaining model uncertainty between each flight [ 38 , 39 ]. Furthermore, the execution of the optimized path is easy and fast for any kind of UAV by simply navigating alongside the optimized waypoints. The prior model can either be based on an existing map with height information [ 36 ] or is generated by photogrammetric reconstructions from a preceding manual flight at a safe altitude or via standard flight planning methods (e.g., regular grids or circular trajectories) [ 10 , 11 ] and is usually expressed by a set of discrete 3D points in a voxel space [ 10 , 11 , 37 , 40 ] or by volumetric surfaces, such as triangulated meshes [ 35 , 36 , 38 , 41 ]. In order to define appropriate views for the optimized trajectory, camera viewpoint hypotheses are either regularly sampled in the free 3D airspace [ 10 , 37 ] resulting in 3D camera graphs, or are sparsely sampled in a 2D view manifold [ 38 ] or in skeleton sets [ 42 ] around the object. Subsequently, an optimization is defined in order to find a connected subset of these viewpoint hypotheses to define a suitable path through the camera graph. Alternatively, the locations of the of regularly sampled viewpoint candidates can be continuously refined during the optimization [ 11 ]. As a means of assessing the suitability of camera viewpoints for the reconstruction, hand-crafted heuristics are usually defined considering the necessities for a successful SfM and MVS workflow. These include multi-view requirements [ 35 , 37 , 40 ], ground resolution [ 35 , 41 ], 3D uncertainty [ 43 ] and the coverage of the object [ 10 , 11 , 37 ]. Instead of using hand-crafted heuristics, several works used machine learning methods to learn heuristics that allow predicting the confidence in the output of a MVS without executing it [ 27 , 43 , 44 ].
to efficiently search these representations in a queried image. Typically, an image is transformed to the corresponding feature space, i.e. it is processed into a new im- age represented by the defined features, and occurrences with little deviation to the objects representation, regarding a certain metric are picked as matches. Concern- ing the first subtask, works from the past two decades like Scale-Invariant Feature Transform (SIFT) (Lowe, 1999), Speeded-Up Robust Features (Bay et al., 2006) or Features from Accelerated Segment Test (Rosten and Drummond, 2006) considered the detectionand description of highly distinctive and transformation robust feature points, to represent objects. Representative works like Histogram of Oriented Gra- dients (Dalal and Triggs, 2005), Local Binary Pattern (LBP) (Ahonen et al., 2006) or Region Covariance (Tuzel et al., 2006) focused on describing feature information from the local area ofand around the objects. In recent years, also advanced feature descriptor variants like RootSIFT Principal Component Analysis (RootSIFT-PCA) (Bursuc et al., 2015), Domain-Size Pooling SIFT (Dong and Soatto, 2015) or Ro- tation Invariant Co-occurrence among adjacent LBPs (X. Qi et al., 2014) were re- searched, to further enhance matching-performance and robustness. For the second subtask, Cascades (Viola and Jones, 2001), Efficient Subwindow Search (Lampert et al., 2008) and Selective Search (Sande et al., 2011) are key contributions to reduce the object search space by e.g. introducing a compressed image description or by finding a sub-search-space, which maximizes a given score function.
A ndreAs s chmitt & t homAs V ögtle , Karlsruhe
Keywords: Terrestrial laser scanner, Pointclouds, Automatic surface extraction, 3Dreconstruction
Summary: Terrestrial laser scanning has become a standard method for a fast and accurate acquisition of3D objects. While data capture has attained a high level of development, the analysis ofpointclouds is still characterised by a remarkable amount of manual interaction. In this article an advanced generic approach for the extraction of surface prim- itives is presented. In a first step the 3D measure- ment domain is subdivided into volume elements (voxels) and the centre of gravity of the interior la- ser points is calculated for each voxel as representa- tive geometric position. Normal vectors are deter- mined for each voxel by means of all possible com- binations of two vectors to the 26 neighbouring barycentres. If the local surrounding contains plane surface parts, a couple of these normal vectors have similar directions. These vectors will be aggregat- ed (mean direction) and the number of involved vectors (NOV) is stored. For a planar surrounding a clear majority will be obtained. A region growing algorithm extracts plane surfaces by merging adja- cent voxels if their main normal directions are sim- ilar (homogeneity criterion). If two majorities can be observed it is an edge point, for three main di- rections a corner point can be assumed. These topo- logical points will be stored as a base for the subse- quent 3D modelling process. First experiences with synthetic and real world data of buildings have shown the suitability of this advanced approach and the robustness concerning noise, surface roughness and outliers. A disadvantage may be a certain general isation effect.
angles are inherently the most prominent features in the slope histogram, the first and last 10° are cut off, so that the range is limited to 10° to 80°. Within the histograms for aspect and slope, a polynomial function of degree 2 is fitted to the peaks. A delta of approximately 90° between two identified aspect angles indicates a good outcome, since most façades are built orthogonally. The maxima are found to be the correct orientation for façade and roof planes. The subsequent process of plane fitting is reduced to two dimensions, since only the position of the planes remains unknown. In case of aspect planes, namely façades it means, that lines are sweeping through the nadir view at the given aspect angles, trying to find significant clusters of line segments. In case of slope planes, namely roofs, the viewing angle is fixed to a horizontal view at the given aspect angles. Again lines are sweeping through the view at given slope angles, in order to find significant clusters of line segments. The fixation of those sweeping lines at clusters marks the position of planes at the previously found orientation angles (see last row of table 1).