A novel method for creating detailed building models with com- plex roof shapes fromLiDARpointclouds is proposed in this paper. The 2.5D DualContouring method of Zhou and Neumann (2010) is used and adapted in a way that step edges and inter- section edges can be created between roof segments. A main contribution of this work is the modification and weighting of the Quadratic Error Function (QEF) for modeling step edges and intersection edges. The modeling depends on the step edge prob- abilities of local height layers. A prerequisite for adaptive 2.5D DualContouring is a roof segmentation technique which stops at smooth edges. The applied robust TIN-based region growing reli- ably stops at smooth edges. Consequently, undersegmentation is significantly reduced. The resulting building models show a very high fit to the input LiDAR points. Each roof segment is repre- sented by a triangulation, thus also non-planar roof shapes can be modelled. Subsequent model regularization is recommended, because buildings are represented by a large number of vertices. Errors in reconstruction result mostly from wrong or missing con- nections of the vertices. Thus, the way the connections of the ver- tices to the building model should be more robust. Wrong con- nections could be avoided by checking for the consistency of the model with the building footprint. Under assumption that build- ing edges are mostly orthogonal or parallel to the main build-
During the last decades, several approaches for the reconstruction of 3Dbuilding models have been developed. Starting in the 1980s with manual and semi-automatic reconstruction methods of 3Dbuilding models from aerial images, the degree of automation has increased in recent years so that they became applicable to various areas. Some typical applications and examples are shown in section 1.1. Especially since the 1990s, when airborne light detection and ranging (LiDAR) technology became widely available, approaches for (semi-)automatic buildingreconstruction of large urban areas turned out to be of particular interest. Only in recent years, some large cities have built detailed 3D city models. Although much effort has been put into the development of a fully automatic reconstruction strategy in order to overcome the high costs of semi-automatic reconstructions, no solution proposed so far meets all requirements (e.g., in terms of completeness, correctness, and accuracy). The reasons for this are manifold as discussed in section 1.2. Some of them are manageable, for example, either by using modern sensors which provide denser and more accurate pointclouds than before or by incorporating additional data sources such as high-resolution images. However, there is quite a big demand for 3Dbuilding models in areas where such modern sensors or additional data sources are not available. Therefore, in this thesis a new fully automatic reconstruction approach of semantic 3Dbuilding models for low- and high-density airborne laser scanning (ALS) data of large urban areas is presented and discussed. Additionally, it is shown how automatically derived building knowledge can be used to enhance existing buildingreconstruction approaches. The specific research objectives are outlined in section 1.3. It includes an overview of the proposed reconstruction workflows and the contribution of this thesis. In order to have lean workflows with good performance, some general assumptions on the buildings to be reconstructed are imposed and explained in section 1.4. The introduction ends with an outline of this thesis in section 1.5.
The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing (Shapiro u. Stockman ). It estimates the parameters of a shape from its points. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform. It can be used to detect lines, circles and other primitive shapes if their parametric equation is known. In principle, it works by mapping every point in the data to a manifold in the parameter space. This manifold describes all possible variants of the parametrized primitive. Making the parametrizing simpler or limit the parameter space speed up the algorithm. This is especially true for 3D shape detection, where for example to detect a plane using the plane equation ax+by+cz+d=0 requires 3D Hough space, which will quickly occupy large space of memory and performance since all possible planes in every transformed pointclouds need to be examined. A plane can also be fitted based on normalized normal vectors using only two of the Euler angles and distance from origin, α, β and d. There is no need to the third Euler angle since the information when transforming around the axis in redundant (Hulik u. a. ).
Space-borne meter resolution SAR data, together with multi-pass InSAR techniques including persistent scatterer interferometry (PSI) and tomographic SAR inversion (TomoSAR), allow us to reconstruct the shape and undergoing motion of individual buildings and urban infrastructures -. TomoSAR in particular offer tremendous improvement in detailed reconstruction and monitoring of urban areas, especially man-made infrastructures . The rich scatterer information retrieved from multiple incidence angles by TomoSAR in particular enables us to generate 4D pointclouds of the illuminated area with a point density comparable to LiDAR. These pointclouds can be potentially used for building façade reconstruction in urban environment from space with few considerations: 1) Side-looking SAR geometry enables TomoSAR pointclouds to possess rich façade information; 2) Temporarily incoherent objects, e.g. trees, cannot be reconstructed from multi-pass space-borne SAR image stacks ; 3) The TomoSAR pointclouds have a moderate 3D positioning accuracy in the order of 1m, while (airborne) LiDAR provides accuracy typically in the order of 0.1m.
Modern spaceborne SAR sensors such as TerraSAR-X and COSMO-SkyMed can deliver meter-resolution data that fits well to the inherent spatial scales of buildings. This very high resolution (VHR) data is therefore particularly suited for detailed urban mapping. In particular, using stacked VHR SAR images, advanced multi-pass interferometric techniques such as tomographic SAR inversion (TomoSAR) allow to retrieve not only the 3D geometrical shape but also the undergoing temporal motion of individual buildings and urban infrastructures . The resulting 4D pointclouds have a point (scatterer) density that is comparable to LiDAR. E.g. experiments using TerraSAR-X high-resolution spotlight data stacks show that the scatterer density retrieved using TomoSAR is on the order of 1 million pts/km 2 . Object reconstructionfrom these high quality TomoSAR pointclouds can greatly support the reconstruction of dynamic city models that could potentially be used to monitor and visualize the dynamics of urban infrastructure in very high level of details. Motivated by this, we presented very first results of façade reconstructionfrom single view (ascending stack) and multi-view (fused ascending and
Automatic generation of 3Dbuilding models is an essential pre- requisite in a wide variety of applications such as tourism, urban planning and automatic navigation. Although over the last decades, many approaches of building detection and reconstructionfrom3Dpointclouds and high resolution aerial images have been reported. The fully 3Dbuildingreconstruction is still a challenging issue due to the complexity of urban scenes. There are basically two strategies for building roof reconstruction: bottom-up/data-driven and top- down/model-driven methods. The bottom-up methods (e.g. region growing (Rottensteiner and Briese, 2003), Hough transform (Vosselman and Dijkman, 2001), RANSAC (Tarsha- Kurdi et al., 2008)) extract roof planes and other geometrical information from the pointclouds. For roof reconstruction, the corresponding planes are assembled and vertices, ridges and eaves are determined (Sohn and Huang, 2008). Sampath and Shan (2010) used a bottom-up approach to segment the LiDAR points to planar and non-planar planes using eigenvalues of the covariance matrix in a small neighborhood. Then, the normal vectors of planar points are clustered by fuzzy k-means clustering. Afterwards, an adjacency matrix is considered to obtain the breaklines and roof vertices of corresponding planes. This method is used for reconstruction of moderately complex buildings. Rottensteiner et al. (2005), presents an algorithm to delineate building roof boundaries fromLIDAR data with high level of detail. In this method, roof planes are initially extracted
Presegmentation typically classifies the point cloud into building points and other points, mainly terrain and vegetation in LiDARpointclouds. If 2D building footprints are avail- able beforehand, buildingpointclouds can be directly extracted [Rau and Lin, 2011]. A popular way is ground filtering method, in which a Digital Terrain Model (DTM) is pro- duced by morphological filter operations [Morgan and Tempfli, 2000] [Zhang et al., 2003] [Pingel et al., 2013], then a height threshold is set on the DTM. Another approach is to fit planes to points clouds, and clustering points. The largest cluster is assumed to be ground [Verma et al., 2006]. [Lafarge and Mallet, 2012] defineed expectation values for buildings, vegetation, ground and clutter by combining different covariance-based measures and height information by energy optimization. [Dorninger and Pfeifer, 2008] extracted all planar regions of the scene using region growing method in feature space and group the extracted points to buildings with a mean-shift algorithm.
Matei et al., 2008]. Sun and Salvaggio  create segment boundaries by overlying a 2D grid to their segmented point cloud: Each grid edge connecting an empty and an occupied grid cell is chosen as border edge. Very similarly, Zhou and Neumann  define boundaries by tracing the closest LiDAR points to those edges. Rottensteiner  define separation boundary lines between adjacent segments from the Delaunay triangulation: Differently segmented points connected by triangulation edges are boundary points, and the corresponding Voronoi edges form the boundary. Dorninger and Pfeifer , Kada and Wichmann  and Sampath and Shan  use a modified convex hull approach called alpha shapes, in which each next boundary vertix is determined only from the local neighborhood of the previous vertex. If the local neighborhood is determined by a fixed radius, alpha shapes produce only satisfactory results if the point density is regular. Therefore, Sampath and Shan  define the neighborhood with a rectangle whose extents and orientation depend on the along-track and across-track LiDAR sampling characterisitics. [Wang and Shan, 2009] identify unconnected boundary points by creating the convex hull of each point’s local neighborhood. If the point is a vertex of this convex hull, it is chosen as a building boundary vertex. Lafarge and Mallet  determine each boundary point based on its distance to the line fitted through its neighborhood.
In this Chapter a new method is proposed for generating 3Dbuilding models on different levels of detail (LOD). The proposed work flow is presented in Figure 6.1. The 3D models on differ- ent LOD follow the standard definition of the City Geography Markup Language (CityGML) described in (Kolbe et al., 2005). CityGML defines five LOD for multi-scale modeling: LOD0 – Regional model consisting of the 2.5D Digital Terrain Model (DTM), LOD1 – Building block model without roof structures, LOD2 – Building model including roof structures, LOD3 – Building model including architectural details, LOD4 – Building model including the interior. Algorithms for producing the first three levels of the LOD are explained in this chapter. Accord- ing to the above categorization, the first LOD corresponds to the digital terrain model (DTM). The non-ground regions are filtered using geodesic reconstruction to produce the DTM fromLIDAR DSM (Arefi and Hahn, 2005; Arefi et al., 2007b). The LOD1 consists of a 3D representation of buildings using prismatic models, i.e., the building roof is approximated by a horizontal plane. Two techniques are implemented for the approximation of the detected building outline: hierarchical fitting of Minimum Bounding Rectangles and RANSAC-based straight line fitting and merging (Arefi et al., 2007a). For the third level of detail (LOD2), a projection-based approach is proposed resulting in a building model with roof structures. The algorithm is fast, because 2D data are analyzed instead of 3D data, i.e., lines are extracted rather than planes. The algorithm begins with extracting the building ridge lines thought to represent building parts. According to the location and orientation of each ridge line one para- metric model is generated. The models of the building parts are merged to form the overall building model.
Different platforms and sensors are used to derive 3d models of urban scenes. 3dreconstructionfrom satellite and aerial images are used to derive sparse models mainly showing ground and roof surfaces of entire cities. In contrast to such sparse models, 3d reconstructions from UAV or ground images are much denser and show building facades and street furniture as traffic signs and garbage bins. Furthermore, pointclouds may also get acquired with LiDAR sensors. Pointclouds do not only differ in the viewpoints, but also in their scales and point densities. Consequently, the fusion of such heterogeneous pointclouds is highly challenging. Regarding urban scenes, another challenge is the occurence of only a few parallel planes where it is difficult to find the correct rotation parameters. We discuss the limitations of the general fusion methodology based on an initial alignment step followed by a local coregistration using ICP and present strategies to overcome them.
Development of automatic methods for reconstruction of buildings and other urban objects from synthetic aperture radar (SAR) images is of great practical interest for many remote sensing applications due to their independence from solar illumination and all weather capability. In addition to it, very high resolution (VHR) SAR images acquired from spaceborne sensors are also capable of monitoring greater spatial area at significantly reduced costs. These benefits have motivated many researchers and therefore several methods have been developed that use SAR imagery for detection and reconstruction of man- made objects in particular buildings. For instance, (Quartulli, 2004) and (Ferro, 2009) present approaches for buildingreconstruction based on single-aspect SAR images. However, use of single SAR images only poses greater challenges especially in dense urban areas where the buildings are located closely together resulting in occlusion of smaller buildings from higher ones (Wegner, 2009). To resolve this, interferometric SAR acquisitions (InSAR) are acquired which implies imaging area of interest more than once with different viewing configurations. (Gamba, 2000) proposed an approach that uses such InSAR configuration to detect and extract buildings based on a modified machine vision approach. (Thiele, 2007) also presented a model based approach that employed orthogonal InSAR images to detect and reconstruct building footprints. An automatic approach based on modeling building objects as cuboids using multi-aspect polarimetric SAR images is presented in (Xu, 2007). (Sportouche, 2011) and (Wegner,
with various strategies. Early works such as MV3D  and AVOD  firstly used off-the-shelf 2D feature extrac- tors to capture the feature maps from the images and the multi-view representations of the pointclouds (e.g., Bird Eye View and Front View), which are then typically fused together by a sum or a concatenation operation. A Region Proposal Network (RPN) is then applied to the fused feature maps to generate 3D bounding box proposals, followed by a refinement network for final 3D bounding box prediction. The advantages of this method are that mature 2D object detector and 2D feature extractor technologies are available to be applied to the multi-view representations of the pointclouds. Furthermore, the features from different sensors can interact over the stacked layers, as these features are normally obtained from similar or even the same neural networks. Liang et al.  utilizes the continuous convolution method to fuse the feature maps of the images and BEVs. Specifi- cally, this approach proposes a continuous fusion layer that aggregates each pixel feature in the image feature maps with the features of the neighbouring points in the BEV feature maps to learn a fused local region, which allows us to extract sufficient discriminative features for 3D object detection.
In this paper, we present a method for regularizing noisy 3D reconstructions, which is especially well suited for scenes containing planar structures like buildings. At hori- zontal structures, the input model is divided into slices and for each slice, an inside/outside labeling is computed. With the outlines of each slice labeling, we create an irregularly shaped volumetric cell decomposition of the whole scene. Then, an optimized inside/outside labeling of these cells is computed by solving an energy minimization problem. For the cell labeling optimization we introduce a novel smooth- ness term, where lines in the images are used to improve the regularization result. We show that our approach can take arbitrary dense meshed pointclouds as input and delivers well regularized building models, which can be textured af- terwards.
In the past, the maps usually have been built using sonar sensors and laser range ﬁnders that are able to measure the distance of nearby objects in a single plane only. Due to this two-dimensional characteristic, maps created using these sensors are also two-dimensional. Recently, different sensors like stereo cameras, time- of-ﬂight cameras, or 3D lasers have gained more im- portance which are able to obtain three-dimensional in- formation about the local surroundings. Consequently, 3D representations are necessary to take full advan- tage of these sensors. In previous works, multi-level surface maps , regular voxel representations, and octrees [6, 7] have been proposed as a 3D alternative to 2D occupancy grid maps. However, the common disadvantage of these existing approaches is that they partition the environment into regular cells with a ﬁxed resolution. The choice of this cell resolution is crucial. Today’s navigational tasks like inch-perfect navigation in narrow environments require precise maps with a high resolution. Unfortunately, an increased resolution results in a heavy rise of the memory consumption and the computational costs especially when using three- dimensional maps. On the other hand a too coarse resolution may lead to inconsistencies and to maps with insufﬁcient precision.
ence. Many approaches have been reported over the last decades. Sophisticated classification algorithms, e.g., support vector ma- chines (SVM) and random forests (RF), data modeling methods, e.g., hierarchical models, and graphical models such as condi- tional random fields (CRF), are well studied. Overviews are given in (Schindler, 2012) and (Vosselman, 2013). (Guo et al., 2011) present an urban scene classification on airborne LiDAR and mul- tispectral imagery studying the relevance of different features of multi-source data. An RF classifier is employed for feature evalu- ation. (Niemeyer et al., 2013) proposes a contextual classification of airborne LiDARpointclouds. An RF classifier is integrated into a CRF model and multi-scale features are employed. Recent work includes (Schmidt et al., 2014), in which full wave- form LiDAR is used to classify a mixed area of land and water body. Again, a framework combining RF and CRF is employed for classification and feature analysis. (Hoberg et al., 2015) presents a multi-scale classification of satellite imagery based also on a CRF model and extends the latter to multi-temporal classification. Concerning the use of more detailed 3D geome- try, (Zhang et al., 2014) presents roof type classification based on aerial LiDARpointclouds.
Design/methodology/approach – To meet these interests we intend to set up a Wiki resource as a structured repository. The content will be based on (a) interactive workshops held at conferences to collect and structure knowledge assets on visual knowledge involving experts from different domains. Moreover, (b) a student seminar starting in early 2017 is designated to describe some typical research designs as well as amend related methods and theories in the Wiki resource based on Wikipedia articles. A content structuring principle for the Wiki resource follows the guidelines of Wikimedia as well as plans for the results to be populated again in Wikipedia. Originality/value – While Wiki approaches are frequently used in the context of visual humanities, these resources are primarily created by experts. Furthermore, Wiki-based approaches related to visualization are often focused on a certain disciplinary context as, for example, art history. A unique aspect of the described setting is to build a Wiki on digital 3Dreconstruction including expertise from different knowledge domains – i.e. on perception and cognition, didactics, information sciences, as well as computing and visual humanities. Moreover, the combination of student work and assessments by experts also provides novel insights for educational research.
Therefore, the first and the most important part of generating 3D models of building parts containing tilted roof structures is extracting ridge lines. Arefi (2009) proposed an algorithm to extract the ridge location from high resolution airborne LIDAR data using morphological geodesic reconstruction (Gonzalez and Woods, 2008). Due to a lower quality of DEM created from Worldview stereo images comparing to the LIDAR data, a method relying on only height data does not produce appropriate ridge pixels. In this paper, a method by integrating orthorectified im- age and DEM information is applied for a high quality ridge line extraction (cf. Figure 1). The procedure to extract all the ridge lines corresponding to a building with tilted roofs begins with feature extraction. For this purpose, three feature descriptors are extracted from DEM and ortho image as follows (cf. Figure 2):
trained deep naturalness regularizer, which provides a solution to the problem of blurry mean outputs. In our approach, we also avoid this by training an autoen- coder, which we use to compress the TSDF volumes. Tatarchenko et al  use an octree generative network to reconstruct objects and scenes. However, this relies on the assumption that the coarse prediction steps can always find even small details, which is often not justified. Therefore, we use a block-wise compres- sion to benefit both from a high resolution and an efficient representation. The 3D-EPN approach introduced by Dai et al  can predict object shapes based on sparse input data. Park et al  showed an interesting approach, where instead of reconstructing a volume, they reconstruct for given points a certain SDF value. This however, struggles to generalize for complete scenes because of the missing spatial link between the input image and the output. Matryoshka Networks fuse multiple nested depth maps to a single volume , but the same struggle of generalizisation to full scenes appears.
This paper studies photogrammetric procedure performed by miniature unmanned rotorcraft, a quadrocopter. As a solution to build up 3D model for buildings facades, UAV can be a real asset as it can reach difficult corners of the building and take repeated captures. The high maneuverability of the quadrocopter permits us to take diagonal pictures containing information about the depth of the building, which is very critical factor while generating the 3D model. The aim of the paper is to have the best quality of 2D captured pictures in order to superpose them to generate 3D model of the building. Therefore, an essential part of the paper concentrates on the capturing angles techniques. The generated trajectory takes into consideration the deviation factor of the ultrasonic altitude sensor and GPS module. Tracking the desired coordinates allows us to automate the photogrammetric procedure and wisely use manpower to generate the 3D model.
6.2. VISUAL FEEDBACK FOR MANUAL SCANNING 89 In this work, a classical rendering of triangles is used, as the streaming surface recon- struction generates triangle meshes and no further conversion of the data is required. Moreover, even older graphics hardware supports the acceleration of triangle sets. The triangles and edges of the mesh are loaded to the GPU and stored as a so-called display list. This display list is updated, every time the model changes. However, not every change of a vertex, edge, or triangle requires an update of the list. Typically, the model is updated after a complete range image has been processed. For single stripe systems with a high measurement frequency (e.g. the 3DMo-LSP), an update of the visualized model is performed after the integration of 5 − 10 new range images. This simple rendering concept is sufficient for smaller models, e.g. busts or techni- cal parts, since recent graphics adapters can render models with more than 200, 000 triangles in real time. Other techniques, such as point rendering or level-of-detail extensions that use e.g. the octee structure are possible, but are not discussed in this thesis.