A cheap and fast way for generating buildingmodels is to obtain 3D information from image sequences. Typically, 3D reconstruction pipelines like Structure-from-Motion (SfM) followed by Multi-View Stereo (MVS), and meshing are used for small and large scale reconstructions. With the increasing research on image-based indoor modeling in the recent past, an integration ofindoorandoutdoormodelsof the same building is consequently the next step. For instance, Fig. 1 shows the reconstruction of our computer-lab which should be connected to the outdoor fac¸ade of the building. When trying to fit a model of the building interior into an existing outdoor model, typically there are no visual corre- spondences for the alignmentusing tie points. Therefore, manual work is needed, like using CAD models or floor plans. An automated way providing the true or at least the most probable locations in the outdoor model assumes to reduce human interaction.
When focusing on the second specific objective (second part of the thesis), the core aspect of it was to integrate image data for improving the planimetrical and topological accuracies of the reconstructed models. This objective was also achieved, while contributing several innovative aspects to the scientific community. It is already proved that the object space linesegments can be derived by the matching of image-based linesegments in projective geometry through the intersection of viewing ray planes. For this matching process, scene constraints were incorporated for minimizing the matching ambiguities within this study. Three well-defined evidences were determined with respect to the scene i.e. roof models reconstructed from point clouds. The gradient of a roof outline, distance of a point to the plane and symmetry between two gutters belonging to two opposite roof pairs were defined in this scenario. Fault correspondences representing edges on the footprint, or beneath the roof outline, or even somewhere on the wall, were avoided using the first two constraints. Having identified the rough symmetries, ambiguities especially relevant to oblique roofs, were further avoided. Similar to many other researchers, this experiment also had to deal with incompleteness issues such as gaps. In addition, some erroneous derivations, such as deviated edges for eave lines, were also found. The effects of these defects were avoided by predicting the most probable boundary edges that could be used to represent such cases. In this regard, known structural arrangements of roof modelsand especially defined convergence priors were applied. Although some false positive and negative narrow regions were given for some roof outlines, the process enabled to accomplish complete refined roof boundaries for each roof model. The evaluation results showed an increased planimetric accuracy (0.55m) for the refined buildingmodels. It is the highest planimetric accuracy when compared with the results of other methods submitted to the ISPRS project. As such, obviously this is a considerable achievement. In addition, correctness of per-object level accuracy and topological accuracy of refined models reached the peak i.e. 100%. These statistics prove that refinement strategies increased both the planimetric and topological accuracies considerably.
In (Rottensteiner and Briese, 2002) DTM and nDSM are generated using the robust interpolation method proposed in (Kraus and Pfeifer, 1998). The initial buildingsegments are produced by thresholding pixels higher than 3.5m and connected component labeling. After morphological refinement texture analysis is employed for separating vegetation from the initial buildingsegments. For this purpose point features proposed by (Fuchs, 1998) are used. The first derivative of the DSM is calculated using a 9 × 9 kernel. For each initial building segment the number of “point-like” pixels is counted. Regions having more than 50% “point-like” pixels are classified as vegetation. Vegetation connected to buildings cannot be properly classified. Therefore, those regions are separated using morphological processing and evaluated again. Vögtle and Steinle (2003) employ fuzzy logic when classifying laser pixels into buildings, vegetation, and terrain. At the beginning an approach named as “ convex-concave hull” (von Hansen and Voegtle, 1999) is employed for generation of DTM from laser data. The normal- ized DSM is generated by subtracting the DTM from laser data which is used for segmentation of3D objects. The gradients on the segment border, difference of first- and last-pulse height, shape, and height textures are the input data for the classification. The gradient values on the boundary of the segment are used to separate non-ground objects, i.e., buildings and veg- etation, from the ground pixels. The difference between first- and last-pulse is employed to discriminate buildings and vegetation. The shape parameter is defined by “parallelism of long segment contour lines” evaluated by the deviation ofline directions assuming that buildings borders are relatively long and parallel to each other.
In contrast to the model-driven strategies, the data-driven meth- ods model the buildings regardless of a specific model library. The roof planes are estimated by segmenting the point cloud into different parts employing methods like edge-based (Jiang and Bunke, 1994), region growing (Rottensteiner, 2006), random sample consensus (RANSAC) (Tarsha-Kurdi et al., 2008), cluster- ing (Shan and Sampath, 2008), or 3D Hough transforms (Vossel- man et al., 2001). (Sohn et al., 2009) proposed a method that re- constructs the buildings from both height and spectral images for creating prismatic buildingmodels based on a binary space par- titioning tree (BSP-Tree). (Zhou and Neumann, 2012) organize planar roof segmentsand roof boundary segments with “global regularities” considering orientation and placement similarities between planar elements in building structures. Despite the in- dependence of data-driven strategies from the predefined model library, common limitations are the irregular or incomplete roof parts due to data noise. As a result, some works explore the pos- sibilities to integrate both strategies for better building modeling. For example, (Verma et al., 2006) introduce the workflow for construction of a 3D geometric model for complex buildings in- corporating a) the segmentation of roof and terrain points, b) roof topology inference by introducing a roof-topology graph and c) parametric roof composition. (Partovi et al., 2013) propose, first, to detect the ridgelines and analyze the height points ofand per- pendicular to ridge lines directions with the help of both orthorec- tified image and DSM. Then, the lines are fitted to ridge points using RANSAC and refined later by matching them or closing gaps in between. Finally, based on reconstructed ridge lines the roof model is selected.
In order to create 3Dmodels without the need of explicit line matching, Jain et al. generate all possible hypothetical straight 3Dlinesegments by triangulating all the detected straight 2D linesegments from different views [JKTS10]. They then keep the one whose back projection on the gradient images of neighboring views has the highest score, assuming that line features correspond to high gradient areas in images. Built upon the same principles whilst applying epipolar constraint on the end-points oflinesegments, Hofer et al. generate less hypothetical 3Dlinesegmentsand thus increase performance significantly while still creating accurate results [HWB13]. However, both approaches are barely possible in the case of infinite line reconstruction, where the detected 2D lines in different views do not exactly correspond to the same part of a 3Dline.
Building detection from aerial and satellite images has been a main research issue for decades and is of great interest since it plays a key role in building model generation, map updating, ur- ban planning and reconstruction (Davydova et al., 2016). Various methods have been developed and difference data sources such as aerial images, digital surface/eleviation models, LIDAR data, multi-spectral images, synthetic aperture radar images, have been used for building detection. In this section we briefly review rele- vant methods in the literature on building detection. Decades ago the initial endeavor for building detection was relying on group- ing of low level image features such as edge/linesegmentsand/or corners to form building hypotheses (Ok, 2013). For instance, a generic model of the shapes ofbuilding was adopted in (Huer- tas and Nevatia, 1988) and shadows cast by buildings were used to confirm building hypotheses and to estimate their height. A computational techniques for utilizing the relationship between shadows and man-made structures to aid in the automatic extrac- tion of man-made structures from aerial imagery is described in (Irvin and McKeown, 1989). An approach to perceptual grouping for detecting and describing 3-D objects in complex images was proposed in (Mohan and Nevatia, 1989) and was illustrated by ap- plying it to the task of detecting and describing complex buildings in aerial images. The vertical and horizontal lines identified us- ing image orientation information and vanishing point calculation were used in (McGlone and Shufelt, 1994) to constrain the set of possible building hypotheses, and vertical lines are extracted at corners to estimate structure height and permit the generation of three dimensional buildingmodels from monocular views. Due to the neglected performance evaluation in building detection, a comprehensive comparative analysis of four building extraction systems was presented in (Shufelt, 1999) and he concluded that none of the developed systems were capable of handling all of
Abstract—Efficient and fully automaticbuilding outline ex- traction and simplification methods are highly demanded for 3D model reconstruction tasks. In spite of the efforts put into developing such methods, the results of the recently proposed methods are still not satisfactory, especially for satellite images, due to object complexities and the presence of noise. Dealing with this problem, in this article, we propose a new approach which detects rough building boundaries (building mask) from Digital Surface Model (DSM) data, and then refines the resulting mask by classifying the geometrical features of the high spatial resolution panchromatic satellite image. The refined mask represents finer details of the building outlines which are close to the original building edges. These outlines are then simplified through a parameterization phase, where a tracing algorithm detects the building boundary points from the refined masks and a set oflinesegments is fitted to them. After that, for each building, the existing main orientations are determined based on the length and arc lengths of the building’s linesegments. Our method is able to determine the multiple main orientations of complex buildings. Through a regularization process, the linesegments are then aligned and adjusted according to the building’s main orientations. Finally, the adjusted linesegments are intersected and connected to each other in order to form a polygon representing the building’s outlines. Experimental results demonstrate that the computed building outlines are highly accurate and simple, even for large and complex buildings with inner yards.
Roof structure reconstruction is a crucial step towards automatic extraction of3Dmodels from DSM data. Having produced a building footprint in the previous steps, we can generate a 3D point cloud by masking the initial nDSM. As complex building roofs often consist of multiple planar surfaces, we first segmented the image into simpler rectangular features based on the general outline of the building. Kada and McKinley (2009) used the term cell decomposition for a similar approach by referring to Foley et al. (1990). The segmentation of the building into cells is use- ful as it prevents plane fitting through noncontiguous parts of the building. Segmenting the image based on every variation of the outline is, however, impractical as it creates a large number of very small segments (Fig. 4, left). Instead, we kept only those lines that feature the longest building edge within a lateral buffer of 15 pixels to each side (Fig. 4, right). It is conceivable at this point to apply a further generalization step to the footprint fol- lowing the cell outline which would make the computation of a generalized building block model with roof (LOD2; Arefi et al. 2008) less complex. This is, however, not implemented here.
In contrast to Local Network Alignment, Global Network Alignment aims to find a correspondence between all nodes of the input networks. A set of methods that follows this definition of Network Alignment has been proposed. To align two networks, IsoRank  uses similarities of neighborhoods and sequences of nodes and represents the problem as an eigenvalue problem. In  Network Alignment is reformulated as a graph matching problem and approximated on relaxations over the set of doubly stochastic matrices (methods PATH and GA). GRAAL  is one of first approaches that follow the “seed-and-extend” strat- egy applied to Global Network Alignment. H-GRAAL  uses the Hungarian algorithm to find the best mapping from a constructed bipartite graph. Note a heuristic that utilizes bipartite graph representation of the input networks and the Hungarian algorithm to resolve it was also proposed to approximate the cost in the related GED problem . PISwap [74, 75] is based on a local optimiza- tion heuristic: It first identifies an optimal alignment based on protein sequence similarity and refines it by propagating topology information. NATALIE 2.0  (see also ) models the alignment as a generalization of the quadratic assign- ment problem and represents it as an integer linear program that is approached with the help of Lagrangian relaxation. MI-GRAAL  uses seed-and-extend strategy together with the Hungarian algorithm to construct weighted bipartite graphs. An alignment is built as a result of solving the maximum-weight match- ing on these graphs. C-GRAAL  also exploits seed-and-extend strategy and alliteratively aligns common neighbors of already aligned nodes. Along with introducing the spectral signatures to measure topological similarity between subgraphs, GHOST  combines a seed-and-extend global alignment phase, where neighborhoods are matched by computing an approximate solution to the quadratic assignment problem, with a local search procedure. An even more sophisticated method, SPINAL , first builds a coarse-grained alignmentand then improves it using seed-and-extend with local refinements. A very fast and efficient method called NETAL  uses a greedy approach to build an align- ment from scoring matrices iteratively updating them. Table 7.1 provides short summary of the methods.
Human beings are exposed to numerous volatile organic compounds (VOCs) in both outside andindoor air environments. More than half of the world’s population now live in cities with signiﬁcant airborne pollution and exposure to outside air can have serious consequences for human health. However, indoor sources of chemicals are also important, particularly since people spend much of their life (93% for the aver- age American) in enclosed spaces such as buildings and vehicles. Furthermore, as architects strive to improve the energy eﬃciency of buildings (e.g passive houses) the in- ternal recirculation of air becomes key for heat conservation and hence indoor air quality becomes an important issue. Known sources ofindoor pollutants include building materials,[59, 90] carpeting, furnishings and products used or stored indoors such as paints[23, 86] and cleaning products. Commonly reported indoor air pollutants include gases such as carbon monoxide, sulphur dioxide, nitrogen dioxide and ozone; microbial debris, selected VOCs and particulate matter.[11, 147] It has been noted that even when the emitted contaminants are present below threshold limit values, they may contribute to a signiﬁcant time-weighted exposure. Humans too are a potent, yet often overlooked, source of chemicals to the indoor air environment. Several hundred VOCs have been reported emanating from human breath, saliva, skin, blood, milk, urine and faeces. The major endogenous compounds emitted in human breath are acetone (1.2 - 1880 ppb), isoprene (12 - 580 ppb), ethanol (13 - 1000 ppb) and methanol (160 - 2000 ppb). However, many other exogenous species may be uptaken (by inhalation and dermal uptake, or on textile fabrics) in outdoor polluted environments such as roadsides and subsequently re-emitted indoors, thereby eﬀectively being imported into more conﬁned domestic spaces. In this study, average VOC emission rates have been determined from a large number of people (8300) under real world conditions so as to include both endogenous and exogenous species. The aim is to provide a representative dataset of typical city dwelling human emission rates that can be used by architects, indoor air quality specialists, and medical researchers. Groups of people (50-238 at a time), were measured in a cinema which served as a convenient enclosed space that was ventilated at a known rate while the audience remained seated. By characterising the human emission rates of VOCs and CO 2 in the real world we may put other indoor
This paper studies photogrammetric procedure performed by miniature unmanned rotorcraft, a quadrocopter. As a solution to build up 3D model for buildings facades, UAV can be a real asset as it can reach difficult corners of the buildingand take repeated captures. The high maneuverability of the quadrocopter permits us to take diagonal pictures containing information about the depth of the building, which is very critical factor while generating the 3D model. The aim of the paper is to have the best quality of 2D captured pictures in order to superpose them to generate 3D model of the building. Therefore, an essential part of the paper concentrates on the capturing angles techniques. The generated trajectory takes into consideration the deviation factor of the ultrasonic altitude sensor and GPS module. Tracking the desired coordinates allows us to automate the photogrammetric procedure and wisely use manpower to generate the 3D model.
1. INTRODUCTION AND RELATED WORK In mobile robotics map building is an important pre- requisite for different navigational tasks. It provides a model of the environment that is essential for collision avoidance, path planning and localization. In the past, a variety of different techniques for representing the environment of a robot have emerged. They can be roughly classiﬁed into metric and topological maps. Topological maps [1, 2, 3] represent the robot’s envi- ronment in a graph-like structure, where the nodes cor- respond to distinctive places, situations or landmarks. Two nodes are connected by an edge if there exists a path for traveling between the places. Topological maps are therefore sparse and compact representations that are suitable for highly efﬁcient path planning and localization in large-scale environments.
gregation concept for boxes that defines which boxes belong to the same feature (building, storey, room, ...). This aggregation can be rep- resented by additional constraints. Depending on those constraints it can be determined whether a side of a box is a wall or just a sepa- ration between boxes of the same object; in that case, it is labeled as ‘invisible’. Fig. 7 gives an example: We start with three boxes form- ing a L-shaped building with a protrusion. By splitting boxes (application of a modified rule R1) a polygonal footprint is generated. Room 1 , for example, is generated by splitting Box 1 two times in different directions. The dotted lines indicate that different boxes belong to the same feature (room). The detailed design of rules for generating complex prism structures will be the topic of a subsequent paper. A fur- ther extension of the approach to extruded polygons with arbitrary angles can be achieved by additional rules which modify walls locally by applying rotations; the (local) topology has to be updated consistently.
In this work, we have presented a novel map matching technique for stereo-vision based submaps. We apply our pre- vious work on local obstacle maps as one of multiple filtering steps in order to gain robust keypoints with discriminative geometric features for the matching process. We evaluated the localization accuracy of our novel submap matching pipeline within our SLAM framework. Therefore, we per- formed experiments in three different scenarios, thereby demonstrating its ability to achieve drift-free and accurate lo- calization in previously unknown indoor, outdoorand mixed environments. In addition, we compare our novel approach to a 3D RBPF SLAM developed in previous work , showing a significant improvement on 2D localization accuracy. Fur- thermore, our approach generates high-resolution 3D maps (3 cm voxel size) of the environment, containing both a full pointcloud for visualization and post-processing as well as the obstacle classification, which can directly be used for path planning. For future work, we plan to approach multi- robot scenarios that involve varying viewpoints as well as dif- ferent sensors for the individual robots. We have shown that our novel approach to map matching already yields robust results w.r.t. changes in viewpoint and light conditions, for example in the mixed scenario. Another challenge for future research is the merging of submaps, once a good relative transformation estimate between them has been found. This is necessary to keep computational and memory requirements within a limited workspace independent of the runtime.
During the last decades, several approaches for the reconstruction of3Dbuildingmodels have been developed. Starting in the 1980s with manual and semi-automatic reconstruction methods of3Dbuildingmodels from aerial images, the degree of automation has increased in recent years so that they became applicable to various areas. Some typical applications and examples are shown in section 1.1. Especially since the 1990s, when airborne light detection and ranging (LiDAR) technology became widely available, approaches for (semi-)automaticbuilding reconstruction of large urban areas turned out to be of particular interest. Only in recent years, some large cities have built detailed 3D city models. Although much effort has been put into the development of a fully automatic reconstruction strategy in order to overcome the high costs of semi-automatic reconstructions, no solution proposed so far meets all requirements (e.g., in terms of completeness, correctness, and accuracy). The reasons for this are manifold as discussed in section 1.2. Some of them are manageable, for example, either by using modern sensors which provide denser and more accurate point clouds than before or by incorporating additional data sources such as high-resolution images. However, there is quite a big demand for 3Dbuildingmodels in areas where such modern sensors or additional data sources are not available. Therefore, in this thesis a new fully automatic reconstruction approach of semantic 3Dbuildingmodels for low- and high-density airborne laser scanning (ALS) data of large urban areas is presented and discussed. Additionally, it is shown how automatically derived building knowledge can be used to enhance existing building reconstruction approaches. The specific research objectives are outlined in section 1.3. It includes an overview of the proposed reconstruction workflows and the contribution of this thesis. In order to have lean workflows with good performance, some general assumptions on the buildings to be reconstructed are imposed and explained in section 1.4. The introduction ends with an outline of this thesis in section 1.5.
Antwerp University; 3 Department of Geography, Toronto University
Background and objectives
Until recently, human exposure to persistent organic pollutants (POP) had been widely considered to occur almost exclusively via the diet. While this appears true for dioxins, it may not be the case for those POPs with indoor use patterns. This paper summarises our research on the causes, levels and human exposure implications ofindoor contamination by polychlorinated biphenyls (PCB) and a number of brominated flame retardants (BFR), in particular polybrominated diphenyl ethers (PBDE), and hexabromocyclododecanes (HBCD). Human health concerns have been raised with respect to each of these chemical classes, creating a need for exposure assessment. While manufacture and new use of PCBs ceased in the UK in the late 1970s, there is an unknown but likely substantial proportion of the ~40 000 t UK usage remaining in use in applications such as window sealants. BFR usage is also substantial, with the (recently banned) Penta-BDE commercial formulations deployed widely in polyurethane foam used in furniture. In addition, substantial manufacture continues of the Deca-BDE product for flame-proofing of textiles and high-impact polystyrene (HIPS) housing for electronic goods such as TVs, as well as of HBCD for use in building insulation, and HIPS. In addition to direct exposure via inhalation and ingestion ofindoor air and dust, there are concerns that such indoor contamination has implications for future dietary exposure, either via emissions from indoor environments during use of treated materials, or as a result of their disposal at end-of-life (Harrad and Diamond, 2006). Given the vast quantities of materials treated with BFRs at per cent levels that will require disposal, this constitutes an issue with substantial implications for sustainable chemicals management.
and are obtained easily through crowd sourcing. On the contrary, books, from which the videos are adapted, are large texts that describe the events (characters, scenes, and interactions) in rich detail. Unlike transcripts, the potential of these text sources needs to be unlocked by first aligning the text units with the video. We propose similarity metrics to bridge the gap between the text and video modalities. Using them, we align plot synopsis sentences with individual video shots, and book chapters with video scenes. To this end, we develop several alignmentmodels that attempt to maximize joint similarity while respecting story progression constraints. We test these approaches on two sets of videos for both plots and books and obtain promising alignment performance. The alignment gives rise to applications such as describing video clips using plot sentences or book paragraphs, story-based video retrieval using plots as intermediaries, and even the ability to predict whether a scene from the video adaptation was present in the original book.
Kurz nach Erscheinen des ersten Kinect Sensors wurde in (Newcombe u. a., 2011) das Ver- fahren KinectFusion vorgestellt. Dieses basiert auf dichten 5 Tiefenbildern von Time-of-Flight oder Structured Light Sensoren, welche mit hohen Bildraten erfasst werden. Das besondere an KinectFusion ist die sehr gute Abbildung von Oberflächen (Abbbildung 1). Dies wird durch die Integration jedes Tiefenbildes in eine globale Voxelstruktur ermöglicht. Jeder Voxel enthält dabei ein Gewicht und den Abstand zur nächsten Oberfläche. Die genaue Position der Ober- fläche lässt sich dann mittels Raycasting 6 anhand des Nulldurchgangs entlang des Strahls ermitteln. (Voxel hinter der Oberfläche erhalten negative Distanzen.) Die Sensorbewegung zwischen zwei Aufnahmen wird ebenfalls mittels einer ICP Variante bestimmt. Dabei erfolgt der Vergleich jedoch nicht nur gegen die vorherige Aufnahme, sondern jederzeit gegen das komplette bisher erstellte Modell. Für verhältnismäßig kleine Szenen (1-3 m Kantenlänge), um die der Sensor herum bewegt werden kann, wird dadurch Drift beinahe vollständig vermieden. Diese aufwendige Verarbeitung kann dank hoch optimierter Algorithmen, die auf Grafikprozes- soren (GPU) massiv parallel ausgeführt werden, mit bis zu 30 Bildern pro Sekunde betrieben werden. Die entstehende Karte kann (in Abhängigkeit vom vorhandenen Grafikspeicher) aus bis zu 512 3 Voxeln 7 bestehen. Im Falle eines Tracking-Verlustes durch zu schnelle Bewegung oder eines bildfüllenden Hindernisses besteht jedoch keine Möglichkeit eine erneute globale Lokalisierung 8 durchzuführen, so dass die Rekonstruktion nur fortgesetzt werden kann, wenn der Sensor ausreichend genau zur letzten erfolgreichen Messposition gebracht wird.
In lane marking extraction process, the σ value for Gaussian smoothing is set to 1.8 to slightly suppress the noise in images. A length threshold on the extracted lines is applied to reduce false positives. Regarding the fact that a dashed lane-line is around 6 meter long which corresponds to 62–87 pixels in image space (when parallel to one of the coordinate axes or in 45 ◦ direction), the extracted lines whose length is less than 65 pixels are rejected. The length of the sliding window should be decided base on the expected curvature of the targeted lineand the robustness of the reconstruction model. In the cases of continuous lane markings on motorways, the sliding window length was fixed to 16 m, or for the last segment it might be up to 24 m long, as a compro- mise between optimization robustness and the minimized system- atic errors arisen from straight-approximated curvature. As to the cases of dashed lane-lines (6 m long), a sliding window is as long as its targeted approximating line, i.e. 6 m length. The step size of sliding window was set to half of the sliding window length. The adapted lens distortion model is none shape-preserving, i.e. a 3D straight line is no more straight in image. However in every independent reconstruction process, only a short 16 m line seg- ment is reconstructed, whose lens distortion correction along this line segment is no larger than one pixel. Thus the bending of a line segment on an image, arose from lens distortion, is ignored in this work, but could be easily removed from the images by rectifying the images (i.e. calculating non distorted images).