Positioning and perception in LIDAR point clouds Digital Signal Processing

(1)

Contents lists available atScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

Positioning and perception in LIDAR point clouds

Csaba Benedek, Andras Majdik, Balazs Nagy, Zoltan Rozsa, Tamas Sziranyi

^∗

MachinePerceptionResearchLaboratory(MPLab),InstituteforComputerScienceandControl(SZTAKI),EötvösLorándResearchNetwork(ELKH),Kendeu.13-17, H-1111Budapest,Hungary

a rt i c l e i n f o a b s t r a c t

Articlehistory:

Availableonline3August2021

Keywords:

Lidar

Objectdetection SLAM

Changedetection Navigation

Inthe lastdecade, LightDetection andRanging (LIDAR)became aleading technologyofdetailed and reliable 3Denvironment perception. Thispaper gives anoverview ofthe wideapplicability of LIDAR sensorsfromtheperspectiveofsignalprocessingforautonomousdriving,includingdynamicandstatic sceneanalysis,mapping, situation awareness whichfunctionssigniﬁcantlypoint beyondthe role ofa safeobstacledetector,whichwasthesoletypicalfunctionforLIDARsinthepioneeryearsofdriver-less vehicles.ThepaperfocusesonawiderangeofLIDARdataanalysisapplicationsofthelastdecade,and inadditiontothepresentationofastate-of-the-artsurvey,thearticlealsosummarizessomeissuesand expecteddirectionsofthedevelopmentinthisﬁeld,andthefutureperspectivesofLIDARsystemsand intelligentLIDARbasedinformationprocessing.

©²⁰²¹^TheÂuthor(s).^Published^byÊlsevierÎnc.^Thisîsânôpenâccessârticleûnder^the^CC^BY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Thispapergivesan overviewoftherich applicabilityofLIDAR sensors fromtheperspective ofsignalprocessingforautonomous driving,includingdynamicandstaticsceneanalysis.Wefocusona widerangeofLIDARdataanalysisapplications,givingaﬁrsthand experienceaboutthestate-of-the-artandthechallengesofanew depthmappingdevicecategory.

1.1. Motivationandsigniﬁcance

Inrecentdecades,remarkableprogresshasbeenmadeinsen- sor development for environment analysis, which greatly influ- ences thescientific progress inthe fieldsofobject detectionand classification,scenesegmentation,andunderstanding.LightDetec- tionandRanging(LIDAR)sensorsbecameoneofthemostwidely usedsensingtechnologiesinvariousapplicationsofgeo-dataanal- ysis,includingperception,mappingandlocalization.

LIDAR is an active remote sensing technology that uses elec- tromagneticwavesintheopticalrangetodetectanobject(target), determinesthedistancebetweenthetargetandthesensor(range), andmeasuresfurtherphysicalpropertiesofthetargetsurfacesuch asscatteringandreﬂection[1].Thesensorcalculatesthedistance ofthetargetobjectsfromtheechotimeoftheemittedandthede- tectedlaserbeamwherethebeamspreadswiththespeedoflight.

*

Correspondingauthor.

E-mailaddress:sziranyi.tamas@sztaki.hu(T. Sziranyi).

Theresultofthemeasurementisahighlyaccurate3Dpointcloud wherethecoordinatesofthepoints aregiveninalocalorglobal coordinatesystemdependingonthetypeoftheLIDARsystemand theapplicationarea.

LIDARscannerscanbe mountedeithertostaticterrestrialsta- tions or to ground based and aerial moving vehicles. By using terrestrial LIDAR sensors, high density point clouds and notably accurate and largely detailed 3D models can be created, which properties are required in architectural and engineering applications.Mobilelaserscanning(MLS)allowsquicksurveysoftheroad networkandenvironment,furthermore,itcancontributetothelo- calizationandcontrolofmobilerobotsandautonomousvehicles.

This paper addresses the main aspects of the broad applica- tionarea of mobile LIDARsensors inautonomous driving related fields.LIDARshavespecialrolesinautonomousdrivingandtrans- portationandspecialvehiclebased intelligentcontrol systems,as theyareusedparalleltothecamerasystems.Novelhigh-resolution piecescan be builtinthecar’sbody,andthey give animportant accessorytotheon-boardsafety.AlthoughLIDARsarecurrentlythe mostexpensivepieces oftheon-boardsensorsystems,theprices aregoingdownquickly,whiletheapplicationareasarerapidlyex- panded.Theauthors,workingintheMachinePerceptionResearch LaboratoryofSZTAKI,Hungary,havefocusedonawiderangeofLI- DARdataanalysisapplicationsforseveralyears,thusinadditionto thepresentationofastate-of-the-artsurvey,thisarticlealsosum- marizestheirfirsthandexperiencesinthefield.

TodaytheLIDARitselfisstillmostfrequentlyconsideredasthe sensorof safety,since its usage ismainly limitedto reliable free https://doi.org/10.1016/j.dsp.2021.103193

1051-2004/©²⁰²¹^TheÂuthor(s).^Published^byÊlsevierÎnc.^Thisîsânôpenâccessârticleûnder^the^CC^BY-NC-ND^license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

Fig. 1.Data sample of a Velodyne HDL-64 RMB Lidar.

spaceveriﬁcation. However,aswewilldemonstrateinthisstudy, the potential of thistechnology goes far beyond simple obstacle detection, sincethedevelopmentofLIDARtechnologyintermsof temporal and spatial resolution and noise elimination led us to moresophisticated3Dmeasurementsforvariousreal-timepercep- tion,navigationandmappingproblems.

Next,we introducethereadertothelatestexcitingresultsand theirbackgroundinreal-lifeLIDARapplications.

1.2. Outlineofthepaper

First, we show thediversity of laser scanner devices to geta pointcloudofthe3Denvironment.Next,we presentanoverview ofawiderangeofLIDAR-basedapplicationmodulesbuiltoneach other,whichimplementvariousfunctions,includingobjectpercep- tion, classiﬁcation,mappingand localization.We also discussthe opportunities in challenging situations such as extreme weather conditionsorthe availabilityof low-rangeone-beam(plane) sen- sorsonly. Finally,weshow theon-the-ﬂycalibrationoftheLIDAR andcamerasystem.

2. LIDARsensorsandresources 2.1. LIDARsensortypes

LIDAR equipments give us a versatile application and operational richness: static/mobile, 360^◦/wide angle/narrow scan, equidistantscanningresolution/specialbeam-patterns,singleecho/

multipleechos. Wewill seethat LIDARScan be usedinanyarea ofimagingtheworld.

RotatingMulti-beam (RMB)Lidarsystemsprovidea360^◦ ﬁeld ofviewofthescene,withaverticalresolutionequaltothenum- ber ofthesensors,while thehorizontalangleresolutiondepends on thespeed of rotation.Although RMBLidars can producehigh frame-rate pointcloudvideos enabling dynamic event analysis in the3Dspace,themeasurementshavealowspatialdensity,which quicklydecreasesasafunctionofthedistancefromthesensor,and thepoint cloudsmayexhibit particularpatternstypical tosensor characteristic(seeFig.1).Inspecialcases,onlyoneorafewbeams areavailable.

Mobilelaserscanning(MLS)platformsequippedwithtimesyn- chronized Lidarsensors and navigation unitscan rapidly provide very dense and accurate static point clouds from large environments, where the 3D spatial measurements are accurately regis- teredto ageo-referencedglobalcoordinatesystem(Fig. 2).These point clouds may act as a basis for detailed and up-to-date 3D

Fig. 2.Data of a Riegl VMX-450 Mobile Laser Scanning (MLS) system.

HighDeﬁnition(HD) mapsofthecities,whichcanbe utilizedby self-drivingvehiclesfornavigation.

Another recently emerging new technology is the Doppler- LIDAR (Firstmention is [2] for wind measurements): e.g., Black- more¹ hasjust introduced a LIDAR for autonomousdriving with velocity orrotation speed data output. Veryrecent models, such as the Livox sensors, use advanced non-repetitive scanning patterns to deliver highly-accurate details. These scanning patterns evenproviderelativelyhighpointdensityinashortperiodoftime, whichcanbuildupahigherdensityasthedurationincreases.The actually available models can achieve the same or greater point densityasconventional32-lineRMBLIDARsensors.

2.2.LIDARresources

Numerousautonomous drivingdatasets havebeen releasedin the recent years with LIDAR data. The most important ones are listedinTable1withtheirpurpose.Aswecanobserve,fortypical benchmarkproblems,suchasobjectdetection,trackingorSimulta- neousLocalizationandMapping(SLAM),one canchoosebetween variouspublicresources,correspondingtodifferentsensorcharac- teristicand scenariotypes. Amain challenge inthe future,however, will be the timely completion of the available benchmark datasetswithreliablemeasurementandgroundtruthinformation, followingtheappearanceofnewerandnewerLIDARsensorstech- nologies.

3. LIDARbasedobjectperception

Objectperception andrecognition is a centralobjective in LI- DAR based 3D point cloud processing. Though several 3D object detectionandclassiﬁcationapproachescanbefoundinthelitera- ture, dueto the largedifferences indata characteristics obtained by different LIDAR sensors, object perception methods are still stronglysensordependentmakingverychallengingtoadoptthem betweendifferenttypesofLIDARdata.

SinceLIDARsensorsprovideveryaccurate 3Dgeometric information,the localizationandshape recognitionof theobjects can be more intuitively compared to 2D image processing. However, beyondthedifferentsensordatacharacteristics,severalchallenges occur in automatic LIDAR-based object detection and classiﬁca- tion,suchasthesparsityofthedata,variablepointdensity,non- uniformsamplingandinaddition,inclutteredscenesobjectsoften

1 https://blackmoreinc.com.

(3)

Table 1

LIDARdatasetswithdifferent purposes.

Name of the dataset Lane detection Object detection/tracking Segmentation Localization and mapping

Standford Track [3] X

Ford [4] X

KITTI [5] X X X X

Málaga urban [6] X

Oxford RobotCar [7] X

Apolloscape [8] X X X X

KAIST Urban dataset [9] X

KAIST Multispectral [10] X

Multivehicle Stereo Event [11] X

UTBM RoboCar [12] X

Unsupervised Llamas (Bosch) [13] X

PandaSet^* X X

BLVD [14] X

H3D (Honda) [15] X

Lyft level 5 [16] X

NuScenes [17] X X

Waymo Open [18] X

Argoverse [19] X X

SZTAKI-Velo64Road [20] X

SZTAKI CityMLS [21] X X

* https://scale.com/open-datasets/pandaset

occlude each othercausingpartially extractedobjectblobsin the measurements.

Based on the object perception literature, we can deﬁne two maingroups:traditionalgeometrybasedmethodsanddeeplearn- ing based approaches. To handle the expensive calculations be- tweenhugeamountof3Dpointsgeometrybasedmethodsusually adoptsome spacepartitioningtechniquessuchasKd-tree, Octree [22,23] or 3D voxel [24] and 2D grid based methods [25]. Some approaches apply different region growing techniques over tree- based structures to obtain coherent objects. The authors of [26]

present an Octree basedoccupancy gridrepresentation to model the dynamic environment surrounding the vehicle andto detect movingobjectsbasedoninconsistenciesbetweenscans.

In general, building andmaintaining a tree-based structure is expensive,sousually,somekindof3Dvoxelor2Dgridapproaches are appliedforstreamingdata.In[25] the authorspropose afast segmentation ofpoint cloudsintoobjects,whichis accomplished bya standardconnectedcomponentalgorithmina2Doccupancy grid,andobjectclassificationisdone ontherawpointcloud segments with 3D shape descriptors and a SVM classifier. Different voxel grid structures are also widely used to complete various sceneunderstanding tasks,includingsegmentation, detectionand recognition[24].Thedataisstoredhereincubicvoxelsforefficient retrieval of the 3D points. Among geometry based 2D grid approaches [20] implementsa pipelineofageometrybasedground separationstep,atwo-layergridstructurebasedanefficientobject extraction, anda deeplearning based object classification which representstheextractedobjectsintherangeimagedomain.

Other recent techniques focus on deep learning based object detection and classiﬁcation in 3D point clouds. VoxelNet [28] is able to predict accurate bounding boxes utilizing discriminative featurelearning.PointPillars[27] (Fig.3)isastate-of-the-artreal- timeobjectdetectionmethod,whichcanpredictobject-candidates from multiple classes, together with their 3D oriented bounding boxesandclassconﬁdencevalues.

4. Thelimitsofusage:low-resolutionLIDARperceptionand extremecircumstances

Theprevious section(Sect.3)showsthatLidarpatternevalua- tioncanresultinsemanticinterpretation;nowweseethatavery limited(diluted) informationofLIDARscanscan alsobe usedfor accurate perception. In this section, limitationsof LIDAR sensors, resulted challenges, and current solutions are discussed. Besides

Fig. 3.LIDARObjectdetectionresultswithdeeplearningbasedPointPillarsapproach [27].Redboxesshowdetectedvehicles,blueboxespedestrians.(Forinterpretation ofthecolorsintheﬁgure(s),thereaderisreferredtothewebversionofthisarti- cle.)

thedevelopinghigh-end3D LIDARsensors,itisalsoworth inves- tigatingthecapabilityofsensorsoflowerorextremelylowresolu- tion(equippedwithafeworevenonlyonelaserchannel)because ofcost-eﬃciencyandincreasingrobustness.Asoneofthemainef- fectsofextremecircumstances(e.g.,harshweather)isinformation loss;andinstallingmorethanoneplanarorafew-layerLIDARsin task-basedoptimizedpositions[29] mayresultinabetteralterna- tive(in somepoint ofview)thanusingonewithhigh-resolution.

Thelimitedinformationcontentofscannerswithlowverticalreso- lutionmakesahigh-levelsemanticinterpretationofthedatamore challenging and makes the easier real-time running of the algorithms.Naturally,machinevisioninthissubﬁeldhasgonethrough rapiddevelopmentinrecentyearsaswell.

4.1. VisionwithLIDARsofextremelylowresolutions

2Drangescannersandtheirapplicationshavearelativelylong historyin robotics [30]. Automated Guided Vehicles (AGVs)have been using these sensors for decades for safety and navigation purposes.Today, there are products available in the market with extremely high horizontal angular resolution, high scanning frequency,andsafetyguaranteeofthemanufacturer.Also,fullydevel-

(4)

Fig. 4.ExamplesofplanarLIDARsensorandpointcloudacquiredbyaplanarLIDAR.

oped, real-timescan matchingandSimultaneous Localizationand Mapping algorithms[31] availableinindustrial andmarket prod- uctsbasedononly2Dlaserscannerdata;makinglocalizationand mappingoneoftheirprimaryapplicationarea.

Besidesnavigation,wewouldliketoextractasmuchinforma- tion from the available data as possible about the environment.

The point cloud processing algorithms can directly utilize point cloudsofthese2Dscanners.Sotherearevarioussolutionsforob- jectdetectionandrecognitionfromthistypeofdata.Onecanuse handcraftedgeneral[32],dataspeciﬁcfeatures[33],imagedescrip- tors[34] orneuralnetworks[35].ApplyingLIDARswith(stillvery) narrow verticalﬁeld of view [36] makes theinformation content richerandsemanticinterpretationeasier.

Wecandistinguishtwodifferentreasonsthatresultindealing withlow(vertical)resolutionLIDARpointclouds.Theﬁrstcaseis when the hardware limits the resolution (LIDAR layers) because wemeasure withaplanarLIDARoronewith4-8layers.The sec- ond case when our LIDAR hasa suﬃcient number of layers (16 andabove),butitsusagescenariolimitstheacquiredpointcloud’s verticalresolution.Atypical exampleofthiskindofscenario isa highway.Wewouldliketolookfar(becauseofthehighspeedand straightroadsections),butdistantobjectswilloccupyonlyavery few layers ofour LIDAR (with highvertical resolution) measurement.We willshow whatkindofsolutions havebeendeveloped to these two particular cases oflow-resolution LIDAR perception recently.

4.1.1. Hardwarelimitedlow-resolutionperception

As mentioned earlier, planar and narrow vertical field LIDARs (a2D LIDAR sensorandframe acquiredby itintilted position is illustratedinFig.4²)arefrequentlyusedinlogistictransportation systems(onAGVs),notjustfornavigationbutforspecificpurposes (e.g.,overhangdetection, seeFig.5).Thesensorswithspecificpo- sitions, thespeed ofthetransport vehicle(about1 m/s),andthe presence ofpositioning sensor [37] inthe vehicle(for navigation purpose) aremaking adequatetousethe 2Dsensordata ina3D reconstruction.Thiswillresultinaveryspecialpartialpointcloud data, incrementally givingmore andmoreinformation aboutob- jects. We proposed a solution to deal with this specific kind of LIDARdatain[38].

The proposed method’smain idea to deal withpartial clouds to comparestatistics oflocal structures. Thesteps of itare sum- marizedbelow:First,localsurfacedeﬁnitionaroundeach pointis needed. We measure thesaliency ofthe point by 3D Harris[39]

operator. Next,to determinea repeatablenumberof keypoints,a localscaleisassignedtosigniﬁcantpoints,andalocalsurfacede- scriptorcharacterizes keypoints.After it, we deﬁne localpatterns asgraphsofkeypoints.Inthelaststep,thefrequencyoflocalpat- ternsiscompared.

2 https://www.sick.com.

Fig. 5.Tiltedsensorinstallationforoverhangdetection.Photosource:SICK- Eﬃcient solutionsformaterialtransportvehiclesinfactoryandlogisticsautomation.

Besidesourmeasurements,weusedan MLSdatabaseforreal- lifetesting. (Thesetypes ofpointcloudsareacquiredsimilarly as described above.) Inthis database, we measured73.3% classiﬁca- tionaccuracyfor5classeswhenonlyabout20%ofthe3Dobject wasvisible,and80.0%accuracy for30% visibilitywhichresultsin a usable andsafe incremental predictionfor early decisions.For moredetails,see[38].

4.1.2. Scenariolimitedlowresolutionperception

ItfollowsfromtheLIDARmeasurementprinciplethattheden- sity of the acquired point cloud decreases with distance from the sensor. Resulting in the phenomenon of measuring a high- resolution sensor butdistant objects are not observable insuﬃ- cient resolution.In thecaseof a sensorwitha lower numberof channels,ithappensinthenear-ﬁeldtoo.(Thecasewhenweper- ceiveanobjectonlyinonelayerisalsonotrare.)

In this type of object, local surface information is not ex- tractable, so we cannot expect methods based on that, designed for2.5D point clouds, to work. To solve this problem, we relied onmethods designedfor 2D point cloudsandextended them to utilizeall the information available [40]; we proposed a method toclassifyobjects withthispoint cloudcharacteristicconstructed from the steps below: First, generating a shape descriptor for object segments using low-frequency components to be robust againstangular resolutiondrop;Then, we extract statisticalmea- suresofgeometriescodingthe3Dlocationofthe(approximately) 2D curve. After it, we group tracklets (tracks up to 5 frames)of segments (ifthere are anyavailable). Thenext step is classifying of segments (or tracklets of segments) with CNN (Convolutional Neural Network). Finally, an object-level decision is made with maximumlikelihoodaggregationofsegmentclassprobabilities(if morethanonesegmentisavailablefromanobject).

Withtheproposedmethod,wereached96,6%classificationac- curacy(andevenbetterifwecouldperceiveandtrackandobject segment more than one frame) for 6 categories on such point clouds of the KITTI [5] database where objects were present at maximum 4 LIDAR layers (with 41 m average distance to the sensor).These pointclouds cannot behandled withconventional methods(andsoignoredinmostcases).Illustrationofobjectclas- sificationwiththeproposedmethodofatypicalscenario(highway observedwith relativelynarrow vertical field LIDAR) isshownin Fig.6.Forfurtherdetailsandexperimentalproofsee[40].

(5)

Fig. 6.Classiﬁcation ofobjectsobservedatmax4layers.Colormap:Red- vehicle, Blue- guardrail,Green- ground.

4.2. Robustnessinharshweatherconditions

Harshweatherconditionscallforchallengingproblems:LIDARs havedecreasedperformanceinsnow/rainorfog.Thiskindoflim- itation hastobe addressedby semantic-basedmethodsorphysi- callymodeledﬁlters.

Recent researches target hardware [41] andsoftware [42] de- velopments to eliminate this effect.To avoidthe problem above, alternativedevicescanbe used,noisymeasurementhastobe fil- tered [43] and incompletedatahasto be completed.Researchers have just started developingthe first stages ofthe solution, recognizing the given weather conditions [44], and examining the influence ofdifferentones [45]. Tosupport that,pursuit datasets inadverseweatherhavebeenreleasedlatterly[46].

5. LIDARbasedlocalizationandmapping

The capability of recognizing patterns in LIDAR point clouds led to high precision odometry techniques in SLAM and similar methods.Next,wewillbrieﬂysummarizethestate-of-the-artalgo- rithmsandcurrenttrendsin LIDAR-basedego-motionestimation, 3Dmapping,andlocalization.

5.1. Visual-odometryusingLIDARs

Recently,severalvisual-odometryalgorithms wereproposed to compute themotionofa vehicleinrealtimeusingonlythecon- tinuously streaming data gathered by the LIDAR sensor. Creating thus LIDAR-only odometry methods eliminates the need for any othersupplementarysensor,e.g.,InertialMeasurementUnit(IMU), wheelencoder,andsatellitebasedGlobalPositioningSystem(GPS).

One of the best performing algorithms in terms of translational and rotationalerrors on the KITTI [47] dataset isthe LOAM [47]

algorithm, whichestimates thesixDoF (Degree ofFreedom) dis- placementofthe vehicleonshorttrajectorieswithverylow drift in scenes of high-density feature points and available reference ground planes.The algorithm can process the measurements ro- bustlyfordifferentLIDARsensorswithvaryingpointclouddensi- ties.However,inthecaseoflongtrajectoriesandsincethedriftis continuouslyaccumulated,asigniﬁcanterrorcouldbuildupinthe positionestimation.

5.2.SimultaneouslocalizationandmappingwithLIDARs

In order to correct the accumulated error in the odometry backend,loop-closuresituationscanbedetectedbyplacerecogni- tionalgorithmswheneverthevehiclereturnstopreviouslyvisited placesinthenavigationarea.InthecaseofSimultaneousLocaliza- tionandMapping(SLAM)algorithms,itisassumedthatthevehi- cleexploresfortheﬁrsttimethegivenenvironment,andtherefore there is no a priori map to localize itself against. Recently, the SegMap[48] algorithmwas proposedtoextract andmatchLIDAR segmentsin3Dpointclouds.SegMapcomputesadata-drivencom- pactdescriptortoextractdistinctiveandmeaningfulfeaturesfrom point cloud segments in order to identifyloop-closure situations alongthetrajectory.

Inorderto increasethe robustnessandprecision ofthelocal- izationalgorithminfeature-poorenvironments, aframework was proposed in LIO-SAM [49] to tightly couple LIDAR and inertial measurements obtained from an IMU. Also, the proposed archi- tectureallowstheintegrationofGPS measurementsincasethese are available. Further on, by adapting the factor graph optimiza- tionframeworktheLIDARInertialSub-system(LIS)wasfusedwith a traditionalmonocular-based Visual Inertial Sub-system(VIS) to createa Lidar-Visual-Inertial (LVI-SAM) localization andmapping system[50].Converselyto thesemethods,next we willshowthe outcomesof a LIDAR-only odometry andlocalization method for urbanenvironmentswhereatargetmapexiststolocalizewithin.

5.3.LIDAR-onlyodometryandlocalizationin3Dpointcloudmaps

Accurate3D citymodels andhigh-deﬁnition mapsarebecom- ing increasingly available with recent mapping technology ad- vancements. In addition, in many real-world applications, maps are available to localize against. Therefore, these should be uti- lizedtocorrecttheaccumulateddriftalongthevehicle’strajectory wheneverageometricallysimilarlocationisdetectedbetweenthe online3Dpointcloudandtheoﬄinemap.

In [51] we proposed LOL, a LIDAR-only Odometry and Local- izationalgorithmthatintegratestheadvantagesoftheLOAM [47]

odometryandtheSegMap[48] algorithm.Inthe odometryback- end, theLOAM algorithmestimatesthe sixDoF odometryinreal time based only on thecontinuously streaming point cloud data fromaVelodyneLIDAR sensor.Ina sceneofhigh-densityfeature points andavailable referencegroundplanes,thealgorithm com- putes the displacement of the vehicle on short trajectories with very low drift using only the consecutive Lidar measurements.

The algorithm can process the measurements robustly fordiffer- ent Velodynesensors with varying point cloud densities. Onthe other hand, in the case of long trajectories and since the drift iscontinuouslyaccumulated,a signiﬁcanterrorcould beaccumu- latedintheestimationthatneedstobecanceledbyalocalization methodwheneveracorrectmatchisdetected betweentheonline Lidarstreamandtheoﬄine referencemap.Therefore, forthelo- calization frontend, we integrated the SegMap method that is a state-of-the-art algorithm for the extraction and matching of 3D pointcloudsegments.

We also included some additional improvements in the ac- ceptanceofcorrectmatchesby applyingfurthergeometrical con- straintscomplementingthefeaturesimilarityones.Namely,oncea goodmatchisdetectedbetweentheonlinemeasurementsand the target map, we only search for similar 3D Lidar segments (with relaxedsimilarityconstraints)inthe neighborhood ofthe current locationdefinedby the locationuncertainty. Inaddition,we only usetheshift betweenthetarget map andtheonlinesource seg- mentscentroidsasaprior,andwerefinethefinaltransformation by applying a fine-grained ICPmatching betweenthe two point clouds.

(6)

Fig. 7.ResultsoftheLOLlocalizationalgorithmwithrespecttothegroundtruth maponvariouslengthKITTI[5] datasets:LOLalgorithm(greenline),LOAMtrajec- tory(redline)withrespecttothegroundtruthpointcloudmap.

Wetestedtheproposed algorithmonseveralKitti[5] datasets, cf. Fig.7,andfoundaconsiderableimprovementintermofpreci- sionwithoutaddingasigniﬁcantcomputationalcostincrease.

5.4. LocalizationindenseLIDARmaps

LIDAR measurements can alsobe utilizedforaccurate self lo- calizationofself-drivingvehicles(SDV)inhighresolution3Dpoint cloud maps of the environment. Asolution provided in [52] can robustly register the sparse RMB Lidar point clouds of the SDVs tothedenseMobile LaserScanning(MLS)pointclouddata,start- ingfromaGPSbasedinitialpositionestimationofthevehicle.The main stepsofthemethod arerobust objectextractionandtrans- formation estimationbasedon multiplekeypointsextractedfrom theobjects andadditionalsemanticinformationderived fromthe previouslysegmentedMLSbasedmap.

6. SemanticsegmentationofMLSpointclouds

DenseMLSpointcloudscanactasabasisfordetailedandup- to-date3D HighDefinition (HD)mapsofthecities,whichcanbe utilized by self-driving vehiclesfor navigation,orby city author- ities forroadnetworkmanagement andsurveillance, architecture, or urban planning. All of theseapplications require semantic labeling ofthe data (Fig. 8). While the high speed of point cloud acquisition isa clearadvantageofMLS,duetothehugedatasize yielded by each daily mission, applyingefficient automated data filtering and interpretation algorithms in the processing side is cruciallyneeded,whichstepsstillintroduceanumberakeychal- lenges.

Taking the raw MLS measurements, one of the critical issues isthephantom effectcausedbyindependentobjectmotions[21].

Duetothesequentialnatureoftheenvironmentscanningprocess, scene objects moving concurrently withthe MLS platform (such aspassing vehiclesandwalkingpedestrians) appearasphantom- like longdrawn, distortedstructures inthe resulting point clouds [53].Itisalsonecessarytorecognizeandmarkallmovablescene elements suchaspedestrians,parkingvehicles [54] ortramsfrom theMLSscene.Ontheonehand,theyarenotpartofthereference backgroundmodel,thustheseregionsmustbeeliminatedfromthe HD maps. Onthe other hand,the presenceof theseobjects may indicatelocationsofsidewalks,parkingplaces,etc.Column-shaped objects, such aspoles, traﬃc signbars [55], tree trunksare usu- allygoodlandmarkpoints fornavigation.Finally,vegetationareas (bushes,treefoliage)shouldalsobespeciﬁcallylabeled[56]:since they are dynamically changing over the whole year, object level changedetectionalgorithmsshouldnottakethemintoaccount.

Whilea numberofvariousapproacheshavealreadybeenpro- posed for general point cloud scene classiﬁcation, they are not focusingonallpracticalchallengesoftheaboveintroducedwork- ﬂowof3DmapgenerationfromrawMLSdata.Inparticularly,only

a few related works havediscussed the problem of phantom re- moving.Point-levelandstatisticalfeature basedmethods suchas [57] and[58] examinethelocaldensityofa pointneighborhood, butas noted in [59] they do not take into account higher level structuralinformation,limitingthedetectionrateofphantoms.The taskissigniﬁcantlyfacilitatedifthescanningposition(e.g.,bytri- pod based scanning [60]) or a relative time stamp (e.g., using a rotating multi-beam Lidar [61]) can be assigned to the individ- ual points or point cloud frames, which enables the exploitation ofmulti-temporalfeaturecomparison.However,inthecaseofour examinedMLSpointclouds,nosuch informationisavailable, and allpointsarerepresentedinthesameglobalcoordinatesystem.

Severaltechniquesextractvariousobjectblobcandidatesbyge- ometricscene segmentation [55,20], then the blobs are classiﬁed using shape descriptors, or deep neural networks [20]. Although thisprocess can be notably fast, the main bottleneck of the ap- proachisthat itlargely dependsonthe qualityofthe objectde- tectionstep.

Alternativemethods implementa voxel level segmentation of the scene, where a regular 3D voxel grid is fit to the point cloud,andthevoxelsareclassifiedintovarioussemanticcategories such as roads, vehicles, pole like objects, etc. [56,62,63]. Here a critical issue is feature selection for classification, which has a wide bibliography.Handcraftedfeatures are efficientlyapplied by a maximum-margin learning approach forindoor objectrecogni- tion in[64]. Covariance, point density,and structuralappearance informationisadoptedin[65] byarandomforestclassifiertoseg- mentMLSdatawithvaryingdensity.However,asthenumberand complexity of the recognizable classes increase, finding the best featuresetbyhandinduceschallenges.

Deeplearningtechniqueshavebeenwidelyusedforpointcloud sceneclassification inrecentyears,followingeitherglobalorlocal (window based) approaches. Globalapproachesconsider informa- tionfromthecomplete3Dsceneforclassificationoftheindividual voxels, thus the main challenge is to keep the time and mem- ory requirements tractable in large scenes. The OctNet method implementsa newcomplex data structure forefficient 3D scene representation,whichenablestheutilizationofdeepandhighres- olution3Dconvolutional networks[66].From apractical pointof view,by OctNet’s training dataannotation operators should fully labelcomplete point cloud scenes, whichmight be an expensive process.

Sliding window based techniques are usually computationally cheaper,astheymovea3Dboxoverthescene,usinglocallyavail- ableinformationfortheclassiﬁcationofeachpointcloudsegment.

TheVote3Deep [62] assumesa fixed-size objectboundingbox for each class to be recognized, which might be less efficient ifthe possiblesizerange ofcertainobjects iswide.A CNNbasedvoxel classification method has recently been proposed in [63], which uses purely local features, coded in a 3D occupancy grid as the inputofthe network.Nevertheless,they didnotdemonstratethe performanceinthepresence ofstrongphantomeffects,whichre- quireaccuratelocaldensitymodeling[58,59].

The multi-view technique [67] projects the point cloud from several(twelve)different viewpointsto 2D planes,and trains2D CNNmodels fortheclassiﬁcation. Finally,theobtainedlabels are backprojectedto the3D pointcloud.Thisapproach presentshigh qualityresultsonsyntheticdatasetsandinpointcloudsfromfac- tory environments, where due to careful scanning, complete 3D pointcloud modelsofthesceneobjectsare available.Application forMLSdatacontaining partiallyscannedobjectsisalsopossible, butthe advantages over competingapproaches are reduced here [67].

PointNet++ [68] introducesa hierarchical neural network for pointsetclassiﬁcation.The methodtakesrandom sampleswithin agivenradiusoftheexaminedpoint,soitdoesnotexploitdensity

(7)

Fig. 8.Labeling result of the proposed 3D CNN based scene segmentation approach (test data provided by Budapest Közút Zrt.)

features. Results are demonstrated on synthetic and indoor data samples,withdenseandaccuratespatialdataandRGBcolorinfor- mation.

The SimilarityGroupProposalNetwork (SGPN) [69] uses Point- Net++âsâ^backbone^featureêxtractor,ând^presentsperformance improvementbyaddingseveralextralayerstothetopofthenet- work structure. However, as noted by the authors, SGPN cannot process large scenes on the order 10⁵ or more points [69], due tousingasimilaritymatrixwhosesizescales quadraticallyasthe number of points increases.This property is disadvantageous for MLSdataprocessing,whereatypicalscenemaycontainover10⁷ points.

The SparseLatticeNetwork (SPLATNet_3D) [70] is a recenttech- niquewhichableto dealwithlarge pointcloud sceneseﬃciently by using a Bilateral Convolution Layer (BCL). SPLATNet3D [70]

projectstheextractedfeaturestoalatticestructure,anditapplies sparseconvolutionoperations.Similarlytovoxelbasedapproaches, the lattice structure implements a discrete scene representation, whereoneshouldaddressunder- andoversegmentationproblems dependingonthelatticescales.

The C²CNN technique introduced in [21] is based on two- channel3Dconvolutionalneuralnetwork(CNN),andisspeciﬁcally improvedtosegmentMLSpointcloudsintoninedifferentseman- ticclasses,whichcanbeusedforhighdeﬁnitioncitymap gener- ation. The main purposeofsemantic point labelingis to provide a detailed and reliable background map for self-driving vehicles (SDV),whichindicatestheroadsandvariouslandmarkobjectsfor navigation and decision support of SDVs. This approach consid- ers several practical aspects of raw MLS sensor data processing, includingthepresenceofdiverseurbanobjects,varyingpointden- sity,andstrongmeasurementnoise ofphantomeffectscausedby objectsmovingconcurrentlywiththescanningplatform.Weeval- uate the proposed approach on a manually annotated new MLS benchmark set, andcompareour solution to threegeneral refer- encetechniquesproposedforsemanticpointcloudsegmentation.

Anumericalcomparisonbetweenmanyoftheabovementioned methodsisshowninTable2,usingtheSZTAKI CityMLSBench- markSet[21].³

3 url:http://mplab.sztaki.hu/geocomp/SZTAKI-CityMLS-DB.html.

7. ChangedetectionusingonboardLidarandMLSmaps

For self-driving car navigation and environment perception, changedetectionbetweentheinstantlysensedRMBLidarmeasure- mentsandtheMLSbasedreferenceenvironmentmodelappearsas acrucialtask,whichindicatesanumberofkeychallenges.Particu- larly,thereisasigniﬁcantdifferenceinthequalityandthedensity characteristicsofthei3DandMLSpointclouds,duetoatrade-off betweentemporalandspatial resolution oftheavailable 3D sensors.

In recent years various techniques have been published for change detection in point clouds, however, the majority of the approaches rely on dense terrestrial laser scanning (TLS) data recordedfromstatictripodplatforms[71,72].Asexplainedin[71], classificationbasedoncalculation ofpoint-to-pointdistancesmay beusefulforhomogeneousTLSandMLSdata,wherechangescan bedetecteddirectlyin3D.However,thepoint-to-pointdistanceis verysensitivetovaryingpointdensity,causingdegradationinour addressed i3D/MLS cross-platform scenario. Instead, [71] follows aray tracing andoccupancy map basedapproachwithestimated normalsforefficientocclusiondetection,andpoint-to-triangledis- tances for more robust calculation of the changes. Here the De- launaytriangulationstep maymeana criticalpoint, especiallyin noisy andcluttered segments of the MLSpoint cloud, which are unavoidably present in a city-scale project. [72] uses a nearest neighborsearchacrosssegmentsofscans:foreverypointofaseg- menttheyperformafixedradiussearchof15cminthereference cloud.Ifforacertainpercentageofsegmentpointsnoneighboring points couldbe found foratleastone segment-to-cloud compar- ison, the objectis labeled there asmoving entity. A method for changedetectionbetweenMLSpointcloudsand2Dterrestrialim- ages is discussed in [73]. An approach dealing with purely RMB Lidarmeasurementsispresentedin[74],whichusesaraytracing approachwith nearest neighbor search.A voxelbased occupancy technique is applied in [75], where the authorsfocus on detect- ingchanges inpointcloudscapturedwithdifferentMLSsystems.

However,thedifferencesindataqualityoftheinputsarelesssig- niﬁcantthaninourdiscussedcase.

In [76] authors proved that change detection can be accel- erated if they compare only keyframes to the map or previous frames. Here, keyframes are the ones that contain changes with highprobability.[76] proposedasolutiontoﬁndthesekeyframes by exploiting mapping residuals. The authors demonstrated the

(8)

Table 2

Quantitativecomparisonofvariouspointcloudsegmentationtechniques[63],[67],[68],[70] and[21] ontheSZTAKI CityMLSbench- markset.

Class OG-CNN [63] Multi-view [67] PointNet++ [68] SPLATNet^xyz_rgb[70] C²CNN [21]

Pr Rc F-r Pr Rc F-r Pr Rc F-r Pr Rc F-r Pr Rc F-r

Phantom 85.3 34.7 49.3 76.5 45.3 56.9 82.3 76.5 79.3 83.4 78.2 80.7 84.3 85.9 85.1

Pedestrian 61.2 82.4 70.2 57.2 66.8 61.6 86.1 81.2 83.6 80.4 78.6 79.5 85.2 85.3 85.2

Car 56.4 89.5 69.2 60.2 73.3 66.1 80.6 92.7 86.2 81.1 89.4 85.0 86.4 88.7 87.5

Vegetation 72.4 83.4 77.5 71.7 78.4 74.9 91.4 89.7 90.5 86.4 87.3 86.8 98.2 95.5 96.8

Column 88.6 74.3 80.8 83.4 76.8 80.0 83.4 93.6 88.2 84.1 89.2 86.6 86.5 89.2 87.8

Tram/Bus 91.4 81.6 86.2 85.7 83.2 84.4 83.1 89.7 86.3 79.3 82.1 80.7 89.5 96.9 93.0

Furniture 72.1 82.4 76.9 57.2 89.3 69.7 84.8 82.9 83.8 82.6 81.3 81.9 88.8 78.8 83.5

Overall 76.9 74.2 75.5 72.5 73.4 72.9 85.6 87.5 86.5 82.5 83.7 83.0 90.4 90.2 90.3

Note:VoxellevelPrecision(Pr),Recall(Rc)andF-rates(F-r)aregiveninpercent(overallvaluesweightedwithclasssigniﬁcance).

performanceoftheproposedmethodinreal-lifeexperimentswith anAGVequippedwitha2DLIDARsensor.

In[77] theauthorsintroducedanewtechniqueforchangede- tectioninurbanenvironmentbasedonthecomparisonof3Dpoint cloudswithsignificantlydifferentdensitycharacteristics.Thisap- proach extracts moving objects and environmental changes from sparse and inhomogeneous instant 3D (i3D) measurements, us- ingasreferencebackgroundmodeldenseandregularpointclouds capturedbymobilelaserscanning(MLS)systems(seeFig.9). The introduced workflow consists ofconsecutive steps ofpoint cloud classification, crossmodal measurement registration,Markov Ran- domField(MRF) basedchange extractionintherangeimage do- mainandlabelbackprojectionto3D.Experimentalevaluationhas beenconductedinfourdifferenturbanscenes,andtheadvantage of theproposed changedetection stepis demonstratedagainst a referencevoxelbasedapproach.

8. Camera-Lidarcalibration

Nowadays,state-of-the-artautonomoussystemsrelyonawide rangeofsensorsforenvironmentperceptionsuch asoptical cameras,radarsandLidars,thereforeefficientsensorfusionisahighly focused research topic in the fields of self-driving vehicles and robotics. Though theresolution andtheoperation speed ofthese sensors have significantly improved in recent years, and their priceshavebecomeaffordableinmassproduction,their measurements have highly diverse characteristics, which makes the efficient exploitation of themultimodal datachallenging. While real timeLidars,suchasVelodyne’srotatingmulti-beam(RMB)sensors provideaccurate3Dgeometricinformationwithrelativelylowver- ticalresolution, opticalcamerascapture highresolutionandhigh qualityimagesequencesenablingtoperceivelowleveldetailsfrom the scene. A common problem with optical cameras is that ex- tremelightingconditions(suchasdark,or strongsunlight)largely influencethecapturedimagedata,whileLidarsareabletoprovide reliableinformationmuchlessdependingonexternalillumination andweather conditions.On the otherhand,by simultaneous uti- lizationofLidarandcamerasensors,accuratedepthwithdetailed textureandcolorinformationcanbeobtainedinparallelfromthe scenes.

Accurate Lidar and camera calibration is an essential step to implementrobust datafusion,thus,relatedissuesare extensively studied in the literature [78–80]. Existing calibration techniques can be groupedbasedon a varietyof aspects [78]:basedon the level of user interaction, they can be semi- or fully automatic, methodologically we can distinguish target-based and target-less approaches, and in the term of operational requirements oﬄine andonlineapproachescanbedeﬁned.

As their main characteristics, target-based methods use spe- cial calibration targets such as3D boxes [79], checkerboard patterns [81],a simpleprintedcircle [82],or auniquepolygonalpla-

Fig. 9.Top:DetectedchangesatatramstopinKálvintér,Budapestusing[77].Red, blueandgreenpointsrepresentbackgroundobjects,foregroundobjectsandground regions,respectively.Bottom:MLSlaserscanofthetramstop.

narboard [83] during thecalibrationprocess.In thelevel ofuser interactions, we can subdivide target-based methods into semi- automatic and fully-automatic techniques. Semi-automatic meth- odsmayconsistofmanymanualsteps,suchasmovingthecalibra- tionpatternsindifferentpositions, manuallylocalizingthe target objects both in the Lidar andin the camera frames, and adjust- ing the parameters of the calibration algorithms. Though semi- automaticmethods mayyieldveryaccurate calibration,theseap- proaches are very time consuming and the calibration results highlydependontheskillsoftheoperators.Moreover,evenawell calibratedsystemmayperiodicallyneedre-calibrationduetoarti- factscausedbyvibrationandsensordeformation effects.

Fully-automatictarget-basedmethodsattempttoautomatically detect previously deﬁned target objects, then they extract and match features without user intervention: Velas et al. [84] detect circular holes on planar targets, Park et al. [83] calibrate Lidar and camera by using white homogeneous target objects, Geiger et al. [81] usecorner detectorsonmultiple checkerboards and Rodriguez et al. [85] detect ellipse patterns automatically.

Though thementioned approachesdo not needoperator interac-

(9)

Fig. 10.Workﬂow of the on-the-ﬂy Lidar-camera registration technique [95].

tions, they still relyon thepresence ofcalibration targets, which often should be arranged in complex setups (i.e., [81] uses 12 checkerboards).Furthermore,during the calibrationboth theplat- formandthetargetsmustbe motionless.

On the contrary, target-less approaches rely on features ex- tracted from the observed scene without using any calibration objects. Some ofthese methods usemotion-based [86–88] information to calibrate the Lidarandcamera, while alternativetech- niques [78,89] attempt to minimize the calibration errors using onlystatic features.

Amongmotion-basedapproaches,HuangandStachniss [87] im- provetheaccuracyofextrinsiccalibrationbytheestimationofthe motionerrors,ShiuandAhmad [86] approximatetherelativemo- tionparametersbetweentheconsecutiveframes,and Shi et al. [90]

calculatesensormotionbyjointlyminimizingtheprojectionerror betweenthe Lidarandthe camera residuals.These methodsﬁrst estimatethetrajectoriesofthecameraandLidarsensorseitherby visual odometry and scan matching techniques, or by exploiting IMUandGNSSmeasurements.Thereaftertheymatchtherecorded cameraandLidarmeasurementsequencesassuming thatthesen- sorsarerigidlymountedtotheplatform.However,the accuracyof these techniquesstrongly dependson the performance of trajec- toryestimation,whichmaysufferfromvisual featurelessregions, lowresolutionscans [91],lackofhardwaretriggerbasedsynchro- nization betweenthecamera andthe Lidar [90], or urban scenes withoutsuﬃcientGPS coverage.

We continue the discussion withsingle frame target-less and feature-based methods. Moghadam et al. [89] attempt to detect correspondencesby extractinglines bothfromthe3DLidarpoint cloud and the 2D image data. While this method proved to be eﬃcient in indoor environments, it requires a large number of linecorrespondences,a conditionthatcannot oftenbesatisﬁedin outdoorscenes.A mutualinformationbasedapproachhasbeenin- troducedin [92] tocalibratedifferentrangesensorswithcameras.

Pandey et al. [78] attempt to maximize the mutual information using the camera’s grayscale pixel intensities and the Lidar re- ﬂectivity values. Based on Lidar reﬂectivity values and grayscale imagesNapier et al. [93] minimize thecorrelation errorbetween theLidarandthecameraframes. Scaramuzza et al. [94] introduce a new data representation called the Bearing Angle image (BA) which is generated from the Lidar’s range measurements. Using conventionalimageprocessingoperations,the methodsearchesfor correspondencesbetweentheBAandthecameraimage.As alim- itation,target-lessfeature-basedmethodsrequireareasonableini-

tialtransformationestimationbetweenthedifferentsensorsmea- surement [90], and mutual information based matching issensi- tive to inhomogeneous point cloud inputs and illumination arti- facts, which are frequently occurringproblems when using RMB Lidars [78].

In[95],authorsproposedanewfullyautomaticandtarget-less extrinsic calibration approach between a camera and a rotating multi-beam(RMB)Lidarmountedonamovingcar.Thistechnique adoptsastructurefrommotion(SfM)methodtogenerate3Dpoint cloudsfromthecameradata,whichcan bematched totheLidar point clouds;thus, theextrinsiccalibration problemis addressed asa registrationtaskinthe3Ddomain (seeFig.10).Themethod consistsoftwomainsteps:anobjectlevelmatchingalgorithmper- forming a coarse alignment ofthe camera andLidar data, and a finealignment step that implements a control point based point levelregistrationrefinement.Thesuperiorityofthemethodisthat it relieson only the rawcamera andLidar sensor streams without usingany external Global Navigation Satellite System(GNSS) orInertial Measurement Unit(IMU) sensors. Moreover, it is able toautomaticallycalculatetheextrinsiccalibrationparameters between the Lidarand camera sensors on-the-flywhich means we onlyhavetomountthesensorsonthetopofthevehicleandstart drivinginatypicalurbanenvironment.

Note that there exist a few end-to-end deep learning based camera and Lidar calibration methods [80,96] in the literature, whichcanautomaticallyestimatethecalibrationparameterswithin abounded parameterrange basedon asuﬃciently large training dataset.However, the trained models cannot be appliedfor arbi- traryconﬁgurations,and re-trainingisoftenmoreresourceinten- sivethanapplyingaconventionalcalibrationapproach.In addition, the failurecaseanalysisandanalyticalestimationofthe limitsof operationsarehighly challenging forblackboxdeeplearning approaches.

9. Conclusionandfuturedirections

When LIDARs started in autonomous driving systems, it was mainly a part of the supporting development toolkit. For a long time up to now, it was seriously considered that LIDARwas not neededasatraﬃccontrollersensor,since(i)itcannotseethrough bad weather (ii) vulnerable opto-mechanics and (iii) its relative high price. However, today we can get high quality but much cheaper LIDAR sensors, with more robust opto-mechanical solutions;badweather problemscan bepartlyeliminated andsoon.