• Nem Talált Eredményt

Positioning and perception in LIDAR point clouds Digital Signal Processing

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Positioning and perception in LIDAR point clouds Digital Signal Processing"

Copied!
12
0
0

Teljes szövegt

(1)

Contents lists available atScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

Positioning and perception in LIDAR point clouds

Csaba Benedek, Andras Majdik, Balazs Nagy, Zoltan Rozsa, Tamas Sziranyi

MachinePerceptionResearchLaboratory(MPLab),InstituteforComputerScienceandControl(SZTAKI),EötvösLorándResearchNetwork(ELKH),Kendeu.13-17, H-1111Budapest,Hungary

a rt i c l e i n f o a b s t r a c t

Articlehistory:

Availableonline3August2021

Keywords:

Lidar

Objectdetection SLAM

Changedetection Navigation

Inthe lastdecade, LightDetection andRanging (LIDAR)became aleading technologyofdetailed and reliable 3Denvironment perception. Thispaper gives anoverview ofthe wideapplicability of LIDAR sensorsfromtheperspectiveofsignalprocessingforautonomousdriving,includingdynamicandstatic sceneanalysis,mapping, situation awareness whichfunctionssignificantlypoint beyondthe role ofa safeobstacledetector,whichwasthesoletypicalfunctionforLIDARsinthepioneeryearsofdriver-less vehicles.ThepaperfocusesonawiderangeofLIDARdataanalysisapplicationsofthelastdecade,and inadditiontothepresentationofastate-of-the-artsurvey,thearticlealsosummarizessomeissuesand expecteddirectionsofthedevelopmentinthisfield,andthefutureperspectivesofLIDARsystemsand intelligentLIDARbasedinformationprocessing.

©2021TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Thispapergivesan overviewoftherich applicabilityofLIDAR sensors fromtheperspective ofsignalprocessingforautonomous driving,includingdynamicandstaticsceneanalysis.Wefocusona widerangeofLIDARdataanalysisapplications,givingafirsthand experienceaboutthestate-of-the-artandthechallengesofanew depthmappingdevicecategory.

1.1. Motivationandsignificance

Inrecentdecades,remarkableprogresshasbeenmadeinsen- sor development for environment analysis, which greatly influ- ences thescientific progress inthe fieldsofobject detectionand classification,scenesegmentation,andunderstanding.LightDetec- tionandRanging(LIDAR)sensorsbecameoneofthemostwidely usedsensingtechnologiesinvariousapplicationsofgeo-dataanal- ysis,includingperception,mappingandlocalization.

LIDAR is an active remote sensing technology that uses elec- tromagneticwavesintheopticalrangetodetectanobject(target), determinesthedistancebetweenthetargetandthesensor(range), andmeasuresfurtherphysicalpropertiesofthetargetsurfacesuch asscatteringandreflection[1].Thesensorcalculatesthedistance ofthetargetobjectsfromtheechotimeoftheemittedandthede- tectedlaserbeamwherethebeamspreadswiththespeedoflight.

*

Correspondingauthor.

E-mailaddress:sziranyi.tamas@sztaki.hu(T. Sziranyi).

Theresultofthemeasurementisahighlyaccurate3Dpointcloud wherethecoordinatesofthepoints aregiveninalocalorglobal coordinatesystemdependingonthetypeoftheLIDARsystemand theapplicationarea.

LIDARscannerscanbe mountedeithertostaticterrestrialsta- tions or to ground based and aerial moving vehicles. By using terrestrial LIDAR sensors, high density point clouds and notably accurate and largely detailed 3D models can be created, which properties are required in architectural and engineering applica- tions.Mobilelaserscanning(MLS)allowsquicksurveysoftheroad networkandenvironment,furthermore,itcancontributetothelo- calizationandcontrolofmobilerobotsandautonomousvehicles.

This paper addresses the main aspects of the broad applica- tionarea of mobile LIDARsensors inautonomous driving related fields.LIDARshavespecialrolesinautonomousdrivingandtrans- portationandspecialvehiclebased intelligentcontrol systems,as theyareusedparalleltothecamerasystems.Novelhigh-resolution piecescan be builtinthecar’sbody,andthey give animportant accessorytotheon-boardsafety.AlthoughLIDARsarecurrentlythe mostexpensivepieces oftheon-boardsensorsystems,theprices aregoingdownquickly,whiletheapplicationareasarerapidlyex- panded.Theauthors,workingintheMachinePerceptionResearch LaboratoryofSZTAKI,Hungary,havefocusedonawiderangeofLI- DARdataanalysisapplicationsforseveralyears,thusinadditionto thepresentationofastate-of-the-artsurvey,thisarticlealsosum- marizestheirfirsthandexperiencesinthefield.

TodaytheLIDARitselfisstillmostfrequentlyconsideredasthe sensorof safety,since its usage ismainly limitedto reliable free https://doi.org/10.1016/j.dsp.2021.103193

1051-2004/©2021TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

Fig. 1.Data sample of a Velodyne HDL-64 RMB Lidar.

spaceverification. However,aswewilldemonstrateinthisstudy, the potential of thistechnology goes far beyond simple obstacle detection, sincethedevelopmentofLIDARtechnologyintermsof temporal and spatial resolution and noise elimination led us to moresophisticated3Dmeasurementsforvariousreal-timepercep- tion,navigationandmappingproblems.

Next,we introducethereadertothelatestexcitingresultsand theirbackgroundinreal-lifeLIDARapplications.

1.2. Outlineofthepaper

First, we show thediversity of laser scanner devices to geta pointcloudofthe3Denvironment.Next,we presentanoverview ofawiderangeofLIDAR-basedapplicationmodulesbuiltoneach other,whichimplementvariousfunctions,includingobjectpercep- tion, classification,mappingand localization.We also discussthe opportunities in challenging situations such as extreme weather conditionsorthe availabilityof low-rangeone-beam(plane) sen- sorsonly. Finally,weshow theon-the-flycalibrationoftheLIDAR andcamerasystem.

2. LIDARsensorsandresources 2.1. LIDARsensortypes

LIDAR equipments give us a versatile application and op- erational richness: static/mobile, 360/wide angle/narrow scan, equidistantscanningresolution/specialbeam-patterns,singleecho/

multipleechos. Wewill seethat LIDARScan be usedinanyarea ofimagingtheworld.

RotatingMulti-beam (RMB)Lidarsystemsprovidea360 field ofviewofthescene,withaverticalresolutionequaltothenum- ber ofthesensors,while thehorizontalangleresolutiondepends on thespeed of rotation.Although RMBLidars can producehigh frame-rate pointcloudvideos enabling dynamic event analysis in the3Dspace,themeasurementshavealowspatialdensity,which quicklydecreasesasafunctionofthedistancefromthesensor,and thepoint cloudsmayexhibit particularpatternstypical tosensor characteristic(seeFig.1).Inspecialcases,onlyoneorafewbeams areavailable.

Mobilelaserscanning(MLS)platformsequippedwithtimesyn- chronized Lidarsensors and navigation unitscan rapidly provide very dense and accurate static point clouds from large environ- ments, where the 3D spatial measurements are accurately regis- teredto ageo-referencedglobalcoordinatesystem(Fig. 2).These point clouds may act as a basis for detailed and up-to-date 3D

Fig. 2.Data of a Riegl VMX-450 Mobile Laser Scanning (MLS) system.

HighDefinition(HD) mapsofthecities,whichcanbe utilizedby self-drivingvehiclesfornavigation.

Another recently emerging new technology is the Doppler- LIDAR (Firstmention is [2] for wind measurements): e.g., Black- more1 hasjust introduced a LIDAR for autonomousdriving with velocity orrotation speed data output. Veryrecent models, such as the Livox sensors, use advanced non-repetitive scanning pat- terns to deliver highly-accurate details. These scanning patterns evenproviderelativelyhighpointdensityinashortperiodoftime, whichcanbuildupahigherdensityasthedurationincreases.The actually available models can achieve the same or greater point densityasconventional32-lineRMBLIDARsensors.

2.2.LIDARresources

Numerousautonomous drivingdatasets havebeen releasedin the recent years with LIDAR data. The most important ones are listedinTable1withtheirpurpose.Aswecanobserve,fortypical benchmarkproblems,suchasobjectdetection,trackingorSimulta- neousLocalizationandMapping(SLAM),one canchoosebetween variouspublicresources,correspondingtodifferentsensorcharac- teristicand scenariotypes. Amain challenge inthe future,how- ever, will be the timely completion of the available benchmark datasetswithreliablemeasurementandgroundtruthinformation, followingtheappearanceofnewerandnewerLIDARsensorstech- nologies.

3. LIDARbasedobjectperception

Objectperception andrecognition is a centralobjective in LI- DAR based 3D point cloud processing. Though several 3D object detectionandclassificationapproachescanbefoundinthelitera- ture, dueto the largedifferences indata characteristics obtained by different LIDAR sensors, object perception methods are still stronglysensordependentmakingverychallengingtoadoptthem betweendifferenttypesofLIDARdata.

SinceLIDARsensorsprovideveryaccurate 3Dgeometric infor- mation,the localizationandshape recognitionof theobjects can be more intuitively compared to 2D image processing. However, beyondthedifferentsensordatacharacteristics,severalchallenges occur in automatic LIDAR-based object detection and classifica- tion,suchasthesparsityofthedata,variablepointdensity,non- uniformsamplingandinaddition,inclutteredscenesobjectsoften

1 https://blackmoreinc.com.

(3)

Table 1

LIDARdatasetswithdifferent purposes.

Name of the dataset Lane detection Object detection/tracking Segmentation Localization and mapping

Standford Track [3] X

Ford [4] X

KITTI [5] X X X X

Málaga urban [6] X

Oxford RobotCar [7] X

Apolloscape [8] X X X X

KAIST Urban dataset [9] X

KAIST Multispectral [10] X

Multivehicle Stereo Event [11] X

UTBM RoboCar [12] X

Unsupervised Llamas (Bosch) [13] X

PandaSet* X X

BLVD [14] X

H3D (Honda) [15] X

Lyft level 5 [16] X

NuScenes [17] X X

Waymo Open [18] X

Argoverse [19] X X

SZTAKI-Velo64Road [20] X

SZTAKI CityMLS [21] X X

* https://scale.com/open-datasets/pandaset

occlude each othercausingpartially extractedobjectblobsin the measurements.

Based on the object perception literature, we can define two maingroups:traditionalgeometrybasedmethodsanddeeplearn- ing based approaches. To handle the expensive calculations be- tweenhugeamountof3Dpointsgeometrybasedmethodsusually adoptsome spacepartitioningtechniquessuchasKd-tree, Octree [22,23] or 3D voxel [24] and 2D grid based methods [25]. Some approaches apply different region growing techniques over tree- based structures to obtain coherent objects. The authors of [26]

present an Octree basedoccupancy gridrepresentation to model the dynamic environment surrounding the vehicle andto detect movingobjectsbasedoninconsistenciesbetweenscans.

In general, building andmaintaining a tree-based structure is expensive,sousually,somekindof3Dvoxelor2Dgridapproaches are appliedforstreamingdata.In[25] the authorspropose afast segmentation ofpoint cloudsintoobjects,whichis accomplished bya standardconnectedcomponentalgorithmina2Doccupancy grid,andobjectclassificationisdone ontherawpointcloud seg- ments with 3D shape descriptors and a SVM classifier. Different voxel grid structures are also widely used to complete various sceneunderstanding tasks,includingsegmentation, detectionand recognition[24].Thedataisstoredhereincubicvoxelsforefficient retrieval of the 3D points. Among geometry based 2D grid ap- proaches [20] implementsa pipelineofageometrybasedground separationstep,atwo-layergridstructurebasedanefficientobject extraction, anda deeplearning based object classification which representstheextractedobjectsintherangeimagedomain.

Other recent techniques focus on deep learning based object detection and classification in 3D point clouds. VoxelNet [28] is able to predict accurate bounding boxes utilizing discriminative featurelearning.PointPillars[27] (Fig.3)isastate-of-the-artreal- timeobjectdetectionmethod,whichcanpredictobject-candidates from multiple classes, together with their 3D oriented bounding boxesandclassconfidencevalues.

4. Thelimitsofusage:low-resolutionLIDARperceptionand extremecircumstances

Theprevious section(Sect.3)showsthatLidarpatternevalua- tioncanresultinsemanticinterpretation;nowweseethatavery limited(diluted) informationofLIDARscanscan alsobe usedfor accurate perception. In this section, limitationsof LIDAR sensors, resulted challenges, and current solutions are discussed. Besides

Fig. 3.LIDARObjectdetectionresultswithdeeplearningbasedPointPillarsapproach [27].Redboxesshowdetectedvehicles,blueboxespedestrians.(Forinterpretation ofthecolorsinthefigure(s),thereaderisreferredtothewebversionofthisarti- cle.)

thedevelopinghigh-end3D LIDARsensors,itisalsoworth inves- tigatingthecapabilityofsensorsoflowerorextremelylowresolu- tion(equippedwithafeworevenonlyonelaserchannel)because ofcost-efficiencyandincreasingrobustness.Asoneofthemainef- fectsofextremecircumstances(e.g.,harshweather)isinformation loss;andinstallingmorethanoneplanarorafew-layerLIDARsin task-basedoptimizedpositions[29] mayresultinabetteralterna- tive(in somepoint ofview)thanusingonewithhigh-resolution.

Thelimitedinformationcontentofscannerswithlowverticalreso- lutionmakesahigh-levelsemanticinterpretationofthedatamore challenging and makes the easier real-time running of the algo- rithms.Naturally,machinevisioninthissubfieldhasgonethrough rapiddevelopmentinrecentyearsaswell.

4.1. VisionwithLIDARsofextremelylowresolutions

2Drangescannersandtheirapplicationshavearelativelylong historyin robotics [30]. Automated Guided Vehicles (AGVs)have been using these sensors for decades for safety and navigation purposes.Today, there are products available in the market with extremely high horizontal angular resolution, high scanning fre- quency,andsafetyguaranteeofthemanufacturer.Also,fullydevel-

(4)

Fig. 4.ExamplesofplanarLIDARsensorandpointcloudacquiredbyaplanarLIDAR.

oped, real-timescan matchingandSimultaneous Localizationand Mapping algorithms[31] availableinindustrial andmarket prod- uctsbasedononly2Dlaserscannerdata;makinglocalizationand mappingoneoftheirprimaryapplicationarea.

Besidesnavigation,wewouldliketoextractasmuchinforma- tion from the available data as possible about the environment.

The point cloud processing algorithms can directly utilize point cloudsofthese2Dscanners.Sotherearevarioussolutionsforob- jectdetectionandrecognitionfromthistypeofdata.Onecanuse handcraftedgeneral[32],dataspecificfeatures[33],imagedescrip- tors[34] orneuralnetworks[35].ApplyingLIDARswith(stillvery) narrow verticalfield of view [36] makes theinformation content richerandsemanticinterpretationeasier.

Wecandistinguishtwodifferentreasonsthatresultindealing withlow(vertical)resolutionLIDARpointclouds.Thefirstcaseis when the hardware limits the resolution (LIDAR layers) because wemeasure withaplanarLIDARoronewith4-8layers.The sec- ond case when our LIDAR hasa sufficient number of layers (16 andabove),butitsusagescenariolimitstheacquiredpointcloud’s verticalresolution.Atypical exampleofthiskindofscenario isa highway.Wewouldliketolookfar(becauseofthehighspeedand straightroadsections),butdistantobjectswilloccupyonlyavery few layers ofour LIDAR (with highvertical resolution) measure- ment.We willshow whatkindofsolutions havebeendeveloped to these two particular cases oflow-resolution LIDAR perception recently.

4.1.1. Hardwarelimitedlow-resolutionperception

As mentioned earlier, planar and narrow vertical field LIDARs (a2D LIDAR sensorandframe acquiredby itintilted position is illustratedinFig.42)arefrequentlyusedinlogistictransportation systems(onAGVs),notjustfornavigationbutforspecificpurposes (e.g.,overhangdetection, seeFig.5).Thesensorswithspecificpo- sitions, thespeed ofthetransport vehicle(about1 m/s),andthe presence ofpositioning sensor [37] inthe vehicle(for navigation purpose) aremaking adequatetousethe 2Dsensordata ina3D reconstruction.Thiswillresultinaveryspecialpartialpointcloud data, incrementally givingmore andmoreinformation aboutob- jects. We proposed a solution to deal with this specific kind of LIDARdatain[38].

The proposed method’smain idea to deal withpartial clouds to comparestatistics oflocal structures. Thesteps of itare sum- marizedbelow:First,localsurfacedefinitionaroundeach pointis needed. We measure thesaliency ofthe point by 3D Harris[39]

operator. Next,to determinea repeatablenumberof keypoints,a localscaleisassignedtosignificantpoints,andalocalsurfacede- scriptorcharacterizes keypoints.After it, we define localpatterns asgraphsofkeypoints.Inthelaststep,thefrequencyoflocalpat- ternsiscompared.

2 https://www.sick.com.

Fig. 5.Tiltedsensorinstallationforoverhangdetection.Photosource:SICK- Efficient solutionsformaterialtransportvehiclesinfactoryandlogisticsautomation.

Besidesourmeasurements,weusedan MLSdatabaseforreal- lifetesting. (Thesetypes ofpointcloudsareacquiredsimilarly as described above.) Inthis database, we measured73.3% classifica- tionaccuracyfor5classeswhenonlyabout20%ofthe3Dobject wasvisible,and80.0%accuracy for30% visibilitywhichresultsin a usable andsafe incremental predictionfor early decisions.For moredetails,see[38].

4.1.2. Scenariolimitedlowresolutionperception

ItfollowsfromtheLIDARmeasurementprinciplethattheden- sity of the acquired point cloud decreases with distance from the sensor. Resulting in the phenomenon of measuring a high- resolution sensor butdistant objects are not observable insuffi- cient resolution.In thecaseof a sensorwitha lower numberof channels,ithappensinthenear-fieldtoo.(Thecasewhenweper- ceiveanobjectonlyinonelayerisalsonotrare.)

In this type of object, local surface information is not ex- tractable, so we cannot expect methods based on that, designed for2.5D point clouds, to work. To solve this problem, we relied onmethods designedfor 2D point cloudsandextended them to utilizeall the information available [40]; we proposed a method toclassifyobjects withthispoint cloudcharacteristicconstructed from the steps below: First, generating a shape descriptor for object segments using low-frequency components to be robust againstangular resolutiondrop;Then, we extract statisticalmea- suresofgeometriescodingthe3Dlocationofthe(approximately) 2D curve. After it, we group tracklets (tracks up to 5 frames)of segments (ifthere are anyavailable). Thenext step is classifying of segments (or tracklets of segments) with CNN (Convolutional Neural Network). Finally, an object-level decision is made with maximumlikelihoodaggregationofsegmentclassprobabilities(if morethanonesegmentisavailablefromanobject).

Withtheproposedmethod,wereached96,6%classificationac- curacy(andevenbetterifwecouldperceiveandtrackandobject segment more than one frame) for 6 categories on such point clouds of the KITTI [5] database where objects were present at maximum 4 LIDAR layers (with 41 m average distance to the sensor).These pointclouds cannot behandled withconventional methods(andsoignoredinmostcases).Illustrationofobjectclas- sificationwiththeproposedmethodofatypicalscenario(highway observedwith relativelynarrow vertical field LIDAR) isshownin Fig.6.Forfurtherdetailsandexperimentalproofsee[40].

(5)

Fig. 6.Classification ofobjectsobservedatmax4layers.Colormap:Red- vehicle, Blue- guardrail,Green- ground.

4.2. Robustnessinharshweatherconditions

Harshweatherconditionscallforchallengingproblems:LIDARs havedecreasedperformanceinsnow/rainorfog.Thiskindoflim- itation hastobe addressedby semantic-basedmethodsorphysi- callymodeledfilters.

Recent researches target hardware [41] andsoftware [42] de- velopments to eliminate this effect.To avoidthe problem above, alternativedevicescanbe used,noisymeasurementhastobe fil- tered [43] and incompletedatahasto be completed.Researchers have just started developingthe first stages ofthe solution, rec- ognizing the given weather conditions [44], and examining the influence ofdifferentones [45]. Tosupport that,pursuit datasets inadverseweatherhavebeenreleasedlatterly[46].

5. LIDARbasedlocalizationandmapping

The capability of recognizing patterns in LIDAR point clouds led to high precision odometry techniques in SLAM and similar methods.Next,wewillbrieflysummarizethestate-of-the-artalgo- rithmsandcurrenttrendsin LIDAR-basedego-motionestimation, 3Dmapping,andlocalization.

5.1. Visual-odometryusingLIDARs

Recently,severalvisual-odometryalgorithms wereproposed to compute themotionofa vehicleinrealtimeusingonlythecon- tinuously streaming data gathered by the LIDAR sensor. Creating thus LIDAR-only odometry methods eliminates the need for any othersupplementarysensor,e.g.,InertialMeasurementUnit(IMU), wheelencoder,andsatellitebasedGlobalPositioningSystem(GPS).

One of the best performing algorithms in terms of translational and rotationalerrors on the KITTI [47] dataset isthe LOAM [47]

algorithm, whichestimates thesixDoF (Degree ofFreedom) dis- placementofthe vehicleonshorttrajectorieswithverylow drift in scenes of high-density feature points and available reference ground planes.The algorithm can process the measurements ro- bustlyfordifferentLIDARsensorswithvaryingpointclouddensi- ties.However,inthecaseoflongtrajectoriesandsincethedriftis continuouslyaccumulated,asignificanterrorcouldbuildupinthe positionestimation.

5.2.SimultaneouslocalizationandmappingwithLIDARs

In order to correct the accumulated error in the odometry backend,loop-closuresituationscanbedetectedbyplacerecogni- tionalgorithmswheneverthevehiclereturnstopreviouslyvisited placesinthenavigationarea.InthecaseofSimultaneousLocaliza- tionandMapping(SLAM)algorithms,itisassumedthatthevehi- cleexploresforthefirsttimethegivenenvironment,andtherefore there is no a priori map to localize itself against. Recently, the SegMap[48] algorithmwas proposedtoextract andmatchLIDAR segmentsin3Dpointclouds.SegMapcomputesadata-drivencom- pactdescriptortoextractdistinctiveandmeaningfulfeaturesfrom point cloud segments in order to identifyloop-closure situations alongthetrajectory.

Inorderto increasethe robustnessandprecision ofthelocal- izationalgorithminfeature-poorenvironments, aframework was proposed in LIO-SAM [49] to tightly couple LIDAR and inertial measurements obtained from an IMU. Also, the proposed archi- tectureallowstheintegrationofGPS measurementsincasethese are available. Further on, by adapting the factor graph optimiza- tionframeworktheLIDARInertialSub-system(LIS)wasfusedwith a traditionalmonocular-based Visual Inertial Sub-system(VIS) to createa Lidar-Visual-Inertial (LVI-SAM) localization andmapping system[50].Converselyto thesemethods,next we willshowthe outcomesof a LIDAR-only odometry andlocalization method for urbanenvironmentswhereatargetmapexiststolocalizewithin.

5.3.LIDAR-onlyodometryandlocalizationin3Dpointcloudmaps

Accurate3D citymodels andhigh-definition mapsarebecom- ing increasingly available with recent mapping technology ad- vancements. In addition, in many real-world applications, maps are available to localize against. Therefore, these should be uti- lizedtocorrecttheaccumulateddriftalongthevehicle’strajectory wheneverageometricallysimilarlocationisdetectedbetweenthe online3Dpointcloudandtheofflinemap.

In [51] we proposed LOL, a LIDAR-only Odometry and Local- izationalgorithmthatintegratestheadvantagesoftheLOAM [47]

odometryandtheSegMap[48] algorithm.Inthe odometryback- end, theLOAM algorithmestimatesthe sixDoF odometryinreal time based only on thecontinuously streaming point cloud data fromaVelodyneLIDAR sensor.Ina sceneofhigh-densityfeature points andavailable referencegroundplanes,thealgorithm com- putes the displacement of the vehicle on short trajectories with very low drift using only the consecutive Lidar measurements.

The algorithm can process the measurements robustly fordiffer- ent Velodynesensors with varying point cloud densities. Onthe other hand, in the case of long trajectories and since the drift iscontinuouslyaccumulated,a significanterrorcould beaccumu- latedintheestimationthatneedstobecanceledbyalocalization methodwheneveracorrectmatchisdetected betweentheonline Lidarstreamandtheoffline referencemap.Therefore, forthelo- calization frontend, we integrated the SegMap method that is a state-of-the-art algorithm for the extraction and matching of 3D pointcloudsegments.

We also included some additional improvements in the ac- ceptanceofcorrectmatchesby applyingfurthergeometrical con- straintscomplementingthefeaturesimilarityones.Namely,oncea goodmatchisdetectedbetweentheonlinemeasurementsand the target map, we only search for similar 3D Lidar segments (with relaxedsimilarityconstraints)inthe neighborhood ofthe current locationdefinedby the locationuncertainty. Inaddition,we only usetheshift betweenthetarget map andtheonlinesource seg- mentscentroidsasaprior,andwerefinethefinaltransformation by applying a fine-grained ICPmatching betweenthe two point clouds.

(6)

Fig. 7.ResultsoftheLOLlocalizationalgorithmwithrespecttothegroundtruth maponvariouslengthKITTI[5] datasets:LOLalgorithm(greenline),LOAMtrajec- tory(redline)withrespecttothegroundtruthpointcloudmap.

Wetestedtheproposed algorithmonseveralKitti[5] datasets, cf. Fig.7,andfoundaconsiderableimprovementintermofpreci- sionwithoutaddingasignificantcomputationalcostincrease.

5.4. LocalizationindenseLIDARmaps

LIDAR measurements can alsobe utilizedforaccurate self lo- calizationofself-drivingvehicles(SDV)inhighresolution3Dpoint cloud maps of the environment. Asolution provided in [52] can robustly register the sparse RMB Lidar point clouds of the SDVs tothedenseMobile LaserScanning(MLS)pointclouddata,start- ingfromaGPSbasedinitialpositionestimationofthevehicle.The main stepsofthemethod arerobust objectextractionandtrans- formation estimationbasedon multiplekeypointsextractedfrom theobjects andadditionalsemanticinformationderived fromthe previouslysegmentedMLSbasedmap.

6. SemanticsegmentationofMLSpointclouds

DenseMLSpointcloudscanactasabasisfordetailedandup- to-date3D HighDefinition (HD)mapsofthecities,whichcanbe utilized by self-driving vehiclesfor navigation,orby city author- ities forroadnetworkmanagement andsurveillance, architecture, or urban planning. All of theseapplications require semantic la- beling ofthe data (Fig. 8). While the high speed of point cloud acquisition isa clearadvantageofMLS,duetothehugedatasize yielded by each daily mission, applyingefficient automated data filtering and interpretation algorithms in the processing side is cruciallyneeded,whichstepsstillintroduceanumberakeychal- lenges.

Taking the raw MLS measurements, one of the critical issues isthephantom effectcausedbyindependentobjectmotions[21].

Duetothesequentialnatureoftheenvironmentscanningprocess, scene objects moving concurrently withthe MLS platform (such aspassing vehiclesandwalkingpedestrians) appearasphantom- like longdrawn, distortedstructures inthe resulting point clouds [53].Itisalsonecessarytorecognizeandmarkallmovablescene elements suchaspedestrians,parkingvehicles [54] ortramsfrom theMLSscene.Ontheonehand,theyarenotpartofthereference backgroundmodel,thustheseregionsmustbeeliminatedfromthe HD maps. Onthe other hand,the presenceof theseobjects may indicatelocationsofsidewalks,parkingplaces,etc.Column-shaped objects, such aspoles, traffic signbars [55], tree trunksare usu- allygoodlandmarkpoints fornavigation.Finally,vegetationareas (bushes,treefoliage)shouldalsobespecificallylabeled[56]:since they are dynamically changing over the whole year, object level changedetectionalgorithmsshouldnottakethemintoaccount.

Whilea numberofvariousapproacheshavealreadybeenpro- posed for general point cloud scene classification, they are not focusingonallpracticalchallengesoftheaboveintroducedwork- flowof3DmapgenerationfromrawMLSdata.Inparticularly,only

a few related works havediscussed the problem of phantom re- moving.Point-levelandstatisticalfeature basedmethods suchas [57] and[58] examinethelocaldensityofa pointneighborhood, butas noted in [59] they do not take into account higher level structuralinformation,limitingthedetectionrateofphantoms.The taskissignificantlyfacilitatedifthescanningposition(e.g.,bytri- pod based scanning [60]) or a relative time stamp (e.g., using a rotating multi-beam Lidar [61]) can be assigned to the individ- ual points or point cloud frames, which enables the exploitation ofmulti-temporalfeaturecomparison.However,inthecaseofour examinedMLSpointclouds,nosuch informationisavailable, and allpointsarerepresentedinthesameglobalcoordinatesystem.

Severaltechniquesextractvariousobjectblobcandidatesbyge- ometricscene segmentation [55,20], then the blobs are classified using shape descriptors, or deep neural networks [20]. Although thisprocess can be notably fast, the main bottleneck of the ap- proachisthat itlargely dependsonthe qualityofthe objectde- tectionstep.

Alternativemethods implementa voxel level segmentation of the scene, where a regular 3D voxel grid is fit to the point cloud,andthevoxelsareclassifiedintovarioussemanticcategories such as roads, vehicles, pole like objects, etc. [56,62,63]. Here a critical issue is feature selection for classification, which has a wide bibliography.Handcraftedfeatures are efficientlyapplied by a maximum-margin learning approach forindoor objectrecogni- tion in[64]. Covariance, point density,and structuralappearance informationisadoptedin[65] byarandomforestclassifiertoseg- mentMLSdatawithvaryingdensity.However,asthenumberand complexity of the recognizable classes increase, finding the best featuresetbyhandinduceschallenges.

Deeplearningtechniqueshavebeenwidelyusedforpointcloud sceneclassification inrecentyears,followingeitherglobalorlocal (window based) approaches. Globalapproachesconsider informa- tionfromthecomplete3Dsceneforclassificationoftheindividual voxels, thus the main challenge is to keep the time and mem- ory requirements tractable in large scenes. The OctNet method implementsa newcomplex data structure forefficient 3D scene representation,whichenablestheutilizationofdeepandhighres- olution3Dconvolutional networks[66].From apractical pointof view,by OctNet’s training dataannotation operators should fully labelcomplete point cloud scenes, whichmight be an expensive process.

Sliding window based techniques are usually computationally cheaper,astheymovea3Dboxoverthescene,usinglocallyavail- ableinformationfortheclassificationofeachpointcloudsegment.

TheVote3Deep [62] assumesa fixed-size objectboundingbox for each class to be recognized, which might be less efficient ifthe possiblesizerange ofcertainobjects iswide.A CNNbasedvoxel classification method has recently been proposed in [63], which uses purely local features, coded in a 3D occupancy grid as the inputofthe network.Nevertheless,they didnotdemonstratethe performanceinthepresence ofstrongphantomeffects,whichre- quireaccuratelocaldensitymodeling[58,59].

The multi-view technique [67] projects the point cloud from several(twelve)different viewpointsto 2D planes,and trains2D CNNmodels fortheclassification. Finally,theobtainedlabels are backprojectedto the3D pointcloud.Thisapproach presentshigh qualityresultsonsyntheticdatasetsandinpointcloudsfromfac- tory environments, where due to careful scanning, complete 3D pointcloud modelsofthesceneobjectsare available.Application forMLSdatacontaining partiallyscannedobjectsisalsopossible, butthe advantages over competingapproaches are reduced here [67].

PointNet++ [68] introducesa hierarchical neural network for pointsetclassification.The methodtakesrandom sampleswithin agivenradiusoftheexaminedpoint,soitdoesnotexploitdensity

(7)

Fig. 8.Labeling result of the proposed 3D CNN based scene segmentation approach (test data provided by Budapest Közút Zrt.)

features. Results are demonstrated on synthetic and indoor data samples,withdenseandaccuratespatialdataandRGBcolorinfor- mation.

The SimilarityGroupProposalNetwork (SGPN) [69] uses Point- Net++asabackbonefeatureextractor,andpresentsperformance improvementbyaddingseveralextralayerstothetopofthenet- work structure. However, as noted by the authors, SGPN cannot process large scenes on the order 105 or more points [69], due tousingasimilaritymatrixwhosesizescales quadraticallyasthe number of points increases.This property is disadvantageous for MLSdataprocessing,whereatypicalscenemaycontainover107 points.

The SparseLatticeNetwork (SPLATNet3D) [70] is a recenttech- niquewhichableto dealwithlarge pointcloud scenesefficiently by using a Bilateral Convolution Layer (BCL). SPLATNet3D [70]

projectstheextractedfeaturestoalatticestructure,anditapplies sparseconvolutionoperations.Similarlytovoxelbasedapproaches, the lattice structure implements a discrete scene representation, whereoneshouldaddressunder- andoversegmentationproblems dependingonthelatticescales.

The C2CNN technique introduced in [21] is based on two- channel3Dconvolutionalneuralnetwork(CNN),andisspecifically improvedtosegmentMLSpointcloudsintoninedifferentseman- ticclasses,whichcanbeusedforhighdefinitioncitymap gener- ation. The main purposeofsemantic point labelingis to provide a detailed and reliable background map for self-driving vehicles (SDV),whichindicatestheroadsandvariouslandmarkobjectsfor navigation and decision support of SDVs. This approach consid- ers several practical aspects of raw MLS sensor data processing, includingthepresenceofdiverseurbanobjects,varyingpointden- sity,andstrongmeasurementnoise ofphantomeffectscausedby objectsmovingconcurrentlywiththescanningplatform.Weeval- uate the proposed approach on a manually annotated new MLS benchmark set, andcompareour solution to threegeneral refer- encetechniquesproposedforsemanticpointcloudsegmentation.

Anumericalcomparisonbetweenmanyoftheabovementioned methodsisshowninTable2,usingtheSZTAKI CityMLSBench- markSet[21].3

3 url:http://mplab.sztaki.hu/geocomp/SZTAKI-CityMLS-DB.html.

7. ChangedetectionusingonboardLidarandMLSmaps

For self-driving car navigation and environment perception, changedetectionbetweentheinstantlysensedRMBLidarmeasure- mentsandtheMLSbasedreferenceenvironmentmodelappearsas acrucialtask,whichindicatesanumberofkeychallenges.Particu- larly,thereisasignificantdifferenceinthequalityandthedensity characteristicsofthei3DandMLSpointclouds,duetoatrade-off betweentemporalandspatial resolution oftheavailable 3D sen- sors.

In recent years various techniques have been published for change detection in point clouds, however, the majority of the approaches rely on dense terrestrial laser scanning (TLS) data recordedfromstatictripodplatforms[71,72].Asexplainedin[71], classificationbasedoncalculation ofpoint-to-pointdistancesmay beusefulforhomogeneousTLSandMLSdata,wherechangescan bedetecteddirectlyin3D.However,thepoint-to-pointdistanceis verysensitivetovaryingpointdensity,causingdegradationinour addressed i3D/MLS cross-platform scenario. Instead, [71] follows aray tracing andoccupancy map basedapproachwithestimated normalsforefficientocclusiondetection,andpoint-to-triangledis- tances for more robust calculation of the changes. Here the De- launaytriangulationstep maymeana criticalpoint, especiallyin noisy andcluttered segments of the MLSpoint cloud, which are unavoidably present in a city-scale project. [72] uses a nearest neighborsearchacrosssegmentsofscans:foreverypointofaseg- menttheyperformafixedradiussearchof15cminthereference cloud.Ifforacertainpercentageofsegmentpointsnoneighboring points couldbe found foratleastone segment-to-cloud compar- ison, the objectis labeled there asmoving entity. A method for changedetectionbetweenMLSpointcloudsand2Dterrestrialim- ages is discussed in [73]. An approach dealing with purely RMB Lidarmeasurementsispresentedin[74],whichusesaraytracing approachwith nearest neighbor search.A voxelbased occupancy technique is applied in [75], where the authorsfocus on detect- ingchanges inpointcloudscapturedwithdifferentMLSsystems.

However,thedifferencesindataqualityoftheinputsarelesssig- nificantthaninourdiscussedcase.

In [76] authors proved that change detection can be accel- erated if they compare only keyframes to the map or previous frames. Here, keyframes are the ones that contain changes with highprobability.[76] proposedasolutiontofindthesekeyframes by exploiting mapping residuals. The authors demonstrated the

(8)

Table 2

Quantitativecomparisonofvariouspointcloudsegmentationtechniques[63],[67],[68],[70] and[21] ontheSZTAKI CityMLSbench- markset.

Class OG-CNN [63] Multi-view [67] PointNet++ [68] SPLATNetxyzrgb[70] C2CNN [21]

Pr Rc F-r Pr Rc F-r Pr Rc F-r Pr Rc F-r Pr Rc F-r

Phantom 85.3 34.7 49.3 76.5 45.3 56.9 82.3 76.5 79.3 83.4 78.2 80.7 84.3 85.9 85.1

Pedestrian 61.2 82.4 70.2 57.2 66.8 61.6 86.1 81.2 83.6 80.4 78.6 79.5 85.2 85.3 85.2

Car 56.4 89.5 69.2 60.2 73.3 66.1 80.6 92.7 86.2 81.1 89.4 85.0 86.4 88.7 87.5

Vegetation 72.4 83.4 77.5 71.7 78.4 74.9 91.4 89.7 90.5 86.4 87.3 86.8 98.2 95.5 96.8

Column 88.6 74.3 80.8 83.4 76.8 80.0 83.4 93.6 88.2 84.1 89.2 86.6 86.5 89.2 87.8

Tram/Bus 91.4 81.6 86.2 85.7 83.2 84.4 83.1 89.7 86.3 79.3 82.1 80.7 89.5 96.9 93.0

Furniture 72.1 82.4 76.9 57.2 89.3 69.7 84.8 82.9 83.8 82.6 81.3 81.9 88.8 78.8 83.5

Overall 76.9 74.2 75.5 72.5 73.4 72.9 85.6 87.5 86.5 82.5 83.7 83.0 90.4 90.2 90.3

Note:VoxellevelPrecision(Pr),Recall(Rc)andF-rates(F-r)aregiveninpercent(overallvaluesweightedwithclasssignificance).

performanceoftheproposedmethodinreal-lifeexperimentswith anAGVequippedwitha2DLIDARsensor.

In[77] theauthorsintroducedanewtechniqueforchangede- tectioninurbanenvironmentbasedonthecomparisonof3Dpoint cloudswithsignificantlydifferentdensitycharacteristics.Thisap- proach extracts moving objects and environmental changes from sparse and inhomogeneous instant 3D (i3D) measurements, us- ingasreferencebackgroundmodeldenseandregularpointclouds capturedbymobilelaserscanning(MLS)systems(seeFig.9). The introduced workflow consists ofconsecutive steps ofpoint cloud classification, crossmodal measurement registration,Markov Ran- domField(MRF) basedchange extractionintherangeimage do- mainandlabelbackprojectionto3D.Experimentalevaluationhas beenconductedinfourdifferenturbanscenes,andtheadvantage of theproposed changedetection stepis demonstratedagainst a referencevoxelbasedapproach.

8. Camera-Lidarcalibration

Nowadays,state-of-the-artautonomoussystemsrelyonawide rangeofsensorsforenvironmentperceptionsuch asoptical cam- eras,radarsandLidars,thereforeefficientsensorfusionisahighly focused research topic in the fields of self-driving vehicles and robotics. Though theresolution andtheoperation speed ofthese sensors have significantly improved in recent years, and their priceshavebecomeaffordableinmassproduction,their measure- ments have highly diverse characteristics, which makes the effi- cient exploitation of themultimodal datachallenging. While real timeLidars,suchasVelodyne’srotatingmulti-beam(RMB)sensors provideaccurate3Dgeometricinformationwithrelativelylowver- ticalresolution, opticalcamerascapture highresolutionandhigh qualityimagesequencesenablingtoperceivelowleveldetailsfrom the scene. A common problem with optical cameras is that ex- tremelightingconditions(suchasdark,or strongsunlight)largely influencethecapturedimagedata,whileLidarsareabletoprovide reliableinformationmuchlessdependingonexternalillumination andweather conditions.On the otherhand,by simultaneous uti- lizationofLidarandcamerasensors,accuratedepthwithdetailed textureandcolorinformationcanbeobtainedinparallelfromthe scenes.

Accurate Lidar and camera calibration is an essential step to implementrobust datafusion,thus,relatedissuesare extensively studied in the literature [78–80]. Existing calibration techniques can be groupedbasedon a varietyof aspects [78]:basedon the level of user interaction, they can be semi- or fully automatic, methodologically we can distinguish target-based and target-less approaches, and in the term of operational requirements offline andonlineapproachescanbedefined.

As their main characteristics, target-based methods use spe- cial calibration targets such as3D boxes [79], checkerboard pat- terns [81],a simpleprintedcircle [82],or auniquepolygonalpla-

Fig. 9.Top:DetectedchangesatatramstopinKálvintér,Budapestusing[77].Red, blueandgreenpointsrepresentbackgroundobjects,foregroundobjectsandground regions,respectively.Bottom:MLSlaserscanofthetramstop.

narboard [83] during thecalibrationprocess.In thelevel ofuser interactions, we can subdivide target-based methods into semi- automatic and fully-automatic techniques. Semi-automatic meth- odsmayconsistofmanymanualsteps,suchasmovingthecalibra- tionpatternsindifferentpositions, manuallylocalizingthe target objects both in the Lidar andin the camera frames, and adjust- ing the parameters of the calibration algorithms. Though semi- automaticmethods mayyieldveryaccurate calibration,theseap- proaches are very time consuming and the calibration results highlydependontheskillsoftheoperators.Moreover,evenawell calibratedsystemmayperiodicallyneedre-calibrationduetoarti- factscausedbyvibrationandsensordeformation effects.

Fully-automatictarget-basedmethodsattempttoautomatically detect previously defined target objects, then they extract and match features without user intervention: Velas et al. [84] de- tect circular holes on planar targets, Park et al. [83] calibrate Lidar and camera by using white homogeneous target objects, Geiger et al. [81] usecorner detectorsonmultiple checkerboards and Rodriguez et al. [85] detect ellipse patterns automatically.

Though thementioned approachesdo not needoperator interac-

(9)

Fig. 10.Workflow of the on-the-fly Lidar-camera registration technique [95].

tions, they still relyon thepresence ofcalibration targets, which often should be arranged in complex setups (i.e., [81] uses 12 checkerboards).Furthermore,during the calibrationboth theplat- formandthetargetsmustbe motionless.

On the contrary, target-less approaches rely on features ex- tracted from the observed scene without using any calibration objects. Some ofthese methods usemotion-based [86–88] infor- mation to calibrate the Lidarandcamera, while alternativetech- niques [78,89] attempt to minimize the calibration errors using onlystatic features.

Amongmotion-basedapproaches,HuangandStachniss [87] im- provetheaccuracyofextrinsiccalibrationbytheestimationofthe motionerrors,ShiuandAhmad [86] approximatetherelativemo- tionparametersbetweentheconsecutiveframes,and Shi et al. [90]

calculatesensormotionbyjointlyminimizingtheprojectionerror betweenthe Lidarandthe camera residuals.These methodsfirst estimatethetrajectoriesofthecameraandLidarsensorseitherby visual odometry and scan matching techniques, or by exploiting IMUandGNSSmeasurements.Thereaftertheymatchtherecorded cameraandLidarmeasurementsequencesassuming thatthesen- sorsarerigidlymountedtotheplatform.However,the accuracyof these techniquesstrongly dependson the performance of trajec- toryestimation,whichmaysufferfromvisual featurelessregions, lowresolutionscans [91],lackofhardwaretriggerbasedsynchro- nization betweenthecamera andthe Lidar [90], or urban scenes withoutsufficientGPS coverage.

We continue the discussion withsingle frame target-less and feature-based methods. Moghadam et al. [89] attempt to detect correspondencesby extractinglines bothfromthe3DLidarpoint cloud and the 2D image data. While this method proved to be efficient in indoor environments, it requires a large number of linecorrespondences,a conditionthatcannot oftenbesatisfiedin outdoorscenes.A mutualinformationbasedapproachhasbeenin- troducedin [92] tocalibratedifferentrangesensorswithcameras.

Pandey et al. [78] attempt to maximize the mutual information using the camera’s grayscale pixel intensities and the Lidar re- flectivity values. Based on Lidar reflectivity values and grayscale imagesNapier et al. [93] minimize thecorrelation errorbetween theLidarandthecameraframes. Scaramuzza et al. [94] introduce a new data representation called the Bearing Angle image (BA) which is generated from the Lidar’s range measurements. Using conventionalimageprocessingoperations,the methodsearchesfor correspondencesbetweentheBAandthecameraimage.As alim- itation,target-lessfeature-basedmethodsrequireareasonableini-

tialtransformationestimationbetweenthedifferentsensorsmea- surement [90], and mutual information based matching issensi- tive to inhomogeneous point cloud inputs and illumination arti- facts, which are frequently occurringproblems when using RMB Lidars [78].

In[95],authorsproposedanewfullyautomaticandtarget-less extrinsic calibration approach between a camera and a rotating multi-beam(RMB)Lidarmountedonamovingcar.Thistechnique adoptsastructurefrommotion(SfM)methodtogenerate3Dpoint cloudsfromthecameradata,whichcan bematched totheLidar point clouds;thus, theextrinsiccalibration problemis addressed asa registrationtaskinthe3Ddomain (seeFig.10).Themethod consistsoftwomainsteps:anobjectlevelmatchingalgorithmper- forming a coarse alignment ofthe camera andLidar data, and a finealignment step that implements a control point based point levelregistrationrefinement.Thesuperiorityofthemethodisthat it relieson only the rawcamera andLidar sensor streams with- out usingany external Global Navigation Satellite System(GNSS) orInertial Measurement Unit(IMU) sensors. Moreover, it is able toautomaticallycalculatetheextrinsiccalibrationparameters be- tween the Lidarand camera sensors on-the-flywhich means we onlyhavetomountthesensorsonthetopofthevehicleandstart drivinginatypicalurbanenvironment.

Note that there exist a few end-to-end deep learning based camera and Lidar calibration methods [80,96] in the literature, whichcanautomaticallyestimatethecalibrationparameterswithin abounded parameterrange basedon asufficiently large training dataset.However, the trained models cannot be appliedfor arbi- traryconfigurations,and re-trainingisoftenmoreresourceinten- sivethanapplyingaconventionalcalibrationapproach.In addition, the failurecaseanalysisandanalyticalestimationofthe limitsof operationsarehighly challenging forblackboxdeeplearning ap- proaches.

9. Conclusionandfuturedirections

When LIDARs started in autonomous driving systems, it was mainly a part of the supporting development toolkit. For a long time up to now, it was seriously considered that LIDARwas not neededasatrafficcontrollersensor,since(i)itcannotseethrough bad weather (ii) vulnerable opto-mechanics and (iii) its relative high price. However, today we can get high quality but much cheaper LIDAR sensors, with more robust opto-mechanical solu- tions;badweather problemscan bepartlyeliminated andsoon.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

Applications include the analysis of Twitter [60], cryp- tocurrency [12] and sensor network [21] data, as well as tree and graph search queries in streaming data [57], the

Radiologic images are in fact extensive two- dimensional or three-dimensional (2D or 3D) data sets in which the quantitative values present in the pixels (or volumetric pixels

In the literature, one can find a number of activity recog- nition approaches based on image sequences, point clouds or depth maps, where occupancy patterns are calculated [47]

Processing point clouds and the consecutive creation of 3D models of objects measured are contemporary topics. The segmentation of point clouds is the process of

Transactions on Knowledge and Data Engineering (IEEE), Journal of Selected Topics in Signal Processing (IEEE), Advances in Data Analysis and Classification (Springer),.

In this paper a monitoring system based on digital signal processing is described, an interpolation algorithm for interpolation of FFT spectra of sinusoid signals is pre- sented and

In this paper the focus is on fusion models in general (giving support for multisensory data processing) and some related automotive applications such as object detection, traffic

• Selection of the optimal drone based on data collection and data processing (which can be on-line in the drone or using its supplied software, or even in the cloud, regardless