Mathematical analysis of satellite images
Part 2
István László*
István Fekete**
Summer School in Mathematics
Institute of Mathematics, Eötvös Loránd University, Budapest, Hungary, June 6 - 10, 2016
* Institute of Geodesy, Cartography and Remote Sensing http://www.fomi.hu
** Eötvös Loránd University, Faculty of Informatics http://www.elte.hu
Contents
Mathematical analysis of satellite images
2
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
•Part 1: An overview of remote sensing
• 1. Introduction: the evolution of remote sensing
• 2. Raw material: acquisition and pre-processing
• 3. Evaluation: only visual approach?
• 4. Practical applications
• 5. Education and university collaboration
•Part 2: Advanced methodology
• 1. Numerical evaluation of satellite images
• 2. The whole process of evaluation
• 3. Segment-based thematic classification
• 4. Data fusion
• 5. Object-based image analysis (OBIA)
II. 1. Numerical evaluation of satellite images
Mathematical analysis of satellite images
3
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The thematic evaluation of remote sensing images
2003. 03. 29., Landsat 7 ETM+
2003. 07. 27., Landsat 5 TM
A part of a crop map
Őszi búza Tavaszi árpa Őszi árpa Kukorica Silókukorica Napraforgó Cukorrépa Lucerna Vízfelszínek
Nem mezőgazdasági területek Egyéb szántóföldi növények
Mathematical analysis of satellite images
4
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Intensity space (spectral space, feature space)
• Multispectral images can be thought of as one or several matrices.
• Rows and columns of a matrix correspond to stripes of surface.
• Pixel coordinates in the intensity space are determined by the intensity levels belonging to different spectral bands.
Intensity values corresponding to different land cover types usually overlap in the intensity space.
The basic task of image processing
and the elementary solutions of classification
Pixels belonging to categories
The intensity vectors of categories show a typical probability distribution
in certain parts of intensity space
Mathematical analysis of satellite images
7
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Nearest neighbour method Hypercube (box) method
The basic task of image processing
and the elementary solutions of classification
Mathematical analysis of satellite images
8
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Starting from the probability distribution, we can draw the isolines of identical probabilities. This is the maximum-likelihood
classification method.
The basic task of image processing
and the elementary solutions of classification
Mathematical analysis of satellite images
9
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Statistical thematic
classification
The mathematical formalisation of statistical thematic classification
The task is to assign a target category (class) to each pixel.
(Target categories: e.g. ω1 – wheat, ω2 – maize, …, ω15 – settlement)
• Set of target categories: ωk (k[1..KF]), KF is the number of categories
• The distribution (density function) of pixel vectors (x) within each class (ωk):
p(x|ωk) (k[1..KF])
• The preliminarily known ratio of classes within the whole image (“a priori probability”): p(ωk) (k[1..KF])
• The value of loss caused by misclassification:
λ(ωk, ωl) or λkl (k,l[1..KF])
(Classified into ωk, but belongs to ωl)
Goal: to find and optimal and widely
usable procedure within these conditions!
Solutions - example:
Class#1 – girls, Class#2 – boys,
B – decision threshold,
y – unknown person, classified as girl (ω1), z – unknown person, classified as boy (ω2)
Classification based on a quantity (height)
Hatched areas: erroneous classification pE(1|2): classified as ω1, belongs to ω2,
pE(2|1): classified as ω2, belongs to ω1
The maximum likelihood
and Bayesian classification
Categorisation on the basis of a discriminant function:
An x vector is classified into class ωk, for which gk(x) is maximal.
Maximum-likelihood method:
gk(x) = p(x|ωk) p(ωk), that is:
if p(x|ωk) p(ωk) p(x|ωl) p(ωl) for all l[1..KF], x vectort is classified into class ωk.
Maximisation of p(x|ωk)p(ωk) is equivalent to maximisation if p(ωk| x) (for a given x), because:
p(ωk| x) p(x) = p(x ωk) = p(x|ωk) p(ωk) (conditional probability), therefore p(ωk| x) = p(x ωk) / p(x) = p(x|ωk) p(ωk) / p(x),
and p(x) is independent of the class (k).
Maximum-likelihood causes the least number of errors (misclassifications).
The minimisation of loss: Bayesian classification.
Loss caused by the classification of a vector x to class k:
KF
l
kl l
x k p x
L
1
) (
)
( k [1..KF]
Maximum-likelihood classification is a special case of
Bayesian classification: identical, symmetrical choice of loss values.
(„Reverse” identity matrix: 0 in main diagonal, otherwise 1.)
• Probability density function of classes with B spectral bands:
Properties of class ωk : k – mean, Σk – covariance matrix,
|Σk| – its determinant.
Classification methods are not so sensitive to p(ωk) prior probabilities.
Instead of the general maximum-likelihood formula –
p(ωk)* p(x|ωk) p(ωl)* p(x|ωl) – one can use its logarithm:
)]
( ) 2(
exp[ 1 )
2 ( )
(x k B/2 k 1/2 x k T k1 x k
p
) (
) 2(
ln 1 2 ) 1 ( ln )
( k k k T k1 k
k x p x x
g
] ..
1 [ KF k
] ..
1 [ KF k
The approximation by normal distribution
Spectral classes
Spectral classes, clusters in the intensity space
Within a thematic class, the natural groups of elements are called clusters.
Possible relations:
• One spectral class corresponds to one thematic class (rare)
• Several spectral classes build up one thematic class (most frequent)
• One spectral class intersects with several thematic classes (this is the source of classification errors!)
• A spectral band does not belong to any of the thematic classes.
The ISODATA clustering method
• Aim: to find the pixels close to each other, that is, to minimise the distance between pixels and cluster centres.
SSE=Sum of Squared Error Steps:
1. Choice of initial cluster centres
2. Each pixel is assigned to the closest cluster centre 3. The calculation of new cluster centres
4. The examination of changes; if needed 2.
5. Clusters are ready.
The maximum number of iteration can also be limited.
] ..
1 [
)
( 2
f k
K
k x C
x
kSSE
ISODATA clustering – an example
Accuracy assessment of
thematic classification
The measurement of overlapping and distance of spectral classes, the prediction of classification errors
Overlap, that is, expected error depends on:
- distance, - deviation.
Clusters in themselves do not overlap!
Every pixel belongs uniquely to one cluster.
But if they are approximated by a probability distribution, some overlap may occur.
The aim of distance measures is the estimation of correct classification..
Distance functions are based on overlap:
(p: density function; integration over the whole intensity space!) Divergence:
It is not limited, but classification accuracy has an upper bound.
Transformed divergence:
Bounded (<=2), saturates in the function of distance.
• Jeffries-Matusita (JM) distance:
x x d
p x x p
p x
p l
k D
l k x
l
k ( )
) ln (
] ) ( ) (
[ ) ,
(
8 )}
) , exp( (
1 {
* 2 ) ,
( D k l
l k
DT
x d x
p x
p l
k
J l
x
k
]2
) (
) (
[ )
,
(
Area sampling
Training and test areas
• Spectral properties of classes are calculated from area samples.
• Surface elements of target categories may spectrally differ from each other.
• These are representative if they cover all the environmental factors.
• About 2-5% of the whole area is used as reference (ground truth).
• Reference areas are split into two parts: training and test areas
Accuracy assessment of thematic classification
• Confusion matrix:
- Rows: reference classes (real classes) - Columns: classification result
- Elements out of the main diagonal: incorrect classification
• Error map: erroneous and correct classifications are marked
with different shades. It often shows the shortcomings in the
preparation of thematic classification.
• An example of confusion matrix and accuracy measures
II. 2. The whole process of evaluation
Mathematical analysis of satellite images
29
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
L1. The definition of goal
L2. The analysis of task, the creation of model and plan L3. The choice of images
L4. Geometric correction
L5. General statistical examination of images L6. The division of the whole area
L7. Acquisition of reference data
L8. The spectral analysis of training areas
L9. Correspondence between spectral and thematic classes L10. The determination of spectral properties of classes L11. Thematic classification of training area
L12. The examination of extensibility of classification L13. The classification of the whole region
L14. Visualization of results in image, map and tabular format
L1. The definition of goal
- What would we like to show? – Thematic categories
- How accurately?
- Good result at every pixel
- Summarized area of thematic categories
Őszi búza Tavaszi árpa Őszi árpa Kukorica Silókukorica Napraforgó Cukorrépa
Lucerna Vízfelszínek
Nem mezőgazdasági területek Egyéb szántóföldi növények
L2. The analysis of task, the creation of model and plan
The matching between physical properties and quantities measured by remote sensing
Example: the comparison of well developed and drought-hit vegetation
L3. The choice of images -
Types, quantity- Date
- Archive or programmed
Example: an application may require the simultaneous use of images taken by different sensors at different dates.
L4. Geometric correction
Generally: pre-processing Geometric correction:
In Hungarian applications: EOV
(Hungarian Unified Map Projection System)
L5. General statistical examination of images - General orientation
- Visual enhancement
- Histogram evaluation
L6. The division of the whole area (strata or regions)
- Steps L7..L13 are executed for each strata!
- The choice of reference areas within strata
Example: a potential subdivision of Hungary to strata according to weather, soil, crop production conditions and administrative boundaries
L7. Acquisition of reference data
Training Test
L8. The spectral analysis of training areas - Clustering (e.g. Isodata)
- The analysis of clusters in spectral space (feature space)
Cluster#1: cultivated areas
Cluster#2: “remaining” areas (transition zones, settlements) Cluster#3: soil
L9. Correspondence between spectral and thematic classes
L10. The determination of spectral properties of classes
We collect all the input data of statistical classification.
- The set of spectral (sub)classes, the parameters of distributions (mean vector, covariance matrix)
- “A priori” probabilities
- In the case of Bayesian classification: loss function We assess, estimate the expected error.
- The measurement of the distance between subclasses, classes:
accuracy measures
- If not appropriate, go back to L8
L11. Thematic classification of training area
In optimal case, every pixel of the training area is classified into the category determined by training data.
It cannot be considered as an independent sample.
If the error rate is high even in this step, we have to repeat clustering with different parameters (L8).
L12. The examination of extensibility of classification - The classification of test reference data
- If not appropriate: back to L6 or L8
L13. The classification of the whole stratum (region)
Őszi búza Tavaszi árpa Őszi árpa Kukorica Silókukorica Napraforgó Cukorrépa
Lucerna Vízfelszínek
Nem mezőgazdasági területek Egyéb szántóföldi növények
L14. Visualization of results in image, map and tabular format - The assembly of regions (strata)
- The calculation of area from thematic map
- Left: drought map for Hungary
- Right: total crop areas per county and crop map for Hungary
II. 3. Segment-based thematic classification
Mathematical analysis of satellite images
46
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Segment-based classification
Segment-based classification: the utilisation of spatial information.
A segment is a set of spectrally similar, contiguous pixels.
Its advantage is the preservation of homogeneity stemming from the nature.
Its disadvantage appears in the assignment of pixels at the border of different land cover categories
improvement: pixel-based revision
Mathematical analysis of satellite images
47
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Some implemented methods of image segmentation
Merge-based (bottom-up) methods:
• A - Sequential linking
• B - Best merge
• C - Graph-based merge method Cut-based (top-down) methods:
• D - Minimum mean cut
• E - Minimum ratio cut
• F - Normalised cut
Mathematical analysis of satellite images
48
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The method proceeds sequentially (row-wise):
Examination of homogeneity of the current cell (of 2 by 2 pixels), based on deviance
If the cell is homogeneous, it can be linked to an already existing segment, if some conditions are met
– either directly, or through “bridging cells”
almost arbitrary segment boundaries may be formed
If linking is not possible, we start a new segment with the current cell.
A. Sequential linking
m
i
i
x x x
A
1
)2
(
n
i
i
y y y
A
1
)2
( A Ax Ay
m
i
i
x x z
B
1
)2
(
n
i
i
y y z
B
1
)2
( B Bx By
1 2
/ )
)(
/
(A B mn C 2
2 / 1
2 1 1
)) /(
(
) / (
) /
( C
n m A
n A m
A
n m
n y
m
x
Two segments can be contracted if the following inequalities are true for all bands (for a given C
1and C
2):
Contraction of segments: based on ANOVA criteria.
Let x be a sample with m elements, y a sample with n elements, z a distribution resulted by the contraction of them.
A. Sequential linking
Start: every pixel is an individual segment.
Iterative step: the merging of the two nearest, neighbouring segments.
Stop criterion: an appropriate number of segments is reached or any pair of neighbouring segments are too far from each other
Distance between segments: pl. divergence, Bhattacharya, Jeffries- Matusita or average quadratic distance.
Graph representation: a node contracting procedure starting from the grid graph of pixels, where nodes represent segments.
Efficient implementation by advanced data structures.
B. Best merge
C-F: Graph-based segmentation methods
• Image segmentation can be regarded as a graph theory problem, as an image can be easily represented by a grid graph.
• Nodes can be pixels or their intensity values.
• The cost (weight) of edges represent the relation (similarity or difference) between neighbouring pixels (nodes). For example, the linear (Euclidean) difference, or Gaussian weight function:
• In this case, segments are connected subgraphs.
2
))2
( ) ( (
) ,
(
v I u I
e v
u
C-F: Graph-based segmentation methods
Graph representation: nodes correspond to segments; kiindulás a pixelek rácsgráfjából. Az élek súlya: az összekötött pixelek távolsága
Algorithm: the iterative merging of neighbouring segments in a way that does not significantly increase heterogeneity.
The heterogeneity of a segment: the maximum edge weight in the minimum spanning tree if the respective subgraph.
Merging criterion:
Implementation: edges are examined in increasing order by weight effective merging is not necessary.
C. Graph-based merge method
( ) / | |, ( ) / | | ( )min het Si k Si het S j k Sj het Si S j
D. Minimum mean cut
• The cost of a cut (G=(V,E), the cut yields subgraphs A and B)
• To calculate minimum mean weight, the cost of cut is divided by the number of edges within cut:
• Therefore the searching for minimum cuts in an undirected graph is NP-hard problem, but there is a way to apply a polynomial
algorithm after appropriate transformations.
1
A,v B,(u,v) E u
Cut(A,B) Mcut(A,B)=
) ,
(
A,v B,(u,v) E u
v
u
Cut(A,B)=
The segment maps after three segmentation methods
Sequential linking
Best merge Graph-based
merge
The steps of classification procedure
1. Segmentation: Pixels are grouped into
segments, using their spectral and spatial properties. Result: segment map
2. Clustering: unsupervised procedure, no
preliminary information (reference data) is used.
Clusters: compact groups in intensity space that characterize land cover categories. Result:
cluster map
Mathematical analysis of satellite images
57
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The steps of classification procedure
• Training phase: Correspondence between clusters and reference areas is examined, clusters are labeled.
• Classification phase: The pixels/segments are classified into land cover categories.
Result: class map
• Accuracy assessment Result: confusion matrix
Mathematical analysis of satellite images
58
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The result of classification after three segmentation methods
Sequential linking
Best merge Graph-based
merge
An illustration of Best Merge method
satellite image (May) satellite image (June) satellite image (August)
segment map (Best Merge)
cluster map (Best Merge)
classification result (Best Merge)
II. 4. Data fusion
Mathematical analysis of satellite images
61
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Data integration, data fusion
• „Data fusion is a formal framework in which are expressed means and tools for the alliance of data originating from different sources.
It aims at obtaining information of greater quality.” (L. Wald)
• Within remote sensing
– generally: the integration of spatial, spectral and temporal dimensions,
– particularly: the fusion of images having different spatial resolution and spectral characteristics („resolution merge”).
• Other data sources (for example):
– GIS coverages (based on field measurements and on delineations on maps or remote sensing images)
– Digital Terrain Model (DTM), Digital Surface Model (DSM)
• Less formal ways of data integration
Rapid Field Visit (RFV)
Classical field inspections are carried out by an integrated hardware-software GIS solution containing a submeter precision GPS unit.
The integration of preliminary field measurements into control process
FÖMI must delineate parcels „among preliminary measurements” during remote sensing control.
Important topic: the consistent use of different kinds of measurements
In 2011, almost 2 complete
WorldView2-coverage for a site:
„Goldie oldie”:
the data fusion of Landsat 7 ETM+
Panchromatic (15 m) Multispectral (25 m, RGB 453)
Modified IHS Transformation
Resolution Merge / Multiplicative
Hyperspherical Color Space
High-pass Filter (own model)
Principal Component
Analysis 66
2009-05-20, VHR 2009-06-19, SPOT4 2009-07-24, SPOT5 2009-08-03, SPOT4
GAEC, 1st standard - the checking of prevention of soil erosion:
DEM+satellite images
(DEM: steep areas, satellite images: traces of actual erosion) GAEC, 1st standard - the checking of prevention of soil erosion:
DEM+GIS
(DEMsteep parcels, CAPIparcels with row crops, intersection: problem!)
Waterlog in the spring may lead to the appearance of weeds in the summer.
Example: non-compliance to GAEC, 7th standard
(Agricultural areas must be kept under appropriate weed control)
II. 5. Object-based image analysis (OBIA)
Mathematical analysis of satellite images
69
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
2. Object-based image analysis: a case study
• Applications:
– Identification of ineligible land
– The recognition of built infrastructure in rural area
• Object-based (OBIA) solution
• The software environment of application:
– Definiens / eCognition software suite – Proprietary segmentation algorithms – Classification: neighborhood information – Commands are organized into Rule Sets
– Raster input images: color infrared ortho-photos (< 1m)
Mathematical analysis of satellite images
70
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The description of task
• Operation and updating of Land Parcel Identification System (LPIS)
• An important property of areas is whether they are subsidized (eligible for agricultural payments).
• Goal: to automatically delineate scattered trees and bushes appearing on pastures.
• Segmentation: not only an option, but a necessity with VHR images. Pixels usually cannot be interpreted individually.
Mathematical analysis of satellite images
71
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Defining non-eligible parts
of pastures with scattered woods and shrubs
It is a requisite to separate non-eligible parts from eligible areas on wooded pastures.
-Relatively big area is concerned
- Time and labour
consuming by manual photo- interpretation
Automatisation is needed
Mathematical analysis of satellite images
72
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Defining borders of wooded pastures
Input data:
• Infrared orthorectified airborne images (40 cm pixel size)
- Aim: to follow the border of land cover features with segmentation
- Because of the big resource demand of segmentation, Tiles & Stitching method was used, i.e.
to split images to small parts, process them, and stitch together the results of processing.
Mathematical analysis of satellite images
73
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The steps of segmentation
1. Quadtree-based segmentation („top-down”) Simple method to obtain initial segments.
Computational load of subsequent steps is reduced.
2. Multiresolution segmentation („bottom-up”) Pairwise region merging technique.
Criterion: spectral and shape heterogeneity.
3. Spectral difference segmentation („bottom-up”) Refines the result of Multiresolution segmentation.
Contracts spectrally similar, neighboring segments.
4. Contrast split segmentation („top-down”)
Partitions segments into lighter and darker areas, if necessary.
Mathematical analysis of satellite images
74
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
Classification
• Goal: to differentiate trees and bushes from pasture (but not between tree species)
– Spectral properties: tree species, vegetation density, illumination – Spatial properties: textural information
• Texture
– Inhomogeneity
– Grey Level Co-occurence Matrix (GLCM) – Entropy: disorder, i.e. randomness of texture
– Homogeneity: Closeness of elements of GLCM to diagonal
Mathematical analysis of satellite images
75
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
It is needed to define several subclasses based on spectral and textural measures.
By merging different subclasses we reach the final classes.
Defining borders of wooded pastures
After an adequate segmentation, the classification made by using the most significant spectral and textural features is compared with the Hungarian LPIS with GIS methods. The delineation can be made more precise by using orthorectified images with better spatial resolution (to reach finer texture) or by using time series.
Mathematical analysis of satellite images
76
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The steps of delineation of tree groups
Ortho-photo
Contrast split segmentation
Classification
Multiresolution segmentation
Mathematical analysis of satellite images
77
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The recognition of built
infrastructure in urban areas
• The aim of project: the examination of expansion and re-structuring of cities
• Beside the spectral information content of ortho-photos, auxiliary information is needed (elevation/LIDAR).
• Segmentation:
1. Quadtree-based segmentation 2. Multiresolution segmentation
3. Spectral difference segmentation (two runs)
• Classification:
– Difficulty: to correctly make a distinction between buildings and linear elements (roads and railroads)
– Solution: density, the measure of elongation
Mathematical analysis of satellite images
78
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The recognition of built infrastructure in urban areas
Good segmentation result: roads and roofs
The result of segmentation in rural, garden suburban area:
Mathematical analysis of satellite images
79
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016
The recognition of built infrastructure in urban areas
•An error of density feature - long block building with gray flat roof and car-like chimneys
•Solution: using Digital Surface Model
A segment that contains the roof, but is spectrally similar to roads
Mathematical analysis of satellite images
80
Summer School in Mathematics, ELTE, Budapest, 06-10 June 2016