• Nem Talált Eredményt

Basic concepts and notations

1.3 Information fusion

The results in the dissertation relating to information fusion are based on the consideration that merging the output of member components may lead to applications of higher accuracy. In our case, the member components are primarily image processing algorithms, whose outputs are merged for object detection purposes. Thus, besides introducing the general corresponding formalism, we also specialize our notation to this scenario.

As classic formulation [56], let D be a set (ensemble) of classifiers (voters) D1,D2,. . .,Dn, Di : Λ⊆RmRM≥0 (i = 1,. . .,n), and Ω = {ω1,ω2,. . .,ωM} is a set of finite class labels.

The classifier Di assigns the support values Di(χ) = (di,1(χ),. . .,di,M(χ)) to a feature vector χ∈Λ describing the opinion of the classifier on what degree χ should be labeled by ω1,. . .,ωM, respectively. Then, in a fusion-based scenario, the final class label is determined forχby applying some rule to the individual labels supported by the classifiers D1,. . .,Dn. Namely, as a general formulation, for each j (j =1,. . .,M) a discriminator function gj(χ) is calculated as

gj(χ) = F(d1,j(χ),. . .,dn,j(χ)), (1.1) where F is a combination function. According to the selection of the support values di,j(χ) and the combination function F, we can set up several decision rules and derive different ensemble classifiers like the following algebraic ones:

Notice that these properties give constraints on which class labelωk should be selected forχ. Here, we apply the intuitive notation D(χ) =ωk instead of D(χ) = (0,. . ., 0 non-zero support 1 is put on the k-th label, since the overall aim for any ensemble classifier is to select a single class label as a final decision.

The simple majority voting based classic ensemble classifier can be derived by restricting the support of the individual classifiers with d1,j(χ) = 1, if the classifier Di labels χ in the class ωj and d1,j(χ) = 0, otherwise. The final labeling of the ensemble is based on determining the class received the largest support in terms of the number of votes:

Dmaj(χ) = ωk ⇐⇒ gk(χ) =maxM

From the simple majority voting model we can easily derive a weighted one with assigning weightswiR≥0 to the classifiers Di implying the following final decision rule:

Dwmaj(χ) = ωk ⇐⇒ gk(χ) =maxM

In contrast to classic majority voting, here we consider each classifier output equipped with different weightswi(0≤wi≤1,i=1,. . .,n). It seems natural to give the classifiers with higher accuracies greater importance in making the final decision. Notice that the classic majority voting scheme can be considered as a special case of the weighted one, since in the majority rule the weight of each vote given by a classifier is constrained to 1 i.e.wi =1 for all i=1,. . .,n.

Several results of the dissertation correspond to ensembles A = {A1,A2,. . .,An} of object detector algorithms Ai (i =1,. . .,n). Based on the nature of the problem, we separate the cases when a single or multiple objects are to be detected in an image. Moreover, we focus on such objects that can be represented by single pixels with e.g. their centers. Thus, the object detector algorithms give their votes in terms of pixels as candidates for the location of the object. According to these considerations, we introduce the related notations as follows.

Let I be a digital image of size r×c. A candidate extractor algorithm for a single object detection scenario is defined as A˙i : IP({1,. . .,c} × {1,. . .,r}) with i ∈ {1,. . .,n} and A˙i(I) = {c˙Ii,1,c˙Ii,2,. . .,c˙Ii,k}, where kN and P(A) denotes the power set of a set A. Notice that this definition allows a candidate extractor algorithm to give more candidates within an image for the possible location of a single object. However, we will investigate also such cases, when each ensemble member can have only one candidate for the location of the object. Thus, when with k =1 algorithm A˙i has a single candidate, we will write c˙Ii instead of {c˙Ii,1}for its candidate set.

If multiple objects may appear in the image, the definition of a candidate extractor algorithm is modified accordingly as A¨i : IP({1,. . .,c} × {1,. . .,r}) with A¨i(I) ={c¨Ii,1,c¨Ii,2,. . .,c¨Ii,k}, where kN. Note the difference between the notation of the candidates corresponding to the single and multiple objects detection scenarios. Namely, c˙Ii,j for some jk is the j-th guess of A˙i for a single object, while c¨Ii,j (jk) predicts the appearance of a desired object in the corresponding location. Since the chapters of the dissertation separate the single and multiple object detection scenarios, we will use the simple notations Ai and cIi,. for both cases if it does not hurt clarity. As a further simplification of notation, we will omit the symbolI from the upper index of candidates, when only one image is concerned and write ci,. for short.

As from the candidate extractor algorithmsA1,A2,. . .,An ensemble candidates are composed via the fusion of their candidate sets A(I) = ∪ni=1Ai(I), we define a confidence measure to describe the rate of agreement of the members on the specific candidates. To do so, first we introduce a proximity relation =∼ to decide whether two candidates indicate the same object or not. With c1,c2 ∈ {1,. . .,c} × {1,. . .,r}, we say that c1 ∼= c2 if d(c1,c2) < Td with some distance threshold TdR≥0. As in our applications the objects to be detected are circular, Td can be selected as the diameter of the desired object. Now, the confidence of the ensemble on any of its candidatesc ∈ A(I) is defined as

confA(c) = |{Ai ∈ A : ∃c0∈ Ai(I) such that c∼=c0}| /|A|. (1.8) Notice thatconfA(c) ∈ {k/|A| : k=1,. . .,|A|}. We also classify the ensemble candidates based on the degree of confidence. Namely, the α–level candidates of A are defined as

A(I)α ={c∈ A(I) : confA(c) ≥α}, where 1/|A| ≤α≤1. (1.9) As specific cases, 1–level, α–level with α >1/2, and 1/|A|–level candidates are selected by each of, the majority of, and at least one of the members of the ensemble, respectively. For the latter case it should be noted that A(I) = A(I)1/|A|. Single algorithms are formally represented by ensembles consisting of one member providing |A| = 1 and α = 1–level confidence for all the candidates.

1.3.1 Error measurement

We also must set up a framework to measure the accuracy of ensemble-based approaches discussed in the dissertation for object detection in practical problems. We start with making the decision whether the candidates found by a specific member algorithm are true or false ones regarding some ground truth usually provided in terms of manual annotations.

LetGT (I)⊆ {1, . . ., c} × {1, . . ., r}be a so called ground truth set of candidates for an image I. For the classification of each candidatec∈ A(I) of an ensembleAin the same image regarding GT(I) and confidence level 1/|A| ≤ α≤1 we apply the following:

c is an α–true positive (T Pα), ifc∈ A(I)α and ∃cgt ∈ GT(I) such that c ∼=cgt;

c is an α–false positive (F Pα), if c∈ A(I)α and @cgt ∈ GT(I) such that c ∼=cgt.

Regarding a candidate cgt ∈ GT(I), we apply:

cgt is an α–false negative (F Nα), if @c∈ A(I)α such thatc ∼=cgt. Finally,

• each point in {1,. . ., c} × {1, . . ., r}\(GT(I)∪ A(I)α) is an α–true negative (T Nα).

The set of all true positives, false positives, false negatives, and true negatives for a given image I will be denoted by T P(I)α,F P(I)α,F N(I)α, and T N(I)α, respectively. Notice that GT(I) is usually a manually annotated set created by experts of the application field. Moreover, since performance evaluation is usually expected to be given at database level, digital images are often organized into an image database DB. Now, to calculate the performance of an ensemble A regarding some ground truth, we introduce the following classic measures at both image and image database level [57, 58]:

• Competition Performance Metrics CP M(DB) = X

g∈G

{SEN(DB)α: F P I(DB)α=g}

|G| , whereG=