Generalizations to weighted voting systems

scheme to spatially constrained voting

5.5 Generalizations to weighted voting systems

In this section, we modify the final decision rule of the ensemble which will result in further improvement of system accuracy. Our generalization is based on the assignment of weights to the ensemble members (classifiers). First, we recall the necessary procedure for finding the weights in

classic majority voting (see e.g. [166]). Then, we derive how the appropriate weights can be found in our generalized voting case.

5.5.1 Classic weighted voting system

For weighted voting system, first let us consider the classifiers D1,D2,. . .,Dn with accuracies p₁,p₂,. . .,p_n, respectively. For this case, from section 1.3 we recall the ensemble classifier D_wmaj corresponding this type of decision rule with the discriminant functions

gj(χ) =

n X i=1

widi,j. (5.51)

Notice that the following discriminant functions can be equivalently used as decision rules:

gj(χ) =P(s|ωj)P(ωj), gj(χ) =log(P(s|ωj)P(ωj)), (5.52) where s = [s₁,. . .,s_n] is the vector with the label output of the ensemble, where s_i ∈ Ω is the label suggested for χ by the classifier D_i and P(ω_j) is the prior probability for class ω_j. A natural problem arises in the weighted majority system how to choose the optimal weights for the classifiers. If we consider independent classifiersD₁,D₂,. . .,D_n with accuracies p₁,p₂,. . .,p_n, then the system accuracy is maximized by assigning weights

wi ∝log pi

1−p_i, i=1,. . .,n, (5.53) where∝stands for the approximately proportional relation. Notice that conditional independence is considered here, that is label suggested forχ by the classifier D_i.

The weights w_i ∝ log_1−p^pⁱ

i do not guarantee the minimum classification errors, because the prior probabilitiesP(ω_j) for the classes have to be taken into account, too. More precisely, if the individual classifiers are mutually independent, and the a priori likelihood is that each choice is equally likely to be correct, the decision rule that maximizes the system accuracy is a weighted majority voting rule, obtained by assigning weights w_i∝log _1−p^pⁱ

5.5.2 Assigning weights by adopting shape constraint

This method (using weights in voting rule) can be applied in our generalized voting scheme pre-sented in section 5.5.1. If we consider the classifiers D₁,D₂,. . .,Dn with respective accuracies p₁,p₂,. . .,p_n and weights w₁,. . .,w_n, then the final decision is made by choosing the maximal sum of weights, where some additional (e.g. geometric) constraints have to be fulfilled by the classifier outputs. Let us consider the probability (1−pi)ri for the i-th classifier that means that the i-th classifier makes wrong classification and participates in making a wrong decision (such as fulfills the additional constraints, as well.)

In our application, we choose the maximal sum of those weights of the algorithms, whose outputs can be bounded by a circle with an appropriate radius. An algorithm takes part in making a wrong decision if its output falling outside the OD is close to other wrong candidates.

For the algorithmD_i with accuracyp_i giving a wrong candidatec_i for the OD we assume that the distribution ofc_i is uniform outside the OD for alli (i=1,. . .,n). In this case, we have

r₁ =. . .=r_n = ^T⁰

T −T₀, (5.55)

whereT₀ and T are the area of the OD and the ROI, respectively, sor_i is the same predetermined constant for all i (i=1,. . .,n).

Theorem 5.5.1. If independent classifiers D1,D2,. . .,Dn are given (conditional independence is considered), then the optimal weight w_i for the classifier D_i with accuracy p_i can be calculated as

w_i ∝log p_i

(1−p_i)²r_i(1−r_i)^. ^(5.56) Proof. See the Appendix.

Notice that the weights wi ∝ log₍_1−p ^pⁱ

i)²ri(1−ri) do not always guarantee the minimum classi-fication errors. Only if the individual classifiers are independent and the prior probabilities for the classes P(ω_j) are equal, the decision rule that maximizes the system accuracy is a weighted majority voting rule, obtained by assigning weights byw_i∝log ₍_1−p ^pⁱ

i)²ri(1−ri).

5.5.3 The weighted majority voting in OD detection

In our application, the output of each OD detecting algorithm ODi is the OD center as a single pixel c_i. In our ensemble-based system we have the set of class labels {ω_x|x ∈ ROI}. For the OD detector OD_i with its output c_i, the class label ω_c_i is assigned to the OD. In this case, the classification is correct if the output c_i falls inside the OD in the retinal image. We can define the decision rule as the sum of the weights of the OD detecting algorithms, whose outputs can be covered by a disc of radius d_OD/2. The disc with the maximal sum of weights is accepted as the final candidate for the OD.

In this application, the condition for the equal prior probabilities for the classes is fulfilled if we suppose uniform distribution of the candidates both inside and outside the OD.

In contrast to the non-weighted systems, less conflicting situations can be obtained when the decision is not exact because of the equal number of outputs falling inside the discs of the predeter-mined radius. Further improvement of this weighted system on majority voting is that there is no need for accuracy constraints p >0.5 on individual algorithms to achieve larger system accuracy.

It can be shown that this weighted voting rule always outperforms the classic majority rule, since in case of a conflict (when the same number of votes are reached in other discs of radius d_OD/2) majority rule decides randomly between the disc candidates, while the weighted voting system can handle the conflict determining to the sum of the weights corresponding the output votes falling inside the discs.

5.5.4 Experimental results

We compare the system accuracies of the classic and the weighted majority voting for different accuracies and weights. We consider three schemes of accuracies for the algorithms:

• A₁ :p₁ =p₂=. . .=p₉=0.6,

• A₂ :pi =1−0.1i,i =1,. . ., 9,

• A₃ : p₁ =0.767,p₂ =0.647,p₃ =0.958,p₄ =0.977,p₅ =0.759,p₆ =0.315,p₇ =0.320,p₈ = 0.228,p₉ =0.643.

The first case is often examined in the literature with equal weights, the second one describes an artificial scenario with linearly dropping accuracies, while the third case contains the true accuracies of our OD detecting algorithms. The accuracy valuesp₁,. . .,p₇ correspond to the seven OD detector algorithms discussed in section 5.4, while p₈ and p₉ to two detectors (from now on denoted byOD₈and OD₉, respectively) implemented based on [173]. These accuracies were found on the Messidor database described in section 1.5.5.

For the weighted voting system, we apply the following weightsw_ifor thei-th algorithm having accuracies p_i (i=1,. . ., 9):

That is, first we study the case when each weight is equal to the accuracy of the individual algorithm (such as taken the i-th algorithm with accuracy p_i, then it participates in the final decision with weight wi = pi.) The second weighting is suggested as optimal for the weighted majority voting, the third one is important for our generalized weighted majority voting. In this way, we give a practical example to confirm the theoretical derivation of the optimal weights given in section 5.5.2.

We apply OD detecting algorithms as classifiers, so we can test and compare the overall per-formance of the different voting systems on classifier output generated artificially. In lack of independent OD detecting algorithms providing these accuracies, we are not able to test and com-pare the voting systems on retinal images. We generate the datasets with elements in the following way: we consider a disc of radius r_ROI (ROI) and a disc of radius r_OD = d_OD/2 inside the ROI (OD), wherer_ROI =716 andr_OD =51 pixels, respectively (for details on the adjustment of these figures, see section 4.1.2). We generate 9 output pointsc_i (for the artificial outputs ofD_i), where the probability that the point c_i falls inside the OD is p_i and the distribution of c_i is uniform outside it. Now, the probability ri (i=1,. . ., 9) can be determined as

The overall performance of the four voting systems (MV – majority voting, WMV – weighted majority voting, GMV – generalized majority voting, WGMV- weighted generalized majority vot-ing) with 9 different combinations of the 3 accuracy (A₁,A₂,A₃) and 3 weighting (B₁,B₂,B₃) schemes are presented in Tables 5.9, 5.10 and 5.11, respectively.

Weighting MV WMV GMV WGMV

B₁ 0.7323 0.7323 0.9948 0.9996 B₂ 0.7380 0.7380 0.9941 0.9991 B₃ 0.7326 0.7326 0.9948 0.9989

Table 5.9: Overall system accuracies for the set of classifier accuracies A₁.

If all weights are equal in weighted voting, then it naturally results in the same system ac-curacy as in the weighted voting scheme, otherwise weighted voting outperforms the non-weighted one. Our generalized (non-non-weighted/non-weighted) voting system when geometric constrains

Weighting MV WMV GMV WGMV B₁ 0.5012 0.8066 0.9889 0.9943 B₂ 0.4965 0.9688 0.9901 0.8712 B₃ 0.5009 0.7289 0.9877 0.9951

Table 5.10: Overall system accuracies for the set of classifier accuracies A₂.

Weighting MV WMV GMV WGMV

B₁ 0.8241 0.9526 0.9996 1.0000 B₂ 0.8260 0.9926 0.9989 0.9941 B₃ 0.8258 0.9481 0.9989 0.9998

Table 5.11: Overall system accuracies for the set of classifier accuracies A₃.

are essential in making the final decision has better overall performance than the classic (non-weighted/weighted) majority voting scheme.

For the OD detection application, we can test and compare our generalized non-weighted and generalized weighted voting system on a real database of retinal images, as well. The Messidor dataset described in section 1.5.5 in details is considered for this aim. In this test, we assigned the optimal weights derived in section 5.5.2 to the participating algorithms (classifiers) having individual accuracies p₁ = 0.767,p₂ =0.647,p₃ =0.958,p₄ = 0.977,p₅ =0.759,p₆ =0.315,p₇ = 0.320,p₈ = 0.228,p₉ = 0.643 (as given in case A3). However, notice that we have no informa-tion about the dependencies among these algorithms. Despite the unknown dependencies of the algorithms, we have found that weighted majority voting (0.98) outperformed simple majority vot-ing (0.974), while simple majority votvot-ing outperformed the individual accuracies of the member algorithms.

5.6 Diversity measures for majority voting in the spatial

In document 2015 DissertationfortheDoctoralDegreeoftheHungarianAcademyofSciences Andr´asHajdu DISCRETEGEOMETRICANDFUSIONBASEDTECHNIQUESFOROBJECTDETECTIONANDDECISIONSUPPORT (Pldal 123-127)