• Nem Talált Eredményt

Manifesto Data and the Index of Similarity (SIM)

3.4 Pairwise Comparisons and Political Space

3.4.2 Manifesto Data and the Index of Similarity (SIM)

If we want to measure the similarity judgements of individuals, then this is rather straightforward. We should just ask them. If we would try to measure the similarity between two manifestos in general or on the basis of the manifesto data set, then the options here are not that obvious. The current work will focus on a way of measuring the difference between manifestos that has been called the “index of similarity” and which echoes many other studies into similar problems that have been conducted before.

Sigelman and Buell, in a study about issue convergence in U.S. presidential campaigns (Sigelman and Buell 2004), use articles published in newspapers during 11 presidential campaigns to see how much presidential candidates address the same issues. In order to measure what they call convergence between the attention profiles of candidates – the overlap in the proportions of attention devoted to certain issues, they propose a measure of “total block distance between a pair of attention profiles, i.e. the sum of the absolute differences between them” (ibid., p. 653). Such a measure, when scaled to range from 0 to 100, shows how much similarity there is between the issue profiles of a pair of candidates. Even though they refer to it among other things as “block distance”, this measure can be interpreted as a measure of similarity (or overlap) between the issue profiles of candidates.

Following Sigelman and Buell, the same measure is also used by Kaplan, Park, and Ridout (2006) in the context of U.S Senate campaigns and by Dowding et al. (2010) for the study of policy agendas in Australian politics. The same measure is applicable to the complete political profiles of parties as represented by their election manifestos and is thus of particular relevance if we are interested in or cannot get around the use of manifesto data.

Franzmann, with reference to Duncan and Duncan’s discussion of indexes of segregation and especially what the latter call an index of displacement (Duncan and Duncan 1955, p. 211), has suggested the use of the same index of similarity (Franzmann 2008; Franzmann 2013) to measure

CEUeTDCollection

party differences on the basis of manifesto content analysis data. The index of similarity in this case has been calculated as the sum of the absolute differences between the coding categories for a party pair. The same method to calculate the difference between parties has also been separately used to characterise party policy differences and change in the case of Estonia (M¨older 2013). The same (Vries 1999) or a similar (van der Brug 1999; van der Brug 2001) step has also been used not as an end-measure, but as an intermediate stage before downscaling the data with MDS.

The manifesto dataset gives the breakdown of a manifesto according to the proportion of attention that is devoted to 56 different policy categories. The index of similarity calculated therefrom has an intuitive and straightforward interpretation – it can be scaled to range from 0 to 100 and interpreted as a proportion overlap between the political profiles of two parties. Mathematically, this index is equivalent to a measure of city-block distance in a 56 dimensional space and can be represented as follows (see also (Franzmann 2008; Sigelman and Buell 2004):

S = 200−P56

i=1|ci1−ci2|

2 (3.6)

where S denotes the programmatic overlap or similarity between a pair of parties and ci1 and ci2 refer to the proportions of manifestos of the two parties that were devoted to each of the 56 policy positions in the manifesto data set. For some purposes it might be more suitable to express the value of the index not as similarity, but as difference. In that case the index would take the following form:

D= P56

i=1|ci1−ci2|

2 (3.7)

where D denotes the difference between two manifestos on a scale from 0 to 100.

The problem of weights

One objection to using all of the issue categories of the manifesto coding scheme would be that obviously not all policy areas are of the same importance. Certainly national security or the economy tend to be more important than culture or minority groups. If different issue categories are indeed of different importance then this would imply that one would have to select which issues are important at which moment in time in which party system for which party and assign corresponding weights.

Neither of those options are feasible, because such information simply does not exist. But fortunately the situation is not that hopeless.

If we assume that the manifesto length each party gets to use for a given election is at least to

CEUeTDCollection

some extent limited – not a very unreasonable assumption – then this problem is not as severe as it might appear. If manifesto length is limited and if different interests within the party compete for that length, then the distribution of relative emphases in the manifesto is likely to be in line with the importance of these issues for the party. If a party spends half of the manifesto on national security and none of it on the welfare state, it is justified to suspect that the former is much more important to the party than the latter and that this relative importance is reflected in the proportion of the manifesto that is devoted to the categories. Therefore, the manifesto data set is most likely to some extent already weighted – by the parties themselves.

The problem of similarity

It was brought out above (section 2.1.1) that judgements of similarity are a non-linear function of distance in space. If we look at the left-right measure above and how they have been applied in empirical analyses (see the chapters below) as well as the index of similarity, then they effectively assume that the distance that is measured is (linearly) equivalent to similarity. What kind of a similarity function would perhaps be more appropriate in this context is an empirical question that must be resolved elsewhere. The analyses that follow will therefore also assume a linear mapping between the two.

The problem of distance

The index of similarity is based on city-block distance, even though there is an infinite amount of other distance metrics that are possible, no just the Euclidean distance. The choice between Euclidean and city-block distance is not an unknown issue in political science and it has been noted that the use of the city-block metric is more appropriate when the dimensions are separable and the Euclidean distance is more appropriate when they are not (Benoit and Laver 2006, p. 27). The same has been noted about the analysis of quality dimension in conceptual spaces (G¨ardenfors 2000, section 1.8). It is the position here that across the whole range of issues that are included in the manifesto data set it is fair to assume that issues are separable and thus is it appropriate to use the city block distance.

If we keep the way that the index of similarity is constructed in mind and look back at how all the measures of left-right position outlined above have used the manifesto data to determine party positions, which can be then used to calculate the distances between parties, a sharp contrast should be evident. Constructing a measure of ideological position, to some extent depending on the index,

CEUeTDCollection

assumes the adequacy of the dimension. It forces us to select certain categories that should matter, to assume a way to aggregate those categories into a position, and in many cases to make several further assumptions about the nature of the data. Many things are done to the data on our way from raw data to a measure of position and from position to a measure of difference.

The index of similarity by comparison assumes almost nothing on its way from data to estimates of difference. It does use the city-block metric of distance instead of other possible options, and a one to one correspondence between distance and difference, but that is it. And it uses all the issue categories of the manifesto data set, instead of selecting just a few that should matter. This should already start one thinking about the benefits of the index. If it comes from the same source, but uses more data and does not try to reduce a 56-dimensional space into a 1-dimensional space before measuring the distance between objects, then chances are that in the end it captures more information than the assumption-heavy and reductionist alternatives. This is, fortunately, an empirical question, which will be dealt with in further parts of the this work.