** ? 6.1 The curse of dimensionality**

**Exercise 7.1. Suppose you have a collection of recipes including a list of ingredients required for them. In case you would like to find recipes that are**

**7.1 Important concepts and notation for frequent pattern mining**

Before delving into the details of frequent pattern mining algorithms, we define a few concepts and introduce some notations first to make the upcoming discussion easier.

Upon introducing the definitions and concepts, let us consider the example transactional dataset from Table7.1. The task of frequent pattern mining naturally becomes more interesting and challenging for transactional datasets of much larger size. This is a rather small dataset which includes only five transactions, however, can be conve-niently used for illustrative purposes. Note that in the remainder of the chapter, we will use the concepts (market) basket and transaction interchangeably.

Basket ID Items

1 {milk, bread, salami}

2 {beer, diapers}

3 {beer, wurst}

4 {beer, baby food, diapers}

5 {diapers, coke, bread}

Table7.1: Example transactional dataset.

*7.1.1* Support of item sets

Let us first define the**support**of an item set. Given some item set
I, we say that its support is simply the number of transactionsTj

from the entire transactional database**T**such thatTj ⊇ I, that is the
market basket with indexjcontains the item setI. Support is often

reported as a number between0and1, quantifying the proportion of
the transactional database which contains item setI^{.}

Note that a basket increases the support of an item set once all elements of the item set are found in a particular basket. Should a single element from an item set be missing from a basket, it no longer qualifies to increase the support for the particular item set. On the other hand, a basket might include arbitrary number of excess items relative to some item set and still contribute to its overall support.

**Example7.2.** Let us calculate the support of the item set {beer} from the
example transactional dataset from Table*7.1. The item beer can be found in*
three transactions (cf. baskets with ID*2,3*and*4), hence its support is also*
three. For our particular example – when the transactional database contains
*5*transactions – this support can also be expressed as3/5=0.6.

It is also possible to quantify the support of multi-item item sets. The
support of the item set {beer, diapers} is two (cf. baskets with id*2*and*4),*
or2/5 = 0.4in the relative sense when normalized by the size of the
transactional dataset.

There is an important property of the support of item sets that we
will heavily rely which will ensure the correctness of the Apriori
al-gorithm, being one of the powerful algorithms to tackle the problem
of frequent pattern mining. This important property is the
**anti-monotonity**of the support of the item sets. What anti-monotonity
means in general for some function f : X → **R**is that for any
x1,x2 ∈ X, that is a pair of inputs from the domain of the function
the property

x1>x2⇒ f(x1)≤ f(x2)

holds, meaning that the value returned by the function for a larger input is allowed to be at most as large as any of the outputs returned by the function for any smaller input.

We define a partial ordering over the subsets of items as depicted in Figure7.1. According to the partial ordering we say that an item setI is “larger” than item setJ whenever the relationI ⊃ J holds between the two item sets. Now the anti-monotonity property is naturally satisfied for the support of item sets as the support of a superset of some item set is at most as large as the support of the narrower set.

In the example illustrated by Figure7.1, item sets{^{b}}^{,}{^{c}}^{,}{^{b,}^{c}}
are assumed to be frequent. We indicate the fact that these item sets
are considered frequent by marking them with red. Note how the
anti-monotone property of support is reflected graphically in
Fig-ure7.1as all the proper subsets of the frequent item sets are always
frequent as well. If this were not the case, that was a violation of the
anti-monotone property.

m i n i n g f r e q u e n t i t e m s e t s 141

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

∅

Figure7.1: An example Hasse diagram for itemsa,bandc. Item sets marked by red are frequent.

*7.1.2* Association rules

The next important concept is that of association rules. From a
mar-ket basmar-ket analysis point of view an**association rule**is a pair of
(dis-joint) item sets,(X,Y)such that the purchase of item setX makes
the purchase of item setY likely. It is notated asX ⇒ Y.

In order to quantify the strength of an association rule, one can
calculate its**confidence, i.e.,**

c(X ⇒ Y) = ^{support}(X ∪ Y)
support(X) ^{,}

that is the number of transactions containing all of the items present in the association rule, divided by the number of transactions that include at least the ones on the left hand side of the rule (and poten-tially, but not mandatorily any other items, including the ones on the right hand side of the association rule). What confidence intuitively quantifies for an association rule is a conditional probability, i.e., it tells us the probability that a basket would contain item setY given that the basket already contains item setX.

**Example7.3.** Revisiting the example transactional dataset from Table*7.1,*
let us calculate the confidence of the association rule{^{beer}} ⇒ {^{diaper}}^{. In}
order to do so we need the support of the item pair {beer, diapers} and that of
the single item on the left hand side of the association rule, i.e., {beer}.

Recall that these support values are exactly the ones we calculated in
Example*7.2*that is

c({beer} ⇒ {diapers}) = ^{support}({beer,diapers})

support({^{beer}}) =2/3.

Recall that unlike conditional probabilities are not symmetric, i.e., P(A|B) = P(B|A)need not be the case by definition, the same applies

for the confidence of item sets, meaning that
c(X ⇒ Y) =c(Y ⇒ X)
**does not**necessarily hold.

As an example to see when the symmetry breaks, calculate the confidences of the association rules

{bread} ⇒ {milk}and{milk} ⇒ {bread}.

Also note that association rules can have multiple items on either on their sides, meaning that association rules of the form

{^{beer}} ⇒ {^{diapers,}^{baby f ood}}
are totally legit ones.

*7.1.3* The interestingness of an association rule

One could potentially think that association rules with high confi-dence are needlessly useful. This is not necessarily the case, however.

Just imagine the simple case when there is some product Awhich simply gets purchased by every customer. Since this product can be found in every market basket, no matter what productBwe choose for, the confidence of the association rulec(B ⇒ A)would also be inevitably 1.0 for any productB.

In order to better access the usefulness of an association rule, we need to devise some notion of true interestingness for the association rules. There exists a variety of such interestingness measures. Going through all of them and detailing their properties is beyond our scope, here we just simply mention a few of the possible ways to quantify the interestingness of an association rule.

A simple way to measure how interesting an association rule
A ⇒ ^{B}is to calculate the so-called**lift**of the association rule by
the formula

c(A ⇒ B)
s(B) ^{,}

withc(A ⇒ B)ands(B)denoting the confidence of the associa-tion rule and the relative support of item setB, respectively. Taking into consideration that the confidence of an association rule can be regarded as a conditional probability of purchasing item setBgiven that item setAhad been purchased, and that the relative support of an item set is nothing but the probability of purchasing that given item setA, it is easy to see that the lift of an association rule can be rewritten as

P(A,B)
P(A)P(B)^{,}

m i n i n g f r e q u e n t i t e m s e t s 143

withP(A,B)indicating the joint probability of buying both item sets
A^{and}Bsimultaneously,P(A)andP(B)referring to the marginal
probability of purchasing item setsAandB, respectively. What it
means in the end that the lift of a rule investigates to what extent
is the purchase of item setAis independent from that of item set
A. A lift value of 1 means that item setsAandBare purchased
independent from each other. Larger lift values mean a stronger
connection between item setsAandB.

We get a further notion of interestingness for an association rule if we calculate

i(A ⇒ B) =c(A ⇒ B)−^{s}(B),

wherec(A ⇒ B)denotes the confidence of the association rule and s(B)marks the relative support for the item set on the right hand side of the association rule. Unlike lift, this quantity can take negative value, once the conditions(B) > c(A ⇒ B)holds. This happens when item setBis less frequently present among such baskets that contain item setAcompared to the overall frequency of the presence of item setB (irrespective of item setA). A value of zero for that value means that we see item setBjust as frequently in those baskets that contain item setAas well than in any basket not necessarily containing item setAin general. A positive value on the other hand means, that the presence of item set in a basket makes the presence of item setBmore likely compared to the case when we do not know if Ais also present in the basket.

*7.1.4* The cardinality of potential association rules

In order to illustrate the difficulty of the problem we try to solve from
a combinatorial point of view, let us quantify the number of possible
association rules that one can construct out ofdproducts. Intuitively,
in an association rule every item can be either absent or present in
the left hand side or the right hand side of the rule. This means that
for every item, there are three possibilities in which it can be involved
in an association rule, meaning that there are exponentially many,
i.e.,O(3^{d})potential association rules that can be assembled fromd
distinct items.

Note, however, that the quantity 3^{d}is an overestimation towards
the true number ofvalidassociation rules, in which we expect both
sides to be disjoint and non-empty. By discounting for the invalid
association rules and utilizing the equation

(1+x)^{d}=

### ∑

d j=1d j

x^{d}^{−}^{j}+x^{d},

we get for the exact number of valid association rules to be answer was not correct in the strict sense – because it also included the number of invalid association rules – it was correct in the asymp-totic sense.

Based on the above exact results, we can formulate as many as 18,660possible association rules even when there are justd = 9 single items to form association rules from. This quick exponential growth in the number of potential association rules is illustrated in Figure7.2, making it apparent that without efficient algorithms finding association rules would not practically be feasible. Since association rules can be created by partitioning frequent item sets, it is of upmost importance that we could find frequent item sets efficiently in the first place.

2 4 6 8 10 12 14

Figure7.2: Illustration of the distinct potential association rules as a function of the different items/features in our dataset (d).

*7.1.5* Special subtypes of frequent item sets

Before delving into the details of actual algorithms which efficiently determine frequent item sets, we first define a few important special subtypes of item sets. These special classes of item sets which are beneficial because they allow for a compressed storage of frequent item sets that can be found in some transactional dataset.

• A**maximal frequent item set**is such that all of its supersets are
not frequent. To put it formally, an item setIis maximal frequent

m i n i n g f r e q u e n t i t e m s e t s 145

if the property

{^{I}|^{I}^{frequent} ∧@^{frequent}^{J}⊃ ^{I}}
holds for it.

• A**closed item set**is such an item set that none of its supersets
have a support equal to it, or to put it differently, all of its
su-persets have a strictly smaller support compared to it. Formally
stating this property

{I|@J⊃I:s(J) =s(I)}

has to hold for an item setIto be closed.

• A**closed frequent item set**is simply an item set which is closed
in the above sense and which has a support exceeding some
previ-ously defined frequency threshold*τ.*

It can be easily seen that maximal item sets always need to be
closed as well. This statement can be verified by contradiction. That
is if we suppose that there exists some item set Iwhich is maximal,
but which is not closed, we get to a contradiction. Indeed, if item set
Iis not a closed item set, then it means that there is at least one such
supersetJ ⊃ ^{I}with the exact same support, i.e.,s(I) = s(J). Now,
sinceI is a frequent item set based on our initial assumption, so does
Jbecause we have just seen that it would have the same support asI.

This, however, contradicts to the assumption ofIbeing a maximal frequent item set, since there exists no superset for maximal frequent item sets that would be frequent as well. Hence maximality of an item set implies its closed property as well. Figure7.3summarizes the relation of the different subtypes of frequent item sets.

As depicted by Figure7.3, maximal frequent item sets holds for
just a privileged set of frequent item sets, i.e., those special ones that
are located at the**frequent item set border, also referred to as the**
**positive border. Item sets that are located in the positive border are**
extremely important, since storing only these item sets is sufficient
to implicitly store all the frequent item sets. This is again assured by
the anti-monotone property of the support of item sets, i.e., all proper
subsets of the item sets on the positive border needs to be frequent
as well because they have at least the same or even higher support as
those item sets present in the positive border.

Additionally, by the definition of maximal frequent item sets, it is also the case that none of their supersets is frequent, hence maximal frequent item sets are indeed sufficient to be stored for implicitly storing all the frequent item sets. Storing maximal frequent item sets, however, would not allow us to reconstruct the supports of all the

Frequent item sets

Closed frequent item sets

Maximal frequent item sets

Figure7.3: The relation of frequent item sets to closed and maximal frequent item sets.

frequent item sets. If it is also important for us that we could tell the exact support for all the frequent item sets, then we also need to store all the closed frequent item sets.

Note, however, that the collection of closed frequent item sets is still narrower than those of frequent item sets (cf. Figure7.3). Hence, storing closed frequent item sets alone also implicitly stores all the frequent item sets in a compressed form together with their support.

**Example7.4.** Table*7.2*contains a small transactional dataset alongside
with the categorization of the different item sets. In this example, we take
the minimum support for regarding an item set to be frequent as3. We can
see as mentioned earlier that whenever a dataset qualifies being maximal
frequent, it always also holds that the given dataset is closed simultaneously.

Additionally, we can see as well that taking all the proper subsets of the maximal frequent item sets identified in the transactional dataset, we can generate all further item sets that are frequent as well, i.e., the proper subsets of item sets{A,B}and{B,C}, it follows that the individual singleton item sets –{A},{B}and{C}– are also above our frequency threshold that was set to3.