• Nem Talált Eredményt

The online auditing problem is defined as follows: Givent−1 queriesq1, . . . ,qt−1of form (Qi, AV G) over the stored data setX ={x1, . . . , xn} and the corresponding answersa1, . . . ,at−1. The value of eachxiis assumed to be real number that lies in a finite interval [α, β],β > α. When a newqtis posed, the task of the online auditor is to make a decision inreal-time whether to answer or deny the query. More specifically, my goal is to propose an auditor that detects if answering with true at causes full disclosure of MAX. First of all, I discuss the construction of a simulatable auditor for this problem, and I will show the limitation of simulatable auditor in this case. Thereafter, I propose an another method that gets around this limitation.

Note that in Algorithm 1, based on the concept of simulatable auditor shown in Fig. 4.1, by ignoring the true answer at we examine every data set X0, consistent with the past queries

and answers, and check if it causes the full disclosure of MAX. This means that the answer a0t computed based onX0 andQt, is included in the analysis. The auditor is simulatable because it never looks at the true answer when making a decision. The main drawback, however, of using simulatable auditor in my problem is the bad utility. In order to see this, consider any AVG query qthat specifies a subset{xi1, xi2, ..., xik}ofX as the query set. There always exist a data setX0 for which this query is not safe to respond, namely, the data set where xi1 =xi2 = . . . = xik = β, as in this case, the true response would beβ, and the querier can figure out that all values in the query set must be equal toβ. This essentially means that all queries should be denied by a simulatable auditor.

Algorithm 1: Simulatable online auditor Auditormaxavg Inputs: q1, . . . , qt,a1, . . . , at−1,α,β;

foreach consistent data set X’docompute the AVGa0tbased onQt andX0; LetLtbe the feasible set formed by thetqueries/answers;

if Lt yields an exact maximumthenoutput DENY;endif endfor

outputat;

Algorithm 2/a: Online auditor Auditormaxavg Inputs: q1, . . . , qt,a1, . . . , at,dtr,α,β;

LetLt be the feasible set formed by thetqueries/answers Letxoptt be the returned maximum by solvingP withLt

if |xoptt −MAX|> dtr AND (MAX−maxt)> dtr thenoutputat;endif else if |xoptt −MAX| ≤dtr OR (MAX−maxt)≤dtr thenoutput DENY;endif

To achieve better utility, hence, I propose two methods (Algorithms 2/a and 2/b) that are not simulatable but I show that they still ensure, in the full disclosure model, the privacy of the maximum value. I start with discussing the Algorithm 2/a: Let us denote|xopt−MAX| as the absolute distance betweenxopt and MAX. Let maxt be the maximum of the firstt answers. Let L be the feasible set that is similar toL but the constraintα≤xi≤β is involved only for such xi’s that occurs in the firsttqueries, and not for all thenvariables. Namely, inLthe second line ofLis changed toα≤xi≤β, for allisuch thatxi occurs in in the firstt queries.

L=

A¯x¯= ¯b,where ¯xis the vector (x1, . . . , xn)T. α≤xi≤β,∀xi that occurs in the first t queries.

Note that I useLinstead ofLin my online auditor because by doing this the auditor leaks less information to the attacker either when answering or denying. To illustrate this, let us consider the example in which the data set is{x1, x2, x3, x4}and∀xi :xi∈[0,5]. Assume also that the first queryq1 is x1+x2 2, and its corresponding answer is 4. The feasible setLinduced by these pieces of information is as follows:

L=

A¯¯x= ¯b,where ¯xis the vector (x1, x2, x3, x4)T. 0≤xi≤5,∀xi :xi ∈ {x1, x2, x3, x4}

where

A¯= 1 1 0 0

, ¯b= 8

However, in this situation the attacker knows that the value ofxopt returned byP is always 5 because during estimating the maximum, all the possible values of the four variables are considered and involved. In contrast, by usingL in the previous example, the value of xopt returned by P is the maximum of only the variablesx1,x2 that occur in q1. This means that the value ofxopt depends on the true answer of q1, hence, the attacker does not know exactly the value of xopt without getting the true answer.

The online auditor, based on the Algorithms 2/a, works as follows: Recall thatL is defined over t queries and answers. Whenever a new query qt is posed, the auditor computes the true

DATABASES

answerat, and then it solves the problemP withL, obtainingxopt. If for a given treshold value dtr,|xopt−MAX|> dtrand (MAX−maxt)> dtrthen the true answeratis provided. Otherwise, if|xopt−MAX| ≤dtr or (MAX−maxt)≤dtr the auditor denies.

In the following, I continue with discussing the Algorithm 2/b, which provides better utility than the Algorithm 2/a: I note that using directly algorithmAsum or the proposed offline auditor in Section 4.5 for constructing an online auditor does not work because denying could lead to full disclosure ofMAX. For instance, according to the offline auditor, if the feasible setLis such that there is a variable xi that can be uniquely determined to be the upperbound, the auditor should deny. However, by receiving a deny the attacker knows that the particular variable is the upperbound since otherwise the auditor should have answered.

Nevertheless, with a minor modification, I still can include the concept of the offline auditor in Section 4.5. Namely, the auditor denies to answer not when there is a variablexi that can be uniquely inferred to be the upperbound, but instead, it denies when a variable is deduced to be within the tolerance tresholddtr from the stored maximum. The proposed online auditor based on the concept of offline auditor is as follow:

Algorithm 2/b: Online auditor Auditormaxavg

Inputs: q1, . . . , qt,a1, . . . , at,dtr,α,β;

LetLt be the feasible set formed by thetqueries/answers

if withLtthe linear equation system has unique solutionthenoutput DENY;return;endif else if there is axithat can be uniquely determinedthen

if (M AX−xi)> dtr AND (M AX−maxt)> dtr thenoutputat;return;endif else if (M AX−xi)≤dtr OR (M AX−maxt)≤dtr thenoutput DENY;return;endif endif elseoutputat;

This online auditor preventMAXfrom being fully disclosured, because in case of deny, the attacker still cannot gain any information about the exact value ofMAX, at most, it only knows that the value is outside the tolerance treshold dtr from MAX. This version of online auditor produces better utility than Algorithm 2/a, because it does not depend on the uncertainty of the estimation of MAX. The auditor based on Algorithm 2/a can deny a huge number of queries in case the stored maximunMAX is within the tolerance tresholddtr from the upperboundβ. For instance, let consider the example data setx1 = 3, x2 = 4, x3 = 1, and the bounds of the variablesα = 0,β = 5. In case the treshold dtr is 2, the auditor based on Algorithm 2/a will deny the queries q1 = AVG(x3) because the estimated maximum based on L formed by the first queries, will be the upperbound 5, which is within the treshold from the stored maximum 4. This scenario cannot happen in Algorithm 2/b.

Lemma 6. Assuming thatdtr>0, the online auditor implemented by the Algorithms 2/a and 2/b provides the privacy of MAX in the full disclosure model.

Proof. (Sketch taken from the section VI.B of my report [Th10 , 2012]) Let fatt(dtr, q1,. . . , qt, a1,. . . ,at−1,α,β) represent the attacker’s based on the input parameters, and returning as output a deny or an answer. I prove that my online auditors do not leak information aboutMAX, in the full disclosure model by showing that the number of the data sets and the parameter sets for which fatt returns deny or answer is always larger than 1. In other words, in every possible scenario, for the attacker the number of possible maximum values will always be greater than 1, hence, the value ofMAX cannot be uniquely determined. I apply mathematical induction in each case to show this.

The complexity and utility of the online auditors: The worst-case complexity of the online auditor depends on the worst-case complexity of P and the number of posed queries. We can assume that the number of queries is O(n), where n is the size of the data set. In this case, by applying one of the polynomial time linear program solver methods, the whole complexity remains polynomial.

The utility of the auditor can be measured based on the number of denies. This is controlled by the treshold valuedtr. Broadly speaking, if dtr is large then the expected number of denies is greater, while whendtr is small the degree of privacy provided decreases, because the estimated maximum can be very close to the real maximum (MAX). The more specific choice of dtr to achieve a good trade-off between utility and privacy level for the specific application scenarios is an interesting question, for which I will find the answer in the future work.

4.7 Simulatable auditor

maxavg

in the probabilistic disclosure