## Learning to predict classes not seen during training (Machine Learning)

In conventional machine learning we train an algorithm on data/classes pairs (supervised learning) or only on raw data( Unsupervised learning) and make predi...

Having gone through the importance of differential privacy and its definition , this article motivates the theory with a practical example to make it more intuitive how the definition is applied to attain privacy. The small town example in this previous article is just to develop an intuitive understanding of why DP would be important. In practice , for DP to be effective you would need a really large database such that the **law of large numbers** work. The law of large numbers is a law from statistics which proves that even in presence of some noisy measurements, the result of a statistic would converge to the true value(or very close to).

Let us consider the query release problem where a analyst is querying a dataset with the two queries resulting in database without a given row (D2) and database with the given row(D1) as defined in previous tutorial.

The previous tutorial defined DP as two probability distributions being close to each other by a factor of exp(epsilon). We put this definition into practice by adding noise to result of every query. The noise is sampled from a probability distribution that is calibrated according to the definition of DP. Note that the noise is added to the result of a query and not to the data itself. Since DP defines how two probability distributions vary a factor of epsilon , we are free to sample from any probability distribution. Most popular distributions used in practice are Laplace and Gaussain.

**Note:** The article provides a high level overview and intuitive understanding of various mechanism. For more comphrehensive proofs and details check out the book Algorithmic Foundations of Differential Privacy

Laplace mechanism is a DP mechanism in which noise sampled from a Laplacian distribution is added to a query result.The Laplace distribution is very suitable for DP because of its sharp rise and fall which for a given standard deviation (or epsilon) has lesser chances of values being sample being close to each other.

Laplace distribution is given by

denoted by Lap(μ,b) , where μ is the mean of the distribution and b is the standard deviation or scale. The Laplace noise we add is given by Lap(0,sensitivity/epsilon). This gives a privacy guarantee of DP(epsilon,0).

Below is an example of adding Laplacian noise to a counting query. A counting query is a type of query where you find the number of occurances or counts of a given data. The true value is 1001. Consider two queries , the true value D1 with count 1001 and the value in which a count is dropped 1000(D2). We add Laplacian noise to both the datasets. It is intiuitive from the below diagram that the examples with large noise between the two queries have highest guarantees and noise which cancel out each other could release the private value.

Notice that when sampled from a probability distribution , when we query with just dataset D1 and D2 if the noise added is very close at both instances , there is very little privacy guarantee. But , if both the noise are in opposite direction there is a maximal privacy guarantee. So the probability of the noises added depends on the standard deviation of the distribution we sample from. Based on the above we could develop a mechanism in which we sample for D1 and always add noise in opposite direction when D2 is queried , but we want to develop a generalized querying mechanism where we treat even differencing attacks query as the same as any other query. The adversery having a knowledge of few rows in the database wont find it hard to find out such a mechanism. We rely on DP being a randomized mechanism.

Exponential Mechanism can be useful when perturbation of the values by themselves could lead to drastically reduce utility. That is the privacy -utlity relationship isnt linear. Suppose a age of a person was 35 and we add noise to it the query answer is 36 or 37 , the utility degrades by a factor of 1 and 2. But , there are scenerios where the utility degrades much quickly. For example , given item to be sold at price p (Assume If there are multiple people quoting same price all of them get the product) . There are three biders for 1$ 1$ and 3$. If the price is below the price p , the buyers would buy the item. Now we want to set the price p such that it makes use of the prices put by the bidder but the exact price is to be kept a secret. Suppose we add noise to $3 bidded price , it becomes 3.1$ , there would 0 revenue generated. Now suppose noise is added to 1$ and it becomes $1.1 , the revenue generated is $1.1 from the person who put a price of $3. But , if the price was $0.9 three people would buy the items making $2.7 revenue. From above scenerios it is evident that the utility-privacy trade off is not linear. So instead of adding noise directly to the value , we choose a value based on some criteria that accounts for the utility gained and accordingly selects a value with some randomness. The criteria for choosing the value is given by exponential mechanism.

In exponential mechanism the probability of a given choice/number is given by exponential of a scoring function given by exp(εu(x, r)/∆u). u is called the utility function ∆u is the maximum change in utility (akin to sensitivity of Laplacian Noise) Utility is a function which gives how useful a given range of value is to a query. Exponential could be a great way to discount for utility , selecting values with higher utility and discarding values with lower utility. It also ensures when there exists values of almost equal utility the utility guarantees are still as high as possible while maintaing privacy. If two or more choices have almost the same scores they have almost equal chances of being randomly picked which sacrifices utility to a certain extent but by a small loss since score functions of both are nearly equal. But , having such choices also provides a privacy bound.

To solve the above bidding problem , we can have a range of continious values from 0.8 to 3.0 with a step of 0.1. The utility score will be the revenue generated if the item is sold for a given price. We calculate for utility for all the values and find the maximum sensitivity denoted by ∆u.Calculating exp(εu(x, r)/∆u) normalized by sum of exp(εu(x, r)/∆u) for all the continious values in the range give us a probability.The probability values generated by the mechanism is the probability with which each of the values would be selected.

Composition is the property of DP which describes how the privacy guarantees hold not just over a single query but also multiple queries. It also provides mechanism to handle privacy guarantees over several queries. Simplest rule of them is basic composition , which states that if k DP algorithms are run the privacy is now DP(k,k).

I will explain different types of composition in detail in a future article. There are various ways of handling composition over several queries. Epsilon delta definition of differential privacyCheck out Part-III of the tutorials where I will explain how DP algorithms privacy guarantees hold good over multiple queries

In conventional machine learning we train an algorithm on data/classes pairs (supervised learning) or only on raw data( Unsupervised learning) and make predi...

Conventional machine learning required the practitioner to manually look at images/text and handcraft appropriate features. Deep Learning models are powerful...

Having gone through the importance of differential privacy and its definition , this article motivates the theory with a practical example to make it more in...

Personal data is a personal valuable asset , it could be used for economic , social or even malicious benifits. Most internet companies survive on personal d...