Differential Privacy Part-IV: Advanced DP Mechanisms

Part IV: Advanced Differentially Private Mechanisms

This blog post is meant to introduce you to a few differential private mechanisms. This assumes you have some understanding of the definition and aims of differential privacy. This is not an exhaustive list or indepth analysis. Just introduction to few mechanisms to get an understanding of DP and why mechanisms based on DP work. It also motivates the privacy-utility tradeoffs that could be made.For proofs as to how the mechanism bounds are derived you can refer to the book <a target="__blank href="https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf">The Algorithmic Foundations of Differential Privacy.</a> All of the mechanisms described in the section are DP (,0)

There is no single mechanism that guarantees privacy and utility over several possible queries and problems , but there are some class of queries and common DP algorithms that provide a solution with decent utility-privacy tradeoff.

To understand the mechanisms we will consider it from query-release problem point of view where analyst could analyze the database in form of queries. Similar to querying a regular SQL database

Sparse Vector Technique

Sparse Vector Technique(SVT) is one technique that greatly helps save the privacy budget by a large extent. The fundamental idea is to report the answer only if it is above a certain threshold. The value being a noisy value.This way queries below certain threshold not being reported helps save privacy budget as they are the one's with the most privacy budget. But , one major disadvantage of vanilla SVT is that small values below threshold could pass threshold with a large noise while large values above threshold could be below the threshold by adding a large negative noise. Although the possibility of this is small there are some adaptive SVT techniques which improve upon these drawbacks. SVT is a building block behind scaling training of Machine Learning models. PATE(Private aggregate of Teacher Ensembles) rely on training ensemble of machine learning models. They output their votes in the form of histogram queries. When majority of the models output a certain class it is more likely to be correct while there is not single winner could be revealing on the training data.

AboveThreshold(D, {fi}, T, ε):
        for Each query i do
              Let νi = Lap(4/ε)
              if fi(D) + νi  Tˆ then
                     Output ai = .
                     Output ai = .
              end if
        end for

Check out Part-IV of the tutorials where I will explain how DP algorithms privacy guarantees hold good over multiple queries


Introduction to Weakly Supervised Learning

3 minute read

Supervised Machine Learning relies on labelled data that consists of data and pairs of expected outputs. For example an image of dog that is labelled a dog. ...

Meta Learning with MAML

3 minute read

Training neural networks for a single task requires several thousands of examples for a each class when training a model from scratch. This is typically not ...

Analyze Private datasets using Pandas

6 minute read

Conventionally pandas allows you to analyze datasets that are present locally on your PC, that is when you are given access to a given dataset. But, there a...

Back to top ↑


Deep Learning in Practice-Be The algorithm

6 minute read

Conventional machine learning required the practitioner to manually look at images/text and handcraft appropriate features. Deep Learning models are powerful...

Back to top ↑


Differential Privacy Part-II: DP Mechanisms

6 minute read

Having gone through the importance of differential privacy and its definition, this article motivates the theory with a practical example to make it more int...

Differential Privacy Part-I: Introduction

6 minute read

Personal data is a personal valuable asset, it could be used for economic, social or even malicious benifits. Most internet companies survive on personal dat...

Back to top ↑