Open Source Contributions


Core Contributor, SyMPC [2021-]
“A library for training and evaluation Neural Networks using Multi Party Computations in Pytorch”

I review pull requests, fix bugs, and develop features for the repository.
Worked on some functionality of Falcon MPC protocol and improved automatic differentiation module.

Core Contributor, PySyft [2018-19]
A framework for privacy-preserving deep learning using Pytorch and Tensorflow.

I worked on Pytorch development.The experience helped me learn a lot of software engineering practices such as unit tests, code reviews, Git, writing clean and documented code. Some tasks I worked on are:

  • Wrote use cases of Federated Learning for developing Word Embeddings from Private Data and demonstration on CIFAR10
  • Wrote an implementation of Differential Privacy method Private Aggregation of Teacher Ensembles (PATE)
  • Tried working on Polynomial Tensor for non-linear computation in multiparty computation (MPC) setting. The method approximated non-linear computation using interpolation/Taylor series methods. Could not integrate it as part of Syft chain of tensors.
  • Refactored data loaders and federated dataset and wrote a tutorial on developing custom federated datasets
  • Type annotated the codebase and wrote documentation for parts of it


Deepgaze
Computer Vision library for human-computer interaction in Python

Refactored certain parts of the library for improved usability and wrote test cases to ensure accurate functionality. Ported library to Python 3.0.


Some Personal Projects I am proud of

For complete list of projects feel free to check my Github Profile.

GreyNSights

[Repository] [Introductory Blogpost]

GreyNSights is a Framework for Privacy-Preserving Data Analysis.

Currently with support only for Pandas. The framework allows analysts to remotely query a dataset such that the dataset remains at source and private to data analyst. The query results returned are differentially private. The framework offers flexibility to the analyst by ensuring that they can use the same pandas syntax for analyzing and transforming datasets, but cannot view the individual rows. GreyNSights also offers flexibility to query several parties together and get aggregate statistics without revealing individual counts of parties.

Example Usage

Owner of a sensitive datasets hosts the dataset

import pandas
from GreyNsights.analyst import Pointer
from GreyNsights.host import Dataset, DataOwner
from GreyNsights.config import Config

dataset = pandas.read_csv("animals_and_carrots.csv", sep=",", names=["animal", "carrots_eaten"])

owner = DataOwner("Bob", port=6544, host="127.0.0.1")
config = Config(owner)
config.load("test_config.yaml")
dataset = Dataset(owner, "Sample Data", dataset, config, whitelist={"Alice": None})

dataset.listen()

A data analyst interested in the dataset can now analyze the dataset while maintaining privacy of dataset and keeping it at source

#Initilization code of GreyNSights
import GreyNsights
from GreyNsights.analyst import DataWorker, DataSource, Pointer, Command, Analyst
from GreyNsights.frameworks import framework

identity = Analyst("Alice", port=65441, host="127.0.0.1")
worker = DataWorker(port=6544, host="127.0.0.1")
dataset = DataSource(identity,worker, "Sample Data")
config = dataset.get_config()

#Initialization Pointer
dataset_pt = config.approve().init_pointer()

#Analysis of dataset
df = pandas.DataFrame(dataset_pt)
df.columns
df.describe().get()
df['carrots_eaten'].mean().get()
df['carrots_eaten'].sum().get()
(df['carrots_eaten']>70).sum().get()
df['carrots_eaten'].max().get()


Weakly Supervised Street View Text Detection

[Repository]


Trained a Convolutional Neural Network for segmentation and localisation using dataset labelled by charecter detection network

  • Used Pytorch, OpenCV and Pillow in Python
  • Trained a character agnostic text detector on Chars74K dataset along with images consisting of indoor/outdoor scenes without text. The character agnostic model is a alexnet network pretrained on imagenet.
  • Trained a street text localisation and detection Fully Convolutional Network(FCN) on the weakly supervised labelled dataset. The dataset was labelled by the character agnostic text detector
  • Reduced time to label a single image by 34% by training a smaller network using Knowledge Distillation, making the model capable of labelling thousands of images in a few hours.
  • Also explored and analyzed other methods for Neural Network compression.
  • Used the distilled network and sliding windows to annotate images in UCSD SVT and NEOCR dataset to derive bounding boxes