Outlining my non-published projects here (open source contributions, course research projects, designed homeworks). My research work can be seen here.

Other Research (Course Projects)

Self-supervised Learning on 3D Point Clouds: New algorithms for self-supervised learning on point clouds, where we teach models to discriminate between real and fake objects. To create fake objects, we perform global perturbations to segments of an object derived from Approximate Convex Decomposition (report).

MixMatch on Vision + Language Tasks (NLVR2): An attempt to integrate the MixMatch data augmentation algorithm for semi-supervised image classification to the challenging setting of NLVR2, where the input space has both images and text (report).

Research Exchange - A Collaborative Paper Annotation Tool - A platform to collaboratively annotate scientific literature to help newcomers understand research papers, built during an Human Computer Interaction course project (report).

Inference Networks for Structured Prediction - A TensorFlow implementation for the multi-label classification experiments in Learning Approximate Inference Networks for Structured Prediction. Also contains experiments on the FIGMENT dataset and a extension to Inference Network training algorithm based on Wasserstein GANs (report).

Diversity Sampling in Machine Learning - An implementation of Diverse Beam Search for Neural Networks in Language Modelling. Also contains the original (slightly modified code) for the interactive segmentation experiments in Diverse M-Best Solutions in MRFs (report).

Macro Actions in Reinforcement Learning - A suite of five algorithms (including ideas from “Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning”) encouraging agents to repeat actions (report).

Single Image Haze Removal - An implementation of He et al. 2009, “Single Image Haze Removal using Dark Channel Prior” and an ongoing implementation of Bahat & Irani 2016, “Blind Dehazing using Internal Patch Recurrence” (report).

CNNs for Sentence Classification - A TensorFlow 1.1 implementation of Kim 2014, “Convolutional Neural Networks for Sentence Classification”.

Brittle Fracture Simulation - Python implementation of O’Brien and Hodgins 1999, “Graphical Modeling and Animation of Brittle Fracture”.

ECG Signal Analysis - Python implementation of parts of Christopher Buck, Aneesh Sampath 2013, “ECG Signal Analysis for Myocardial Infarction Detection.”.

Course Materials

Homework on linguistic probe tasks designed for UMass Amherst’s grad NLP class using AllenNLP.

Open Source Contributions

Indian Language Datasets

As a part of my RnD project at IIT Bombay, I am releasing the dataset used to train my neural network language models. These have been mined from Wikipedia and I hope this will help further research in language modelling for Indian morphologically rich languages. The folder also contains the original PTB dataset.

  • Malayalam (denoted by ml)
  • Tamil (denoted by ta)
  • Kannada (denoted by kn)
  • Telugu (denoted by te)
  • Hindi (denoted by hi)
  • PTB (denoted by ptb)

All these datasets are compatible with SRILM. Files marked with unk have replaced all singletons with <unk> tokens. Files marked with char are character versions. All datasets have a train, valid and test file. You will find the dataset here.