My interests include Natural Language Processing, Speech Recognition, Machine Learning and Computer Security.
Inference Networks for Structured Prediction - A TensorFlow implementation for the multi-label classification experiments in Learning Approximate Inference Networks for Structured Prediction. Also contains experiments on the FIGMENT dataset and a extension to Inference Network training algorithm based on the paper Improved Training of Wasserstein GANs.
Diversity Sampling in Machine Learning - An implementation of Diverse Beam Search for Neural Networks in Language Modelling. Also contains the original (slightly modified code) for the interactive segmentation experiments in Diverse M-Best Solutions in MRFs.
Macro Actions in Reinforcement Learning - A suite of five algorithms (including ideas from “Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning”) encouraging agents to repeat actions.
Single Image Haze Removal - An implementation of He et al. 2009, “Single Image Haze Removal using Dark Channel Prior” and an ongoing implementation of Bahat & Irani 2016, “Blind Dehazing using Internal Patch Recurrence”.
CNNs for Sentence Classification - A TensorFlow 1.1 implementation of Kim 2014, “Convolutional Neural Networks for Sentence Classification”.
Brittle Fracture Simulation - Python implementation of O’Brien and Hodgins 1999, “Graphical Modeling and Animation of Brittle Fracture”.
ECG Signal Analysis - Python implementation of parts of Christopher Buck, Aneesh Sampath 2013, “ECG Signal Analysis for Myocardial Infarction Detection.”.
Indian Language Datasets
As a part of my RnD project at IIT Bombay, I am releasing the dataset used to train my neural network language models. These have been mined from Wikipedia and I hope this will help further research in language modelling for Indian morphologically rich languages. The folder also contains the original PTB dataset.
- Malayalam (denoted by
- Tamil (denoted by
- Kannada (denoted by
- Telugu (denoted by
- Hindi (denoted by
- PTB (denoted by
All these datasets are compatible with SRILM. Files marked with
unk have replaced all singletons with
<unk> tokens. Files marked with
char are character versions. All datasets have a
test file. You will find the dataset here.