My interests include Natural Language Processing, Speech Recognition and Computer Graphics. I’ve primarily worked on three research projects - End-to-End Speech Recognition during my summer internship at Toyota Technological Institute at Chicago, Neural Language Modelling as a part of a RnD Project at IIT Bombay and Constraint-Driven Learning for NLP applications as a part of my Bachelor’s thesis at IIT Bombay.
- Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu
A Study of All-Convolutional Encoders for Connectionist Temporal Classification
ICASSP-2018 (Awarded SPS Travel Grant)
Macro Actions in Reinforcement Learning - A suite of five algorithms (including ideas from “Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning”) encouraging agents to repeat actions - macro-actions-rl
Single Image Haze Removal - An implementation of He et al. 2009, “Single Image Haze Removal using Dark Channel Prior” and an ongoing implementation of Bahat & Irani 2016, “Blind Dehazing using Internal Patch Recurrence” - blind-dehazing
TensorFlow 1.1 implementation of Kim 2014, “Convolutional Neural Networks for Sentence Classification” - tf-sentence-classification
Python implementation of O’Brien and Hodgins 1999, “Graphical Modeling and Animation of Brittle Fracture” - brittle-fracture-simulation
Python implementation of parts of Christopher Buck, Aneesh Sampath 2013, “ECG Signal Analysis for Myocardial Infarction Detection.” - ecg-analysis
Indian Language Datasets
As a part of my RnD project at IIT Bombay, I am releasing the dataset used to train my neural network language models. These have been mined from Wikipedia and I hope this will help further research in language modelling for Indian morphologically rich languages. The folder also contains the original PTB dataset.
- Malayalam (denoted by
- Tamil (denoted by
- Kannada (denoted by
- Telugu (denoted by
- Hindi (denoted by
- PTB (denoted by
All these datasets are compatible with SRILM. Files marked with
unk have replaced all singletons with
<unk> tokens. Files marked with
char are character versions. All datasets have a
test file. You will find the dataset here.