My interests include Natural Language Processing, Speech Recognition and Computer Graphics.

Publications

Thesis

  • Constraint Driven Learning
    (under guidance of Prof. Preethi Jyothi)
    IIT Bombay (2017-2018)
    [[pdf]](http://martiansideofthemoon.github.io/assets/btp-report.pdf)

Research Implementations

Indian Language Datasets

As a part of my RnD project at IIT Bombay, I am releasing the dataset used to train my neural network language models. These have been mined from Wikipedia and I hope this will help further research in language modelling for Indian morphologically rich languages. The folder also contains the original PTB dataset.

  • Malayalam (denoted by ml)
  • Tamil (denoted by ta)
  • Kannada (denoted by kn)
  • Telugu (denoted by te)
  • Hindi (denoted by hi)
  • PTB (denoted by ptb)

All these datasets are compatible with SRILM. Files marked with unk have replaced all singletons with <unk> tokens. Files marked with char are character versions. All datasets have a train, valid and test file. You will find the dataset here.