My interests include Natural Language Processing, Speech Recognition and Computer Graphics. I’ve primarily worked on three research projects - End-to-End Speech Recognition during my summer internship at Toyota Technological Institute at Chicago, Neural Language Modelling as a part of a RnD Project at IIT Bombay and Constraint-Driven Learning for NLP applications as a part of my Bachelor’s thesis at IIT Bombay.

Preprints

  • Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu
    “A Study of All-Convolutional Encoders for Connectionist Temporal Classification”
    Submitted to ICASSP-2018
    [arXiv] [code]

Research Implementations

Indian Language Datasets

As a part of my RnD project at IIT Bombay, I am releasing the dataset used to train my neural network language models. These have been mined from Wikipedia and I hope this will help further research in language modelling for Indian morphologically rich languages. The folder also contains the original PTB dataset.

  • Malayalam (denoted by ml)
  • Tamil (denoted by ta)
  • Kannada (denoted by kn)
  • Telugu (denoted by te)
  • Hindi (denoted by hi)
  • PTB (denoted by ptb)

All these datasets are compatible with SRILM. Files marked with unk have replaced all singletons with <unk> tokens. Files marked with char are character versions. All datasets have a train, valid and test file. You will find the dataset here.