Word embeddings for native languages

Ganga Meghanath

Apr 5, 2018

Corpus

Abstract

We scraped data from over 30 websites to successfully construct a corpus of nearly 15 million words in the Indian native language Malayalam. We performed a comparative study on the effectiveness of existing word2vec models on the corpus. We dveloped custom metrics and test cases in Malayalam for model evaluation.

Source Themes

Ganga Meghanath

Data & Applied Scientist

My research interests include Reinforcement Learning, Deep Learning, Game Theory, Vision & Robotics.

Word embeddings for native languages

Abstract

Ganga Meghanath

Data & Applied Scientist

Related