Word embeddings for native languages

Abstract

We scraped data from over 30 websites to successfully construct a corpus of nearly 15 million words in the Indian native language Malayalam. We performed a comparative study on the effectiveness of existing word2vec models on the corpus. We dveloped custom metrics and test cases in Malayalam for model evaluation.

Avatar
Ganga Meghanath
Data & Applied Scientist

My research interests include Reinforcement Learning, Deep Learning, Game Theory, Vision & Robotics.

Related