Dr. Udayasanker Thayasivam

  • Utility of Multilingual embedding for Tamil Language Technology
  • :Moratuwa University,Sri Lanka

Moratuwa University, Srilanka

Utility of Multilingual embedding for Tamil Language Technology: The availability of resources, training data, and benchmarks in English leads to a disproportionate focus on the English language and negligence of the plethora of other languages spoken worldwide. Researchers have been looking into methods of transferring the knowledge gained from working with high-resource languages such as ‘English’ to low-resourced languages such as ‘Tamil’.

A multilingual embedding model is a powerful tool that encodes text from different languages into a shared embedding space. This techniqueenables to be applied to various downstream tasks, like text classification, clustering, and others while leveraging semantic information for language understanding. Tamil language technology is a rapidly growing space that mainly lacks useful annotated data at scale. Multilingual embedding enables Languages that lack humongous annotated corpus by utilizing multilingual embedding to build useful machine learning models with limited or no annotated corpus.

Session