Tutorials – Tamil Internet Conference 2021

Tutorial I: Title : – Neural machine translation – NMT

Resource Persons:

Dr. Pattabhi Rao

Dr. Vijay Sundar Ram

Anna Univ -KBC Research Center.

Abstract:
Translation is the process of decoding the meaning of the source text and recoding this meaning in
the target language without information loss. Automatic machine translation is one of the prime
research areas in Natural Language Processing (NLP) for decades. Different methodologies have
been evolved in the development of Machine Translation (MT) systems and Neural Machine
Translation (NMT) is currently the popular approach. In this tutorial, we intend to present a brief
introduction to different approaches of MT, building blocks of NMT, and different data preparation
methodologies and to demonstrate the steps involved in building an English to Tamil NMT system
using OpenNMT. The tutorial will be useful for students and professionals working in the area of
Language technology.
Description:
Translation is an act of rendering a discourse from one language to another language without losing
any information. Automatic Machine Translation is on the most active research area in Natural
Language Processing (NLP) for the past few decades. Machine translation (MT) systems were
developed using different approaches such as Dictionary-based approach, Rule-based approach,
Interlingual MT, Transfer-base approach, Statistical MT, Hybrid MT and Neural Machine
Translation (NMT). There has been considerable development in NMT in the last decade. It is
generally developed using recurrent neural networks (RNN) using Long-short Term Memory
(LSTM) or Gate recurrent Units (GRU). Attention-based NMT, transform architecture based NMT
approaches are extensively experimented by researchers for different language pairs and in the
development of multilingual translation language models. In this tutorial, we start with a brief
introduction to the different approaches used in building MT systems and focus more on NMT
approaches, building blocks of NMT, different data preparation approaches and we conclude with a
demonstration on building English to Tamil NMT system using OpenNMT, an open source Neural
Machine Translation framework.
The intended audience for this tutorial are students, professionals working in the area of Language
Technology and enthusiasts working in Language computing.

Date : Saturday 4 December 2021.

Time : 16.30 – 18.30 IST

Tutorial II: TamilBERT: Natural Language Modeling for Tamil

Title – Tamil BERT: Natural Language Modeling for Tamil

Resource person – Ms Vinitra Swamy, ML Ph.D. at EPF-Lausanne, Switzerland
Ms. Vinitra is a Researcher in the area of AI and Deep Learning, currently working on her doctoral thesis work at the Swiss Federal Inst. of Technology at Lausanne, Switzerland. Vinitra studied at the Computer Science Dept. of the Univ. of California, Berkeley, USA where she received her BA and M.S degrees. Before moving to Swiss, she worked at Microsoft AI as a lead engineer for the Open Neural Network eXchange project. Vinitra also has served as a machine learning lecturer for the Berkeley Division of Data Sciences and the University of Washington CSE Department.

Tutorial description

Released by Google AI researchers in late 2018, the BERT model (Bidirectional Encoder Representations from Transformers) has taken the natural language processing (NLP) world by storm. BERT is used in many applications and countless downstream language tasks, from text classification to question-answering to Google’s very own search engine for interpreting user queries. Today, we’ll be harnessing the power of that model on Tamil Wikipedia text, so you can use this in your own Tamil NLP tasks. This workshop will start with an overview of deep learning in NLP, then move onto showcasing BERT’s capabilities and fine-tuning BERT for the Tamil language.

The background knowledge required for this workshop is a base level understanding of Python and an interest in NLP. If you have experience with language processing libraries or neural networks, See you soon !

Date : Saturday 4 December 2021.

Time :18.30 – 19.30

——————————

Tutorial III: “Beginning AI applications for Tamil“

Resource Person: Dr. Muthu Annamalai, California, USA

Dr.Muthu Annamalai, Sr. principal Engineer with SambaNova systems, a California based company specialising in Hardware, integrated systems for AI, Deep learning and big data analytics. Muthu is very active in open source software for Tamil and is the author of Ezil, first programming language in Tamil. Muthu has received MS and PhD degrees from Univ of Texas…

Tutorial will address :-

– shift in software development for AI, ML

– what is ML and how it is being used

– keras overview

– walk thorough notebook for computer vision application with keras

– walk through applications in NLP

– conclusion, next steps

Date : Saturday 4 December 2021.

Time : 19.30 – 20.30