CLST |

Interdisciplinary

Established in 2014, the Centre for Linguistic Science and Technology (CLST) was founded as an interdisciplinary research hub dedicated to advancing language science and technology, with particular emphasis on the languages of North-East India. From its inception, the Centre has sought to build a strong and sustainable knowledge ecosystem for under-resourced and typologically diverse languages through collaborative, cross-disciplinary research.

CLST brings together faculty and researchers from Computer Science and Engineering, Electronics and Electrical Engineering, Mechanical and Civil Engineering, Design, Biosciences and Bioengineering, Humanities & Social Sciences, and the Mehta Family School of Data Science and Artificial Intelligence. This integrative framework enables the Centre to approach language as a holistic domain, bridging theoretical linguistics, computational modeling, cognitive science, signal processing, artificial intelligence, and human-centred design, thereby positioning the Centre at the intersection of linguistic science and emerging AI-driven language systems.

Interdisciplinary collaboration is central to the Centre’s academic and research activities. Research projects supported by government and industry, including initiatives focused on data archiving and preservation, demonstrate the Centre’s commitment to integrating linguistic analysis with computational techniques, signal processing, and data-driven methodologies. These efforts aim to create enduring digital infrastructures that support both research and long-term language preservation.

The Centre advances research across a broad spectrum of areas, including speech and language technologies, multilingual text and computational script analysis, cognitive and neural studies of language, multimodal vision-language systems, immersive and interactive interfaces, and computational modeling of phonology and speech. A significant thrust of its work involves the development of high-quality linguistic resources such as annotated corpora, lexical databases, parallel corpora, transliteration datasets, and speech repositories. Many of these resources are developed through community-driven and web-based data collection models, ensuring both scalability and local engagement.

The linguistic landscape of North-East India is marked by rich diversity, with many languages belonging to the Tibeto-Burman and Austro-Asiatic families. These languages exhibit structural properties that differ substantially from Indo-European languages, presenting distinctive challenges for linguistic theory, documentation, and computational modeling. CLST addresses these challenges through systematic language documentation, corpus creation, development of language processing tools, and archival of grammars and linguistic materials. A key objective of the Centre is to make these resources accessible to local communities and the wider research community, thereby advancing language preservation, scholarship, and inclusive language technologies.

Minority languages of India

The Centre is dedicated to the scientific study and technological advancement of minority languages spoken in India, with particular emphasis on the languages of North-East India. Its work includes systematic language documentation, linguistic analysis of speech and text, and the development of computational tools tailored to under-resourced and typologically diverse languages.

A central component of this effort is the creation of structured speech and text corpora that support empirical linguistic research and the development of data-driven language technologies.

Empowerment of NE local communities

The Centre seeks to strengthen regional research capacity and technological self-reliance in the North-East. The Centre is committed to engaging with local communities, scholars, and institutions through collaborative projects, training initiatives, and mentorship in areas such as language documentation, computational analysis, and technology development.

By serving as a nodal institution for research coordination and skill development, the Centre fosters an ecosystem that enables sustainable growth in language science and technology within the region.

Storehouse of information about NE languages

The Centre aspires to serve as a scholarly reference point for the study of North-East Indian languages within the global research community. Given the prominence of Tibeto-Burman and Austro-Asiatic language families in the region, systematic research and analysis contribute to broader theoretical and comparative linguistic inquiry.

Through sustained research output and academic collaboration, the Centre aims to advance international understanding of structurally diverse and under-documented languages.

In this section

Interdisciplinary
Minority languages of India
Empowerment of NE local communities
Storehouse of information about NE languages

CLST - IIT GuwahatiAbout Us

Interdisciplinary

Minority languages of India

Empowerment of NE local communities

Storehouse of information about NE languages

In this section