The Centre was founded in the year 2014. The structure of the centre is interdisciplinary, diverse projects involving various streams of
specialization will contribute towards building a knowledge base and developing technologies for
the
NE languages in particular. Some collaborators in center are already in possession of sponsored
projects that are run in an interdisciplinary manner. For example, the DeITy sponsored project
aimed
at archiving and preserving speech data has interdisciplinary research interests.
The centre aims at collaboration among different streams namely, computer science, linguistics,
optical character recognition, handwriting recognition, speech technology and typography.
In the domain of computer science, the focus of the centre will be natural language processing
and
development of language resources for major languages of NE. The resources will include synset
dictionary, POS tagged data, Parallel Corpora and Transliteration data. In addition, the center
will
also work on creating parallel resources in some major Indian languages. The center will focus
its
attention on crowd-based model for creating language resources. Apart from that, data will also
be
sourced from the web. This part of the project aims at creating tools for NE languages such as
Morphological Analyzers/ Synthesizers, Automatic POS tagger, NE Transliteration and Unknown Word
Transliteration Module, Translation Memory Tool for Human editing etc.
In the domain of linguistics, the center will focus at the languages spoken in the NE region and
explore the languages through analysis and experiments. The linguists in the center will aim at
building an archive for the speech and text resources that will aid phonology, phonetics,
syntactic
processing and technology development areas as also for the proper understanding of these
languages
by linguists, speech technologists, NLP specialists for further research and development. This
center aims at creating speech corpora of languages spoken in the North-East India primarily
because
there is not much work done in these languages. As mentioned before, most of the languages that
belong to the NE area are Tibeto-Burman languages and many are Austro-Asiatic languages. That
set
them apart from the Indo-European languages. Hence, from typological and technology development
perspective, these languages will pose a completely different set of challenges that this center
will try to address. The center will also archive grammars, linguistic texts etc. on the
languages
of NE and make them available to the local and global communities.
The center will be built in the area of language and primary functions of the center will
include
preservation and archiving of minority languages spoken in India, speech and written text
analysis
of the languages and technology development in Indian languages with special emphasis on NE
Indian
languages.
As part of the goal to archive and preserve the local languages, the center will aim at creating
speech and text databases for the languages spoken in the North East, specifically the minority
languages.
The center will be responsible for disseminating information and technology developed in the center at both local and global levels. At the local level, the center will be focused in empowering the local communities in the NE region by making the center’s knowledge and experience available to the communities. At the same time the center will be a nodal center that will involve and mentor other institutions in the area in the field of research and development especially in language analysis and related technology development.
At the global level, the center envisions itself as a storehouse of information regarding the languages of the NE region. As most of the languages spoken in this area belong to the Tibeto-Burman family, information provided about the NE languages will help global understanding of the languages of this family. The speech and text database created by the center surely be of global interest. Hence, the center envisions disseminating information to the global community through the Internet.