The Multimedia Analytics Laboratory was set up in the department of Electronics and Electrical Engineering (EEE), Indian Institute of Technology (IIT) Guwahati during July, 2013. The laboratoty focuses on the research and development activities related to video, speech and text analytics and applications of computer vision in graphics and robotics.Sponsored Projects
Project Title: Multi-Modal Broadcast Analytics – Structured Evidence Visualization for Events of Security Concern
Funding Agency: DIT New Delhi.
PI: Dr. Prithwijit Guha
Co-PI: Dr.Sanasam Ranbir Singh
CI: Prof. S. R. M. Prasanna, Prof. S. Nandi
Duration: 2013 to 2016
Multi-Modal Broadcast Analytics
Multi-modal contents from different broadcasting authorities present news events and related opinions from varying perspectives. Information content of events with varying levels of sensitization and their capabilities of triggering further events are of paramount importance to national security strategists, content monitoring agencies and media analysts. For example, “Mumbai Attack” was widely reported across all news channels and websites. This event also triggered other news events like “Honorary Awards and Recognition for Martyr’s Families”, “Terror Tourism”, “Home Ministry Reshuffle”, “Kasab Trial Case” etc. These events are linked through common keywords or key phrases like “Terrorist”, “Kasab”, “Martyr”, “Hemant Karkare”, “ATS” etc. Audio-visual descriptions, interviews and debates were also broadcast on television channels. Video sequences of news reporting or debates also had commercial breaks and logo animations. With the availability of such huge amount of news on this topic (along with other news), how can we automatically obtain a consolidated report on “Mumbai Attack” (while rejecting others) and the development of related events with chronologically ordered news articles and videos? Or, can we be alerted with the news instantiations containing keywords like “Terrorist”, "Attack" or "Bomb Blast"? A necessity of such functionalities motivates us to propose a multi-modal analytics platform for indexing and querying broadcast media contents.
We propose to develop a prototype system capable of analyzing, indexing and querying multi-modal information, cross-link keywords with meaningful audio-video segments (e.g. removing commercials, logo animations etc.) and relate news events along a timeline. This will provide us with a structured organization of the multi-modal broadcast data along with their inter-relations. The availability of such a structured knowledge base will enable us to generate reports in response to queries with specific news events or keywords. Thus, the scope of the work involves the development of an analytical framework to extract structured information from unstructured multi-modal data through video analytics, text mining and speech processing.
The task of Speech/Audio analytics involves identifying different segments in broadcast audio like pure speech, speech with background music and pure music and further classifying them into their respective classes. Pure speech generally contains a lot of information regarding a particular event. Hence , a speaker independent speech-to-text transcription is a necessary first step to extract keywords or event tags. The presence of a particular keyword in a speech segment relating to a particular event may be obtained by performing keyword spotting in the continuous speech segment. The textual information, so extracted will be a useful source for text mining and video analytics in a multi-modal analytical framework.
Multiple Object Tracking
Faculty Incharge: Dr. Prithwijit Guha
Associate Faculty Incharge: Dr. Suresh Sundaram
Staff Incharge: -