Speech and Music Processing

Our research area is mainly focused on Speech processing, Identification of Indian languages, Audio Visual Speech Recognition and Music Processing.

Language Identification :-

We have mainly focused on identification of Indian spoken languages especially prevalent in Northeastern India. Here various existing statistical approaches, deep learning algorithms were compared and new algorithms were proposed for identification of Indian languages. It has been proved from our experiments that our proposed hybrid approach is better than existing techniques. We had created and worked on our own recorded database. The research work was published in various IEEE conferences in India and abroad. Our work was also published in various international Journals like IJST, Springer (regular issue), Inderscience and other renowned SCI/SCIE/Scopus Journals. One student, Mr. Himanish Shekar Das has already been awarded PhD degree under the guidance of Dr. Pinki Roy in this field and it was an extension her PhD work. This work will be extended further by other PhD scholars in future.

Dr. Pinki Roy received “Young Scientist Award”, Venus International foundation, Chennai 2015 and it was Awarded for her major contribution in field of language identification of Indian languages during PhD. Her Award Profile available in the link  http://viraw.info/2015/ra15/Pinki.html.The title of her Ph.D Thesis was “Automatic Identification of Indian languages based on various statistical approaches” and she had worked under the able guidance of Dr. Pradip kumar Das, Professor, IIT-Guwahati, Assam.

Audio Visual Speech Recognition:-

At present cyber security of the sensitive applications is great challenge for computer scientists. The ATMs,credit cards,debit cards and all online transactions are based on PIN number which is just a combination of digits and along with that OTP is generated for higher security.Both the techniques have high risk of intrusion and nowadays cases of hacking are increasing day by day. Presently we are moving towards digitization of money and we depend mostly upon e-currency. Security of sensitive applications is a need of the hour and for improved authentication we had given focus on Audio video recognition of speech. Here along with audio, video of the person is also taken into account for better authentication. Work on Audio visual recognition of Speech based on the digits database is already being done by my PhD Scholar. To list a few, this work was published in SCI/Scopus journals as well as in worlds top and prestigious conferences like TENCON 2019,ICCMS’2018-sydney Australia. This work was appreciated by scientific community of the world and have been considered as major contribution in the field of audio video digit recognition. The list of publications are listed below for reference. Saswati Debnath has already been awarded PhD degree under the guidance of Dr.Pinki Roy in this field. This work is at present continued further by another PhD scholar, Ms Arpita choudhry.

Music Processing

Speech and music are two prominent research areas in the domain of audio signal processing. With recent advancements in speech and music technology, the area has grown manifolds, bringing together the interdisciplinary researchers in computer science, musicology and speech analysis. The language we speak propagates as sound wave through various media and allow communication or entertainment for us, humans. The music we hear or create can be perceived in different aspects as rhythm, melody, harmony, timbre, or mood. The multifaceted nature of speech or music information requires algorithms, systems using sophisticated signal processing, machine learning and deep learning techniques to better extract useful information. Mr. Ranjeet Jha, PhD Scholar  is currently working in this field under the able guidance of Dr. Anupam Biswas and Dr. Pinki Roy.

Other fields of interest in speech and music domain are as follows:

  • Speech acquisition/Melody extraction
  • Speaker verification/Singer identification
  • Audio-visual speech recognition & analysis
  • Speech Processing
  • Digit Recognition
  • Automated speech/music transcription
  • Language Identification of Indian Languages
  • Speech/music classification
  • Music information retrieval
  • Indian classical/Bollywood music

Faculty Members:

Current Students:

  • Yeshwant Singh, JRF and Pursuing PhD at NIT Silchar
  • Ranjeet Kumar, Pursuing PhD at NIT Silchar
  • Arpita Choudhury, Pursuing PhD at NIT Silchar

Passed Students:

  • Saswati Debnath, PhD
  • Himanish Shekar Das,PhD
  • Nilima Ahmed,MTech
  • Apoorv Singh, BTech
  • Dhananjay Mishra, BTech
  • Dipjul Rahman, BTech
  • Bishal Kahar, BTech
  • SNG Mowarngam, BTech

 Sponsored Research Projects:

SL No Project Title Principal Investigator Co-Investigator Amount Sponsor Status
1. Leveraging Machine Learning and Soft Computing Techniques to Investigate Raag Formation in Indian Classical Music Dr. Anupam Biswas N/A Rs. 24.68 Lakhs SERB-DST Ongoing
2. Development of speech based multi-level person authentication system Dr. Ujwala Baruah 57.93 Lakhs DEITY Completed


  • Dr. Anupam Biswas, Prof. Emile Wennekes, Prof. Tzung-Pei Hong, and Dr. Alicja Wieczorkowska, “Advances in Speech and Music Technology – Proceedings of FRSM 2020”, in “Advances in Intelligent Systems and Computing (AISC Series)”, Springer Nature, 2021. (Link https://www.springer.com/gp/book/9789813368804)
  • Dr. Anupam Biswas, Prof. Emile Wennekes, Dr. Alicja Wieczorkowska, and Dr. Rabul Hussain Laskar, “Advances in Speech and Music Technology – Proceedings of FRSM 2020”, in “Signal and Communication Technology Series”, Springer Nature, 2022.(Ongoing, CFP Link https://easychair.org/cfp/asmt2021)


  1.  Yeshwant Singh and Anupam Biswas, “Swaragram based Residual Neural Architecture for Raag Identification in Indian Classical Music”, in 12th International Conference on Computing, Communication and Networking Technologies (ICCCNT 2021), 6-8 July 2021, Kharagpur, India.
  2. Yeshwant Singh, Ranjeet Kumar and Anupam Biswas, “Swaragram: Shruti based Chromagram for Indian Classical Music”, in 25th International Symposium in Frontiers of Research on Speech and Music (FRSM 2020), 8-9 Oct. 2020, Silchar, India.
  3. Anupam Biswas, Apoorv Singh and Sayandeep Roy, “Note Repositioning Algorithm for Musical Instruments in Indian Classical Music”, in 24th International Symposium in Frontiers of Research on Speech and Music (FRSM 2019), 6-7 Jul. 2019, Kanpur, India.
  4. Ranjeet Kumar, Anupam Biswas and Pinky Roy, “Melody Extraction from Polyphonic Music using Deep Neural Network: A Literature Survey”, in 24th International Symposium in Frontiers of Research on Speech and Music (FRSM 2019), 6-7 Jul. 2019, Kanpur, India.
  5. Yeshwant Singh and Anupam Biswas, “Computational Approaches for Indian Classical Music: A Comprehensive Review”, in “Advances in Speech and Music Technology: Computational Aspects and Applications” in Book series “Signal and Communication Technology”, Springer, 2021. (Accepted)
  6. Yeshwant Singh, Anupam Biswas, Angshuman Bora, Debashish Malakar, Subham Chakraborty, and Suman Bera, “Design Perspectives of Multitask Deep Learning Models and Applications”, in “Applications based Understanding of Machine and Deep Learning Algorithms for Signal and Image Processing” , Wiley-IEEE Press, 2021. (Accepted)
  7. Ranjeet Kumar, Anupam Biswas and Pinky Roy, “Melody Extraction from Music: A Comprehensive Study”, in Applications of Machine Learning, pp. 141-155. Springer, Singapore, 2020.
  8. Saswati Debnath, Pinki Roy(2021),“Audio-Visual Automatic Speech Recognition using PZM, MFCC and Statistical analysis”, International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI),2021.(Indexed-SCIEIF: 3.137) (Accepted).
  9. Saswati Debnath, Pinki Roy(2020), “User authentication system based on Speech and cascade hybrid Facial feature ” International Journal of Image and Graphics, World Scientific ,Vol.20,Issue no.3, 2020 (Indexed-SCOPUS+ESCI).
  10. Saswati Debnath,  Pinki  Roy(2020),“Appearance  and  shape  based  hybrid  visual feature:  Towards audio-visual automatic speech recognition, Journal of Signal, Image and Video Processing, Springer,2020.(Indexed-SCIE, IF = 5.17)
  11. Saswati Debnath, Pinki Roy(2019), “Study of speech enabled healthcare technologyInternational Journal of Medical Engineering and InformaticsInderscience Publishers, ISSN online:1755-0661,2019.(Indexed-SCOPUS)
  12. Saswati Debnath and Pinki Roy(2018). “Audio-Visual Speech Recognition based on Machine Learning approach International Journal of Advanced Intelligence Paradigms (IJAIP)Inderscience, (SCOPUS+ESCI Indexed) (In press).
  13.  Saswati Debnath,  Pinki  Roy(2021),“ Study of different feature extraction method for visual speech recognition”, 2021 International Conference of Computer Communication and Informatics (ICCCI-2021), January 27-29, 2021Coimbatore, INDIA(Accepted)
  14. Saswati Debnath, Pinki Roy(2019),“Multi-modal authentication system based on audio-visual data”, IEEE TENCON 2019Kerala, IndiaDOI: 1109/TENCON.2019.8929592.
  15.  Saswati Debnath, Pinki Roy(2018) Speaker Independent Isolated Word Recognition based on ANOVA and IFS” 10th International Conference on Computer Modeling and Simulation(ICCMS-2018), 8-10 JanuarySydney, Australia. International Conference Proceedings Series by ACM(ISBN: 978-1-4503-6339-6), index by Ei Compendex,Scopus,Thomson Reuters Conference Proceedings Citation Index(ISI Web of Science).
  16. Saswati Debnath, Pinki Roy(2018), “Automatic Speech Recognition based on Clustering technique”, Published in Advances in Intelligent Systems and Computing (AISC Series Springer), The first International Conference on Emerging Technology in Modeling and Graphics (IEMGraph’18), 6th to 7th September, 2018 , Kolkata, India.
  17. Saswati Debnath, Pinki Roy, Akash Gupta and Avinash Gurjar(2018), “Automatic Speech Recognition based on Clustering technique”, Advances in Intelligent Systems and Computing (AISC Series Springer) The First International Conference on Emerging Technology in Modelling and Graphics (IEMGraph’18), 06-07 September 2018, Kolkata, India (In Press).
  18. Saswati Debnath, Pinki Roy(2017), “Isolated Word Recognition based on Different Statistical Analysis and Feature Selection Technique” International Conference on Cognitive Informatics & Soft Computing (CISC-2017), Advances in Intelligent Systems and Computing (AISC Series Springer). Indexed By:ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink. (p 463-473).
  19. Saswati Debnath, and Pinki Roy(2017), “Isolated Word Recognition based on Difference Statistical and Feature Selection Technique”, Advances in Intelligent Systems and Computing (AISC Series Springer) International Conference on Cognitive Informatics & Soft Computing (CISC-2017), 20-21 December 2017,Hyderabad, India.
  20. Das, H. S., & Roy, P. (2021). A CNN-BiLSTM based hybrid model for Indian language identification. Journal of Applied Acoustics, 182, 108274. (Accepted, Elsevier, SCIE, IF 2.639, Will be published in November 2021)
  21. Das, H. S., & Roy, P. (2020). “Bottleneck feature-based hybrid deep autoencoder approach for Indian language identification.” Arabian Journal for Science and Engineering,” 45(4), 3425-3436.(Springer, SCIE, IF 2.334)
  22. Das, H. S., & Roy, P. (2019). “Optimal prosodic feature extraction and classification in parametric excitation source information for Indian language identification using neural network based Q-learning algorithm.” International Journal of Speech Technology, 22(1), 67-77(Springer, ESCI, Scopus)
  23. Pinki Roy, Pradip K. Das(2013), “Comparison of VQ and GMM approach for Identifying Indian Languages”, International Journal of Applied Pattern Recognition, IJAPR, Inderscience Publisher(ESCI Indexed).
  24. Pinki Roy, Pradip K. Das(2012), “A Hybrid VQ-GMM Approach for Identifying Indian Languages“, Springer, International Journal of Speech Technology,Springer, June’2012, Vol.15, Issue-2, DOI: 10.1007/s10772-012-9152-6(Scopus Indexed).
  25.  Pinki Roy, Pradip K. Das(2011), “Language Identification of Indian Languages based on Gaussian Mixture Model”, International Journal of Wisdom based Computing, Volume 1(3), December 2011, ISSN 2231-4857, pp. 54-59.
  26. Das, H. S., & Roy, P. (2021). Impact of Visual Representation of Audio Signals for Indian Language Identification. In: 25th International Symposium on Frontiers of Research in Speech and Music (FRSM), 2020In: Biswas A., Wennekes E., Hong TP., Wieczorkowska A. (eds) Advances in Speech and Music Technology. Advances in Intelligent Systems and Computing, vol 1320Springer, Singapore
  27. Pinki Roy, Nilima Ahmed, Saswati Debnath(2017) “Facial feature based authentication system from video stream” 58th International Conference on Best Researches, International Organization of Scientific Research and Development (IOSRD) 2017 Bangalore, India.
  28. Nilima Ahmed, Pinki Roy(2017), “A Review Study of Digit Recognition System”, IEEE International Conference on NextGen Electronic Technologies: Silicon to Software. March 23rd to 25th 2017,VIT university vellore, Chennai.
  29. Nilima Ahmed, Pinki Roy(2016), “A secured and authenticated online banking transaction system”, IEEE 1st International Conference on Accessibility to Digital World(16 – 18 December, 2016)“Work In Progress Forum, Poster presentation.
  30.  Pinki Roy, Pradip K. Das, Sandeep Kumar Gupta(2014), “Comparison of SVMs and NNs approach for automatic identification of Indian languages”, 1st International Science & Technology Congress,IEMCONGRESS-2014, August 28-31,2014,Science city, Kolkata, India, pp. 39-44, ISBN:978-93-5107-248-5.
  31. Pinki Roy, Pradip K. Das(2011), “Automatic Language Identification of three Indian Languages using Vector Quantization”, 4th International Conference on Computer and Electrical Engineering, ICCEE’2011, October 14-16, 2011,Singapore,pp.293-297,ISBN: 978079185984.
  32. Pinki Roy(2010), “Language Recognition of Three Indian Languages based on clustering and supervised Learning”, International Conference on Computer Applications 2010, ICCA’2010, 24-27 December 2010, Pondicherry, India, pp. 77-82, doi: 10.3850/978-981-08-7302-8_1287.
  33. Pinki Roy, Pradip K. Das(2010), “Review of Language Identification Techniques”, 2010 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC’2010, December 28-29, 2010, Coimbatore, India, pp.1-4, ISBN: 978-1-4244-5965-0, DOI: 10.1109/ICCIC.2010.5705780.
  34. Das, H. S., & Roy, P. (2019). A deep dive into deep learning techniques for solving spoken language identification problems. In Intelligent Speech Signal Processing (Elsevier, pp. 81-100).