Text-to-Speech Synthesis for Hindi Language Using MFCC and LPC Feature Extraction Techniques
DOI:
https://doi.org/10.46947/joaasr632024943Keywords:
Devanagari; Text-to-Speech (TTS); Hindi language; DCT; MFCC; LPCAbstract
India is a large country with over a billion populations who speak numerous languages. 43% of Indians speak Devanagari Hindi script, followed by Bengali, Telugu, Marathi, and other languages. The widespread generation of content and accessibility would therefore greatly benefit from text-to-speech systems for such languages. In this research work we improve the already available Text-to-Speech (TTS) system using advance preprocessing techniques to the Hindi corpus database and applied various feature extraction techniques for better result. Finally we got the accuracy as 98% using MFCC and LPC feature extraction techniques. The developed model is capable for getting the input from audio file and read it loudly using developed TTS system.
Metrics
References
Narasimhan B, Sproat R, Kiraz G. Schwa-deletion in Hindi text-to-speech synthesis. International Journal of Speech Technology. 2004 Oct;7(4):319-33. DOI: https://doi.org/10.1023/B:IJST.0000037075.71599.62
Aparna NS, Shreekanth T. Text to speech synthesis of Hindi language using polysyllable units. Int J Comput Appl. 2015;975:8887.
Lee J, Othman RM, Mohamad NZ. Syllable-based Malay word stemmer. In2013 IEEE Symposium on Computers & Informatics (ISCI) 2013 Apr 7 (pp. 7-11). IEEE. DOI: https://doi.org/10.1109/ISCI.2013.6612366
Krishna NS, Murthy HA. Duration modeling of Indian languages Hindi and Telugu. InFifth ISCA Workshop on Speech Synthesis 2004.
Anumanchipalli G, Chitturi R, Joshi S, Kumar R, Singh SP, Sitaram RN, Kishore SP. Development of Indian language speech databases for large vocabulary speech recognition systems. InProc. SPECOM 2005 Oct.
Bhatt S, Jain A, Dev A. Syllable based Hindi speech recognition. Journal of Information and Optimization Sciences. 2020 Aug 17;41(6):1333-51. DOI: https://doi.org/10.1080/02522667.2020.1809091
Sitaram S, Black AW. Speech synthesis of code-mixed text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) 2016 May (pp. 3422-3428).
Shakil SS, Anil MC. Cognitive Devanagari (Marathi) text-to-speech system. In2015 International Conference on Computing Communication Control and Automation 2015 Feb 26 (pp. 758-762). IEEE. DOI: https://doi.org/10.1109/ICCUBEA.2015.151
Published by: Deutsche Gesellschaft fürInternationale Zusammenarbeit (GIZ) GmbHRegistered, “A Study on Open Voice Data in Indian Languages Implemented”, 2020.
Sproat R, Black AW, Chen S, Kumar S, Ostendorf M, Richards CD. Normalization of non-standard words. Computer speech & language. 2001 Jul 1;15(3):287-333. DOI: https://doi.org/10.1006/csla.2001.0169
Pennell DL, Liu Y. Normalization of text messages for text-to-speech. In2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010 Mar 14 (pp. 4842-4845). IEEE. DOI: https://doi.org/10.1109/ICASSP.2010.5495127
Srivastava N, Mukhopadhyay R, Prajwal KR, Jawahar CV. Indicspeech: text-to-speech corpus for Indian languages. In Proceedings of the 12th language resources and evaluation conference 2020 May (pp. 6417-6422).
Anil MC, Shirbahadurkar SD, Shakil SS. A mapper and combiner based Marathi text to speech synthesis using English TTS Engine. In2015 Annual IEEE India Conference (INDICON) 2015 Dec 17 (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/INDICON.2015.7443570
Colaco J, Borkar S. Design and implementation of Konkani text to speech generation system using OCR technique. Imper. J. Interdiscip. Res. 2016 Sep.
Kannojia S, Singh G, Mathur S. A text to speech synthesizer using acoustic unit based concatenation for any Indian language of Devanagari script. In2016 11th International Conference on Industrial and Information Systems (ICIIS) 2016 Dec 3 (pp. 759-763). IEEE. DOI: https://doi.org/10.1109/ICIINFS.2016.8263040
Sawant NK, Borkar S. Devanagari printed text to speech conversion using OCR. In2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), 2018 2nd International Conference on 2018 Aug 30 (pp. 504-507). IEEE. DOI: https://doi.org/10.1109/I-SMAC.2018.8653685
Magdum D, Dubey MS, Patil T, Shah R, Belhe S, Kulkarni M. Methodology for designing and creating Hindi speech corpus. In2015 International Conference on Signal Processing and Communication Engineering Systems 2015 Jan 2 (pp. 336-339). IEEE. DOI: https://doi.org/10.1109/SPACES.2015.7058279
He F, Chu SH, Kjartansson O, Rivera CE, Katanova A, Gutkin A, Demirsahin I, Johny CC, Jansche M, Sarin S, Pipatsrisawat K. Open-source multi-speaker speech corpora for building gujarati, kannada, malayalam, marathi, tamil and telugu speech synthesis systems.
Ramani B, Christina SL, Rachel GA, Solomi VS, Nandwana MK, Prakash A, Shanmugam SA, Krishnan R, Prahalad SK, Samudravijaya K, Vijayalakshmi P. A common attribute based unified HTS framework for speech synthesis in Indian languages. InEighth ISCA Workshop on Speech Synthesis 2013.
Shreekanth T, Udayashankara V, Chandrika M. Duration modelling using neural networks for Hindi TTS system considering position of syllable in a word. Procedia Computer Science. 2015 Jan 1;46:60-7. DOI: https://doi.org/10.1016/j.procs.2015.01.056
Ramli I, Jamil N, Seman N, Ardi N. An improved syllabification for a better Malay language text-to-speech synthesis (TTS). Procedia Computer Science. 2015 Jan 1;76:417-24. DOI: https://doi.org/10.1016/j.procs.2015.12.280
Narang S, Jindal MK, Kumar M. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā. 2019 Jun;44:1-8. DOI: https://doi.org/10.1007/s12046-019-1126-9
Ganapathy S. Multivariate autoregressive spectrogram modeling for noisy speech recognition. IEEE signal processing letters. 2017 Jul 11;24(9):1373-7. DOI: https://doi.org/10.1109/LSP.2017.2724561
Patil UG, Shirbahadurkar SD, Paithane AN. Automatic Speech Recognition of isolated words in Hindi language using MFCC. In2016 International Conference on Computing, Analytics and Security Trends (CAST) 2016 Dec 19 (pp. 433-438). IEEE. DOI: https://doi.org/10.1109/CAST.2016.7915008
Serrano JC, Papakyriakopoulos O, Hegelich S. NLP-based feature extraction for the detection of COVID-19 misinformation videos on YouTube. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020 2020 Jul.
Gupta K, Gupta D. An analysis on LPC, RASTA and MFCC techniques in Automatic Speech recognition system. In2016 6th international conference-cloud system and big data engineering (confluence) 2016 Jan 14 (pp. 493-497). IEEE. DOI: https://doi.org/10.1109/CONFLUENCE.2016.7508170
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.