Synthetic Vocal Tracts - A Review

Kandhal Yazhini P; Yuvaraj Babu K

doi:https://doi.org/10.26452/ijrps.v11iSPL3.3526

Synthetic Vocal Tracts - A Review

Kandhal Yazhini P ¹ , Yuvaraj Babu K ✉ ¹

1 Department of Anatomy, Saveetha dental college and hospitals, Saveetha institute of medical and technical sciences, Saveetha University, Chennai, Tamil Nadu, India, +91-9840210597

Abstract

Synthetic vocal tracts are gadgets powered by a computer system capable of translating the brain activity into synthesized speech, by decoding the movements of muscles involved in vocalization, using advanced computer programming. New standardized method for the development of synthetic vocal tract is the 3D vocal tract model with the binary conversion. Few of the typical features considered while creating the synthetic vocal tract are fundamental frequency, perturbation measure, jitter and change in pitch. The tissue engineered larynx is the promising development in synthetic vocal tract treatment in case of patients with vocal fold repair and regeneration. The future of this interesting technology lies in using high speed video endoscopy based synthetic vocal cords. The review was done based on the articles obtained from various platforms. This review article elaborates about the principle of synthetic vocal tract. Quality of the article used was assessed using a quality assessment tool and graded as strong, moderate and weak. The aim of this study is to understand the concept of synthetic vocal tract and its significance. Synthetic vocal tract is a recently established biomedical tool that has come as a boon in treating patients with severe disabilities. Speech synthesis is evolving as a viable solution as more research is being carried out on this. To understand the full significance of this crucial technique, more research has to be carried out on this field of biomedical engineering and viable solutions need to be developed, so that this novel technique is fully utilized.

Keywords

vocal cord disorders, synthetic vocal tracts, voice restoration, Speech synthesis, tissue engineered larynx

Introduction

Synthetic vocal tracts are gadgets powered by a computer system capable of translating the brain activity into synthesized speech, by decoding the movements of muscles involved in vocalization using advanced computer programming. The captured MRI of vocal tract holds a different vowel positions providing the data for one-dimensional area function. The complex shapes of the MRI of different cross-sections while producing different vowel sounds are condensed and merged, to provide the vocal tract shape data for 2D (Mullen, Howard, & Murphy, 2007). A digital waveguide mesh is created from this, using the 3D generated from 2D data. The digital waveguide mesh functions as a laryngeal excitation source and provides the acoustic output.

For analyzing laryngeal sources the Liljencrants-fant(LF) model is commonly used in speech synthesis systems. This is rendered as a mathematical model for adjusting the different voice quality in synthesized vocal tract (Fant, 2013). The typical features considered while creating the synthetic vocal tract are fundamental frequency, perturbation measure, jitter, change in pitch, shimmer, change in amplitude with time, harmonic to noise ratio etc. Some studies used mel-frequency cepstral coefficients’s (MFCC) and derivatives for representing speech signal and statistical speech signal processing systems.

Assessment tools: a multidimensional voice programmer (MDVP) supplies the measurements and gives information related to vocal source (Kay Elemetrics Corp, 1999). Articulatory configuration of the vocal tract is related to the vocal tract articulation in vocal folds. For acoustic discrimination of pathological voice, an investigation of vocal tract characteristics was also considered. The observation of how the shape of vocal tracts of sopranos changes when singing different notes are also taken into account and another important parameter considered is the physiological vowel production (Takemoto, Kitamura, & Adachi, 2006).

Over the past years various research was done by our team was on osteology on the importance of posterior condylar canal (Choudhari & Thenmozhi, 2016), accessory foramens present in middle cranial fossa (Hafeez & Thenmozhi, 2016), clinical importance of styloid process (Kannan & Thenmozhi, 2016), Occurance of foramen of Huschke (Keerthana & Thenmozhi, 2016), morphometric analysis of foramen meningo-orbitale (Pratha & Thenmozhi, 2016), Gerdy’s tubercle in Tibia (Nandhini, Babu, & Mohanraj, 2018), Clinical implication of Occipital emissary formanen (Subashri & Thenmozhi, 2016), stature estimation from facial lengths (Krishna & Babu, 2016), radiation effects of mobile phone on brain (Sriram, Thenmozhi, & Yuvaraj, 2015), use of i-pads vs textbook in education (Thejeswar & Thenmozhi, 2015), on Mi RNA on hypertension (Johnson et al., 2020), microRNA especially on preeclampsia patients (Sekar, Lakshmanan, Mani, & Biruntha, 2019), animal studies (Seppan et al., 2018), and in few other fields like thyroid function and obesity (Menon & Thenmozhi, 2016), and vision impairment in amblyopia (Samuel & Thenmozhi, 2015). There is a lack of much information on the current topic of synthetic tracts hence this review article elaborates about the principle of synthetic vocal tract. The aim of this study is to understand the concept of synthetic vocal tract and its significance.

Methodology

The review was done based on the articles obtained from various platforms like pubmed, pubmed central and google scholar. They were collected with a restriction in time basis from 1970 - 2020. The inclusion criteria were original research papers, articles which contained pros and cons. Exclusion criteria came into account for retracted articles and articles of other languages. All the articles were selected based on synthetic vocal tracts.

They were determined by article title, abstract and complete article. When article holder websites were analyzed on the topic of synthetic vocal tract, more than 100 articles were obtained and then filtered according to the abstract, title and complete article and then reviewed. The key words of this study like vocal cord disorders, treatments, synthetic vocal tracts were utilized to consult the databases. Quality of article used was assessed using a quality assessment tool and graded as strong, moderate and weak Table 1.

Vocal cord

Diseases

Nocturnal stridor is a type of breathing disorder commonly found in patients with multiple system atrophy (MSA). An improved understanding of this breathing disorder is essential since nocturnal stridor carries an increased risk of sudden death. It's essential to get familiarized with the additional features in MSA diagnosis of its treatments. The continuous positive airway pressure and tracheostomy are the prevalent treatments for this (Cortelli et al., 2019). When treating MSA patients with nocturnal stridor, it's essential to evaluate the diagnosis by analysis movements of the vocal cord using the DISE method and also by conducting a polysomnography (Heo, Kim, Lee, & Park, 2020).

Intravascular papillary endothelial hyperplasia is a vascular tumor which causes loss or deterioration of voice. It is a rare case arising out of true vocal cord. Inducible laryngeal obstruction (ILO), is another disorder which is caused by the inappropriate obstruction of the true vocal fold, supraglottic structures is responsible for trigger or stimulus during exercises, exercise induced laryngeal obstruction (EILO). Aggressive laryngeal fibromatosis is a lesion manifestation which occurs in patients with dyspnea and stridor. While treating these cases a total laryngectomy with clear margin is needed to avoid high risk of local recurrence (Khan et al., 2019).

Treatments

The treatment for vocal cord paralysis depends on several factors. It is very important to analyze the cause and symptoms, before advising treatments like voice therapy, surgery or both. Vocal fold nodules and polyps and cyst are diagnosed with ultrasonography techniques, treatment involves laryngeal microsurgery. Benign vocal cord lesion is another vocal cord disorder. Treatment procedures consist of voice therapy and vocal improvement. Paradoxical vocal cord is the motion disorder which can be helped with the motion disorder which can be helped with the synthetic vocal tracts. Negative pressure wound therapy is the common treatment for false vocal perforation and abscess (Makihara et al., 2020).

Synthetic vocal tract

New standardized method for the development of synthetic vocal tract is the 3D vocal tract model with binary conversion (Inohara et al., 2010). In this procedure, the MRI scanning of the vocal cord is employed. The captured data with different positions, while creating the sound, provide the vocal tract shape data- for 2D (Mullen et al., 2007). The steps involved in developing the synthetic vocal tract are generating digital waveguide mesh and synthetic speech generation using the computer algorithm among others.

Various innovative research models

Computer models of vocal tract are extensively used for the study of the vocal ability of Neanderthals. One such model involves the specifications of vocal tract with their area functions. Physiological changes of vocal tract are also explored while constructing the 3D Printing vocal tract model based on MRI. The LH model, which is commonly used for designing the speech synthesis system, is rendered as a mathematical model. This is used for adjusting the different voice qualities during synthesis (Fant, 2013). One other technique employed in designing the synthetic vocal tract is the two mass model. This evaluates the biometrical interrelation between vocal fold masses, stiffness and subglottal pressure.

Current trends

The current voice conversion approach involves Gaussian mixture model (GMM) used for the restoration of speech (Nakamura, Toda, Saruwatari, & Shikano, 2012). Speech synthesis technique is another trending method that is currently employed. This involves trending vocal disability, vocal banking and reconstruction (Yamagishi, Veaux, King, & Renals, 2012). During this period of COVID 19 pandemic the transcervical laryngeal ultrasonography is used in laryngeal evaluation (Noel, Orloff, & Sung, 2020).

Future possibilities

The future promise in the advancement of the synthetic vocal tract is tissue engineered larynx. This has been experimentally tried for cancer patients (Hamilton & Birchall, 2017). Stem cell therapy is another technological breakthrough that can have a big impact in vocal cord treatment. Future also looks bright with the possibility of using high speed video endoscopy based synthetic vocal cord. Propylene glycol gel PRG is a synthetic vocal tract. Methods and technologies for rejuvenation of the human voice restoring the flexibility that the vocal cord loses with disease or age are emerging as a result of the collaboration between scientists and physicians (Miyamaru et al., 2019).

Results and Discussion

Synthetic vocal tracts are computer capable of translating brain activity into synthesized speech by decoding the movement of muscles involved in vocalization using special algorithms, Mullen and Inohara said, the new standardized method of designing a synthetic vocal tract involves binary conversion of 3D and 2D vocal tract models using MRI imaging and condensation to that (Inohara et al., 2010; Mullen et al., 2007). Khan EM have done extensive research on this subject and concur that laryngectomy is the preferred treatment for vocal disorders like vocal cord polyps, cysts, fibromatosis, stridor, abscess etc. In contrast, Makihara et al suggested negative pressure wound therapy as the preferred treatment for vocal cord perforation and abscess (Makihara et al., 2020).

Table 1: Quality of study for article used in review

S.no	Author	Type of study	Results	Quality of study
1	(Cortelli et al., 2019)	Systematic review	Continuous positive airway pressure and tracheostomy are the prevalent treatments for MSA	moderate
2	(Heo et al., 2020)	Case control study	It is essential to evaluate the vocal cord movement when treating MSA patient with nocturnal stridor	moderate
3	(Makihara et al., 2020)	Case series	Negative pressure wound therapy is the common treatment for false vocal perforation and abscess	moderate
4	(Inohara et al., 2010)	research	Synthetic vocal tract is the 3D vocal tract model with binary conversion	weak
5	(Mullen et al., 2007)	systematic review	Vocal tract shape data for 2D data	moderate
6	(Bunton & Story, 2009)	Case control study	Computer model of vocal tract is used to study vocal disability of neanderthals	weak
7	(Fant, 2013)	Research	LF model synthesis different voice qualities	strong
8	(Nakamura et al., 2012)	Systematic review	GMM used for voice conversion and restoration of speech	moderate
9	(Yamagishi et al., 2012)	Case series	Speech synthesis technique used in treating vocal disabilities	moderate
10	(Hamilton & Birchall, 2017)	Case control study	Tissue engineering larynx for cancer patients in future	moderate
11	(Khan et al., 2019)	research	Total laryngectomy with clear margin is need to avoid aggressive laryngeal fibromatosis	moderate
12	(Noel et al., 2020)	Systematic review	Transcervical laryngeal ultrasonography is used in laryngeal evaluation	weak

Yamagishi proposed HMM based speech synthesis that involves text to speech technique for voice banking and voice building for people with disordered speech (Yamagishi et al., 2012). The future of synthetic vocal tract can be suggested that the tissue engineered larynx as the promising development in synthetic vocal tract treatment in case of patients with vocal fold repair and regeneration and reveals that the future of this interesting technology lies in using high speed video endoscopy based synthetic vocal cords. Future advancement in stem cell therapy will have a telling impact on the synthetic vocal tract treatment (Hamilton & Birchall, 2017).

Due to time limitation the study has been restricted to only a few articles out of the 100 articles selected. A larger, deeper research is necessary. A more elaborate study on this subject with the focus on future possibilities in this area, with the fast-phased developments in technology, especially in computers is needed. Synthetic vocal tract and its growing significance in offering viable solutions to patients with vocal disorders have to be studied in depth.

Study of synthetic vocal tract is a nascent technique, with developing technology. We are yet to discover the full potential of this technique and apply it completely. This tool has its significance in a wide array of treatment of voice disorders including cancer, research on this exciting field of biomedical solution is going on all over the world, the physicians and the scientists are working relentlessly to understand its significance and full potential and to exploit it fully, for the treatment of multivariate vocal diseases in future.

Conclusion

Synthetic vocal tract is a recently established biomedical tool that has come as a boon in treating patients with severe disabilities. Speech synthesis is evolving as a viable solution as more research is being carried out on this. To understand the full significance of this crucial technique, more research has to be carried out on this field of biomedical engineering. The viable solutions need to be developed, so that this novel technique is fully utilized.

Conflict of interest

The authors declare that they have no funding support for this study.

Funding support

The authors reported the conflict of interest while performing this study to be nil.

[1] Mullen, Jack, Howard, David M. & Murphy, Damian T. . 2007. Real-Time Dynamic Articulations in the 2-D Waveguide Mesh Vocal Tract Model. IEEE Transactions on Audio, Speech and Language Processing 15(2):577–585.

[2] Fant, G . 2013. Frequency domain analysis of glottal flow: The LF-model revisited. . Speech Production and Language. De Gruyter

[3] Kay Elemetrics Corp 1999. Multi-speech, Model 3700 CSL for Windows, Models 4100, 4300B ; Version 2.3.

[4] Takemoto, Hironori, Kitamura, Tatsuya & Adachi, Seiji . 2006. Changes in vocal tract resonance during a pitch cycle. The Journal of the Acoustical Society of America 120(5):3375.

[5] Choudhari, Sahil & Thenmozhi, M.S. . 2016. Occurrence and Importance of Posterior Condylar Foramen. Research Journal of Pharmacy and Technology 9(8):1083.

[6] Hafeez, Nauma & Thenmozhi, . 2016. Accessory foramen in the middle cranial fossa. Research Journal of Pharmacy and Technology 9(11):1880.

[7] Kannan, Roghith & Thenmozhi, M.S. . 2016. Morphometric Study of Styloid Process and its Clinical Importance on Eagle's Syndrome. Research Journal of Pharmacy and Technology 9(8):1137.

[8] Keerthana, B & Thenmozhi, M.S. . 2016. Occurrence of foramen of huschke and its clinical significance. Research Journal of Pharmacy and Technology 9(11):1835.

[9] Pratha, A. Ashwatha & Thenmozhi, M. S. . 2016. A Study of Occurrence and Morphometric Analysis on Meningo Orbital Foramen. Research Journal of Pharmacy and Technology 9(7):880.

[10] Nandhini, J. S Thaslima, Babu, K. Yuvaraj & Mohanraj, Karthik Ganesh . 2018. Size, Shape, Prominence and Localization of Gerdy's Tubercle in Dry Human Tibial Bones. Research Journal of Pharmacy and Technology 11(8):3604.

[11] Subashri, A. & Thenmozhi, M.S. . 2016. Occipital Emissary Foramina in Human Adult Skull and Their Clinical Implications. Research Journal of Pharmacy and Technology 9(6):716.

[12] Krishna, R. Nivesh & Babu, K. Yuvaraj . 2016. Estimation of stature from physiognomic facial length and morphological facial length. Research Journal of Pharmacy and Technology 9(11):2071.

[13] Sriram, Nirisha, Thenmozhi, & Yuvaraj, Samrithi . 2015. Effects of Mobile Phone Radiation on Brain: A questionnaire based study. Research Journal of Pharmacy and Technology 8(7):867.

[14] Thejeswar, E. P & Thenmozhi, M. S. . 2015. Educational Research-iPad System vs Textbook System. Research Journal of Pharmacy and Technology 8(8):1158.

[15] Johnson, Jayapriya, Lakshmanan, Ganesh, M, Biruntha, R.M, Vidhyavathi, Kalimuthu, Kohila & Sekar, Durairaj . 2020. Computational identification of MiRNA-7110 from pulmonary arterial hypertension (PAH) ESTs: a new microRNA that links diabetes and PAH. Hypertension Research 43(4):360–362.

[16] Sekar, Durairaj, Lakshmanan, Ganesh, Mani, Panagal & Biruntha, M. . 2019. Methylation-dependent circulating microRNA 510 in preeclampsia patients. Hypertension Research 42(10):1647–1648.

[17] Seppan, Prakash, Muhammed, Ibrahim, Mohanraj, Karthik Ganesh, Lakshmanan, Ganesh, Premavathy, Dinesh, Muthu, Sakthi Jothi, Wungpam Shimray, Khayinmi & Sathyanathan, Sathya Bharathy . 2018. Therapeutic potential of Mucuna pruriens (Linn.) on ageing induced damage in dorsal nerve of the penis and its implication on erectile function: an experimental study using albino rats. The Aging Male 1–14.

[18] Menon, Aniruddh & Thenmozhi, M.S. . 2016. Correlation between thyroid function and obesity. Research Journal of Pharmacy and Technology 9(10):1568.

[19] Samuel, Ashika Rachael & Thenmozhi, M. S. . 2015. Study of impaired vision due to Amblyopia. Research Journal of Pharmacy and Technology 8(7):912.

[20] Cortelli, Pietro, Calandra-Buonaura, Giovanna, Benarroch, Eduardo E., Giannini, Giulia, Iranzo, Alex, Low, Phillip A., Martinelli, Paolo, Provini, Federica & Quinn, Niall . 2019. Stridor in multiple system atrophy. Neurology 93(14):630–639.

[21] Heo, S J, Kim, J S, Lee, B J & Park, D . 2020. Isolated stridor without any other sleeping breathing disorder diagnosed using drug-induced sleep endoscopy in a patient with multiple system atrophy: A case report. Medicine .

[22] Khan, Nayel, Clemens, Mark, Liu, Jun, Garden, Adam S., Lawyer, Anne, Weber, Randal, Gunn, G. Brandon, Morrison, William H. & Kupferman, Michael E. . 2019. The role of salvage surgery with interstitial brachytherapy for the Management of Regionally Recurrent Head and Neck Cancers. Cancers of the Head and Neck 4(1).

[23] Makihara, Seiichiro, Kariya, Shin, Naito, Tomoyuki, Uraguchi, Kensuke, Matsumoto, Junya, Noda, Yohei, Okano, Mitsuhiro & Nishizaki, Kazunori . 2020. False vocal cord perforation with abscess treated by negative pressure wound therapy. SAGE Open Medical Case Reports 8:2050313X2091541.

[24] Inohara, Ken, Sumita, Yuka I., Ohbayashi, Naoto, Ino, Shuichi, Kurabayashi, Tohru, Ifukube, Tohru & Taniguchi, Hisashi . 2010. Standardization of Thresholding for Binary Conversion of Vocal Tract Modeling in Computed Tomography. Journal of Voice 24(4):503–509.

[25] Nakamura, Keigo, Toda, Tomoki, Saruwatari, Hiroshi & Shikano, Kiyohiro . 2012. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication 54(1):134–146.

[26] Yamagishi, Junichi, Veaux, Christophe, King, Simon & Renals, Steve . 2012. Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology 33(1):1–5.

[27] Noel, Julia E., Orloff, Lisa A. & Sung, Kwang . 2020. Laryngeal Evaluation during the COVID-19 Pandemic: Transcervical Laryngeal Ultrasonography. Otolaryngology–Head and Neck Surgery 163(1):51–53.

[28] Hamilton, Nick J. I. & Birchall, Martin A. . 2017. Tissue-Engineered Larynx: Future Applications in Laryngeal Cancer. Current Otorhinolaryngology Reports 5(1):42–48.

[29] Miyamaru, Satoru, Kumai, Yoshihiko, Murakami, Daizo, Kodama, Narihiro, Miyamoto, Takumi, Yumoto, Eiji & Orita, Yorihisa . 2019. Phonatory function in patients with well-differentiated thyroid carcinoma following meticulous resection of tumors adhering to the recurrent laryngeal nerve. International Journal of Clinical Oncology 24(12):1536–1542.

[30] Bunton, Kate & Story, Brad H. . 2009. Identification of synthetic vowels based on selected vocal tract area functions. The Journal of the Acoustical Society of America 125(1):19–22.