Synthetic Vocal Tracts - A Review
Abstract
Synthetic vocal tracts are gadgets powered by a computer system capable of translating the brain activity into synthesized speech, by decoding the movements of muscles involved in vocalization, using advanced computer programming. New standardized method for the development of synthetic vocal tract is the 3D vocal tract model with the binary conversion. Few of the typical features considered while creating the synthetic vocal tract are fundamental frequency, perturbation measure, jitter and change in pitch. The tissue engineered larynx is the promising development in synthetic vocal tract treatment in case of patients with vocal fold repair and regeneration. The future of this interesting technology lies in using high speed video endoscopy based synthetic vocal cords. The review was done based on the articles obtained from various platforms. This review article elaborates about the principle of synthetic vocal tract. Quality of the article used was assessed using a quality assessment tool and graded as strong, moderate and weak. The aim of this study is to understand the concept of synthetic vocal tract and its significance. Synthetic vocal tract is a recently established biomedical tool that has come as a boon in treating patients with severe disabilities. Speech synthesis is evolving as a viable solution as more research is being carried out on this. To understand the full significance of this crucial technique, more research has to be carried out on this field of biomedical engineering and viable solutions need to be developed, so that this novel technique is fully utilized.
Keywords
vocal cord disorders, synthetic vocal tracts, voice restoration, Speech synthesis, tissue engineered larynx
Introduction
Synthetic vocal tracts are gadgets powered by a computer system capable of translating the brain activity into synthesized speech, by decoding the movements of muscles involved in vocalization using advanced computer programming. The captured MRI of vocal tract holds a different vowel positions providing the data for one-dimensional area function. The complex shapes of the MRI of different cross-sections while producing different vowel sounds are condensed and merged, to provide the vocal tract shape data for 2D (Mullen, Howard, & Murphy, 2007). A digital waveguide mesh is created from this, using the 3D generated from 2D data. The digital waveguide mesh functions as a laryngeal excitation source and provides the acoustic output.
For analyzing laryngeal sources the Liljencrants-fant(LF) model is commonly used in speech synthesis systems. This is rendered as a mathematical model for adjusting the different voice quality in synthesized vocal tract (Fant, 2013). The typical features considered while creating the synthetic vocal tract are fundamental frequency, perturbation measure, jitter, change in pitch, shimmer, change in amplitude with time, harmonic to noise ratio etc. Some studies used mel-frequency cepstral coefficients’s (MFCC) and derivatives for representing speech signal and statistical speech signal processing systems.
Assessment tools: a multidimensional voice programmer (MDVP) supplies the measurements and gives information related to vocal source (Kay Elemetrics Corp, 1999). Articulatory configuration of the vocal tract is related to the vocal tract articulation in vocal folds. For acoustic discrimination of pathological voice, an investigation of vocal tract characteristics was also considered. The observation of how the shape of vocal tracts of sopranos changes when singing different notes are also taken into account and another important parameter considered is the physiological vowel production (Takemoto, Kitamura, & Adachi, 2006).
Over the past years various research was done by our team was on osteology on the importance of posterior condylar canal (Choudhari & Thenmozhi, 2016), accessory foramens present in middle cranial fossa (Hafeez & Thenmozhi, 2016), clinical importance of styloid process (Kannan & Thenmozhi, 2016), Occurance of foramen of Huschke (Keerthana & Thenmozhi, 2016), morphometric analysis of foramen meningo-orbitale (Pratha & Thenmozhi, 2016), Gerdy’s tubercle in Tibia (Nandhini, Babu, & Mohanraj, 2018), Clinical implication of Occipital emissary formanen (Subashri & Thenmozhi, 2016), stature estimation from facial lengths (Krishna & Babu, 2016), radiation effects of mobile phone on brain (Sriram, Thenmozhi, & Yuvaraj, 2015), use of i-pads vs textbook in education (Thejeswar & Thenmozhi, 2015), on Mi RNA on hypertension (Johnson et al., 2020), microRNA especially on preeclampsia patients (Sekar, Lakshmanan, Mani, & Biruntha, 2019), animal studies (Seppan et al., 2018), and in few other fields like thyroid function and obesity (Menon & Thenmozhi, 2016), and vision impairment in amblyopia (Samuel & Thenmozhi, 2015). There is a lack of much information on the current topic of synthetic tracts hence this review article elaborates about the principle of synthetic vocal tract. The aim of this study is to understand the concept of synthetic vocal tract and its significance.
Methodology
The review was done based on the articles obtained from various platforms like pubmed, pubmed central and google scholar. They were collected with a restriction in time basis from 1970 - 2020. The inclusion criteria were original research papers, articles which contained pros and cons. Exclusion criteria came into account for retracted articles and articles of other languages. All the articles were selected based on synthetic vocal tracts.
They were determined by article title, abstract and complete article. When article holder websites were analyzed on the topic of synthetic vocal tract, more than 100 articles were obtained and then filtered according to the abstract, title and complete article and then reviewed. The key words of this study like vocal cord disorders, treatments, synthetic vocal tracts were utilized to consult the databases. Quality of article used was assessed using a quality assessment tool and graded as strong, moderate and weak Table 1.
Vocal cord
Diseases
Nocturnal stridor is a type of breathing disorder commonly found in patients with multiple system atrophy (MSA). An improved understanding of this breathing disorder is essential since nocturnal stridor carries an increased risk of sudden death. It's essential to get familiarized with the additional features in MSA diagnosis of its treatments. The continuous positive airway pressure and tracheostomy are the prevalent treatments for this (Cortelli et al., 2019). When treating MSA patients with nocturnal stridor, it's essential to evaluate the diagnosis by analysis movements of the vocal cord using the DISE method and also by conducting a polysomnography (Heo, Kim, Lee, & Park, 2020).
Intravascular papillary endothelial hyperplasia is a vascular tumor which causes loss or deterioration of voice. It is a rare case arising out of true vocal cord. Inducible laryngeal obstruction (ILO), is another disorder which is caused by the inappropriate obstruction of the true vocal fold, supraglottic structures is responsible for trigger or stimulus during exercises, exercise induced laryngeal obstruction (EILO). Aggressive laryngeal fibromatosis is a lesion manifestation which occurs in patients with dyspnea and stridor. While treating these cases a total laryngectomy with clear margin is needed to avoid high risk of local recurrence (Khan et al., 2019).
Treatments
The treatment for vocal cord paralysis depends on several factors. It is very important to analyze the cause and symptoms, before advising treatments like voice therapy, surgery or both. Vocal fold nodules and polyps and cyst are diagnosed with ultrasonography techniques, treatment involves laryngeal microsurgery. Benign vocal cord lesion is another vocal cord disorder. Treatment procedures consist of voice therapy and vocal improvement. Paradoxical vocal cord is the motion disorder which can be helped with the motion disorder which can be helped with the synthetic vocal tracts. Negative pressure wound therapy is the common treatment for false vocal perforation and abscess (Makihara et al., 2020).
Synthetic vocal tract
New standardized method for the development of synthetic vocal tract is the 3D vocal tract model with binary conversion (Inohara et al., 2010). In this procedure, the MRI scanning of the vocal cord is employed. The captured data with different positions, while creating the sound, provide the vocal tract shape data- for 2D (Mullen et al., 2007). The steps involved in developing the synthetic vocal tract are generating digital waveguide mesh and synthetic speech generation using the computer algorithm among others.
Various innovative research models
Computer models of vocal tract are extensively used for the study of the vocal ability of Neanderthals. One such model involves the specifications of vocal tract with their area functions. Physiological changes of vocal tract are also explored while constructing the 3D Printing vocal tract model based on MRI. The LH model, which is commonly used for designing the speech synthesis system, is rendered as a mathematical model. This is used for adjusting the different voice qualities during synthesis (Fant, 2013). One other technique employed in designing the synthetic vocal tract is the two mass model. This evaluates the biometrical interrelation between vocal fold masses, stiffness and subglottal pressure.
Current trends
The current voice conversion approach involves Gaussian mixture model (GMM) used for the restoration of speech (Nakamura, Toda, Saruwatari, & Shikano, 2012). Speech synthesis technique is another trending method that is currently employed. This involves trending vocal disability, vocal banking and reconstruction (Yamagishi, Veaux, King, & Renals, 2012). During this period of COVID 19 pandemic the transcervical laryngeal ultrasonography is used in laryngeal evaluation (Noel, Orloff, & Sung, 2020).
Future possibilities
The future promise in the advancement of the synthetic vocal tract is tissue engineered larynx. This has been experimentally tried for cancer patients (Hamilton & Birchall, 2017). Stem cell therapy is another technological breakthrough that can have a big impact in vocal cord treatment. Future also looks bright with the possibility of using high speed video endoscopy based synthetic vocal cord. Propylene glycol gel PRG is a synthetic vocal tract. Methods and technologies for rejuvenation of the human voice restoring the flexibility that the vocal cord loses with disease or age are emerging as a result of the collaboration between scientists and physicians (Miyamaru et al., 2019).
Results and Discussion
Synthetic vocal tracts are computer capable of translating brain activity into synthesized speech by decoding the movement of muscles involved in vocalization using special algorithms, Mullen and Inohara said, the new standardized method of designing a synthetic vocal tract involves binary conversion of 3D and 2D vocal tract models using MRI imaging and condensation to that (Inohara et al., 2010; Mullen et al., 2007). Khan EM have done extensive research on this subject and concur that laryngectomy is the preferred treatment for vocal disorders like vocal cord polyps, cysts, fibromatosis, stridor, abscess etc. In contrast, Makihara et al suggested negative pressure wound therapy as the preferred treatment for vocal cord perforation and abscess (Makihara et al., 2020).
S.no |
Author |
Type of study |
Results |
Quality of study |
---|---|---|---|---|
1 |
Systematic review |
Continuous positive airway pressure and tracheostomy are the prevalent treatments for MSA |
moderate |
|
2 |
Case control study |
It is essential to evaluate the vocal cord movement when treating MSA patient with nocturnal stridor |
moderate |
|
3 |
Case series |
Negative pressure wound therapy is the common treatment for false vocal perforation and abscess |
moderate |
|
4 |
research |
Synthetic vocal tract is the 3D vocal tract model with binary conversion |
weak |
|
5 |
systematic review |
Vocal tract shape data for 2D data |
moderate |
|
6 |
Case control study |
Computer model of vocal tract is used to study vocal disability of neanderthals |
weak |
|
7 |
Research |
LF model synthesis different voice qualities |
strong |
|
8 |
Systematic review |
GMM used for voice conversion and restoration of speech |
moderate |
|
9 |
Case series |
Speech synthesis technique used in treating vocal disabilities |
moderate |
|
10 |
Case control study |
Tissue engineering larynx for cancer patients in future |
moderate |
|
11 |
research |
Total laryngectomy with clear margin is need to avoid aggressive laryngeal fibromatosis |
moderate |
|
12 |
Systematic review |
Transcervical laryngeal ultrasonography is used in laryngeal evaluation |
weak |
Yamagishi proposed HMM based speech synthesis that involves text to speech technique for voice banking and voice building for people with disordered speech (Yamagishi et al., 2012). The future of synthetic vocal tract can be suggested that the tissue engineered larynx as the promising development in synthetic vocal tract treatment in case of patients with vocal fold repair and regeneration and reveals that the future of this interesting technology lies in using high speed video endoscopy based synthetic vocal cords. Future advancement in stem cell therapy will have a telling impact on the synthetic vocal tract treatment (Hamilton & Birchall, 2017).
Due to time limitation the study has been restricted to only a few articles out of the 100 articles selected. A larger, deeper research is necessary. A more elaborate study on this subject with the focus on future possibilities in this area, with the fast-phased developments in technology, especially in computers is needed. Synthetic vocal tract and its growing significance in offering viable solutions to patients with vocal disorders have to be studied in depth.
Study of synthetic vocal tract is a nascent technique, with developing technology. We are yet to discover the full potential of this technique and apply it completely. This tool has its significance in a wide array of treatment of voice disorders including cancer, research on this exciting field of biomedical solution is going on all over the world, the physicians and the scientists are working relentlessly to understand its significance and full potential and to exploit it fully, for the treatment of multivariate vocal diseases in future.
Conclusion
Synthetic vocal tract is a recently established biomedical tool that has come as a boon in treating patients with severe disabilities. Speech synthesis is evolving as a viable solution as more research is being carried out on this. To understand the full significance of this crucial technique, more research has to be carried out on this field of biomedical engineering. The viable solutions need to be developed, so that this novel technique is fully utilized.
Conflict of interest
The authors declare that they have no funding support for this study.
Funding support
The authors reported the conflict of interest while performing this study to be nil.