Synfonica LLC (formerly NovaSpeech LLC) was founded in 2004 to develop next-generation text-to-speech and related technologies.
Building on linguistic and perceptual speech models developed by President and Chief Technology Officer Dr. Susan Hertz and her collaborators over more than forty years, Synfonica is currently developing exciting new cutting-edge technologies for speech synthesis. Our work is founded on a knowledge-based approach and a set of core innovations, which are described on our Technology page and in our publications.
Our research and development projects build on our team's extensive experience and expertise in multi-language and multi-voice speech synthesis and other areas, as described more fully in our team section.
Dr. Sue Hertz
President and Chief Technology Officer
Dr. Sue Hertz has more than forty years' experience in multi-language text-to-speech synthesis, including both text analysis and speech generation as well as both rule-based and concatenative methods. She has extensive business, technical, and research experience in these areas; she also has a strong background in speech processing, linguistics, acoustic phonetics, speech perception, and software development.
In 1983, Dr. Hertz founded Eloquent Technology, Inc. (ETI), a text-to-speech software company, and transformed it over a number of years from a basement operation into a profitable, worldwide leader in multi-language text-to-speech technology.
As President and Chief Technology Officer at ETI, Dr. Hertz oversaw all of the company's technical and business operations throughout its seventeen-year existence, and she invented or designed much of its core technology. This technology included the multi-voice ETI-Eloquence text-to-speech system for thirteen languages/dialects and the sophisticated Delta programming language and interactive environment used to develop the ETI-Eloquence synthesis rules. The ETI-Eloquence product was known for its extremely small memory footprint, its flexibility, its exceptionally accurate text processing, and its consistent and intelligible speech output. ETI's history and technology are detailed as part of the Smithsonian Speech Synthesis History Project.
In January 2001, Dr. Hertz sold Eloquent Technology, Inc. to SpeechWorks International, Inc. (now part of Nuance Communications, Inc.). After the ETI-SpeechWorks merger, Dr. Hertz worked for a year and a half in the SpeechWorks Ithaca office as Chief Scientist and Executive Director of text-to-speech technologies. Since 1979, she has also held positions in the Linguistics Department at Cornell University. She is currently an Adjunct Professor in the department, teaching occasional graduate-level courses in speech synthesis and phonetics. In addition, she has been the Principal Investigator or Project Director on 17 government grants or contracts in the area of speech synthesis.
In 2004, Dr. Hertz founded NovaSpeech to exploit her finding that many segments in natural speech can be replaced with formant-synthesized ones with little if any degradation in speech quality, naturalness, or intelligibility—see Hertz (2002). This finding eventually resulted in the company's novel hybrid speech synthesis system (see our Technology page). More recently, the company's focus has shifted to an innovative all-formant approach that is a natural outgrowth of the hybrid approach; Dr. Hertz is leading the development of Synfonica's all-formant technology.
Among Dr. Hertz's current research interests is the development of a model that accounts for how listeners parse the continuous speech stream into linguistic units like phonemes, syllables, and words. This model has not only provided an important framework for Synfonica's synthesis system, but also promises to advance other areas of speech processing, such as speech recognition. The model is also at the foundation of an extensive set of educational materials that Dr. Hertz is developing with Dr. Gibson for teaching about methods and models for speech research
In her other life, Dr. Hertz is an oil painter and photographer—see suehertzfineart.com. As explained on her art site, she is fascinated by the similarities in how visual and auditory stimuli are parsed into meaningful objects, be they particular kinds of trees or specific words.
Selected Publications and Patent Citations
Hertz, S.R. (1982) From text to speech with SRS, Journal of the Acoustical Society of America 72, 1155-1170.
Hertz, S.R. (1983) The “morphology” of English spelling: a look at the SRS text-modification rules for English, Working Papers of the Cornell Phonetics Laboratory 1, 17-28.
Hertz, S.R., Kadin, J. and Karplus, K. (1985) The Delta rule development system for speech synthesis from text, Proceedings of the IEEE 73, no. 11, Special Issue on Man-Machine Speech Communication, 1589-1601.
Hertz, S.R. (1990) The Delta programming language: an integrated approach to non-linear phonology, phonetics, and speech synthesis, in J. Kingston and M. Beckman (eds.), Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, Cambridge University Press.
Hertz, S.R. (1990) A modular approach to multi-dialect and multi-language speech synthesis using the Delta System, Proceedings of the Workshop on Speech Synthesis, European Speech Communication Association, Autrans, France, 225-228.
Hertz, S.R. (1991) Streams, phones, and transitions:? toward a phonological and phonetic model of formant timing, Journal of Phonetics 19, Special Issue on Speech Synthesis and Phonetics, edited by R. Carlson.
Clements, G.N., Hertz, S.R. (1996) An integrated approach to phonology and phonetics, in J. Durand and B. Laks (eds.), Current Trends in Phonology: Models and Methods, CNRS, Paris X and Univ. of Salford Publications.
Hertz, S.R. (1997) The technology of text-to-speech. Speech Technology, CI Publishing, 18-21.
Hertz, S.R., Younes, R.J., and Zinovieva, N. (1999) Language-universal and language-specific components in the multi-language ETI-Eloquence text-to-speech system. Proceedings of the 14th International Congress of Phonetic Sciences, 2283-2286.
Hertz, S.R., Younes, R.J., and Hoskins, S.R. (2000) Space, speed, quality, and flexibility: Advantages of rule-based speech synthesis. Proceedings of AVIOS 2000, San Jose, CA May 22-24, 217-227.
Hertz, S.R. (2002) Integration of rule-based formant synthesis and waveform concatenation: a hybrid approach to text-to-speech synthesis, Proc. IEEE 2002 Workshop On Speech Synthesis.
Hertz, S.R., Spencer, I.C., Church, T.F., and Goldhor, R. (2004) Perceptual consequences of nasal surrogates in English: Implications for speech synthesis, Poster presented at the 147th Meeting of the Acoustical Society of America.
Hertz, S.R. and Goldhor, R. (2004) When can speech segments serve as surrogates?, Proc. From Sound to Sense: 50+ Years of Discoveries in Speech Communication.
Hertz, S. R. (2006) A model of the regularities underlying speaker variation: Evidence from hybrid synthesis, Proc. Interspeech 2006.
Hertz, S.R., Gibson, M., Glatthorn, N., Hegde, P., Mills, H., Spencer, I. (2008) The role of prosody in speech parsing, poster presented at Experimental and Theoretical Advances in Prosody, Cornell.
Hertz, S.R. and Mills, H.G. (2010). System and method for hybrid speech synthesis, European Patent number EP 2140447 B1.
Hertz, S.R. and Mills, H.G. (2011). System and method for hybrid speech synthesis, United States Patent number 7953600.
Dr. Masayuki Gibson
Dr. Masayuki Gibson joined the company part-time in 2007 while completing their Ph.D. in Linguistics, and they joined the team full-time in 2011. Masayuki is responsible for research and development related to our knowledge-based text-to-speech rules, our speech therapy application, and our expressive speech synthesis project. They are the lead developer of the back end (speech generation component) of Synfonica's text-to-speech system, drawing on their extensive knowledge of speech perception, acoustic phonetics, phonology, and linguistics more generally. Together with Dr. Hertz, they are also working on a set of educational materials for teaching about methods and models for speech research.
Isaac Spencer has been working at the company since its inception in 2004. Isaac is the primary developer of the front end (text analysis component) of Synfonica's knowledge-based text-to-speech synthesis system, implementing rules for text normalization, phrase prediction, morphological analysis, lexical stress prediction, letter-to-phoneme conversion, and more. Having expertise in both linguistics and software development, he also contributes to many of the company's other research and development activities.
Patrick Hegde has been working at the company since 2006. Patrick has a background in software development and linguistics and works on our desktop, mobile, and web applications. He supports Synfonica's research and development activities by implementing software that facilitates text-to-speech rule development, perceptual experiments, and speech data analysis. He is also the main developer of our speech therapy application.