Synfonica LLC (formerly NovaSpeech LLC) was founded in 2004 to develop next-generation speech technologies and related educational materials.
Building on linguistic and perceptual speech models developed by President and Chief Scientist Dr. Susan Hertz and her collaborators over more than forty years, Synfonica is currently developing exciting new cutting-edge technologies for speech synthesis. Our work is founded on a knowledge-based approach and a set of core innovations, which are described on our Technology page and in our publications.
Our research and development projects build on our team's extensive experience and expertise in multi-language and multi-voice speech synthesis, signal processing, and other areas, as described more fully in our team section.
Dr. Sue Hertz
President and Chief Scientist
Dr. Susan Hertz has more than forty years' experience in multi-language text-to-speech synthesis, including both text analysis and speech generation as well as both rule-based and concatenative methods. She has extensive business, technical, and research experience in these areas; she also has a strong background in speech processing, linguistics, acoustic phonetics, speech perception, and software development.
In 1983, Dr. Hertz founded Eloquent Technology, Inc. (ETI), a text-to-speech software company, and transformed it over a number of years from a basement operation into a profitable, worldwide leader in multi-language text-to-speech technology.
As President and Chief Technology Officer at ETI, Dr. Hertz oversaw all of the company's technical and business operations throughout its seventeen-year existence, and she invented or designed much of its core technology. This technology included the multi-voice ETI-Eloquence text-to-speech system for thirteen languages/dialects and the sophisticated Delta programming language and interactive environment used to develop the ETI-Eloquence synthesis rules. The ETI-Eloquence product was known for its extremely small memory footprint, its flexibility, its exceptionally accurate text processing, and its consistent and intelligible speech output.
In January 2001, Dr. Hertz sold Eloquent Technology, Inc. to SpeechWorks International, Inc. (now part of Nuance Communications, Inc.). After the ETI-SpeechWorks merger, Dr. Hertz worked for a year and a half in the SpeechWorks Ithaca office as Chief Scientist and Executive Director of text-to-speech technologies. Since 1979, she has also held positions in the Linguistics Department at Cornell University. She is currently an Adjunct Professor in the department, teaching occasional graduate-level courses in speech synthesis and phonetics. In addition, she has been the Principal Investigator or Project Director on 14 government grants or contracts in the area of speech synthesis.
In 2004, Dr. Hertz founded NovaSpeech to exploit her finding that many segments in natural speech can be replaced with formant-synthesized ones with little if any degradation in speech quality, naturalness, or intelligibility—see Hertz (2002). This finding eventually resulted in the company's novel hybrid speech synthesis system (see our Technology page). More recently, the company's focus has shifted to an innovative all-formant approach that is a natural outgrowth of the hybrid approach; Dr. Hertz is leading the development of Synfonica's all-formant technology.
Among Dr. Hertz's current research interests is the development of a model that accounts for how listeners parse the continuous speech stream into linguistic units like phonemes, syllables, and words. This model has not only provided an important framework for Synfonica's synthesis system, but also promises to advance other areas of speech processing, such as speech recognition. The model is also at the foundation of an extensive set of educational materials that Dr. Hertz is developing with Dr. Gibson for teaching about methods and models for speech research
In her other life, Dr. Hertz is an oil painter and photographer—see suehertzfineart.com. As explained on her art site, she is fascinated by the similarities in how visual and auditory stimuli are parsed into meaningful objects, be they particular kinds of trees or specific words.
Selected Publications and Patent Citations
Hertz, S.R. (1982) From text to speech with SRS, Journal of the Acoustical Society of America 72, 1155-1170.
Hertz, S.R. (1983) The “morphology” of English spelling: a look at the SRS text-modification rules for English, Working Papers of the Cornell Phonetics Laboratory 1, 17-28.
Hertz, S.R., Kadin, J. and Karplus, K. (1985) The Delta rule development system for speech synthesis from text, Proceedings of the IEEE 73, no. 11, Special Issue on Man-Machine Speech Communication, 1589-1601.
Hertz, S.R. (1990) The Delta programming language: an integrated approach to non-linear phonology, phonetics, and speech synthesis, in J. Kingston and M. Beckman (eds.), Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, Cambridge University Press.
Hertz, S.R. (1990) A modular approach to multi-dialect and multi-language speech synthesis using the Delta System, Proceedings of the Workshop on Speech Synthesis, European Speech Communication Association, Autrans, France, 225-228.
Hertz, S.R. (1991) Streams, phones, and transitions:? toward a phonological and phonetic model of formant timing, Journal of Phonetics 19, Special Issue on Speech Synthesis and Phonetics, edited by R. Carlson.
Clements, G.N., Hertz, S.R. (1996) An integrated approach to phonology and phonetics, in J. Durand and B. Laks (eds.), Current Trends in Phonology: Models and Methods, CNRS, Paris X and Univ. of Salford Publications.
Hertz, S.R. (1997) The technology of text-to-speech. Speech Technology, CI Publishing, 18-21.
Hertz, S.R., Younes, R.J., and Zinovieva, N. (1999) Language-universal and language-specific components in the multi-language ETI-Eloquence text-to-speech system. Proceedings of the 14th International Congress of Phonetic Sciences, 2283-2286.
Hertz, S.R., Younes, R.J., and Hoskins, S.R. (2000) Space, speed, quality, and flexibility: Advantages of rule-based speech synthesis. Proceedings of AVIOS 2000, San Jose, CA May 22-24, 217-227.
Hertz, S.R. (2002) Integration of rule-based formant synthesis and waveform concatenation: a hybrid approach to text-to-speech synthesis, Proc. IEEE 2002 Workshop On Speech Synthesis.
Hertz, S.R., Spencer, I.C., Church, T.F., and Goldhor, R. (2004) Perceptual consequences of nasal surrogates in English: Implications for speech synthesis, Poster presented at the 147th Meeting of the Acoustical Society of America.
Hertz, S.R. and Goldhor, R. (2004) When can speech segments serve as surrogates?, Proc. From Sound to Sense: 50+ Years of Discoveries in Speech Communication.
Hertz, S. R. (2006) A model of the regularities underlying speaker variation: Evidence from hybrid synthesis, Proc. Interspeech 2006.
Hertz, S.R., Gibson, M., Glatthorn, N., Hegde, P., Mills, H., Spencer, I. (2008) The role of prosody in speech parsing, poster presented at Experimental and Theoretical Advances in Prosody, Cornell.
Hertz, S.R. and Mills, H.G. (2010). System and method for hybrid speech synthesis, European Patent number EP 2140447 B1.
Hertz, S.R. and Mills, H.G. (2011). System and method for hybrid speech synthesis, United States Patent number 7953600.
Senior Software Engineer
Harold Mills joined the company in August 2006. He has more than twenty-five years of experience in signal processing, as well as software and hardware development in the areas of speech analysis and synthesis, animal bioacoustics, planetary image processing, and satellite communications.
Harold earned B.S. and M.Eng. degrees in Computer Science and Electrical Engineering, respectively, from Cornell University in 1986 and 1988. From 1987 to 1990 he worked at TRW in Redondo Beach, California, where he participated in software and hardware projects related to space satellite communication systems, including the design and construction of a real-time FFT-based digital spectrum analyzer and a digital interpolation ASIC.
From 1990 until 1994 he worked at the Center for Radiophysics and Space Research at Cornell University, where he wrote image processing and display software for NASA's Mars Observer mission and developed a prototype JPEG image compressor for other missions. During this time, Harold also consulted for Dr. Hertz's former company, Eloquent Technology, Inc., writing a software implementation of a Klatt-style formant-based speech synthesizer.
In 1994 Harold joined the Bioacoustics Research Program of the Cornell Laboratory of Ornithology, where he worked for the next twelve years developing software for collecting and analyzing animal acoustical and other behavioral data. From 1999 through 2005 he was the lead developer of the Raven sound analysis software. During 2000 he led the development, deployment, and operation of the acoustical component of the BirdCast project, a largely automated network of acoustical bird migration monitoring stations that operated in the Delaware River valley during the spring and fall migrations.
At Synfonica, Harold is the lead programmer on all our synthesis projects. He is responsible for the implementation of our synthesis system development kit (SDK), our tablet-based applications, all our specialized signal processing algorithms, and much more.show less [-]
Dr. Masayuki Gibson
Linguist and Speech Scientist
Dr. Masayuki Gibson joined the company in 2008. He worked part-time as a Research Assistant for three years before joining the team as a full-time Speech Scientist in February 2011. He graduated magna cum laude from Rutgers University in 2003 with a B.A. in Linguistics and Music and received a Ph.D. from Cornell University in Linguistics in 2013. His doctoral dissertation was on the interaction of lexical tone and sentential intonation in tone languages.
At Synfonica, Masayuki is responsible for many aspects of the research and development related to our current speech synthesis projects, including both our knowledge-based synthesis rule development and our speech therapy applications. Together with Dr. Hertz, he is also working on an extensive set of educational materials for teaching about methods and models for speech research.
Isaac Spencer has been working at the company since June 2004, after graduating magna cum laude from Cornell University with a B.A. in Linguistics and English Literature. Isaac supports Synfonica's research activities through a variety of programming tasks—everything from scripting to main-line development. Currently, one of his main tasks is the implementation of the text-analysis component (front end) of our text-to-speech system.
Patrick Hegde has been working as a Research Associate at the company since September 2006, after obtaining a B.A. in Linguistics and History with distinction in all subjects from Cornell University. Patrick contributes to many of Synfonica's activities, including software development, technical documentation, and public relations.