Jody Kreiman, PhD
Department of Surgery
Professor Kreiman’s NIH-supported research, jointly conducted with Bruce Gerratt, PhD in The Voice Perception Laboratory, focuses on the perception (and secondarily on the production) of normal and pathological voice. Voice quality is a primary means by which humans signal their identity, internal state, and intentions to others, and voice disorders can have devastating personal and professional consequences, creating an undesirable personal image and making vocal communication difficult or impossible. However, despite the importance of voice perception and large literatures in disciplines ranging from music to medicine, little progress has been made in understanding how listeners perceive voices. In fact, the modern history of voice research may be viewed as a series of efforts to circumvent the problem of measuring quality by substituting “objective” measures of acoustics, physiological functions, or airflow. Unfortunately, objective measures of quality are meaningless unless they are validated against perceptual measures. Thus, perception of voice remains of central importance even in efforts to eliminate perceptual measures.
Their research attempts to develop models of voice perception and speaker recognition. Without such models, the goal of understanding how listeners perceive voices will not be achieved. Initial studies in the laboratory sought to specify the sources of variability in listeners’ ratings of vocal quality. More recently, studies have focused on developing reliable, valid methods to measure perceived vocal quality, by controlling the factors underlying response variability. They have devised a new, theoretically-motivated method of assessing quality – listener-mediated analysis-resynthesis—in which listeners explicitly compare synthetic and natural voice samples, and change speech synthesizer parameters to create acceptable auditory matches to voice stimuli. This method is designed to replace usable internal standards for qualities like breathiness and roughness with externally presented stimuli. Initial results indicate that this technique does control the major hypothetical sources of disagreement in rating scale judgments.
A reliable and valid method of measuring what listeners hear is an essential component of a common theoretical framework that links together physiology, aerodynamics, acoustics, and perception, to explain how tissue movement finally results in the perception of speech sounds. However, voice production, perception, and acoustics in the past have been studied as nearly independent disciplines, with little cross-fertilization of ideas and virtually no theory to link levels of description. A unified approach to the study of voice could have many potential benefits, including theoretically motivating surgeries to improved voice quality, allowing prediction of post-surgical voice quality given a patient’s particular findings, motivating objective measures of voice, specifying which aspects of a voice are essential to its identification, and so on. Development of such a theory (in collaboration with other faculty members in Head and Neck Surgery, Engineering, and Linguistics) is the ultimate goal of this ongoing research.