An Overview of the CHAINS Project
Speech provides one of many ways to identify an individual reliably, based on anatomical and physiological characteristics. However, we are still far from having a 'voiceprint' as reliable as a fingerprint or DNA profile. In this work, we use diverse methods to distinguish between those speech characteristics which an individual can change at will (e.g. by intentional disguise) and those which are essentially invariant, irrespective of speaking style.
In a first phase of the project, we developed a corpus that contains speech (read fables and sentences) collected from the same speakers under a variety of conditions, and in a variety of speaking styles. The corpus contains both imitative and non-imitative speech styles, and was recorded intwo distinct sessions.
Using the corpus, we went on to develop a new way to represent speech within a speaker identification system. Using an AM/FM analysis of the speech signal, we developed a featural representation called pykfec, which we use in plase of standard MFCC coefficients. Details of the encoding are available in the following paper:
Grimaldi, M. and Cummins, F. (2008). Speaker Identification Using Instantaneous Frequencies. IEEE Transactions on Audio, Speech, and Language Processing, 16(6):1097-1111.
Code to extract pykfec coefficients is now available from the downloads page.
In addition to the core work in speaker identification, a model of speech production is under development that offers insight into how coordination in speech production is achieved, despite the complexity of the underlyingproduction system. When speakers change style, they have the experience of making a relatively simple change, yet any stylistic change is accompanied by very many effects in the resulting speech. Our production model implements gestural sequencing under efficiency constraints. More details soon.