Coarticulatory relations in a compact model of articulatory dynamics Veena Singampalli and Philip J.B. Jackson Abstract: This paper presents the study of coarticulatory relations amongst articulators aimed at building a compact coarticulation model. The principles behind our model are presented and its performance is compared with an existing model of coarticulation. Our ultimate goal of developing such a model is to incorporate it in the architecture of a multi-level SHMM based speech recognition system to improve its recognition performance. Many researchers have tried to model coarticulatory effects under different circumstances using different kinds of data. ¨Ohman [3] used formant data to study coarticulation in VCV utterances. He proposed a mathematical model [4] to obtain the vocal tract shape of stop consonants in VCV utterances and compared the model results to x-ray data. His model needs further development to be able to generate vocal tract shapes for the consonants and to allow for coarticulation between consonants. Blackburn and Young [1] used articulatory accelerations between preceding and following phonemes to model the conditional variations in position of the articulators for the intermediate phoneme using x-ray microbeam data. Their coarticulation model did not consider correlations between horizontal and vertical movements of articulators and was limited to considering the immediate left and right contexts. Dang et al. [2] modelled coarticulation at planning stage in VCVCV utterances as a modulation process; the vocalic component being the carrier wave and the consonantal component being the modulating wave. Their proposed carrier model estimated the parameters statistically from empirical (EMA) data. There is a need for understanding the underlying correlations that affect articulatory movements and developing a generalised model. The three main kinds of relationship in modelling articulatory trajectories are (1) correlated movements of the articulator in 3-D space, (2) correlated movements of the articulators with respect to one another, and (3) correlated movements of the articulators over time, as in [1]. Our aim is to incorporate all three correlations in our model, and our current work is based on using statistical techniques to identify, quantify, test and represent the relationships amongst the movements of the articulators, addressing (1) and (2). Critical and dependent articulators for every phone were identified in a data-driven way using statistical tests on EMA data obtained form MOCHA corpus. Hotelling’s Tsquared test, student t-test and Kullback Leibler (KL) divergence were considered as distance measures for identifying critical articulators, whose distributions were characterised by shifted mean positions and smaller covariance when compared with the statistics of those articulators overall, i.e., in neutral position. A simple univariate coarticulation model was developed based on the findings of the statistical correlation tests. Every phone model was associated with a mean and variance for each articulator in horizontal and vertical directions. The parameters of the model were initialised to those of the neutral position. Based on KL divergence between the phone-specific distribution and the model’s, the critical articulator was identified as having the maximum distance (above a given threshold). The critical articulator’s model parameters were set equal to the phone-specific statistics, and its effect on the other articulators was determined by the strength of any significant correlation between them (after Pearson’s test). Bivariate models that incorporated the relationship between critical and dependent articulators and between different dimensions of the same articulator were investigated. Goodness of fit of our models was determined by the KL divergence between the model and phone-specific distributions. Synthetic articulatory trajectories were generated using the models and RMS error between the synthetic and empirical data was used to assess the accuracy of models. Results of this data-driven technique for identifying critical articulators and their effect on dependent ones are presented and the performance of the coarticulation model evaluated. The results obtained by implementing coarticulation model proposed in [1], are also presented, for comparison. Future work is needed to develop a complete model of coarticulation that can be compactly integrated into the speech reognition system. However, such a model would also find applications in multimodal ASR, pronunciation training and foreign language systems, and multimodal synthesis. [1] Simon C. Blackburn and Young Steve. A self-learning predictive model of articulator movements during speech production. J. Acoust. Soc. Am., 103(3), March 2000. [2] Jianwu Dang, Jianguo Wei, Takeharu Suzuki, and Pascal Perrier. Investigation and modeling of coarticulation during speech. Eurospeech, 2005. [3] Sven E.G. ¨Ohman. Coarticulation in VCV utterances: Spectrographic measurements. J. Acoust. Soc. Am., 39, 1966. [4] Sven E.G. ¨Ohman. Numerical model of coarticulation. J. Acoust. Soc. Am., 41(2), 1967.