Speech Recognition System Using Hilbert Huang Transform and DHMM
Speech Recognition System Using Hilbert Huang Transform and DHMM
Main Article Content
Abstract
This paper presents robust speech
recognition system in the presence of noise.
Discrete Hidden Markov Model (DHMM) is used
for mainly reducing the computation burden of
voice recognition which in turn increases speed.
Hilbert Huang Transform (HHT) is an empirical
approach to decompose any complicated data set
into a finite number of Intrinsic Mode Functions
(IMF) to obtain the instantaneous frequency data.
This Empirical Mode Decomposition (EMD)
method of HHT operates in time domain on the
local characteristic time scale of the data, making
it adaptive and highly efficient to work with any
nonlinear and nonstationary data’s unlike Fourier
transforms. The Mel Frequency Spectrum
Coefficients (MFCC) is derived from cepstral
coefficients of IMFs. The features are then
weighted and summed to get the original speech
reconstructed signal. Genetic Algorithm (GA) was
designed for each IMF to get better optimal
solution. This results in significant reduction in
time measurement, and thus it improves the
speech recognition rate
Article Details
##references## ##ver##
codebook and utterance information in cepstral
statistics normalization techniques for robust
speech recognition in additive noise
environments,†IEEE Signal Process. Lett., vol.
16, no. 6, pp. 473–476, Jun. 2009.
[2]. L. D. Persia, D. Milone, H. L. Rufiner, and M.
Yanagida, “Perceptual evaluation of blind source
separation for robust speech recognition,†Signal
Process., vol. 88, no. 10, pp. 2578–2583, 2008.
[3]. C. T. Ishi, S. Matsuda, T. Kanda, T. Jitsuhiro,
H. Ishiguro, S. Nakamura, and N. Hagita, “A
robust speech recognition system for
communication robots in noisy environments,â€
IEEE Trans. Robot., vol. 24, no. 3, pp. 759–763,
Jun. 2008.
[4]. D. Wang, H. Leung, A. P. Kurian, H. J. Kim,
and H. Yoon, “A deconvolutive neural network
for speech classification with applications to home
service robot,†IEEE Trans. Instrum. Meas., vol.
59, no. 12, pp. 3237– 3243, Dec. 2010.
[5] L. Buera, A. Miguel, O. Saz, A. Ortega, and E.
Lleida, “Unsupervised data-driven feature vector
normalization with acoustic model adaptation for
robust speech recognition,†IEEE Trans. Audio,
Speech, Lang. Process., vol. 18, no. 2, pp. 296–
309, Feb. 2010.
[6] A. Sankar and C. H. Lee, “A maximumlikelihood
approach to stochastic matching for
robust speech recognition,†IEEE Trans. Speech
Audio Process., vol. 4, no. 3, pp. 190–202, May
1996.
[7] C.W. Hsu and L. S. Lee, “Higher order
cepstral moment normalization for improved
robust speech recognition,†IEEE Trans. Audio,
Speech, Lang. Process., vol. 17, no. 2, pp. 205–
220, Feb. 2009.
[8] C. T. Ishi, S. Matsuda, T. Kanda, T. Jitsuhiro,
H. Ishiguro, S. Nakamura, and N. Hagita, “A
robust speech recognition system for
communication robots in noisy environments,â€
IEEE Trans. Robot., vol. 24, no. 3, pp. 759–763,
Jun. 2008.
[9] Y. Zhan, H. Leung, K. C. Kwak, and H. Yoon,
“Automated speaker recognition for home service
robots using genetic algorithm and
Dempster–Shafer fusion technique,†IEEE Trans.
Instrum. Meas., vol. 58, no. 9, pp. 3058–3068,
Sep. 2009.
[10] Y. Tsao and C. H. Lee, “An ensemble
speaker and speaking environment modeling
approach to robust speech recognition,†IEEE
Trans. Audio, Speech, Lang. Process., vol. 17, no.
5, pp. 1025–1037, Jul. 2009.
[11] A. Sankar and C. H. Lee, “A maximumlikelihood
approach to stochastic matching for
robust speech recognition,†IEEE Trans. Speech
S. Dhanalakshmi IJSRE volume 1 issue 1 May 2013 Page 9
Audio Process., vol. 4, no. 3, pp. 190–202, May
1996
[12] W.Wang, X. Li, and R. Zhang, “Speech
detection based on Hilbert–Huang transform,†in
Proc. 1st Int. Multi-Symp. Comput. Comput. Sci.,
Jun. 20–24, 2006, vol. 1, pp. 290–293
[13] N. E. Huang, “The empirical mode
decomposition and the Hilbert spectrum for
nonlinear and non-stationary time series analysis,â€
Proc. R. Soc. Lond. A, vol. 454, no. 1971, pp.
903–995, Mar. 1998
[14] M. K. I. Molla and K. Hirose, “Singlemixture
audio source separation by subspace
decomposition of Hilbert spectrum,†IEEE Trans.
Acoust., Speech, Signal Process., vol. 15, no. 3,
pp. 893–900, Mar. 2007
[15] S. T. Pan, “Design of robust D-stable IIR
filters using genetic algorithms with embedded
stability criterion,†IEEE Trans. Signal Process.,
vol. 57, no. 8, pp. 3008–3016, Aug. 2009.
[16] M. K. I. Molla and K. Hirose, “Singlemixture
audio source separation by subspace
decomposition of Hilbert spectrum,†IEEE Trans.
Acoust., Speech, Signal Process., vol. 15, no. 3,
pp. 893–900, Mar. 2007
[17] J. A. Rosero, L. Romeral, J. A. Ortega, and E.
Rosero, “Short-circuit detection by means of
empirical mode decomposition and Wigner–Ville
distribution for PMSM running under dynamic
condition,†IEEE Trans. Ind. Electron., vol. 56,
no. 11, pp. 4534–4547, Nov. 2009
[18] S. Windmann and R. Haeb-Umbach,
“Parameter estimation of a statespace model of
noise for robust speech recognition,†IEEE Trans.
Audio, Speech, Lang. Process., vol. 17, no. 8, pp.
1577–1590, Nov. 2009
[19] M. J. F. Gales and S. J. Young, “Robust
speech recognition in additive and convolutional
noise using parallel model combination,†Comput.
Speech Lang., vol. 9, no. 4, pp. 289–307, Oct.
1995
[20] C. H. Lee, “On stochastic feature and model
compensation approaches to robust speech
recognition,†Speech Commun., vol. 25, no. 1–3,
pp. 29–47, Aug. 1998.