Discrete-Time Speech Signal Processing: Principles and Practice (IE-Hardcover)

Thomas F. Quatieri

買這商品的人也買了...

商品描述

Essential principles, practical examples, current applications, and leading-edge research.

In this book, Thomas F. Quatieri presents the field's most intensive, up-to-date tutorial and reference on discrete-time speech signal processing. Building on his MIT graduate course, he introduces key principles, essential applications, and state-of-the-art research, and he identifies limitations that point the way to new research opportunities.

Quatieri provides an excellent balance of theory and application, beginning with a complete framework for understanding discrete-time speech signal processing. Along the way, he presents important advances never before covered in a speech signal processing text book, including sinusoidal speech processing, advanced time-frequency analysis, and nonlinear aeroacoustic speech production modeling. Coverage includes:

  • Speech production and speech perception: a dual view
  • Crucial distinctions between stochastic and deterministic problems
  • Pole-zero speech models
  • Homomorphic signal processing
  • Short-time Fourier transform analysis/synthesis
  • Filter-bank and wavelet analysis/synthesis
  • Nonlinear measurement and modeling techniques

The book's in-depth applications coverage includes speech coding, enhancement, and modification; speaker recognition; noise reduction; signal restoration; dynamic range compression, and more. Principles of Discrete-Time Speech Processing also contains an exceptionally complete series of examples and Matlab exercises, all carefully integrated into the book's coverage of theory and applications.

Table of Contents

1. Introduction.

Discrete-Time Speech Signal Processing. The Speech Communication Pathway. Analysis/Synthesis Based on Speech Production and Perception. Applications. Outline of Book.


2. A Discrete-Time Signal Processing Framework.

Discrete-Time Signals. Discrete-Time Systems. Discrete-Time Fourier Transform. Uncertainty Principle. z-Transform. LTI Systems in the Frequency Domain. Properties of LTI Systems. Time-Varying Systems. Discrete-Fourier Transform. Conversion of Continuous Signals and Systems to Discrete Time.


3. Production and Classification of Speech Sounds.

Anatomy and Physiology of Speech Production. Spectrographic Analysis of Speech. Categorization of Speech Sounds. Prosody: The Melody of Speech. Speech Perception.


4. Acoustics of Speech Production.

Physics of Sound. Uniform Tube Model. A Discrete-Time Model Based on Tube Concatenation. Vocal Fold/Vocal Tract Interaction.


5. Analysis and Synthesis of Pole-Zero Speech Models.

Time-Dependent Processing. All-Pole Modeling of Deterministic Signals. Linear Prediction Analysis of Stochastic Speech Sounds. Criterion of “Goodness” . Synthesis Based on All-Pole Modeling. Pole-Zero Estimation. Decomposition of the Glottal Flow Derivative.


 

Appendix 5.A: Properties of Stochastic Processes.

Random Processes. Ensemble Averages. Stationary Random Process. Time Averages. Power Density Spectrum.


 

Appendix 5.B: Derivation of the Lattice Filter in Linear Prediction Analysis.

6. Homomorphic Signal Processing.

Concept. Homomorphic Systems for Convolution. Complex Cepstrum of Speech-Like Sequences. Spectral Root Homomorphic Filtering. Short-Time Homomorphic Analysis of Periodic Sequences. Short-Time Speech Analysis. Analysis/Synthesis Structures. Contrasting Linear Prediction and Homomorphic Filtering.


7. Short-Time Fourier Transform Analysis and Synthesis.

Short-Time Analysis. Short-Time Synthesis. Short-Time Fourier Transform Magnitude. Signal Estimation from the Modified STFT or STFTM. Time-Scale Modification and Enhancement of Speech.


 

Appendix 7.A: FBS Method with Multiplicative Modification.

8. Filter-Bank Analysis/Synthesis.

Revisiting the FBS Method. Phase Vocoder. Phase Coherence in the Phase Vocoder. Constant-Q Analysis/Synthesis. Auditory Modeling.


9. Sinusoidal Analysis/Synthesis.

Sinusoidal Speech Model. Estimation of Sinewave Parameters. Synthesis. Source/Filter Phase Model. Additive Deterministic-Stochastic Model.


 

Appendix 9.A: Derivation of the Sinewave Model.

Appendix 9.B: Derivation of Optimal Cubic Phase Parameters.

10. Frequency-Domain Pitch Estimation.

A Correlation-Based Pitch Estimator. Pitch Estimation Based on a “Comb Filter<170. Pitch Estimation Based on a Harmonic Sinewave Model. Glottal Pulse Onset Estimation. Multi-Band Pitch and Voicing Estimation.


11. Nonlinear Measurement and Modeling Techniques.

The STFT and Wavelet Transform Revisited. Bilinear Time-Frequency Distributions. Aeroacoustic Flow in the Vocal Tract. Instantaneous Teager Energy Operator.


12. Speech Coding.

Statistical Models of Speech. Scaler Quantization. Vector Quantization (VQ). Frequency-Domain Coding. Model-Based Coding. LPC Residual Coding.


13. Speech Enhancement.

Introduction. Preliminaries. Wiener Filtering. Model-Based Processing. Enhancement Based on Auditory Masking.


 

Appendix 13.A: Stochastic-Theoretic parameter Estimation.

14. Speaker Recognition.

Introduction. Spectral Features for Speaker Recognition. Speaker Recognition Algorithms. Non-Spectral Features in Speaker Recognition. Signal Enhancement for the Mismatched Condition. Speaker Recognition from Coded Speech.


 

Appendix 14.A: Expectation-Maximization (EM) Estimation.

Glossary.
Speech Signal Processing.
Units.
Databases.
Index.
About the Author.

商品描述(中文翻譯)

本書由Thomas F. Quatieri撰寫,是關於離散時間語音信號處理的最全面、最新的教程和參考資料。他在麻省理工學院的研究生課程的基礎上,介紹了關鍵原則、基本應用和最新研究,並指出了指向新研究機會的限制。


Quatieri在理論和應用之間提供了很好的平衡,從完整的框架開始理解離散時間語音信號處理。在此過程中,他提出了以往語音信號處理教材中從未涵蓋的重要進展,包括正弦語音處理、先進的時頻分析和非線性空氣聲學語音生成建模。內容包括:



  • 語音生成和語音知覺:雙重觀點

  • 隨機和確定性問題的關鍵區別

  • 極點-零點語音模型

  • 同調信號處理

  • 短時傅立葉變換分析/合成

  • 濾波器組和小波分析/合成

  • 非線性測量和建模技術


本書深入涵蓋了語音編碼、增強、修改、語者識別、降噪、信號恢復、動態範圍壓縮等應用。《離散時間語音處理原理》還包含了一系列非常完整的例子和Matlab練習,所有這些都被精心融入到理論和應用的內容中。


目錄

1. 簡介


離散時間語音信號處理。語音通信路徑。基於語音生成和知覺的分析/合成。應用。本書概述。



2. 離散時間信號處理框架。


離散時間信號。離散時間系統。離散時間傅立葉變換。不確定性原理。z-變換。頻域中的線性時不變系統。線性時不變系統的性質。時變系統。離散傅立葉變換。連續信號和系統轉換為離散時間。



3. 語音聲音的生成和分類。


語音生成的解剖學和生理學。語音的頻譜分析。語音聲音的分類。語音的韻律。語音知覺。



4. 語音生成的聲學。


聲音的物理學。均勻管模型。基於管道串聯的離散時間模型。聲帶/聲道交互作用。



5. 極點-零點語音模型的分析和合成。


時間依賴處理。確定性信號的全極建模。隨機語音聲音的線性預測分析。"好"的標準。基於全極建模的合成。極點-零點估計。聲門流動導數的分解。



 


附錄 5.A:隨機過程的性質。


隨機過程。集合平均。平穩隨機過程。時間平均。功率密度譜。



 


附錄 5.B:線性預測分析中格子濾波器的推導。


6. 同調信號處理。


概念。用於卷積的同調系統。類似語音序列的復雜倒頻譜。頻譜根同調濾波。周期序列的短時同調分析。短時語音分析。分析/合成結構。對比Li```