Hands-on Speech Recognition with Kaldi/TIMIT: Demystify Automatic Speech Recognition (ASR) & Deep Neural Networks (DNN)
- Length: 220 pages
- Edition: 1
- Language: English
- Publisher: HYPEch.com Publishing
- Publication Date: 2020-11-26
- ISBN-10: B08P79K6Q1
- Sales Rank: #3712573 (See Top 100 Books)
With the increasing demand for In-car Systems, Health Care, Military, Telephone, and our daily life, Automatic Speech Recognition (ASR) related job market is booming right now. As the leading open source software in ASR field, Kaldi might be the best start point. We could learn all the concepts and technologies through building and running a Kaldi model, as well as using it in the real world.
We don’t yet know how expansive this trend will be, but if you’re a developer who specializes in software developing, now might be the time to capitalize on the rising job opportunities as major apps work to integrate Kaldi.
From installation to the final results, we go through the whole life-cycle of Kaldi developing process using TIMIT corpus. You will actually build up a real ASR model and could apply it into your working environment.
All the steps to build Kaldi/TIMIT model have been recorded by screenshot, code, and output. You will not be lost and missed.
Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Machine Learning are hot.
Chapter 1 discussing some background of the ASR. Knowing the history and context of the new topic is the best way to understand it in my humble opinion. Always asking: Who is it? Where did it come from? Where is it going?
Once the soul questions have been answered, we get to install Kaldi. Chapter 2 Installation will explain Kaldi environment and installation process. We could have Mac, PC, Linux, Windows or any platform.
We test the Kaldi installation with some small projects like yes/no. There are another recipe called 10 digits speech recognition good for testing purpose as well, which is not included in this book.
Chapter 3 downloads and sets up TIMIT in Kaldi with specific environment parameters.
Chapter 4 prepares the data for TIMIT. We learn about FST, dictionary and some other relevant concepts during preparation.
Chapter 5 extracts features. MFCC and CMVN will be discussed in details.
Chapter 6 runs monophone model for TIMIT. All the ASR fundamental concepts have been explained.
Chapter 7 to 9 run triphone model for tri1, tri2, tri3.
Chapter 10 runs SGMM2 model.
Chapter 11 runs MMI + SGMM2.
Chapter 12 runs Dan’s DNN.
Chapter 13 covers all the stages of Karel’s DNN, including store features, pre-training, frame-level cross-entropy, sequence-discriminative training, and iteration of sMBR.
Chapter 14 is the final results of the whole TIMIT output which could be used as a template for comparison.
When finishing the whole book, we will be armed with Kaldi ASR Neutral Network models running capability.
This book gives you a start point to pursue higher goals in Artificial Intelligence world.