Speech Recognition Course
Material for learning speech recognition, based on Microsoft teaching material on EdX (changed from CNTK to PyTorch). Learning/teaching materials are given in each module/directory.
Github Pages: https://bagustris.github.io/speech-recognition-course
Repository: https://github.com/bagustris/speech-recognition-course
Modules
Convert from markdown to pdf with pandoc in each module:
pandoc readme.md -o readme.pdf
Then, you can inspect the generated PDFs.
References:
- https://learning.edx.org/course/course-v1:Microsoft+DEV287x+1T2019a/home
- L. Gillick and S. J. Cox, “Some statistical issues in the comparison of speech recognition algorithms,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1989, vol. 1, pp. 532–535, doi: 10.1109/icassp.1989.266481.
- M. Mohri, F. Pereira, and M. Riley, “SPEECH RECOGNITION WITH WEIGHTED FINITE-STATE TRANSDUCERS,” in Springer Handbook on Speech Processing and Speech Communication.
- D. S. Pallet, W. M. Fisher, and J. G. Fiscus, “Tools for the analysis of benchmark speech recognition tests,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 1, pp. 97–100, 1990, doi: 10.1109/icassp.1990.115546.
- T. Morioka, T. Iwata, T. Hori, and T. Kobayashi, “Multiscale recurrent neural network based language model,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015, vol. 2015-Janua, pp. 2366–2370.
- M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 1, pp. 194–197, Accessed: Aug. 08, 2020. [Online]. Available: http://www.isca-speech.org/archive.
- Y. Bengio, R. Ducharme, and P. Vincent, “A neural probabilistic language model,” J. Mach. Learn. Res., vol. 3, pp. 1137–1155, 2003.
- P. F. Brown, P. V DeSouza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai, “Class-Based n-gram Models of Natural Language,” Comput. Linguist., vol. 18, no. 4, pp. 467–480, 1992.
- M. Levit, S. Parthasarathy, S. Chang, A. Stolcke, and B. Dumoulin, “Word-phrase-entity language models: Getting more mileage out of N-grams,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2014, pp. 666–670.
- X. Shen, Y. Oualil, C. Greenberg, M. Singh, and D. Klakow, “Estimation of gap between current language models and human performance,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, vol. 2017-Augus, pp. 553–557, doi: 10.21437/Interspeech.2017-729.
- A. Stolcke, “Entropy-based Pruning of Backoff Language Models,” pp. 579–588, 2000, doi: 10.3115/1075218.107521.