The voice cleaning methods and algorithms play a key role both in preprocessing speech for further analysis and recognition, and in improving the quality of communication between users of information networks. The real-time streaming noise cleaning methods are the most important and complex area. The ability to process streaming data without delays imposes a number of significant restrictions on the algorithm: it cannot be iterative with a previously unknown number of iterations, and cannot explicitly use the data before or after the current block being processed. In the work, a modern adaptive noise reduction method for speech that can work with minimal signal transmission delays has been proposed. A large-scale study of existing approaches has been conducted, with special attention paid to two groups of algorithms: noise detection algorithms and noise suppression algorithms. Based on them the developed algorithm meeting the specified requirements has been built and analyzed. A set of audio data of Russian speech with various noises superimposed on it has been created. The testing of the algorithm has been made and its comparison with existing actual noise cleaning methods has been performed. The proposed adaptive method of noise cleaning without using specialized apparatus means and subsidiary information is able to operate in the real time conditions. The testing of the developed algorithm using the metrics of segment NC and PESQ have shown the high efficiency of the development and its superiority to common noise cleaning implementations Speex and WebRTC with respect to the noise cleaning quality and operation speed.
Igor E. Vishnyakov
Bauman Moscow State Technical University (National Research University)
1. Lukin A.S. AES San Francisco 2008: Tutorial T3. Broadband noise reduction: theory and application // Aes.org: [Web] / Audio Engineering Society. October 2–5, 2008. URL: https://www.aes.org/events/125/tutorials/session.cfm?code=T3 (accessed 31.03.2021).
2. Pascual S., Bonafonte S., Serra J. SEGAN: speech enhancement generative adversarial network // Interspeech 2017. Stockholm: ISCA, 2017. P. 3642–3646.
3. Gabbay A., Shamir A., Peleg Sh. Visual speech enhancement // The Hebrew University of Jerusalem. 2018. P. 1170–1174.
4. Schoenenberg K., Raake A., Koeppe J. Why are you so slow? Misattribution of transmis-sion delay to attributes of the conversation partner at the far-end // International Journal of Hu-man-Computer Studies. 2014. Vol. 72. Issue 5: May. P. 477–487.
5. Burnett G.C. Noise suppressing multi-microphone headset // The Journal of the Acoustical Society of America. 2013. No. 133. P. 4352.
6. Doclo S. Multi-microphone noise reduction and dereverberation techniques for speech applications: PhD Diss. Katholieke Universiteit Leuven, 2003. 50 p.
7. Khan F., Milner B.P. Speaker separation using virtually-derived binary masks // Audito-ry-Visual Speech Processing. Annecy: ISCA, 2013. P. 215–220.
8. Speex [Электронный ресурс]. URL: https://www.speex.org/ (дата обращения 20.05.2020).
9. WebRTC [Электронный ресурс]. URL: https://webrtc.org/ (дата обращения 20.05.2020).
10. Cohen I., Berdugo B. Speech enhancement for non-stationary noise environments // Signal Processing. 2001. Vol. 81. No. 11. P. 2403–2418.
11. Ephraim Y., Malah D. Speech enhancement using minimum mean-square error log-spectral amplitude estimator // IEEE Transactions on acoustics, Speech and Signal Processing. 1985. P. 443–445.
12. Shrawankar U., Thakare V. Performance analysis of noise filters and speech enhance-ment techniques in adverse mixed noisy environment for HCI // International Journal of Research and Reviews in Computer Science. 2012. No. 3. P. 1817–1825.
13. Loizou P. Speech enhancement. Theory and practice. 2nd ed. CRC Press, 2013. 705 p.
14. Rangachari S. Loizou P.A. Noise estimation algorithm for highly nonstationary envi-ronments // Speech Communications. 2006. Vol. 48. No. 2. P. 220–231.
15. Scalart P., Filho J.V. Speech enhancement based on a priori signal to noise estimation // 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, GA: IEEE, 1996. Vol. 2. P. 629–632.
16. Plapous C., Marro C., Mauuary L., Scalart P. A two-step noise reduction technique // 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal: IEEE, 2004. Vol. 1. P. 289–292.
17. Shifeng O., Chao G., Ying G. Improved a priori SNR estimation for speech enhance-ment incorporating speech distortion component // TELKOMNIKA Indonesian Journal of Elec-trical Engineering. 2013. Vol. 11 (9). P. 5359–5364.
18. glibc // Операционная система GNU: [Электронный ресурс] / Free Software Founda-tion. URL: https://www.gnu.org/software/libc/ (дата обращения 26.05.2020).
19. Rix A., Beerends J., Hollier M., Hekstra A.P. Perceptual evaluation of speech quality (PESQ) // 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, UT: IEEE, 2001. P. 1–4.
20. ZvukiPro [Электронный ресурс]. URL: https://zvukipro.com/ (дата обращения 26.05.2020).