Psychoacoustically-Weighted Adaptive Digital Filtering for Enhanced Speech Quality and Audio Size Efficiency

Hane Yorda Dinata; Eldyana Citra  Laksita

doi:10.56211/sudo.v5i1.1373

Authors

Hane Yorda Dinata Universitas Pendidikan Indonesia
Eldyana Citra Laksita Universitas Pendidikan Indonesia

DOI:

https://doi.org/10.56211/sudo.v5i1.1373

Keywords:

Adaptive Filtering; Psychoacoustic Modeling; Speech Enhancement; Wiener Filtering

Abstract

Balancing perceptual quality with computational efficiency remains challenging in speech enhancement systems. This research presents an adaptive filtering framework integrating psychoacoustic modeling with multi-stage noise reduction. The architecture combines spectral subtraction and Wiener filtering, modulated by Bark-scale perceptual weighting derived from critical band theory. Unlike conventional approaches, the system exploits frequency-dependent auditory sensitivity to concentrate processing on perceptually salient regions while reducing representation of masked components. Experimental validation across diverse acoustic conditions yielded an average SNR improvement of 4.2 dB over baseline techniques, with simultaneous 31.7% file size reduction through psychoacoustically-guided quantization. PESQ assessment produced a mean opinion score of 4.23, confirming excellent quality preservation. Convergence analysis revealed 23% faster adaptation attributed to perceptually-weighted cost functions. Robustness testing across white noise, babble, and environmental sounds demonstrated consistent performance with minimal variance, indicating strong generalization capability. These findings show that incorporating human auditory principles simultaneously improves perceptual quality, computational efficiency, and system adaptability—critical for bandwidth-constrained applications in mobile communications, streaming platforms, and assistive devices

Downloads

Download data is not yet available.

References

[1] M. Gupta, R. K. Singh, dan S. Singh, “Analysis of Optimized Spectral Subtraction Method for Single Channel Speech Enhancement,” Wireless Pers Commun, vol. 128, no. 3, hlm. 2203–2215, Feb 2023, doi: 10.1007/s11277-022-10039-y. DOI: https://doi.org/10.1007/s11277-022-10039-y

[2] K. Paliwal, K. Wójcicki, dan B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Communication, vol. 52, no. 5, hlm. 450–475, Mei 2010, doi: 10.1016/j.specom.2010.02.004. DOI: https://doi.org/10.1016/j.specom.2010.02.004

[3] L.-P. Yang dan Q.-J. Fu, “Spectral subtraction-based speech enhancement for cochlear implant patients in background noise,” J. Acoust. Soc. Am., vol. 117, no. 3, hlm. 1001–1004, Mar 2005, doi: 10.1121/1.1852873. DOI: https://doi.org/10.1121/1.1852873

[4] Y. Zhang dan Y. Zhao, “Real and imaginary modulation spectral subtraction for speech enhancement,” Speech Communication, vol. 55, no. 4, hlm. 509–522, Mei 2013, doi: 10.1016/j.specom.2012.09.005. DOI: https://doi.org/10.1016/j.specom.2012.09.005

[5] S. Doclo, A. Spriet, J. Wouters, dan M. Moonen, “Speech Distortion Weighted Multichannel Wiener Filtering Techniques for Noise Reduction,” dalam Speech Enhancement, J. Benesty, S. Makino, dan J. Chen, Ed., Berlin, Heidelberg: Springer, 2005, hlm. 199–228. doi: 10.1007/3-540-27489-8_9. DOI: https://doi.org/10.1007/3-540-27489-8_9

[6] D. Marquardt, V. Hohmann, dan S. Doclo, “Interaural Coherence Preservation in Multi-Channel Wiener Filtering-Based Noise Reduction for Binaural Hearing Aids,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, hlm. 2162–2176, Des 2015, doi: 10.1109/TASLP.2015.2471096. DOI: https://doi.org/10.1109/TASLP.2015.2471096

[7] M. Yu, J. Su, Y. Wang, dan C. Han, “A noise reduction method for rolling bearing based on improved Wiener filtering,” Rev. Sci. Instrum., vol. 96, no. 2, hlm. 024705, Feb 2025, doi: 10.1063/5.0217945. DOI: https://doi.org/10.1063/5.0217945

[8] Y. Iqbal dkk., “A Hybrid Speech Enhancement Technique Based on Discrete Wavelet Transform and Spectral Subtraction,” IEEE Access, vol. 13, hlm. 39765–39781, 2025, doi: 10.1109/ACCESS.2025.3546434. DOI: https://doi.org/10.1109/ACCESS.2025.3546434

[9] G. Huang dkk., “Advances in Microphone Array Processing and Multichannel Speech Enhancement,” dalam ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, hlm. 1–5. doi: 10.1109/ICASSP49660.2025.10888510. DOI: https://doi.org/10.1109/ICASSP49660.2025.10888510

[10] A. Pandey, S. Pangaonkar, R. Pawar, S. Rahamatkar, dan P. Rokade, “Multilayer Perceptron Classification for Multilingual Speech Detection,” Procedia Computer Science, vol. 260, hlm. 447–456, Jan 2025, doi: 10.1016/j.procs.2025.03.222. DOI: https://doi.org/10.1016/j.procs.2025.03.222

[11] I. Missaoui dan Z. Lachiri, “Robust Speaker Recognition Using Perceptual Stationary Wavelet Coefficients and Prosodic Feature in Noisy Conditions,” IEEE Access, vol. 13, hlm. 157396–157407, 2025, doi: 10.1109/ACCESS.2025.3607263. DOI: https://doi.org/10.1109/ACCESS.2025.3607263

[12] M. J. Polonenko dan R. K. Maddox, “The Effect of Speech Masking on the Human Subcortical Response to Continuous Speech,” eNeuro, vol. 12, no. 4, Apr 2025, doi: 10.1523/ENEURO.0561-24.2025. DOI: https://doi.org/10.1523/ENEURO.0561-24.2025

[13] T. Kawase, C. Obuchi, J. Suzuki, Y. Katori, dan S. Sakamoto, “Masking Effects Caused by Contralateral Distractors in Participants With Versus Without Listening Difficulties,” Ear and Hearing, vol. 46, no. 2, hlm. 393, Apr 2025, doi: 10.1097/AUD.0000000000001591. DOI: https://doi.org/10.1097/AUD.0000000000001591

[14] K. Li, K. Zaman, X. Li, M. Akagi, J. Dang, dan M. Unoki, “Machine Anomalous Sound Detection Using Spectral-Temporal Modulation Representations Derived From Machine-Specific Filterbanks,” IEEE Transactions on Audio, Speech and Language Processing, vol. 33, hlm. 2059–2073, 2025, doi: 10.1109/TASLPRO.2025.3570956. DOI: https://doi.org/10.1109/TASLPRO.2025.3570956

[15] M. Madhushankara, R. Mathew, H. Muralikrishna, S. N. Shenoy, B. S. Darshan, dan R. Lakshman Rao, “Speech Enhancement for Electrolarynx Devices Using M-RLS: Intelligibility Improvement and Low-Power Hardware Feasibility,” IEEE Access, vol. 13, hlm. 161016–161025, 2025, doi: 10.1109/ACCESS.2025.3605590. DOI: https://doi.org/10.1109/ACCESS.2025.3605590

[16] H. Yu, H. Zhang, J. Xiang, dan H. Yang, “Neural Momentum-Enhanced LMS for Linear Acoustic Echo Cancellation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 33, hlm. 4574–4589, 2025, doi: 10.1109/TASLPRO.2025.3624967. DOI: https://doi.org/10.1109/TASLPRO.2025.3624967

[17] E. Seidel, G. Enzner, P. Mowlaee, dan T. Fingscheidt, “Neural Kalman Filters for Acoustic Echo Cancellation: Comparison of deep neural network-based extensions,” IEEE Signal Processing Magazine, vol. 41, no. 6, hlm. 24–38, Nov 2024, doi: 10.1109/MSP.2024.3449557. DOI: https://doi.org/10.1109/MSP.2024.3449557

[18] V. Saravanan, N. Santhiyakumari, M. Thangavel, dan R. Hemalatha, “Dynamic step-size normalized LMS algorithm for alpha-stable impulsive noise control and peak tracking,” SIViP, vol. 19, no. 7, hlm. 565, Mei 2025, doi: 10.1007/s11760-025-04137-0. DOI: https://doi.org/10.1007/s11760-025-04137-0

[19] F. Shen, W. Yan, dan W. Wang, “Affine projection exponential hyperbolic sine algorithm designed for impulsive noise environments,” SIViP, vol. 19, no. 2, hlm. 104, Des 2024, doi: 10.1007/s11760-024-03702-3. DOI: https://doi.org/10.1007/s11760-024-03702-3

[20] A. Li, “Enhanced noise suppression in microphone arrays using a dynamic blocking matrix and LMS-based beamforming,” dalam 3rd International Conference on Mechatronics and Smart Systems (CONF-MSS 2025), Jun 2025, hlm. 69–75. doi: 10.1049/icp.2025.2459. DOI: https://doi.org/10.1049/icp.2025.2459

[21] A. Kar dkk., “Improved Active Noise Cancellation Using Variable Step-Size Combined Fx-LMS Algorithm,” Circuits Syst Signal Process, vol. 44, no. 1, hlm. 447–461, Jan 2025, doi: 10.1007/s00034-024-02848-2. DOI: https://doi.org/10.1007/s00034-024-02848-2

[22] Z. Zheng, Z. Shao, Y. Yu, L. Lu, dan S. Gao, “Cramér–Rao Lower Bound of adaptive filtering algorithms for acoustic echo cancellation,” Signal Processing, vol. 238, hlm. 110111, Jan 2026, doi: 10.1016/j.sigpro.2025.110111. DOI: https://doi.org/10.1016/j.sigpro.2025.110111

[23] Y. Gao, Y. Huang, dan X. Zhang, “Estimating the subjective severity of laptop fan abnormal sounds using psychoacoustic parameters,” Applied Acoustics, vol. 236, hlm. 110753, Jun 2025, doi: 10.1016/j.apacoust.2025.110753. DOI: https://doi.org/10.1016/j.apacoust.2025.110753

[24] A. Shrestha, S. Ghorshi, M. Joorabchi, I. Panahi, dan F. F. Firouzeh, “A Speech Enhancement Algorithm Combining Wavelet Transform and Adaptive Filters,” dalam 2025 IEEE 6th International Conference on Image Processing, Applications and Systems (IPAS), Jan 2025, hlm. 1–6. doi: 10.1109/IPAS63548.2025.10924521. DOI: https://doi.org/10.1109/IPAS63548.2025.10924521

[25] K. Zhen, M. S. Lee, J. Sung, S. Beack, dan M. Kim, “Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding,” IEEE Signal Processing Letters, vol. 27, hlm. 2159–2163, 2020, doi: 10.1109/LSP.2020.3039765. DOI: https://doi.org/10.1109/LSP.2020.3039765

[26] J.-M. Valin, “A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement,” dalam 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), Agu 2018, hlm. 1–5. doi: 10.1109/MMSP.2018.8547084. DOI: https://doi.org/10.1109/MMSP.2018.8547084

[27] R. Shimokura, Y. Kakei, dan Y. Iiguni, “Deep Neural Network for Personalization of Parametric Head-Related Transfer Functions in a Median Plane,” dalam 2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA), Sep 2025, hlm. 1–6. doi: 10.1109/I3DA65421.2025.11202111. DOI: https://doi.org/10.1109/I3DA65421.2025.11202111

[28] M. Li, Y. Liu, dan L. Zhou, “DeConformer-SENet: An efficient deformable conformer speech enhancement network,” Digital Signal Processing, vol. 156, hlm. 104787, Jan 2025, doi: 10.1016/j.dsp.2024.104787. DOI: https://doi.org/10.1016/j.dsp.2024.104787

[29] J. Wang, Z. Lin, T. Wang, M. Ge, L. Wang, dan J. Dang, “Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement,” dalam ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, hlm. 1–5. doi: 10.1109/ICASSP49660.2025.10889525. DOI: https://doi.org/10.1109/ICASSP49660.2025.10889525

[30] S. Yun dkk., “Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge,” IEEE Access, vol. 13, hlm. 43947–43955, 2025, doi: 10.1109/ACCESS.2025.3543232. DOI: https://doi.org/10.1109/ACCESS.2025.3543232

[31] D. Ai, J. Wang, T. He, H. Yuan, Y. Liu, dan N. Ling, “Temporal and Spatial Perception: A Novel Perceptual Rate-Distortion Optimization Method for H.266/VVC Encoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 8, hlm. 8299–8313, Agu 2025, doi: 10.1109/TCSVT.2025.3544542. DOI: https://doi.org/10.1109/TCSVT.2025.3544542

[32] H. Chen, J. Li, X. Ma, dan Y. Mao, “Real-Time Response Optimization in Speech Interaction: A Mixed-Signal Processing Solution Incorporating C++ and DSPs,” dalam 2025 7th International Conference on Artificial Intelligence Technologies and Applications (ICAITA), Jun 2025, hlm. 110–114. doi: 10.1109/ICAITA67588.2025.11137915. DOI: https://doi.org/10.1109/ICAITA67588.2025.11137915

[33] Y. Pan, F. Yang, W. Peng, Q. Liu, dan C. Zhang, “Improved PointNet with accuracy and efficiency trade-off for online detection of defects in laser processing,” Optics and Lasers in Engineering, vol. 184, hlm. 108610, Jan 2025, doi: 10.1016/j.optlaseng.2024.108610. DOI: https://doi.org/10.1016/j.optlaseng.2024.108610

[34] V. Zadorozhnyy, S. Amizadeh, Q. Ye, dan K. Koishida, “CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions,” dalam ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, hlm. 1–5. doi: 10.1109/ICASSP49660.2025.10887633. DOI: https://doi.org/10.1109/ICASSP49660.2025.10887633

[35] S. Sultana dan D. S. Williamson, “A Pre-training Framework that Encodes Noise Information for Speech Quality Assessment,” dalam ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, hlm. 1–5. doi: 10.1109/ICASSP49660.2025.10888341. DOI: https://doi.org/10.1109/ICASSP49660.2025.10888341

[36] R. L. Lai dkk., “Leveraging Self-Supervised Audio-Visual Pretrained Models to Improve Vocoded Speech Intelligibility in Cochlear Implant Simulation,” IEEE Transactions on Biomedical Engineering, hlm. 1–12, 2025, doi: 10.1109/TBME.2025.3610284. DOI: https://doi.org/10.1109/TBME.2025.3610284

[37] S. Yoosuf, H. Baali, dan A. Bouzerdoum, “Improving perceptual quality in spatiotemporal timeseries forecasting,” Engineering Applications of Artificial Intelligence, vol. 156, hlm. 111062, Sep 2025, doi: 10.1016/j.engappai.2025.111062. DOI: https://doi.org/10.1016/j.engappai.2025.111062

[38] N. B.g., T. Y. G., R. G.p., dan J. H.s., “Role of noise elimination algorithms in speech processing applications: A comprehensive research and some experimental results,” Engineering Applications of Artificial Intelligence, vol. 156, hlm. 111116, Sep 2025, doi: 10.1016/j.engappai.2025.111116. DOI: https://doi.org/10.1016/j.engappai.2025.111116

[39] T. Shi, R. Ullah, dan H. Jia, “Speech enhancement based on emphasizing the fundamental frequency integrated with SNMF/DNN,” Multimed Tools Appl, vol. 84, no. 14, hlm. 13157–13175, Apr 2025, doi: 10.1007/s11042-024-19464-6. DOI: https://doi.org/10.1007/s11042-024-19464-6

[40] B. Stahl dan H. Gamper, “Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment,” dalam ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, hlm. 1–5. doi: 10.1109/ICASSP49660.2025.10888007. DOI: https://doi.org/10.1109/ICASSP49660.2025.10888007