Automated Human Transcription Error Detection Framework for Korean ASR Corpus

Automated Transcription Error Detection for KSponSpeech

  • Developed a framework for automatically detecting and correcting human transcription errors in the KSponSpeech corpus using ASR and language model–based validation.
  • Presented the work at Korea Computer Congress (KCC) 2023
    • A Model-based Method for Automatic Transcription Error Detection in ASR Corpora.
    • Won the Best Paper award!
  • Published the extended study in the Journal of the Korean Institute of Information Scientists and Engineers (KIISE) (2024).
    • Jeongpil Lee, Jeehyun Lee, Yerin Choi, Jaehoo Jang, & Myoung-Wan Koo (2024). An Automated Error Detection Method for Speech Transcription Corpora Based on Speech Recognition and Language Models. Journal of KIISE, 51(4), 362–369. https://doi.org/10.5626/JOK.2024.51.4.362

My Contribution

  • Led the design and implementation of the automatic error detection framework
  • Tuned model confidence and LM scoring thresholds through extensive validation experiments