Speech-Adaptive Detection of Unnatural Intra-Sentential Pauses Using Contextual Anomaly Modeling for Interpreter Training

Detecting unnatural pauses is a critical component of automated quality assessment (AQA) in interpreter training, as pause patterns directly reflect an interpreter’s cognitive load and fluency. Traditional pause detection methods rely on static temporal thresholds (e.g., 1.0 second), which often fail to account for segment-specific speech rate variability and individual speaking styles. This study proposes a context-adaptive pause detection framework that integrates unsupervised anomaly detection using Isolation Forest (iForest) with a sliding window technique. To enhance pedagogical validity, we specifically focused on intra-sentential pauses by delineating sentence boundaries using a specialized segmentation model. The proposed model was evaluated against ground-truth labels annotated by professional interpreting experts. Our results demonstrate that the sliding window–based contextual anomaly detection model significantly outperforms the conventional static baseline, particularly in terms of recall and Cohen’s kappa. Furthermore, by applying a weighted F3-score and the “Recognition-over-Recall” principle, we confirmed that the proposed model substantially reduces the instructor’s total operational burden by shifting the workload from de novo annotation creation to more efficient corrective pruning. These findings suggest that speech-adaptive modeling provides a more reliable and labor-saving framework for automated interpreting assessment and feedback.

Liked Liked