- Processes input in chunks of 3 seconds. - Padding audio with silence - Uses 1 second audio from previous pass - No text context