Help Center/Editor Features/AI Captions (Whisper)

AI Captions (Whisper)

4 min readEditor Features

Novus uses OpenAI's Whisper model — running entirely on your device — to automatically transcribe your audio and sync captions to the beat.

Generating Captions

Open the Captions panel in the editor (right sidebar). Click 'Auto-transcribe'. On the first use Novus downloads the Whisper model (~150 MB) and caches it locally — subsequent runs are instant.

The model processes your audio offline. No audio data is ever uploaded to a server. After transcription you'll see a list of caption segments with their timestamps.

Tips

  • For spoken word and rap the transcription is very accurate. For heavily processed vocals or non-English lyrics accuracy varies — use the manual edit mode to correct any errors.

Editing Captions

Click any caption segment to edit the text or adjust the start/end time. Drag the time handles in the Timeline to sync captions precisely to words or phrases.

You can also add caption segments manually by clicking 'Add Caption' and typing the text — useful for tracks where the lyrics aren't spoken clearly enough for Whisper to transcribe.

Caption Styles

The Style dropdown in the Captions panel offers 9 preset animation styles: Fade, Pop, Slide Up, Slide Down, Typewriter, Bounce, Scale, Glow, and Karaoke highlight. Each style controls how individual caption segments appear and disappear.

Tips

  • Karaoke highlight works best for lyric videos — it progressively highlights each word as it's spoken.
  • Fade is the cleanest option for spoken-word or podcast content.

Still have questions?

Our support team typically responds within 24 hours.