Irodori-TTS-500M-v2-VoiceDesign Demo

Model | GitHub

Caption-conditioned Japanese TTS model based on rectified flow over DACVAE latents.

  • Caption / Style Prompt: Optional. Leave blank for text-only generation.
  • Generates up to 30 seconds of audio, automatically trimmed to content length.
1 120
1 32
CFG Guidance Mode
0 10
0 10