Irodori-TTS-500M-v2-VoiceDesign Demo
Model
|
GitHub
Caption-conditioned Japanese TTS model based on rectified flow over DACVAE latents.
Caption / Style Prompt
: Optional. Leave blank for text-only generation.
Generates up to 30 seconds of audio, automatically trimmed to content length.
Text
Caption / Style Prompt (optional)
Sampling
▼
Num Steps
↺
1
120
Num Candidates
↺
1
32
Seed (blank=random)
CFG Guidance Mode
CFG Scale Text
↺
0
10
CFG Scale Caption
↺
0
10
Advanced (Optional)
▼
Generate
Generated Audio 1
Run Log