Announcing MOSS-TTS-PNY 1.7B v0.1
Finetune of the MOSS-TTS-Local model with fixed speaker embedding a custom iSTFTNet3 vocoder that intercepts the MOSS tokenizer's features and outputs 48KHz audio.
Runs 1.8x realtime on a single RTX 5090 w/PyTorch:
(see replies)
Finetuned MOSS-TTS (the 1.7B version) on some speakers, then trained a new decoder from scratch, based on iSTFTNet3 that takes in its tokenizer features and outputs 48KHz audio. I am a genius.
I'll open source this later.