Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Go to file

Thorsten Mueller 3a4e78ffe7 Added tts audio samples.		2022-04-22 22:30:29 +02:00
img	README update due dataset release	2020-08-05 17:25:01 +02:00
models/thorsten-de_model_v01_taco2_ddc	Added script and config for taco2 + ddc training	2020-08-23 12:00:07 +02:00
samples	Added tts audio samples.	2022-04-22 22:30:29 +02:00
_config.yml	Set theme jekyll-theme-architect	2020-09-28 13:25:03 +02:00
EvolutionOfThorstenDataset.pdf	README update due dataset release	2020-08-05 17:25:01 +02:00
LICENSE	Create LICENSE	2019-10-29 19:04:37 +01:00
README.md	Added tts audio samples.	2022-04-22 22:30:29 +02:00

Motivation for Thorsten-Voice project 🗣️ 💬

A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Some personal words before using Thorsten-Voice

I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. 🌍 (Thorsten Müller)

Please keep in mind, that i am no professional voice talent. I'm just a normal guy sharing his voice with the world.

Voice-Datasets

Voice datasets are listed on Zenodo:

Dataset	DOI Link
Thorsten-21.02-neutral
Thorsten-21.06-emotional
Thorsten-21.05-neutral	soon to come

Thorsten-21.02-neutral

 @dataset{muller_thorsten_2021_5525342,
   author       = {Müller, Thorsten and
                    Kreutz, Dominik},
    title        = {Thorsten - Open German Voice (Neutral) Dataset},
    month        = feb,
    year         = 2021,
    note         = {{Please use it to make the world a better place for 
                    whole humankind.}},
    publisher    = {Zenodo},
    version      = {3.0},
    doi          = {10.5281/zenodo.5525342},
    url          = {https://doi.org/10.5281/zenodo.5525342}
  }

Samples

Dataset summary

Recorded by Thorsten Müller
Optimized by Dominik Kreutz
LJSpeech file and directory structure
22.668 recorded phrases (wav files)
More than 23 hours of pure audio
Samplerate 22.050Hz
Mono
Normalized to -24dB
Phrase length (min/avg/max): 2 / 52 / 180 chars
No silence at beginning/ending
Avg spoken chars per second: 14
Sentences with question mark: 2.780
Sentences with exclamation mark: 1.840

Dataset evolution

As described in the PDF document (evolution of thorsten dataset) this dataset consists of three recording phases.

Phase 1: Recorded with a cheap usb microphone (low quality)
Phase 2: Recorded with a good microphone (good quality)
Phase 3: Recorded with same good microphone but longer phrases (> 100 chars) (good quality)

If you want to use a dataset subset you can see which files belong to which recording phase in recording quality csv file.

Thorsten-21.06-emotional

@dataset{muller_thorsten_2021_5525023,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten - Open German Voice (Emotional) Dataset},
  month        = jun,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.5525023},
  url          = {https://doi.org/10.5281/zenodo.5525023}
}

All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.

Samples

Listen to the phrase "Mist, wieder nichts geschafft." in following emotions.

🙂 Neutral
🤢 Disgusted
😠 Angry
😀 Amused
😲 Surprised
😔 Sleepy
😵 Drunk
🤫 Whispering

Dataset summary

Recorded by Thorsten Müller
Optimized by Dominik Kreutz
300 sentences * 8 emotions = 2.400 recordings
Mono
Samplerate 22.050Hz
Normalized to -24dB
No silence at beginning/ending
Sentence length: 59 - 148 chars

Thorsten-22.05-neutral

Soon to come

TTS Models

Thorsten-21.04-Tacotron2-DCA

This TTS-model has been trained on Thorsten-21.02-neutral dataset. The recommended trained Fullband-MelGAN Vocoder can be downloaded here.

Run the model:

pip install TTS==0.5.0
tts-server --model_name tts_models/de/thorsten/tacotron2-DCA

Thorsten-22.05-VITS

Trained on dataset Thorsten-22.05-neutral.

TODO

Thorsten-22.05-Tacotron2-DDC

Trained on dataset Thorsten-22.05-neutral.

TODO

TTS audio samples

TODO

Text	Fullband-MelGAN	VITS	Tacotron2-DDC (22.05)
Die übernächste Generation von Gamern sitzt auch stundenlang vor Bildschirmen.	Sample (Dataset 21.02)	Sample (Dataset 22.05)	Sample (Dataset 22.05)
Ich weiß, es ist vorbei.	Sample (Dataset 21.02)	Sample (Dataset 22.05)	Sample (Dataset 22.05)
Insgesamt gab es bisher 563 333 Infektionen und 9226 Todesfälle.	Sample (Dataset 21.02)	Sample (Dataset 22.05)	Sample (Dataset 22.05)
Und irgendwann Schwupps, ist das Ende da.	Sample (Dataset 21.02)	Sample (Dataset 22.05)	Sample (Dataset 22.05)

Other models

Silero

You can use a free A-GPL licensed models trained on Thorsten-21.02-neutral dataset via the silero-models project.

Thorsten 16kHz

ZDisket

ZDisket made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by monatis. Thanks for sharing that 👍. See it in action on Youtube.

Public talks

I really want to bring the topic "Open Voice For An Open Future" to a bigger public attention.

I've been part of a Linux User Group podcast about Mycroft AI and talked on my TTS efforts on that in (May 2021).
I was invited by Yusuf from Turkish tensorflow community to talk on "How to make machines speak with your own voice". This talk has been streamed live on Youtube and is available here. If you're interested on the showed slides, feel free to download my presentation here (June 2021) )
I've been invited as speaker on VoiceLunch language & linguistics on 03.01.2022. Here are my slides (January 2022).
In addition i share my thoughts and knowledge on Open Voice on my Youtube channel.

Feel free to file an issue if you ...

Use my TTS voice in your project(s)
Want to share your trained "Thorsten" model
Get to know about any abuse usage of my voice

Thanks section

Cool projects

Cool people

Even more special people

Additionally, a really nice thanks for my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design.

And last but not least i want to say a huge, huge thank you to a special guy who supported me on this journey as a partner right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally GPU power.

Thank you so much, dear Dominik (@domcross) for being my partner on this journey.

Thorsten (Twitter: @ThorstenVoice)