mirror of https://github.com/thorstenMueller/Thorsten-Voice.git synced 2024-11-21 23:43:12 +01:00

Thorsten Mueller ab94596d00 Small adjustments

2024-02-18 10:48:01 +01:00

11 KiB

Raw Blame History

Project motivation
Personal note
Thorsten Voice Datasets
Thorsten TTS-Models
Public talks
My Youtube channel

Motivation for Thorsten-Voice project 🗣️ 💬

A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Personal words by Thorsten Müller

I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. 🌍 (Thorsten Müller)

Please keep in mind, that i am no professional voice talent. I'm just a normal guy sharing his voice with the world.

Feel free to contact me on social media 🤗.

Platform	Link
Youtube	ThorstenVoice on Youtube
LinkedIn	Thorsten Müller on LinkedIn
Twitter	ThorstenVoice on Twitter
Huggingface	ThorstenVoice on Huggingface
Instagram	ThorstenVoice on Instagram

Voice-Datasets

All my "Thorsten-Voice" datasets are listed and downloadable on Zenodo. Qoutation is highly appreciated in case you use them in your projects, products or papers.

Dataset	DOI Link
Thorsten-Voice Dataset 2021.02 (Neutral)
Thorsten-Voice Dataset 2021.06 (Emotional)
Thorsten-Voice Dataset 2022.10 (Neutral)
Thorsten-Voice Dataset 2023.09 (Hessisch)

Thorsten-Voice Dataset 2021.02 (Neutral)

@dataset{muller_2021_5525342,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice Dataset 2021.02},
  month        = sep,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {3.0},
  doi          = {10.5281/zenodo.5525342},
  url          = {https://doi.org/10.5281/zenodo.5525342}
}

Dataset summary

Recorded by Thorsten Müller
Optimized by Dominik Kreutz
LJSpeech file and directory structure
22.668 recorded phrases (wav files)
More than 23 hours of pure audio
Samplerate 22.050Hz
Mono
Normalized to -24dB
Phrase length (min/avg/max): 2 / 52 / 180 chars
No silence at beginning/ending
Avg spoken chars per second: 14
Sentences with question mark: 2.780
Sentences with exclamation mark: 1.840

Dataset evolution

As described in the PDF document (evolution of thorsten dataset) this dataset consists of three recording phases.

Phase 1: Recorded with a cheap usb microphone (low quality)
Phase 2: Recorded with a good microphone (good quality)
Phase 3: Recorded with same good microphone but longer phrases (> 100 chars) (good quality)

If you want to use a dataset subset you can see which files belong to which recording phase in recording quality csv file.

Thorsten-Voice Dataset 2021.06 (Emotional)

@dataset{muller_2021_5525023,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice Dataset 2021.06 emotional},
  month        = sep,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.5525023},
  url          = {https://doi.org/10.5281/zenodo.5525023}
}

All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.

Dataset summary

Recorded by Thorsten Müller
Optimized by Dominik Kreutz
300 sentences * 8 emotions = 2.400 recordings
Mono
Samplerate 22.050Hz
Normalized to -24dB
No silence at beginning/ending
Sentence length: 59 - 148 chars

Thorsten-Voice Dataset 2022.10 (Neutral)

🗣️ Listen to some audio recordings from this dataset here.

@dataset{muller_2022_7265581,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice Dataset 2022.10},
  month        = nov,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.7265581},
  url          = {https://doi.org/10.5281/zenodo.7265581}
}

Thorsten-Voice Dataset 2023.09 (Hessisch)

@dataset{muller_2024_10511260,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice Dataset 2023.09 Hessisch},
  month        = jan,
  year         = 2024,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.10511260},
  url          = {https://doi.org/10.5281/zenodo.10511260}
}

TTS Models

Based on these opensource voice datasets several TTS (text to speech) models have been trained using AI / machine learning technology.

There are models available trained on the projects by Coqui AI, Piper TTS, Silero and ZDisket. You can find more information on how to use them, audio samples and video tutorials on the Thorsten-Voice project website.

Thorsten-22.05-VITS

Trained on dataset Thorsten-22.05-neutral. Audio samples are available on Thorsten-Voice website.

To run TTS server just follow these steps:

pip install tts==0.7.1
tts-server --model_name tts_models/de/thorsten/vits
Open browser on http://localhost:5002 and enjoy playing

Thorsten-22.08-Tacotron2-DDC

Trained on dataset Thorsten-22.05-neutral. Audio samples are available on Thorsten-Voice website.

To run TTS server just follow these steps:

pip install tts==0.8.0
tts-server --model_name tts_models/de/thorsten/tacotron2-DDC
Open browser on http://localhost:5002 and enjoy playing

Silero

You can use a free A-GPL licensed models trained on Thorsten-21.02-neutral dataset via the silero-models project.

Thorsten 16kHz

ZDisket

ZDisket made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by monatis. Thanks for sharing that 👍. See it in action on Youtube.

Support & Thanks

If you like my voice contribution and would like to support my effort for an opensource voice technology future, you can support me, if you like:

Subscribe and share my https://youtube.com/@ThorstenMueller/ youtube channel and follow me on my social media profiles
Buy me a tea using Ko-Fi or Github sponsor

I want to say thank you to great people who supported me on this journey with nice words, support and compute power: Thanks El-Tocino, Eren Gölge, Gras64, Kris Gesling, Nmstoker, Othiele, Repodiac, SanjaESC, Synesthesiam.

Special thanks to my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design and of course to the dear Dominik (@domcross) for him being so close by my side on this amazing journey.

Public talks

I really like to talk about the importance of an opensource voice technology future. If you would like me to be a speaker on a conference or event i'd happy to be contacted using the Thorsten-Voice website contact form.

"Thorsten-Voice" Youtube channel

In summer 2021 i've started to share my lessons learned and experiences on open voice tech, in special TTS on my little Youtube channel. If you check out and like my videos i'd happy to welcome you as subscriber and member of my little Youtube community.

11 KiB Raw Blame History

Motivation for Thorsten-Voice project 🗣️ 💬

Personal words by Thorsten Müller

Social media

Voice-Datasets

Thorsten-Voice Dataset 2021.02 (Neutral)

Dataset summary

Dataset evolution

Thorsten-Voice Dataset 2021.06 (Emotional)

Dataset summary

Thorsten-Voice Dataset 2022.10 (Neutral)

Thorsten-Voice Dataset 2023.09 (Hessisch)

TTS Models

Thorsten-22.05-VITS

Thorsten-22.08-Tacotron2-DDC

Silero

ZDisket

Support & Thanks

Public talks

"Thorsten-Voice" Youtube channel

11 KiB

Raw Blame History