Added Windows TTS training recipe

Added modified vits recipe for Thorsten-Voice model training using Windows
German Corpus for Mimic-Recording-Studio
2023-03-05 16:19:50 +01:00 · 2022-12-16 22:54:02 +01:00 · 2022-11-13 17:08:26 +01:00 · 2022-11-13 16:47:46 +01:00 · 2022-08-23 19:03:47 +02:00 · 2022-06-24 18:00:12 +02:00
83 changed files with 24467 additions and 107 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1,2 @@
 # These are supported funding model platforms
--- a/CITATION.cff
+++ b/CITATION.cff
@ -0,0 +1,28 @@
 # This CITATION.cff file was generated with cffinit.
 # Visit https://bit.ly/cffinit to generate yours today!
 cff-version: 1.2.0
 title: Thorsten-Voice
 message: >-
  Please cite Thorsten-Voice project if you use
  datasets or trained TTS models.
 type: dataset
 authors:
  - given-names: Thorsten
    family-names: Müller
    email: tm@thorsten-voice.de
  - given-names: Dominik
    family-names: Kreutz
 repository-code: 'https://github.com/thorstenMueller/Thorsten-Voice'
 url: 'https://www.Thorsten-Voice.de'
 abstract: >-
  A free to use, offline working, high quality german
  TTS voice should be available for every project
  without any license struggling.
 keywords:
  - Thorsten
  - Voice
  - Open
  - German
  - TTS
  - Dataset
--- a/Logo_Thorsten-Voice.png
+++ b/Logo_Thorsten-Voice.png
--- a/README.md
+++ b/README.md
@ -1,135 +1,245 @@
-# Introduction
+![Thorsten-Voice logo](Logo_Thorsten-Voice.png)
 Many smart voice assistants like Amazon Alexa, Google Home, Apple Siri and Microsoft Cortana use cloud services to offer their (base) functionality.
-As some people have privacy concerns using these services there are some (open source) projects trying to build offline and/or privacy aware alternatives.
+- [Project motivation](#motivation-for-thorsten-voice-project-speaking_head-speech_balloon)
 - [Personal note](#some-personal-words-before-using-thorsten-voice)
-But speech recognition and text synthesis still requires cloud services for providing these in a decent quality.
+- [**Thorsten** Voice Datasets](#voice-datasets)
  - [Thorsten-21.02-neutral](#thorsten-2102-neutral)
  - [Thorsten-21.06-emotional](#thorsten-2106-emotional)
  - [Thorsten-22.10-neutral](#thorsten-2210-neutral)
-# MyCroft AI
+- [**Thorsten** TTS-Models](#tts-models)
-> https://mycroft.ai/
+  - [Thorsten-21.04-Tacotron2-DCA](#thorsten-2104-tacotron2-dca)
  - [Thorsten-22.05-VITS](#thorsten-2205-vits)
  - [Thorsten-22.08-Tacotron2-DDC](#thorsten-2208-tacotron2-ddc)
  - [Other models](#other-models)
 - [Public talks](#public-talks)
-MyCroft is a company developing an opensource voice assistant with a very nice and active community. But the stt/tts parts are still cloud based (eg. google services), even if requests are anonymized by a mycroft proxy in between. But integration with locally hosted services such as deepspeech (stt) or mimic/tacotron (tts) is possible.
+- [My Youtube channel](#youtube-channel)
-# Mozilla
+- [Special Thanks](#thanks-section)
 Mozilla works on these really important aspects for free and open human machine voice interaction.
 ## STT - speech to text
 > https://commonvoice.mozilla.org/
 "STT" needs lots of audio training data by many speakers (women/men/kids) of all ages, dialects and in various audio quality levels. So any voice contribution for common voice project is highly welcome.
 ## TTS - text to speech
 > https://github.com/mozilla/tts
 "TTS" needs lots of clean recordings by one speaker to train a model. Mozilla is developing a software stack for proper model training based on tacotron2 papers.
 # And?!
 I want to make the most personal contribution i can give and contribute my personal voice (**german**) for TTS training to the community for free usage.
 ## Please read some personal words before downloading the dataset
 I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone.
 So hopefully my voice is used in this manner to make this world a better place for all of us :-).
 **tl;dr** Please don't use for evil!
 # Dataset "thorsten"
 ## Samples of my voice
 To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.
 * [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
 * [Der Prophet spricht stets in Gleichnissen.](./samples/original_recording/recorded_sample_02.wav )
 * [Bitte schmeißt euren Müll nicht einfach in die Walachei.](./samples/original_recording/recorded_sample_03.wav )
 * [So etwas würde mir nie in den Sinn kommen.](./samples/original_recording/recorded_sample_04.wav )
 * [Sie klettert auf einen Stein und nimmt eine Denkerpose ein.](./samples/original_recording/recorded_sample_05.wav )
 * [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
 * [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )
-## Dataset information
+# Motivation for Thorsten-Voice project :speaking_head: :speech_balloon:
 A **free** to use, **offline** working, **high quality** **german** **TTS** voice should be available for every project without any license struggling.
-* ljspeech-1.1 structure
+<a href="https://twitter.com/intent/follow?screen_name=ThorstenVoice"><img src="https://img.shields.io/twitter/follow/ThorstenVoice?style=social&logo=twitter" alt="follow on Twitter"></a>
-* 22.668 recorded phrases (wav files)
+[![YouTube Channel Subscribers](https://img.shields.io/youtube/channel/subscribers/UCjqqTVVBTsxpm0iOhQ1fp9g?style=social)](https://www.youtube.com/c/ThorstenMueller)
-* more than 23 hours of pure audio
+[![Project website](https://img.shields.io/badge/Project_website-www.Thorsten--Voice.de-92a0c0)](https://www.Thorsten-Voice.de)
-* samplerate 22.050Hz
+
-* mono
+# Social media
-* normalized to -24dB
+Please check and follow me on my social media profiles - Thank you.
-* phrase length (min/avg/max): 2 / 52 / 180 chars
+
-* no silence at beginning/ending
+| Platform         | Link                                                                                                            |
-* avg spoken chars per second: 14
+| --------------- | ------- |
-* sentences with question mark: 2.780
+| Youtube | [ThorstenVoice on Youtube](https://www.youtube.com/c/ThorstenMueller) |
-* sentences with exclamation mark: 1.840
+| Twitter | [ThorstenVoice on Twitter](https://twitter.com/ThorstenVoice) |
 | Instagram | [ThorstenVoice on Instagram](https://www.instagram.com/thorsten_voice/) |
 | LinkedIn | [Thorsten Müller on LinkedIn](https://www.linkedin.com/in/thorsten-m%C3%BCller-848a344/) |
 # Some personal words before using **Thorsten-Voice**
 > I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. :earth_africa: (*Thorsten Müller*)
 Please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with the world.
 # Voice-Datasets
 Voice datasets are listed on Zenodo:
 | Dataset         | DOI Link                                                                                                            |
 | --------------- | ------- |
 | Thorsten-21.02-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342) |
 | Thorsten-21.06-emotional | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023) |
 | Thorsten-22.10-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7265581.svg)](https://doi.org/10.5281/zenodo.7265581) |
 ## Thorsten-21.02-neutral
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342)
 ```
@dataset{muller_thorsten_2021_5525342,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice - "Thorsten-21.02-neutral" Dataset},
  month        = feb,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {3.0},
  doi          = {10.5281/zenodo.5525342},
  url          = {https://doi.org/10.5281/zenodo.5525342}
 }
 ```
 > :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1KVjGXG2ij002XRHb3fgFK4j0OEq1FsWm?usp=sharing).**
 ### Dataset summary
 * Recorded by Thorsten Müller
 * Optimized by Dominik Kreutz
 * LJSpeech file and directory structure
 * 22.668 recorded phrases (*wav files*)
 * More than 23 hours of pure audio
 * Samplerate 22.050Hz
 * Mono
 * Normalized to -24dB
 * Phrase length (min/avg/max): 2 / 52 / 180 chars
 * No silence at beginning/ending
 * Avg spoken chars per second: 14
 * Sentences with question mark: 2.780
 * Sentences with exclamation mark: 1.840
 ### Dataset evolution
 As described in the PDF document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf)) this dataset consists of three recording phases.
 * **Phase 1**: Recorded with a cheap usb microphone (*low quality*)
 * **Phase 2**: Recorded with a good microphone (*good quality*)
 * **Phase 3**: Recorded with same good microphone but longer phrases (> 100 chars) (*good quality*)
 If you want to use a dataset subset you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
-![text length vs. mean audio duration](./img/thorsten-de---datasetAnalysis1.png)
+## Thorsten-21.06-emotional
-![text length vs. median audio duration](./img/thorsten-de---datasetAnalysis2.png)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023)
 ![text length vs. STD](./img/thorsten-de---datasetAnalysis3.png)
 ![text length vs. number instances](./img/thorsten-de---datasetAnalysis4.png)
 ![signal noise ratio](./img/thorsten-de---datasetAnalysis5.png)
 ![bokeh](./img/thorsten-de---datasetAnalysis6.png)
-## Dataset evolution
+```
-As decribed in the pdf document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf)) this dataset consists of three recording phases.
+@dataset{muller_thorsten_2021_5525023,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {{Thorsten-Voice - "Thorsten-21.06-emotional" 
                   Dataset}},
  month        = jun,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.5525023},
  url          = {https://doi.org/10.5281/zenodo.5525023}
 }
 ```
-* phase1: Recorded with a cheap usb microphone
+All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.
 * phase2: Recorded with a good microphone
 * phase3: Recorded with same good microphone but longer phrases (> 100 chars)
-If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
+### Samples
 Listen to the phrase "**Mist, wieder nichts geschafft.**" in following emotions.
 * :slightly_smiling_face: [Neutral](./samples/thorsten-21.06-emotional/neutral.wav)
 * :nauseated_face: [Disgusted](./samples/thorsten-21.06-emotional/disgusted.wav)
 * :angry: [Angry](./samples/thorsten-21.06-emotional/angry.wav)
 * :grinning: [Amused](./samples/thorsten-21.06-emotional/amused.wav)
 * :astonished: [Surprised](./samples/thorsten-21.06-emotional/surprised.wav)
 * :pensive: [Sleepy](./samples/thorsten-21.06-emotional/sleepy.wav)
 * :dizzy_face: [Drunk](./samples/thorsten-21.06-emotional/drunk.wav)
 * 🤫 [Whispering](./samples/thorsten-21.06-emotional/whisper.wav)
 ### Dataset summary
 * Recorded by Thorsten Müller
 * Optimized by Dominik Kreutz
 * 300 sentences * 8 emotions = 2.400 recordings
 * Mono
 * Samplerate 22.050Hz
 * Normalized to -24dB
 * No silence at beginning/ending
 * Sentence length: 59 - 148 chars
-## Download information
+## Thorsten-22.10-neutral
-> Download size: 2,7GB
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7265581.svg)](https://doi.org/10.5281/zenodo.7265581)
 > :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1dxoSo8Ktmh-5E0rSVqkq_Jm1r4sFnwJM?usp=sharing).**
-Version | Description | Date | Link
+```
------------ | ------------- | ------------- | -------------
+@dataset{muller_thorsten_2022_7265581,
-thorsten-de-v01 | Initial version | 2020-06-28 | [Google Drive Download v01](https://drive.google.com/file/d/1yKJM1LAOQpRVojKunD9r8WN_p5KzBxjc/view?usp=sharing)
+  author       = {Müller, Thorsten and
-thorsten-de-v02 | normalized to -24dB and split metadata.csv into shuffeled metadata_train.csv and metadata_val.csv | 2020-08-22 | [Google Drive Download v02](https://drive.google.com/file/d/1mGWfG0s2V2TEg-AI2m85tze1m4pyeM7b/view?usp=sharing)
+                  Kreutz, Dominik},
  title        = {ThorstenVoice Dataset 2022.10},
  month        = oct,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.7265581},
  url          = {https://doi.org/10.5281/zenodo.7265581
 }
 ```
 # TTS Models
 ## Thorsten-21.04-Tacotron2-DCA
 This [TTS-model](https://drive.google.com/drive/folders/1m4RuffbvdOmQWnmy_Hmw0cZ_q0hj2o8B?usp=sharing) has been trained on [**Thorsten-21.02-neutral**](#thorsten-2102-neutral) dataset. The recommended trained Fullband-MelGAN Vocoder can be downloaded [here](https://drive.google.com/drive/folders/1hsfaconm4Yd9wPVyOtrXjWQs4ZAPoouY?usp=sharing).
 Run the model:
 * pip install TTS==0.5.0
 * tts-server --model_name tts_models/de/thorsten/tacotron2-DCA
-# Trained tacotron2 model "thorsten"
+## Thorsten-22.05-VITS
-If you trained a model on "thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated. 
+Trained on dataset **Thorsten-22.05-neutral**.
 Audio samples are available on [Thorsten-Voice website](https://www.thorsten-voice.de/en/just-get-started/).
-## Trained models (TODO)
+To run TTS server just follow these steps:
 * pip install tts==0.7.1
 * tts-server --model_name tts_models/de/thorsten/vits
 * Open browser on http://localhost:5002 and enjoy playing
 ## Thorsten-22.08-Tacotron2-DDC
 Trained on dataset [**Thorsten-22.05-neutral**](#thorsten-2205-neutral).
 Audio samples are available on [Thorsten-Voice website]([https://www.thorsten-voice.de/en/just-get-started/](https://www.thorsten-voice.de/2022/08/14/welches-tts-modell-klingt-besser/)).
 To run TTS server just follow these steps:
 * pip install tts==0.8.0
 * tts-server --model_name tts_models/de/thorsten/tacotron2-DDC
 * Open browser on http://localhost:5002 and enjoy playing
 ## Other models
 ### Silero
 You can use a free A-GPL licensed models trained on **Thorsten-21.02-neutral** dataset via the [silero-models](https://github.com/snakers4/silero-models/blob/master/models.yml) project.
 * [Thorsten 16kHz](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing)
 * [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb)
 ### ZDisket
 [ZDisket](https://github.com/ZDisket/TensorVox) made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by [monatis](https://github.com/monatis/german-tts). Thanks for sharing that :thumbsup:. See it in action on [Youtube](https://youtu.be/tY6_xZnkv-A).
 # Public talks
 I really want to bring the topic "**Open Voice For An Open Future**" to a bigger public attention.
 * I've been part of a Linux User Group podcast about Mycroft AI and talked on my TTS efforts on that in (*May 2021*).
 * I was invited by [Yusuf](https://github.com/monatis/) from Turkish tensorflow community to talk on "How to make machines speak with your own voice". This talk has been streamed live on Youtube and is available [here](https://www.youtube.com/watch?v=m-Uwb-Bg144&t=2303s). If you're interested on the showed slides, feel free to download my presentation [here](https://docs.google.com/presentation/d/1ynnw0ilKV3WwMSJHytrN3GXRiFr8x3r0DUimBm1y0LI/edit?usp=sharing) (*June 2021*)
 )
 * I've been invited as speaker on VoiceLunch language & linguistics on 03.01.2022. [Here are my slides](https://docs.google.com/presentation/d/1Gi6BmYHs7g4ZgdAiIKGBnBwZDCvJOD9DJxQOGlgds1o/edit?usp=sharing) (*January 2022*).
 # Youtube channel
 In summer 2021 i've started to share my lessons learned and experiences on open voice tech, in special **TTS** on my little [Youtube channel](https://www.youtube.com/c/ThorstenMueller). If you check out and like my videos i'd happy to welcome you as subscriber and member of my little Youtube community.
 Folder | Date | Link | Description
 ------------ | ------------- | ------------- | -------------
 thorsten-taco2-ddc-v0.1 | to do | to do | to do
 # Feel free to file an issue if you ...
-* have improvements on dataset
+* Use my TTS voice in your project(s)
-* use my TTS voice in your project(s)
+* Want to share your trained "Thorsten" model
-* want to share your trained "thorsten" model
+* Get to know about any abuse usage of my voice
 * get to know about any abuse usage of my voice
-# Special thanks
+# Thanks section
-I want to thank all open source communities for providing great projects.
+## Cool projects
 * https://commonvoice.mozilla.org/
 * https://coqui.ai/
 * https://mycroft.ai/
 * https://github.com/rhasspy/
-Just to name some nice guys who joined me on this tts-roadtrip:
+## Cool people
 * [El-Tocino](https://github.com/el-tocino/)
 * [Eren Gölge](https://github.com/erogol/)
 * [Gras64](https://github.com/gras64/)
 * [Kris Gesling](https://github.com/krisgesling/)
 * [Nmstoker](https://github.com/nmstoker)
 * [Othiele](https://discourse.mozilla.org/u/othiele/summary)
 * [Repodiac](https://github.com/repodiac)
 * [SanjaESC](https://github.com/SanjaESC)
 * [Synesthesiam](https://github.com/synesthesiam/)
-* eltocino (https://github.com/el-tocino/)
+## Even more special people
-* erogol (https://github.com/erogol/)
+Additionally, a really nice thanks for my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design.
 * gras64 (https://github.com/gras64/)
 * krisgesling (https://github.com/krisgesling/)
 * nmstoker (https://github.com/nmstoker)
 * othiele (https://discourse.mozilla.org/u/othiele/summary)
 * repodiac (https://github.com/repodiac)
-And last but not least i want to say a huge thank you to a special guy who supported me on this journey right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally his gpu computing power. 
+And last but not least i want to say a **huge, huge thank you** to a special guy who supported me on this journey as a partner right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally GPU power. 
-Without his amazing support this dataset (in it's current way) would not exists.
+**Thank you so much, dear **Dominik** ([@domcross](https://github.com/domcross/)) for being my partner on this journey.**
-Thank you Dominik (@domcross / https://github.com/domcross/)
+Thorsten (*Twitter: @ThorstenVoice*)
 # Links
 * https://discourse.mozilla.org/t/contributing-my-german-voice-for-tts/48150
 * https://community.mycroft.ai/
 * https://github.com/MycroftAI/mimic-recording-studio
 * https://voice.mozilla.org/
 * https://github.com/mozilla/TTS
 (https://github.com/repodiac/tit-for-tat/tree/master/thorsten-TTS)
 * https://raw.githubusercontent.com/mozilla/voice-web/master/server/data/de/sentence-collector.txt
 We'll hear us in future :-)
 Thorsten
--- a/Youtube/train_vits_win.py
+++ b/Youtube/train_vits_win.py
@ -0,0 +1,94 @@
 import os
 from trainer import Trainer, TrainerArgs
 from TTS.tts.configs.shared_configs import BaseDatasetConfig
 from TTS.tts.configs.vits_config import VitsConfig
 from TTS.tts.datasets import load_tts_samples
 from TTS.tts.models.vits import Vits, VitsAudioConfig
 from TTS.tts.utils.text.tokenizer import TTSTokenizer
 from TTS.utils.audio import AudioProcessor
 def main():
 	output_path = os.path.dirname(os.path.abspath(__file__))
 	#output_path = "c:\\temp\tts"
 	dataset_config = BaseDatasetConfig(
 		formatter="ljspeech", meta_file_train="metadata_small.csv", path="C:\\Users\\ThorstenVoice\\TTS-Training\\ThorstenVoice-Dataset_2022.10"
 	)
 	audio_config = VitsAudioConfig(
 		sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None
 	)
 	config = VitsConfig(
 		audio=audio_config,
 		run_name="vits_thorsten-voice",
 		batch_size=4,
 		eval_batch_size=4,
 		batch_group_size=5,
 		num_loader_workers=1,
 		num_eval_loader_workers=1,
 		run_eval=True,
 		test_delay_epochs=-1,
 		epochs=1000,
 		text_cleaner="phoneme_cleaners",
 		use_phonemes=True,
 		phoneme_language="de",
 		phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
 		compute_input_seq_cache=True,
 		print_step=25,
 		print_eval=True,
 		mixed_precision=False,
 		output_path=output_path,
 		datasets=[dataset_config],
 		cudnn_benchmark=False,
 		test_sentences=[
 		  "Es hat mich viel Zeit gekostet ein Stimme zu entwickeln, jetzt wo ich sie habe werde ich nicht mehr schweigen.",
 		  "Sei eine Stimme, kein Echo.",
 		  "Es tut mir Leid David. Das kann ich leider nicht machen.",
 		  "Dieser Kuchen ist großartig. Er ist so lecker und feucht.",
 		  "Vor dem 22. November 1963.",
 		],
 	)
 	# INITIALIZE THE AUDIO PROCESSOR
 	# Audio processor is used for feature extraction and audio I/O.
 	# It mainly serves to the dataloader and the training loggers.
 	ap = AudioProcessor.init_from_config(config)
 	# INITIALIZE THE TOKENIZER
 	# Tokenizer is used to convert text to sequences of token IDs.
 	# config is updated with the default characters if not defined in the config.
 	tokenizer, config = TTSTokenizer.init_from_config(config)
 	# LOAD DATA SAMPLES
 	# Each sample is a list of ```[text, audio_file_path, speaker_name]```
 	# You can define your custom sample loader returning the list of samples.
 	# Or define your custom formatter and pass it to the `load_tts_samples`.
 	# Check `TTS.tts.datasets.load_tts_samples` for more details.
 	train_samples, eval_samples = load_tts_samples(
 		dataset_config,
 		eval_split=True,
 		eval_split_max_size=config.eval_split_max_size,
 		eval_split_size=config.eval_split_size,
 	)
 	# init model
 	model = Vits(config, ap, tokenizer, speaker_manager=None)
 	# init the trainer and 🚀
 	trainer = Trainer(
 		TrainerArgs(),
 		config,
 		output_path,
 		model=model,
 		train_samples=train_samples,
 		eval_samples=eval_samples,
 	)
 	trainer.fit()
 	print("Fertig!")
 from multiprocessing import Process, freeze_support
 if __name__ == '__main__':
    freeze_support()  # needed for Windows
    main()
--- a/docs/_config.yml
+++ b/docs/_config.yml
@ -0,0 +1 @@
 theme: jekyll-theme-cayman
--- a/docs/audio_compare.md
+++ b/docs/audio_compare.md
@ -0,0 +1,449 @@
 # Vocoder Vergleich auf Basis des "thorsten" Tacotron 2 Modells
 Hier sind Hörproben mit unterschiedlichen Vocodern. Alle gesprochenen Texte (*Sample 1 - 4*) basieren auf Aufnahmen im Dataset, jedoch nicht auf dem Spektogramm von "ground truth", sondern auf Basis des trainierten Tacotron 2 Modells. Sample 5 ist der Beginn des Märchens "Der Froschkönig" und wurde nicht für das Dataset aufgezeichnet.
 ## Sätze
 * **Sample #01**: Eure Schoko-Bonbons sind sagenhaft lecker!
 * **Sample #02**: Eure Tröte nervt.
 * **Sample #03**: Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet.
 * **Sample #04**: Euer Plan hat ja toll geklappt.
 * *Sample #05: "In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön." (Anfang vom "Froschkönig")*
 # Ground truth
 Originalaufnahmen aus dem "thorsten" Dataset.
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-gt.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-gt.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-gt.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-gt.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # Griffin Lim
 > Details zum Model: (todo: link)  
 > Tacotron2 + DDC: 460k Schritte trainiert
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-griffin-lim.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-griffin-lim.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-griffin-lim.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-griffin-lim.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-griffin-lim.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # ParallelWaveGAN
 > Details: [Notebook von Olaf](https://colab.research.google.com/drive/15kJHTDTVxyIjxiZgqD1G_s5gUeVNLkfy?usp=sharing)  
 > Tacotron2 + DDC: 360k Schritte trainiert, PWGAN Vocoder: 925k Schritte trainiert
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-pwgan.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-pwgan.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-pwgan.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-pwgan.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-pwgan.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # WaveGrad
 > Tacotron2 + DDC: 460k Schritte trainiert, WaveGrad Vocoder: 510k Schritte trainiert (inkl. Noise-Schedule)
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-wavegrad.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-wavegrad.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-wavegrad.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-wavegrad.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-wavegrad.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # HifiGAN
 > Thanks to SanjaESC (https://github.com/SanjaESC) for training this model.
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-hifigan.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-hifigan.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-hifigan.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-hifigan.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-hifigan.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # VocGAN
 > **Diese Beispiele basieren auf "ground truth" und nicht auf dem Tacotron 2 Modell**  
 > 200 Epochen / 284k Trainingsschritte
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-vocgan.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-vocgan.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-vocgan.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-vocgan.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # GlowTTS / Waveglow
 > Details: [Github von Synesthesiam](https://github.com/rhasspy/de_larynx-thorsten)
 > GlowTTS trainiert für 380k und Vocoder für 500k Schritte.
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-waveglow.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-waveglow.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-waveglow.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-waveglow.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-waveglow.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # TensorFlowTTS
 ## Multiband MelGAN
 > Thanks [Monatis](https://github.com/monatis)  
 > Details: [Notebook von Monatis](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing#scrollTo=SCbWCChVkfnn)  
 > Taco2 Modell für 80k Schritte trainiert, Multiband MelGAN für 800k Schritte.
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-TensorFlowTTS.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-TensorFlowTTS.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-TensorFlowTTS.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-TensorFlowTTS.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-TensorFlowTTS.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # Silero models
 > Thanks [snakers4](https://github.com/snakers4/silero-models)  
 > Details: [Notebook von Silero](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb#scrollTo=indirect-berry)  
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-silero.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-silero.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-silero.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-silero.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-silero.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
 # Forward Tacotron
 > Thanks [cschaefer26](https://github.com/as-ideas/ForwardTacotron)  
 > Config: Forward-Tacotron, trained to 300k, alpha set to 0.8, pretrained HifiGAN vocoder
 <dl>
 <table>
 <thead>
  <tr>
    <th>Sample</th>
    <th>Text</th>
    <th>Audio</th>
  </tr>
 </thead>
 <tbody>
  <tr>
    <td>01</td>
    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
    <td><audio controls="" preload="none"><source src="samples/sample01-ForwardTacotron-HifiGAN.wav"></audio></td>
  </tr>
  <tr>
    <td>02</td>
    <td>Eure Tröte nervt</td>
    <td><audio controls="" preload="none"><source src="samples/sample02-ForwardTacotron-HifiGAN.wav"></audio></td>
  </tr>
  <tr>
    <td>03</td>
    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
    <td><audio controls="" preload="none"><source src="samples/sample03-ForwardTacotron-HifiGAN.wav"></audio></td>
  </tr>
  <tr>
    <td>04</td>
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-ForwardTacotron-HifiGAN.wav"></audio></td>
  </tr>
  <tr>
    <td>05</td>
    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-ForwardTacotron-HifiGAN.wav"></audio></td>
  </tr>
 </tbody>
 </table>
 </dl>
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1,48 @@
 # Motivation
 <span style="font-size:1.5em;font-weight:bold">
 Eine kostenfreie, qualitativ hochwertige, deutsche TTS Stimme, die offline erzeugt werden kann sollte jedem Projekt ohne Lizenzrechtliche Probleme zur Verfügung stehen.
 </span>
 # Egal aus welchem Bereich du kommst:
 * Privates Bastelprojekt
 * OpenSource/Community Projekt
 * Bildung/Forschung/Wissenschaft
 * Kommerzielles Unternehmen
 * ...
 # Egal welcher Bereich dich interessiert:
 * Smarte Sprachassistenten
 * Navigationssysteme
 * Smart Homes
 * Sprechende Kühlschränke
 * Vorlesen von Bildschirmtexten (Barrierefreiheit)
 * Interaktive Robotik
 * ...
 # Wer wir sind
 Wir sind eine kleine motivierte Gruppe hobbymäßiger TTS-Enthusiasten die sich nach einem abgewandelten "Herr der Ringe Zitat" benannt hat - "**Fellowership of free german tts**"
 # Wo wir aktuell stehen
 Wir arbeiten weiterhin daran qualitativ noch bessere Modell zu trainieren, aber den aktuellen "stable" Stand kannst Du hier anhören:
 * [Es ist im Moment klarer Himmel bei 18 Grad.](https://drive.google.com/file/d/1cDIq4QG6i60WjUYNT6fr2cpEjFQIi8w5/view?usp=sharing)
 * [Ich verstehe das nicht, aber ich lerne jeden Tag neue Dinge.](https://drive.google.com/file/d/1kja_2RsFt6EmC33HTB4ozJyFlvh_DTFQ/view?usp=sharing)
 * [Ich bin jetzt bereit.](https://drive.google.com/file/d/1GkplGH7LMJcPDpgFJocXHCjRln_ccVFs/view?usp=sharing)
 * [Bitte warte einen Moment, bis ich fertig mit dem Booten bin.](https://drive.google.com/file/d/19Td-F14n_05F-squ3bNlt2BDE-NMFaq1/view?usp=sharing)
 * [Mein Name ist Mycroft und ich bin funky.](https://drive.google.com/file/d/1dbyOyE7Oy8YdAsYqQ4vz4VJjiWIyc8oV/view?usp=sharing)
 ## Vergleich einiger Vocoder
 Wir experimentieren aktuell mit unterschiedlichen Konfigurationen um das beste Modell zu ermitteln. Ein Vergleich der bisherigen Ergebnisse findest Du auf dieser Seite. 
 > [Vergleich der unterschiedlichen Modelle](./audio_compare)
 # Interessiert?
 [Weitere Details, Downloads und Danksagungen findet ihr hier.](https://github.com/thorstenMueller/deep-learning-german-tts "Dataset Details und Thorsten-Modell Download")
 ---
 <span style="font-size:1.5em;font-weight:bold">
 Wir wünschen euch viel Spaß und Erfolg bei der Umsetzung eurer Projekte :-)
 </span>
--- a/docs/samples/sample01-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample01-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample01-TensorFlowTTS.wav
+++ b/docs/samples/sample01-TensorFlowTTS.wav
--- a/docs/samples/sample01-griffin-lim.wav
+++ b/docs/samples/sample01-griffin-lim.wav
--- a/docs/samples/sample01-gt.wav
+++ b/docs/samples/sample01-gt.wav
--- a/docs/samples/sample01-hifigan.wav
+++ b/docs/samples/sample01-hifigan.wav
--- a/docs/samples/sample01-pwgan.wav
+++ b/docs/samples/sample01-pwgan.wav
--- a/docs/samples/sample01-silero.wav
+++ b/docs/samples/sample01-silero.wav
--- a/docs/samples/sample01-vocgan.wav
+++ b/docs/samples/sample01-vocgan.wav
--- a/docs/samples/sample01-waveglow.wav
+++ b/docs/samples/sample01-waveglow.wav
--- a/docs/samples/sample01-wavegrad.wav
+++ b/docs/samples/sample01-wavegrad.wav
--- a/docs/samples/sample02-ForwardTacotron-HifiGAN.wav.wav
+++ b/docs/samples/sample02-ForwardTacotron-HifiGAN.wav.wav
--- a/docs/samples/sample02-TensorFlowTTS.wav
+++ b/docs/samples/sample02-TensorFlowTTS.wav
--- a/docs/samples/sample02-griffin-lim.wav
+++ b/docs/samples/sample02-griffin-lim.wav
--- a/docs/samples/sample02-gt.wav
+++ b/docs/samples/sample02-gt.wav
--- a/docs/samples/sample02-hifigan.wav
+++ b/docs/samples/sample02-hifigan.wav
--- a/docs/samples/sample02-pwgan.wav
+++ b/docs/samples/sample02-pwgan.wav
--- a/docs/samples/sample02-silero.wav
+++ b/docs/samples/sample02-silero.wav
--- a/docs/samples/sample02-vocgan.wav
+++ b/docs/samples/sample02-vocgan.wav
--- a/docs/samples/sample02-waveglow.wav
+++ b/docs/samples/sample02-waveglow.wav
--- a/docs/samples/sample02-wavegrad.wav
+++ b/docs/samples/sample02-wavegrad.wav
--- a/docs/samples/sample03-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample03-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample03-TensorFlowTTS.wav
+++ b/docs/samples/sample03-TensorFlowTTS.wav
--- a/docs/samples/sample03-griffin-lim.wav
+++ b/docs/samples/sample03-griffin-lim.wav
--- a/docs/samples/sample03-gt.wav
+++ b/docs/samples/sample03-gt.wav
--- a/docs/samples/sample03-hifigan.wav
+++ b/docs/samples/sample03-hifigan.wav
--- a/docs/samples/sample03-pwgan.wav
+++ b/docs/samples/sample03-pwgan.wav
--- a/docs/samples/sample03-silero.wav
+++ b/docs/samples/sample03-silero.wav
--- a/docs/samples/sample03-vocgan.wav
+++ b/docs/samples/sample03-vocgan.wav
--- a/docs/samples/sample03-waveglow.wav
+++ b/docs/samples/sample03-waveglow.wav
--- a/docs/samples/sample03-wavegrad.wav
+++ b/docs/samples/sample03-wavegrad.wav
--- a/docs/samples/sample04-ForwardTacotron-HifiGAN.wav.wav
+++ b/docs/samples/sample04-ForwardTacotron-HifiGAN.wav.wav
--- a/docs/samples/sample04-TensorFlowTTS.wav
+++ b/docs/samples/sample04-TensorFlowTTS.wav
--- a/docs/samples/sample04-griffin-lim.wav
+++ b/docs/samples/sample04-griffin-lim.wav
--- a/docs/samples/sample04-gt.wav
+++ b/docs/samples/sample04-gt.wav
--- a/docs/samples/sample04-hifigan.wav
+++ b/docs/samples/sample04-hifigan.wav
--- a/docs/samples/sample04-pwgan.wav
+++ b/docs/samples/sample04-pwgan.wav
--- a/docs/samples/sample04-silero.wav
+++ b/docs/samples/sample04-silero.wav
--- a/docs/samples/sample04-vocgan.wav
+++ b/docs/samples/sample04-vocgan.wav
--- a/docs/samples/sample04-waveglow.wav
+++ b/docs/samples/sample04-waveglow.wav
--- a/docs/samples/sample04-wavegrad.wav
+++ b/docs/samples/sample04-wavegrad.wav
--- a/docs/samples/sample05-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample05-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample05-TensorFlowTTS.wav
+++ b/docs/samples/sample05-TensorFlowTTS.wav
--- a/docs/samples/sample05-griffin-lim.wav
+++ b/docs/samples/sample05-griffin-lim.wav
--- a/docs/samples/sample05-hifigan.wav
+++ b/docs/samples/sample05-hifigan.wav
--- a/docs/samples/sample05-pwgan.wav
+++ b/docs/samples/sample05-pwgan.wav
--- a/docs/samples/sample05-silero.wav
+++ b/docs/samples/sample05-silero.wav
--- a/docs/samples/sample05-waveglow.wav
+++ b/docs/samples/sample05-waveglow.wav
--- a/docs/samples/sample05-wavegrad.wav
+++ b/docs/samples/sample05-wavegrad.wav
--- a/german_corpus-mimic_recording_studio.csv
+++ b/german_corpus-mimic_recording_studio.csv
--- a/helperScripts/Dockerfile.Jetson-Coqui
+++ b/helperScripts/Dockerfile.Jetson-Coqui
@ -0,0 +1,51 @@
 # Dockerfile for running Coqui TTS trainings in a docker container on NVIDIA Jetson platofrm.
 # Based on NVIDIA Jetson ML Image, provided without any warranty as is by Thorsten Müller (https://twitter.com/ThorstenVoice) in august 2021
 FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3
 RUN echo "deb https://repo.download.nvidia.com/jetson/common r32.4 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
 RUN echo "deb https://repo.download.nvidia.com/jetson/t194 r32.4 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
 RUN apt-get update -y
 RUN apt-get install vim python-mecab libmecab-dev cuda-toolkit-10-2 libcudnn8 libcudnn8-dev libsndfile1-dev locales -y
 # Setting some environment vars
 ENV LLVM_CONFIG=/usr/bin/llvm-config-9
 ENV PYTHONPATH=/coqui/TTS/
 ENV LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
 # Skipping OPENBLAS_CORETYPE might show "Illegal instruction (core dumped) error
 ENV OPENBLAS_CORETYPE=ARMV8
 ENV NVIDIA_VISIBLE_DEVICES all
 ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
 LABEL com.nvidia.volumes.needed="nvidia_driver"
 # Adjust locale setting to your personal needs
 RUN sed -i '/de_DE.UTF-8/s/^# //g' /etc/locale.gen && \
    locale-gen
 ENV LANG de_DE.UTF-8
 ENV LANGUAGE de_DE:de
 ENV LC_ALL de_DE.UTF-8
 RUN mkdir /coqui
 WORKDIR /coqui
 ARG COQUI_BRANCH
 RUN git clone -b ${COQUI_BRANCH} https://github.com/coqui-ai/TTS.git
 WORKDIR /coqui/TTS
 RUN pip3 install pip setuptools wheel --upgrade
 RUN pip uninstall -y tensorboard tensorflow tensorflow-estimator nbconvert matplotlib
 RUN pip install -r requirements.txt
 RUN python3 ./setup.py develop
 # Jupyter Notebook
 RUN python3 -c "from notebook.auth.security import set_password; set_password('nvidia', '/root/.jupyter/jupyter_notebook_config.json')"
 CMD /bin/bash -c "jupyter lab --ip 0.0.0.0 --port 8888 --allow-root"
 # Build example:
 #   nvidia-docker build . -f Dockerfile.Jetson-Coqui --build-arg COQUI_BRANCH=v0.1.3 -t jetson-coqui
 # Run example:
 #   nvidia-docker run -p 8888:8888 -d --shm-size 32g --gpus all -v /ssd/___prj/tts/dataset-july21:/coqui/TTS/data jetson-coqui
 # Bash example:
 #   nvidia-docker exec -it <containerId> /bin/bash
--- a/helperScripts/MRS2LJSpeech.py
+++ b/helperScripts/MRS2LJSpeech.py
@ -0,0 +1,157 @@
 # This script generates the folder structure for ljspeech-1.1 processing from mimic-recording-studio database
 # Changelog
 # v1.0  - Initial release by Thorsten Müller (https://github.com/thorstenMueller/deep-learning-german-tts)
 # v1.1  - Great improvements by Peter Schmalfeldt (https://github.com/manifestinteractive)
 #           - Audio processing with ffmpeg (mono and samplerate of 22.050 Hz)
 #           - Much better Python coding than my original version
 #           - Greater logging output to command line
 #           - See more details here: https://gist.github.com/manifestinteractive/6fd9be62d0ede934d4e1171e5e751aba
 #           - Thanks Peter, it's a great contribution :-)
 # v1.2  - Added choice for choosing which recording session should be exported as LJSpeech
 # v1.3  - Added parameter mrs_dir to pass directory of Mimic-Recording-Studio
 # v1.4  - Script won't crash when audio recorded has been deleted on disk
 # v1.5  - Added parameter "ffmpeg" to make converting with ffmpeg optional
 from genericpath import exists
 import glob
 import sqlite3
 import os
 import argparse
 import sys
 from shutil import copyfile
 from shutil import rmtree
 # Setup Directory Data
 cwd = os.path.dirname(os.path.abspath(__file__))
 output_dir = os.path.join(cwd, "dataset")
 output_dir_audio = ""
 output_dir_audio_temp=""
 output_dir_speech = ""
 # Create folders needed for ljspeech
 def create_folders():
  global output_dir
  global output_dir_audio
  global output_dir_audio_temp
  global output_dir_speech
  print('→ Creating Dataset Folders')
  output_dir_speech = os.path.join(output_dir, "LJSpeech-1.1")
  # Delete existing folder if exists for clean run
  if os.path.exists(output_dir_speech):
    rmtree(output_dir_speech)
  output_dir_audio = os.path.join(output_dir_speech, "wavs")
  output_dir_audio_temp = os.path.join(output_dir_speech, "temp")
  # Create Clean Folders
  os.makedirs(output_dir_speech)
  os.makedirs(output_dir_audio)
  os.makedirs(output_dir_audio_temp)
 def convert_audio():
  global output_dir_audio
  global output_dir_audio_temp
  recordings = len([name for name in os.listdir(output_dir_audio_temp) if os.path.isfile(os.path.join(output_dir_audio_temp,name))])
  print('→ Converting %s Audio Files to 22050 Hz, 16 Bit, Mono\n' % "{:,}".format(recordings))
  # Please use `pip install ffmpeg-python`
  import ffmpeg
  for idx, wav in enumerate(glob.glob(os.path.join(output_dir_audio_temp, "*.wav"))):
    percent = (idx + 1) / recordings
    print('› \033[96m%s\033[0m \033[2m%s / %s (%s)\033[0m ' % (os.path.basename(wav), "{:,}".format((idx + 1)), "{:,}".format(recordings), "{:.0%}".format(percent)))
    # Convert WAV file to required format
    (ffmpeg
      .input(wav)
      .output(os.path.join(output_dir_audio, os.path.basename(wav)), acodec='pcm_s16le', ac=1, ar=22050, loglevel='error')
      .overwrite_output()
      .run(capture_stdout=True)
    )
 def copy_audio():
  global output_dir_audio
  print('→ Using ffmpeg to convert recordings')
  recordings = len([name for name in os.listdir(output_dir_audio_temp) if os.path.isfile(os.path.join(output_dir_audio_temp,name))])
  print('→ Copy %s Audio Files to LJSpeech Dataset\n' % "{:,}".format(recordings))
  for idx, wav in enumerate(glob.glob(os.path.join(output_dir_audio_temp, "*.wav"))):    
    copyfile(wav,os.path.join(output_dir_audio, os.path.basename(wav)))
 def create_meta_data(mrs_dir):
  print('→ Creating META Data')
  conn = sqlite3.connect(os.path.join(mrs_dir, "backend", "db", "mimicstudio.db"))
  c = conn.cursor()
  # Create metadata.csv for ljspeech
  metadata = open(os.path.join(output_dir_speech, "metadata.csv"), mode="w", encoding="utf8")
  # List available recording sessions
  user_models = c.execute('SELECT uuid, user_name from usermodel ORDER BY created_date DESC').fetchall()
  user_id = user_models[0][0]
  for row in user_models:
    print(row[0] + ' -> ' + row[1])
  user_answer = input('Please choose ID of recording session to export (default is newest session) [' + user_id + ']: ')
  if user_answer:
    user_id = user_answer
  for row in c.execute('SELECT audio_id, prompt, lower(prompt) FROM audiomodel WHERE user_id = "' + user_id + '" ORDER BY length(prompt)'):
    source_file = os.path.join(mrs_dir, "backend", "audio_files", user_id, row[0] + ".wav")
    if exists(source_file):
      metadata.write(row[0] + "|" + row[1] + "|" + row[2] + "\n")
      copyfile(source_file, os.path.join(output_dir_audio_temp, row[0] + ".wav"))
    else:
      print("Wave file {} not found.".format(source_file))
  metadata.close()
  conn.close()
 def cleanup():
  global output_dir_audio_temp
  # Remove Temp Folder
  rmtree(output_dir_audio_temp)
 def main():
  parser = argparse.ArgumentParser()
  parser.add_argument('--mrs_dir', required=True)
  parser.add_argument('--ffmpeg', required=False, default=False)
  args = parser.parse_args()
  if not os.path.isdir(os.path.join(args.mrs_dir,"backend")):
    sys.exit("Passed directory is no valid Mimic-Recording-Studio main directory!")
  print('\n\033[48;5;22m  MRS to LJ Speech Processor  \033[0m\n')
  create_folders()
  create_meta_data(args.mrs_dir)
  if(args.ffmpeg):
    convert_audio()
  else:
    copy_audio()
  cleanup()
  print('\n\033[38;5;86;1m✔\033[0m COMPLETE【ツ】\n')
 if __name__ == '__main__':
  main()
--- a/helperScripts/README.md
+++ b/helperScripts/README.md
@ -0,0 +1,27 @@
 # Short collection of helpful scripts for dataset creation and/or TTS training stuff
 ## MRS2LJSpeech
 Python script which takes recordings (filesystem and sqlite db) done with Mycroft Mimic-Recording-Studio (https://github.com/MycroftAI/mimic-recording-studio) and creates an audio optimized dataset in widely supported LJSpeech directory structure.
 Peter Schmalfeldt (https://github.com/manifestinteractive) did an amazing job as he optimized my originally (quick'n dirty) version of that script, so thank you Peter :-)
 See more details here: https://gist.github.com/manifestinteractive/6fd9be62d0ede934d4e1171e5e751aba#file-mrs2ljspeech-py
 ## Dockerfile.Jetson-Coqui
 > Add your user to `docker` group to not require sudo on all operations.
 Thanks to NVIDIA for providing docker images for Jetson platform. I use the "machine learning (ML)" image as baseimage for setting up a Coqui environment.
 > You can use any branch or tag as COQUI_BRANCH argument. v0.1.3 is just the current stable version.
 Switch to directory where Dockerfile is in and run `nvidia-docker build . -f Dockerfile.Jetson-Coqui --build-arg COQUI_BRANCH=v0.1.3 -t jetson-coqui` to build your container image. When build process is finished you can start a container on that image.
 ### Mapped volumes
 We need to bring your dataset and configuration file into our container so we should map a volume on running container
 `nvidia-docker run -p 8888:8888 -d --shm-size 32g --gpus all -v [host path with dataset and config.json]:/coqui/TTS/data jetson-coqui`. Now we have a running container ready for Coqui TTS magic.
 ### Jupyter notebook
 Coqui provides lots of useful Jupyter notebooks for dataset analysis. Once your container is up and running you should be able to call 
 ### Running bash into container
 `nvidia-docker exec -it jetson-coqui /bin/bash` now you're inside the container and an `ls /coqui/TTS/data` should show your dataset files.
--- a/helperScripts/getDatasetSpeechRate.py
+++ b/helperScripts/getDatasetSpeechRate.py
@ -0,0 +1,41 @@
 # This script gets speech rate per audio recording from a voice dataset (ljspeech file and directory structure)
 # Writte by Thorsten Müller (deep-learning-german@gmx.net) and provided without any warranty.
 # https://github.com/thorstenMueller/deep-learning-german-tts/
 # https://twitter.com/ThorstenVoice
 # Changelog:
 # v0.1 - 26.09.2021 - Initial version
 from genericpath import exists
 import os
 import librosa
 import csv
 dataset_dir = "/home/thorsten/___dev/tts/dataset/Thorsten-neutral-Dec2021-44k/" # Directory where metadata.csv is in
 out_csv_file = os.path.join(dataset_dir,"speech_rate_report.csv")
 decimal_use_comma = True # False: Splitting decimal value with a dot (.); True: Comma (,)
 out_csv = open(out_csv_file,"w")
 out_csv.write("filename;audiolength_sec;number_chars;chars_per_sec;remove_from_dataset\n")
 # Open metadata.csv file
 with open(os.path.join(dataset_dir,"metadata.csv")) as csvfile:
    reader = csv.reader(csvfile, delimiter='|')
    for row in reader:
        wav_file = os.path.join(dataset_dir,"wavs",row[0] + ".wav")
        if exists(wav_file):
            # Gather values for report.csv output
            phrase_len = len(row[1]) - 1 # Do not count punctuation marks.
            duration = round(librosa.get_duration(filename=wav_file),2)
            char_per_sec = round(phrase_len / duration,2)
            if decimal_use_comma:
                duration = str(duration).replace(".",",")
                char_per_sec = str(char_per_sec).replace(".",",")
            out_csv.write(row[0] + ".wav;" + str(duration) + ";" + str(phrase_len) + ";" + str(char_per_sec) + ";no\n")
        else:
            print("File " + wav_file + " does not exist.")
 out_csv.close()
--- a/helperScripts/removeFilesFromDataset.py
+++ b/helperScripts/removeFilesFromDataset.py
@ -0,0 +1,48 @@
 # This script removes recordings from an ljspeech file/directory structured dataset based on CSV file from "getDatasetSpeechRate"
 # Writte by Thorsten Müller (deep-learning-german@gmx.net) and provided without any warranty.
 # https://github.com/thorstenMueller/deep-learning-german-tts/
 # https://twitter.com/ThorstenVoice
 # Changelog:
 # v0.1 - 26.09.2021 - Initial version
 import os
 import csv
 import shutil
 dataset_dir = "/Users/thorsten/Downloads/thorsten-export-20210909/" # Directory where metadata.csv is in
 subfolder_removed = "___removed"
 in_csv_file = os.path.join(dataset_dir,"speech_rate_report.csv")
 to_remove = []
 # Open metadata.csv file
 with open(os.path.join(dataset_dir,in_csv_file)) as csvfile:
    reader = csv.reader(csvfile, delimiter=';')
    for row in reader:
        if row[4] == "yes":
            # Recording in that row should be removed from dataset
            to_remove.append(row[0])
            print("Recording " + row[0] + " will be removed from dataset.")
 print("\n" + str(len(to_remove)) + " recordings has been marked for deletion.")
 if len(to_remove) > 0:
    metadata_cleaned = open(os.path.join(dataset_dir,"metadata_cleaned.csv"),"w")
    # Create new subdirectory for removed wav files
    removed_dir = os.path.join(dataset_dir,subfolder_removed)
    if not os.path.exists(removed_dir):
        os.makedirs(removed_dir)
    # Remove lines from metadata.csv and move wav files to new subdirectory
    with open(os.path.join(dataset_dir,"metadata.csv")) as csvfile:
        reader = csv.reader(csvfile, delimiter='|')
        for row in reader:
            if (row[0] + ".wav") not in to_remove:
                metadata_cleaned.write(row[0] + "|" + row[1] + "|" + row[2] + "\n")
            else:
                # Move recording to new subfolder
                shutil.move(os.path.join(dataset_dir,"wavs",row[0] + ".wav"),removed_dir)
    metadata_cleaned.close()
--- a/img/thorsten-de---datasetAnalysis1.png
+++ b/img/thorsten-de---datasetAnalysis1.png
--- a/img/thorsten-de---datasetAnalysis2.png
+++ b/img/thorsten-de---datasetAnalysis2.png
--- a/img/thorsten-de---datasetAnalysis3.png
+++ b/img/thorsten-de---datasetAnalysis3.png
--- a/img/thorsten-de---datasetAnalysis4.png
+++ b/img/thorsten-de---datasetAnalysis4.png
--- a/img/thorsten-de---datasetAnalysis5.png
+++ b/img/thorsten-de---datasetAnalysis5.png
--- a/img/thorsten-de---datasetAnalysis6.png
+++ b/img/thorsten-de---datasetAnalysis6.png
--- a/samples/original_recording/recorded_sample_01.wav
+++ b/samples/original_recording/recorded_sample_01.wav
--- a/samples/original_recording/recorded_sample_02.wav
+++ b/samples/original_recording/recorded_sample_02.wav
--- a/samples/original_recording/recorded_sample_03.wav
+++ b/samples/original_recording/recorded_sample_03.wav
--- a/samples/original_recording/recorded_sample_04.wav
+++ b/samples/original_recording/recorded_sample_04.wav
--- a/samples/original_recording/recorded_sample_05.wav
+++ b/samples/original_recording/recorded_sample_05.wav
--- a/samples/original_recording/recorded_sample_06.wav
+++ b/samples/original_recording/recorded_sample_06.wav
--- a/samples/original_recording/recorded_sample_07.wav
+++ b/samples/original_recording/recorded_sample_07.wav
--- a/samples/thorsten-21.06-emotional/amused.wav
+++ b/samples/thorsten-21.06-emotional/amused.wav
--- a/samples/thorsten-21.06-emotional/angry.wav
+++ b/samples/thorsten-21.06-emotional/angry.wav
--- a/samples/thorsten-21.06-emotional/disgusted.wav
+++ b/samples/thorsten-21.06-emotional/disgusted.wav
--- a/samples/thorsten-21.06-emotional/drunk.wav
+++ b/samples/thorsten-21.06-emotional/drunk.wav
--- a/samples/thorsten-21.06-emotional/neutral.wav
+++ b/samples/thorsten-21.06-emotional/neutral.wav
--- a/samples/thorsten-21.06-emotional/sleepy.wav
+++ b/samples/thorsten-21.06-emotional/sleepy.wav
--- a/samples/thorsten-21.06-emotional/surprised.wav
+++ b/samples/thorsten-21.06-emotional/surprised.wav
--- a/samples/thorsten-21.06-emotional/whisper.wav
+++ b/samples/thorsten-21.06-emotional/whisper.wav
Author	SHA1	Message	Date
Thorsten Müller	f13bcaf63e	Added Windows TTS training recipe Added modified vits recipe for Thorsten-Voice model training using Windows	2023-03-05 16:19:50 +01:00
Thorsten Müller	04c5683194	German Corpus for Mimic-Recording-Studio	2022-12-16 22:54:02 +01:00
Thorsten Müller	50e09d49bf	Added social media info	2022-11-13 17:08:26 +01:00
Thorsten Müller	b0afed75f4	Added new 2022.10 ThorstenVoice dataset.	2022-11-13 16:47:46 +01:00
Thorsten Müller	9b7b4c6836	Added new released Tacotron2 DDC model to README tts-server --model_name tts_models/de/thorsten/tacotron2-DDC	2022-08-23 19:03:47 +02:00
Thorsten Müller	aba10bc64a	Added info on new VITS model.	2022-06-24 18:00:12 +02:00
Thorsten Müller	07e85b3905	Merge pull request #35 from thorstenMueller/thorstenMueller-patch-1 Add new project logo to header.	2022-05-09 20:56:03 +02:00
Thorsten Müller	e08d50d6bb	Added new logo to header	2022-05-09 20:46:17 +02:00
Thorsten Müller	e691aa4ee3	Delete Logo_Thorsten-Voice-kleiner.jpg	2022-05-09 20:45:31 +02:00
Thorsten Müller	625f73e986	Delete Logo_Thorsten-Voice.jpg	2022-05-09 20:45:18 +02:00
Thorsten Müller	de1802f8ce	Update README.md	2022-05-09 20:34:50 +02:00
Thorsten Müller	f0500309d6	Test with embedded logo	2022-05-09 18:19:02 +02:00
Thorsten Müller	41c91b9865	Add files via upload	2022-05-09 18:14:50 +02:00
Thorsten Müller	fcb1e705a9	Add files via upload	2022-05-09 18:13:06 +02:00
Thorsten Müller	b8802db4f8	Uploaded transparent Thorsten-Voice logo.	2022-05-09 18:12:03 +02:00
Thorsten Müller	b00c768343	Added badge links.	2022-04-28 18:13:49 +02:00
Thorsten Mueller	3b0b4f898f	Fixed typo.	2022-04-24 09:13:13 +02:00
Thorsten Müller	2106fc6b00	Test	2022-04-23 23:31:15 +02:00
Thorsten Müller	e4ff3ce04a	Initial draft FUNDING.yml	2022-04-23 23:29:15 +02:00
Thorsten Müller	f408508cd7	Merge pull request #31 from thorstenMueller/prep-thorsten-22.05 Merge new README (preparation for new TTS model release)	2022-04-23 23:26:17 +02:00
Thorsten Mueller	6b4cfb41d4	Added Youtube link.	2022-04-23 23:22:27 +02:00
Thorsten Mueller	521dd33483	Updated TOC	2022-04-23 21:15:26 +02:00
Thorsten Mueller	6efb25310a	preparations for new Thorsten models	2022-04-23 21:13:30 +02:00
Thorsten Müller	5654397f3e	Add citation file.	2022-04-20 23:48:54 +02:00
Thorsten Mueller	b5ec9ef991	Fixed minor issues	2022-02-15 17:52:03 +01:00
Thorsten Mueller	77ad01d4ff	Making ffmpeg conversion optional.	2022-02-15 17:28:40 +01:00
Thorsten Mueller	c35507b1f7	Added link for VoiceLunch slides.	2022-01-03 20:09:43 +01:00
Thorsten Mueller	b536dfd958	Added check if audio file exists in getDatasetSpeechRate	2021-12-19 18:44:01 +01:00
Thorsten Mueller	29238f2a31	Updated Download links / Cites	2021-12-11 17:44:49 +01:00
Thorsten Müller	8c5f4503f3	Added two hyperlinks To http://www.Thorsten-Voice.de and https://OpenVoice-Tech.net Wiki	2021-11-28 11:33:54 +01:00
Thorsten Mueller	2ff7e3961b	Added Forward Tacotron samples.	2021-10-30 21:48:21 +02:00
Thorsten Müller	1221713314	Remove Wikipedia link to "Thorsten (Stimme)"	2021-10-23 16:52:59 +02:00
Thorsten Mueller	d3225b48f8	Added Citation to README.	2021-10-08 18:22:34 +02:00
Thorsten Mueller	33c030f844	Added two scripts for dataset analysis/cleaning.	2021-09-28 06:10:21 +02:00
Thorsten Müller	2daabae53e	Added DOIs in README	2021-09-24 16:32:16 +02:00
Thorsten Müller	1d445b09f8	Added DOI badge for emotional dataset	2021-09-23 21:58:54 +02:00
Thorsten Mueller	2853f111dc	Merge branch 'master' of https://github.com/thorstenMueller/deep-learning-german-tts	2021-09-18 16:04:59 +02:00
Thorsten Mueller	7540606247	Added download link for new recording-in-progress neutral dataset.	2021-09-18 16:04:33 +02:00
Thorsten Mueller	0b9e929ce0	Added Fullband-MelGAN model download path. Thanks to (see #26 )	2021-08-20 06:02:47 +02:00
Thorsten Mueller	bc06fa923f	Added info on TensorVox by ZDisket - thanks :-)	2021-08-12 18:30:55 +02:00
Thorsten Mueller	f19144b085	Adjusted quick setup example to new vocoder model.	2021-08-06 09:50:44 +02:00
Thorsten Müller	251c093ad4	Added locale settings for german Umlaut handling.	2021-08-04 09:24:51 +02:00
Thorsten Mueller	f505fd38df	Dockerfile draft for NVIDIA Jetson Xavier AGX and Coqui	2021-08-02 19:54:38 +02:00
Thorsten Mueller	3e09ae8615	Added link to my Youtube channel.	2021-07-21 22:49:47 +02:00
Thorsten Mueller	2ed2413dda	Explain how i recorded emotional phrases.	2021-07-13 21:53:55 +02:00
Thorsten Mueller	51c5f55bbd	Added check that recording exists before export.	2021-07-12 23:27:50 +02:00
Thorsten Mueller	4f875ac591	Added --mrs_dir param for more flexibility	2021-07-07 22:00:47 +02:00
Thorsten Mueller	2ea44ede87	Added REAME for helperScripts	2021-07-04 22:38:38 +02:00
Thorsten Mueller	ba60fc57d4	Added script to create LJSpeech dataset out of Mimic-Recording-Studio recordings.	2021-07-04 22:33:38 +02:00
Thorsten Müller	9e68d99ee7	Updated emotional dataset v02 download link	2021-06-20 08:57:39 +02:00
Thorsten Mueller	7172604eed	Added v02 emotional dataset (drunk + whispering)	2021-06-13 10:59:04 +02:00
Thorsten Mueller	58dece7c55	Added chapter on public talks	2021-06-08 07:18:30 +02:00
Thorsten Mueller	c81f374aca	Test Commit	2021-06-07 21:52:31 +02:00
Thorsten Mueller	2c6aca780b	Added table with trained model checkpoint downloads	2021-05-11 22:34:10 +02:00
Thorsten Müller	68e60f2a92	Format Wikipedia link	2021-04-22 18:57:40 +02:00
Thorsten Mueller	a3b0dde296	Added info about Wikipedia article	2021-04-22 18:53:39 +02:00
Thorsten Mueller	28d81a0fb2	Update on emotional dataset info	2021-04-11 11:42:24 +02:00
Thorsten Mueller	12c6d26dbd	Moved emotional samples to other table.	2021-04-11 11:39:29 +02:00
Thorsten Mueller	4c06db69dd	Added silero models to audio comparison	2021-04-11 11:04:20 +02:00
Thorsten Müller	bae96a75a5	Added badge for link to TTS comparison page	2021-04-09 19:29:24 +02:00
Thorsten Müller	1313520064	Playing around with some cool badges :-)	2021-04-09 19:05:43 +02:00
Thorsten Mueller	e2ecf68c13	added details on coqui model usage.	2021-04-05 16:57:36 +02:00
Thorsten Mueller	c8a5e1082e	Small TOC fix	2021-04-03 23:48:10 +02:00
Thorsten Mueller	40aae591d7	Small fixes in TOC	2021-04-03 23:45:46 +02:00
Thorsten Mueller	4f722e96a9	Adding info on emotional dataset.	2021-04-03 23:24:53 +02:00
Thorsten Müller	7e1530b742	Merge pull request #14 from snakers4/master Add silero-models	2021-04-03 22:12:09 +02:00
snakers4	647786be6c	Add silero-models	2021-04-03 05:17:14 +00:00
Thorsten Müller	00685a008d	Added cute sloth smiley.	2021-03-30 12:07:41 +02:00
Thorsten Mueller	e5481a82a6	Added smaller logo	2021-03-30 08:00:58 +02:00
Thorsten Mueller	2d1428cd13	Switch to non-transparent logo	2021-03-30 07:55:08 +02:00
Thorsten Mueller	df55a19ae2	Added ThorstenVoice logo	2021-03-30 07:53:48 +02:00
Thorsten Müller	9585b73cc3	Modify title	2021-03-16 20:23:29 +01:00
Thorsten Müller	70158ba7c8	Small README updates	2021-03-16 18:51:21 +01:00
Thorsten Mueller	e1e9f8666a	Small text adjustments and formatting on README.	2021-03-16 18:41:39 +01:00
Thorsten Müller	cca10c215e	Added download link to v03 dataset.	2021-02-10 19:46:21 +01:00
Thorsten Mueller	09705597b8	Merge branch 'master' of https://github.com/thorstenMueller/deep-learning-german-tts	2021-01-23 18:50:15 +01:00
Thorsten Mueller	bdb3aa7d47	Added hifiGAN samples trained by SanjaESC	2021-01-23 18:15:56 +01:00
Thorsten Müller	f0c0f63ae1	Added nice guy SanjaESC to thanks section	2021-01-22 16:24:56 +01:00
Thorsten Müller	036c266ad7	Added Sebastian to thanks section - Thank you :-)	2021-01-16 08:24:10 +01:00
Thorsten Mueller	8e6137b3af	Added wavegrad samples (training in progress)	2020-12-14 17:53:32 +01:00
Thorsten Mueller	9ee0353da4	Changed main and subheading for TensorFlowTTS	2020-12-02 12:23:20 +01:00
Thorsten Mueller	a99d4b6477	Added first samples for TensorFlowTTS	2020-12-02 12:14:16 +01:00
Thorsten Mueller	02020e54f7	added sample 05 for griffin lim.	2020-11-21 10:19:13 +01:00
Thorsten Mueller	5347394f3e	Added Griffin Lim vocoder samples	2020-11-21 10:08:08 +01:00
Thorsten Mueller	c59d19e0a1	Added detail on glowtts training steps.	2020-11-17 22:04:09 +01:00
Thorsten Mueller	e45736f62d	added sample05 with GlowTTS.	2020-11-17 21:53:08 +01:00
Thorsten Mueller	e96de3a095	fixed typo	2020-11-16 18:25:38 +01:00
Thorsten Mueller	eaead5cebe	Rename to docs folder for Github pages	2020-11-16 17:28:20 +01:00
Thorsten Mueller	7b27bdac2d	Added github page with index and sample wavs	2020-11-16 17:25:42 +01:00
Thorsten Müller	f55e16d0fc	fixed typo	2020-09-23 19:32:27 +02:00
		`@ -0,0 +1,2 @@`
							`# These are supported funding model platforms`