Added Windows TTS training recipe

Added modified vits recipe for Thorsten-Voice model training using Windows
German Corpus for Mimic-Recording-Studio
2023-03-05 16:19:50 +01:00 · 2022-12-16 22:54:02 +01:00 · 2022-11-13 17:08:26 +01:00 · 2022-11-13 16:47:46 +01:00 · 2022-08-23 19:03:47 +02:00 · 2022-06-24 18:00:12 +02:00
83 changed files with 24467 additions and 107 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1,2 @@
+# These are supported funding model platforms
+
--- a/CITATION.cff
+++ b/CITATION.cff
@ -0,0 +1,28 @@
+# This CITATION.cff file was generated with cffinit.
+# Visit https://bit.ly/cffinit to generate yours today!
+
+cff-version: 1.2.0
+title: Thorsten-Voice
+message: >-
+  Please cite Thorsten-Voice project if you use
+  datasets or trained TTS models.
+type: dataset
+authors:
+  - given-names: Thorsten
+    family-names: Müller
+    email: tm@thorsten-voice.de
+  - given-names: Dominik
+    family-names: Kreutz
+repository-code: 'https://github.com/thorstenMueller/Thorsten-Voice'
+url: 'https://www.Thorsten-Voice.de'
+abstract: >-
+  A free to use, offline working, high quality german
+  TTS voice should be available for every project
+  without any license struggling.
+keywords:
+  - Thorsten
+  - Voice
+  - Open
+  - German
+  - TTS
+  - Dataset
--- a/Logo_Thorsten-Voice.png
+++ b/Logo_Thorsten-Voice.png
--- a/README.md
+++ b/README.md
@ -1,135 +1,245 @@
-# Introduction
-Many smart voice assistants like Amazon Alexa, Google Home, Apple Siri and Microsoft Cortana use cloud services to offer their (base) functionality.
+![Thorsten-Voice logo](Logo_Thorsten-Voice.png)

-As some people have privacy concerns using these services there are some (open source) projects trying to build offline and/or privacy aware alternatives.
+- [Project motivation](#motivation-for-thorsten-voice-project-speaking_head-speech_balloon)
+  
+- [Personal note](#some-personal-words-before-using-thorsten-voice)

-But speech recognition and text synthesis still requires cloud services for providing these in a decent quality.
+- [**Thorsten** Voice Datasets](#voice-datasets)
+  - [Thorsten-21.02-neutral](#thorsten-2102-neutral)
+  - [Thorsten-21.06-emotional](#thorsten-2106-emotional)
+  - [Thorsten-22.10-neutral](#thorsten-2210-neutral)

-# MyCroft AI
-> https://mycroft.ai/
+- [**Thorsten** TTS-Models](#tts-models)
+  - [Thorsten-21.04-Tacotron2-DCA](#thorsten-2104-tacotron2-dca)
+  - [Thorsten-22.05-VITS](#thorsten-2205-vits)
+  - [Thorsten-22.08-Tacotron2-DDC](#thorsten-2208-tacotron2-ddc)
+  - [Other models](#other-models)
+  
+- [Public talks](#public-talks)

-MyCroft is a company developing an opensource voice assistant with a very nice and active community. But the stt/tts parts are still cloud based (eg. google services), even if requests are anonymized by a mycroft proxy in between. But integration with locally hosted services such as deepspeech (stt) or mimic/tacotron (tts) is possible.
+- [My Youtube channel](#youtube-channel)

-# Mozilla
-Mozilla works on these really important aspects for free and open human machine voice interaction.
-
-## STT - speech to text
-> https://commonvoice.mozilla.org/
-
-"STT" needs lots of audio training data by many speakers (women/men/kids) of all ages, dialects and in various audio quality levels. So any voice contribution for common voice project is highly welcome.
-
-## TTS - text to speech
-> https://github.com/mozilla/tts
-
-"TTS" needs lots of clean recordings by one speaker to train a model. Mozilla is developing a software stack for proper model training based on tacotron2 papers.
-
-# And?!
-I want to make the most personal contribution i can give and contribute my personal voice (**german**) for TTS training to the community for free usage.
-
-## Please read some personal words before downloading the dataset
-I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone.
-
-So hopefully my voice is used in this manner to make this world a better place for all of us :-).
-
-**tl;dr** Please don't use for evil!
-
-# Dataset "thorsten"
-## Samples of my voice
-To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.
-
-* [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
-* [Der Prophet spricht stets in Gleichnissen.](./samples/original_recording/recorded_sample_02.wav )
-* [Bitte schmeißt euren Müll nicht einfach in die Walachei.](./samples/original_recording/recorded_sample_03.wav )
-* [So etwas würde mir nie in den Sinn kommen.](./samples/original_recording/recorded_sample_04.wav )
-* [Sie klettert auf einen Stein und nimmt eine Denkerpose ein.](./samples/original_recording/recorded_sample_05.wav )
-* [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
-* [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )
+- [Special Thanks](#thanks-section)


-## Dataset information
+# Motivation for Thorsten-Voice project :speaking_head: :speech_balloon:
+A **free** to use, **offline** working, **high quality** **german** **TTS** voice should be available for every project without any license struggling.

-* ljspeech-1.1 structure
-* 22.668 recorded phrases (wav files)
-* more than 23 hours of pure audio
-* samplerate 22.050Hz
-* mono
-* normalized to -24dB
-* phrase length (min/avg/max): 2 / 52 / 180 chars
-* no silence at beginning/ending
-* avg spoken chars per second: 14
-* sentences with question mark: 2.780
-* sentences with exclamation mark: 1.840
+<a href="https://twitter.com/intent/follow?screen_name=ThorstenVoice"><img src="https://img.shields.io/twitter/follow/ThorstenVoice?style=social&logo=twitter" alt="follow on Twitter"></a>
+[![YouTube Channel Subscribers](https://img.shields.io/youtube/channel/subscribers/UCjqqTVVBTsxpm0iOhQ1fp9g?style=social)](https://www.youtube.com/c/ThorstenMueller)
+[![Project website](https://img.shields.io/badge/Project_website-www.Thorsten--Voice.de-92a0c0)](https://www.Thorsten-Voice.de)
+
+# Social media
+Please check and follow me on my social media profiles - Thank you.
+
+| Platform         | Link                                                                                                            |
+| --------------- | ------- |
+| Youtube | [ThorstenVoice on Youtube](https://www.youtube.com/c/ThorstenMueller) |
+| Twitter | [ThorstenVoice on Twitter](https://twitter.com/ThorstenVoice) |
+| Instagram | [ThorstenVoice on Instagram](https://www.instagram.com/thorsten_voice/) |
+| LinkedIn | [Thorsten Müller on LinkedIn](https://www.linkedin.com/in/thorsten-m%C3%BCller-848a344/) |
+
+# Some personal words before using **Thorsten-Voice**
+> I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. :earth_africa: (*Thorsten Müller*)
+
+Please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with the world.
+
+# Voice-Datasets
+Voice datasets are listed on Zenodo:
+| Dataset         | DOI Link                                                                                                            |
+| --------------- | ------- |
+| Thorsten-21.02-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342) |
+| Thorsten-21.06-emotional | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023) |
+| Thorsten-22.10-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7265581.svg)](https://doi.org/10.5281/zenodo.7265581) |
+
+## Thorsten-21.02-neutral
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342)
+
+```
+@dataset{muller_thorsten_2021_5525342,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {Thorsten-Voice - "Thorsten-21.02-neutral" Dataset},
+  month        = feb,
+  year         = 2021,
+  note         = {{Please use it to make the world a better place for 
+                   whole humankind.}},
+  publisher    = {Zenodo},
+  version      = {3.0},
+  doi          = {10.5281/zenodo.5525342},
+  url          = {https://doi.org/10.5281/zenodo.5525342}
+}
+```
+
+> :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1KVjGXG2ij002XRHb3fgFK4j0OEq1FsWm?usp=sharing).**
+
+### Dataset summary
+* Recorded by Thorsten Müller
+* Optimized by Dominik Kreutz
+* LJSpeech file and directory structure
+* 22.668 recorded phrases (*wav files*)
+* More than 23 hours of pure audio
+* Samplerate 22.050Hz
+* Mono
+* Normalized to -24dB
+* Phrase length (min/avg/max): 2 / 52 / 180 chars
+* No silence at beginning/ending
+* Avg spoken chars per second: 14
+* Sentences with question mark: 2.780
+* Sentences with exclamation mark: 1.840
+
+### Dataset evolution
+As described in the PDF document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf)) this dataset consists of three recording phases.
+
+* **Phase 1**: Recorded with a cheap usb microphone (*low quality*)
+* **Phase 2**: Recorded with a good microphone (*good quality*)
+* **Phase 3**: Recorded with same good microphone but longer phrases (> 100 chars) (*good quality*)
+
+If you want to use a dataset subset you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.


-![text length vs. mean audio duration](./img/thorsten-de---datasetAnalysis1.png)
-![text length vs. median audio duration](./img/thorsten-de---datasetAnalysis2.png)
-![text length vs. STD](./img/thorsten-de---datasetAnalysis3.png)
-![text length vs. number instances](./img/thorsten-de---datasetAnalysis4.png)
-![signal noise ratio](./img/thorsten-de---datasetAnalysis5.png)
-![bokeh](./img/thorsten-de---datasetAnalysis6.png)
+## Thorsten-21.06-emotional
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023)

-## Dataset evolution
-As decribed in the pdf document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf)) this dataset consists of three recording phases.
+```
+@dataset{muller_thorsten_2021_5525023,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {{Thorsten-Voice - "Thorsten-21.06-emotional" 
+                   Dataset}},
+  month        = jun,
+  year         = 2021,
+  note         = {{Please use it to make the world a better place for 
+                   whole humankind.}},
+  publisher    = {Zenodo},
+  version      = {2.0},
+  doi          = {10.5281/zenodo.5525023},
+  url          = {https://doi.org/10.5281/zenodo.5525023}
+}
+```

-* phase1: Recorded with a cheap usb microphone
-* phase2: Recorded with a good microphone
-* phase3: Recorded with same good microphone but longer phrases (> 100 chars)
+All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.

-If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
+### Samples
+Listen to the phrase "**Mist, wieder nichts geschafft.**" in following emotions.
+
+* :slightly_smiling_face: [Neutral](./samples/thorsten-21.06-emotional/neutral.wav)
+* :nauseated_face: [Disgusted](./samples/thorsten-21.06-emotional/disgusted.wav)
+* :angry: [Angry](./samples/thorsten-21.06-emotional/angry.wav)
+* :grinning: [Amused](./samples/thorsten-21.06-emotional/amused.wav)
+* :astonished: [Surprised](./samples/thorsten-21.06-emotional/surprised.wav)
+* :pensive: [Sleepy](./samples/thorsten-21.06-emotional/sleepy.wav)
+* :dizzy_face: [Drunk](./samples/thorsten-21.06-emotional/drunk.wav)
+* 🤫 [Whispering](./samples/thorsten-21.06-emotional/whisper.wav)
+### Dataset summary
+* Recorded by Thorsten Müller
+* Optimized by Dominik Kreutz
+* 300 sentences * 8 emotions = 2.400 recordings
+* Mono
+* Samplerate 22.050Hz
+* Normalized to -24dB
+* No silence at beginning/ending
+* Sentence length: 59 - 148 chars


-## Download information
-> Download size: 2,7GB
+## Thorsten-22.10-neutral
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7265581.svg)](https://doi.org/10.5281/zenodo.7265581)
+> :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1dxoSo8Ktmh-5E0rSVqkq_Jm1r4sFnwJM?usp=sharing).**

-Version | Description | Date | Link
------------ | ------------- | ------------- | -------------
-thorsten-de-v01 | Initial version | 2020-06-28 | [Google Drive Download v01](https://drive.google.com/file/d/1yKJM1LAOQpRVojKunD9r8WN_p5KzBxjc/view?usp=sharing)
-thorsten-de-v02 | normalized to -24dB and split metadata.csv into shuffeled metadata_train.csv and metadata_val.csv | 2020-08-22 | [Google Drive Download v02](https://drive.google.com/file/d/1mGWfG0s2V2TEg-AI2m85tze1m4pyeM7b/view?usp=sharing)
+```
+@dataset{muller_thorsten_2022_7265581,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {ThorstenVoice Dataset 2022.10},
+  month        = oct,
+  year         = 2022,
+  publisher    = {Zenodo},
+  version      = {1.0},
+  doi          = {10.5281/zenodo.7265581},
+  url          = {https://doi.org/10.5281/zenodo.7265581
+}
+```
+
+# TTS Models
+
+## Thorsten-21.04-Tacotron2-DCA
+This [TTS-model](https://drive.google.com/drive/folders/1m4RuffbvdOmQWnmy_Hmw0cZ_q0hj2o8B?usp=sharing) has been trained on [**Thorsten-21.02-neutral**](#thorsten-2102-neutral) dataset. The recommended trained Fullband-MelGAN Vocoder can be downloaded [here](https://drive.google.com/drive/folders/1hsfaconm4Yd9wPVyOtrXjWQs4ZAPoouY?usp=sharing).
+
+Run the model:
+* pip install TTS==0.5.0
+* tts-server --model_name tts_models/de/thorsten/tacotron2-DCA


-# Trained tacotron2 model "thorsten"
-If you trained a model on "thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated. 
+## Thorsten-22.05-VITS
+Trained on dataset **Thorsten-22.05-neutral**.
+Audio samples are available on [Thorsten-Voice website](https://www.thorsten-voice.de/en/just-get-started/).

-## Trained models (TODO)
+To run TTS server just follow these steps:
+* pip install tts==0.7.1
+* tts-server --model_name tts_models/de/thorsten/vits
+* Open browser on http://localhost:5002 and enjoy playing
+
+## Thorsten-22.08-Tacotron2-DDC
+Trained on dataset [**Thorsten-22.05-neutral**](#thorsten-2205-neutral).
+Audio samples are available on [Thorsten-Voice website]([https://www.thorsten-voice.de/en/just-get-started/](https://www.thorsten-voice.de/2022/08/14/welches-tts-modell-klingt-besser/)).
+
+To run TTS server just follow these steps:
+* pip install tts==0.8.0
+* tts-server --model_name tts_models/de/thorsten/tacotron2-DDC
+* Open browser on http://localhost:5002 and enjoy playing
+
+
+## Other models
+### Silero
+
+You can use a free A-GPL licensed models trained on **Thorsten-21.02-neutral** dataset via the [silero-models](https://github.com/snakers4/silero-models/blob/master/models.yml) project.
+
+* [Thorsten 16kHz](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing)
+* [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb)
+
+### ZDisket
+[ZDisket](https://github.com/ZDisket/TensorVox) made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by [monatis](https://github.com/monatis/german-tts). Thanks for sharing that :thumbsup:. See it in action on [Youtube](https://youtu.be/tY6_xZnkv-A).
+
+# Public talks
+I really want to bring the topic "**Open Voice For An Open Future**" to a bigger public attention.
+
+* I've been part of a Linux User Group podcast about Mycroft AI and talked on my TTS efforts on that in (*May 2021*).
+* I was invited by [Yusuf](https://github.com/monatis/) from Turkish tensorflow community to talk on "How to make machines speak with your own voice". This talk has been streamed live on Youtube and is available [here](https://www.youtube.com/watch?v=m-Uwb-Bg144&t=2303s). If you're interested on the showed slides, feel free to download my presentation [here](https://docs.google.com/presentation/d/1ynnw0ilKV3WwMSJHytrN3GXRiFr8x3r0DUimBm1y0LI/edit?usp=sharing) (*June 2021*)
+)
+* I've been invited as speaker on VoiceLunch language & linguistics on 03.01.2022. [Here are my slides](https://docs.google.com/presentation/d/1Gi6BmYHs7g4ZgdAiIKGBnBwZDCvJOD9DJxQOGlgds1o/edit?usp=sharing) (*January 2022*).
+
+# Youtube channel
+In summer 2021 i've started to share my lessons learned and experiences on open voice tech, in special **TTS** on my little [Youtube channel](https://www.youtube.com/c/ThorstenMueller). If you check out and like my videos i'd happy to welcome you as subscriber and member of my little Youtube community.

-Folder | Date | Link | Description
------------ | ------------- | ------------- | -------------
-thorsten-taco2-ddc-v0.1 | to do | to do | to do

 # Feel free to file an issue if you ...
-* have improvements on dataset
-* use my TTS voice in your project(s)
-* want to share your trained "thorsten" model
-* get to know about any abuse usage of my voice
+* Use my TTS voice in your project(s)
+* Want to share your trained "Thorsten" model
+* Get to know about any abuse usage of my voice

-# Special thanks
-I want to thank all open source communities for providing great projects.
+# Thanks section
+## Cool projects
+* https://commonvoice.mozilla.org/
+* https://coqui.ai/
+* https://mycroft.ai/
+* https://github.com/rhasspy/

-Just to name some nice guys who joined me on this tts-roadtrip:
+## Cool people
+* [El-Tocino](https://github.com/el-tocino/)
+* [Eren Gölge](https://github.com/erogol/)
+* [Gras64](https://github.com/gras64/)
+* [Kris Gesling](https://github.com/krisgesling/)
+* [Nmstoker](https://github.com/nmstoker)
+* [Othiele](https://discourse.mozilla.org/u/othiele/summary)
+* [Repodiac](https://github.com/repodiac)
+* [SanjaESC](https://github.com/SanjaESC)
+* [Synesthesiam](https://github.com/synesthesiam/)

-* eltocino (https://github.com/el-tocino/)
-* erogol (https://github.com/erogol/)
-* gras64 (https://github.com/gras64/)
-* krisgesling (https://github.com/krisgesling/)
-* nmstoker (https://github.com/nmstoker)
-* othiele (https://discourse.mozilla.org/u/othiele/summary)
-* repodiac (https://github.com/repodiac)
+## Even more special people
+Additionally, a really nice thanks for my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design.

-And last but not least i want to say a huge thank you to a special guy who supported me on this journey right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally his gpu computing power. 
+And last but not least i want to say a **huge, huge thank you** to a special guy who supported me on this journey as a partner right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally GPU power. 

-Without his amazing support this dataset (in it's current way) would not exists.
+**Thank you so much, dear **Dominik** ([@domcross](https://github.com/domcross/)) for being my partner on this journey.**

-Thank you Dominik (@domcross / https://github.com/domcross/)
-
-# Links
-* https://discourse.mozilla.org/t/contributing-my-german-voice-for-tts/48150
-* https://community.mycroft.ai/
-* https://github.com/MycroftAI/mimic-recording-studio
-* https://voice.mozilla.org/
-* https://github.com/mozilla/TTS
-(https://github.com/repodiac/tit-for-tat/tree/master/thorsten-TTS)
-* https://raw.githubusercontent.com/mozilla/voice-web/master/server/data/de/sentence-collector.txt
-
-We'll hear us in future :-)
-
-Thorsten
+Thorsten (*Twitter: @ThorstenVoice*)
--- a/Youtube/train_vits_win.py
+++ b/Youtube/train_vits_win.py
@ -0,0 +1,94 @@
+import os
+
+from trainer import Trainer, TrainerArgs
+
+from TTS.tts.configs.shared_configs import BaseDatasetConfig
+from TTS.tts.configs.vits_config import VitsConfig
+from TTS.tts.datasets import load_tts_samples
+from TTS.tts.models.vits import Vits, VitsAudioConfig
+from TTS.tts.utils.text.tokenizer import TTSTokenizer
+from TTS.utils.audio import AudioProcessor
+
+def main():
+
+	output_path = os.path.dirname(os.path.abspath(__file__))
+	#output_path = "c:\\temp\tts"
+	dataset_config = BaseDatasetConfig(
+		formatter="ljspeech", meta_file_train="metadata_small.csv", path="C:\\Users\\ThorstenVoice\\TTS-Training\\ThorstenVoice-Dataset_2022.10"
+	)
+	audio_config = VitsAudioConfig(
+		sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None
+	)
+
+	config = VitsConfig(
+		audio=audio_config,
+		run_name="vits_thorsten-voice",
+		batch_size=4,
+		eval_batch_size=4,
+		batch_group_size=5,
+		num_loader_workers=1,
+		num_eval_loader_workers=1,
+		run_eval=True,
+		test_delay_epochs=-1,
+		epochs=1000,
+		text_cleaner="phoneme_cleaners",
+		use_phonemes=True,
+		phoneme_language="de",
+		phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
+		compute_input_seq_cache=True,
+		print_step=25,
+		print_eval=True,
+		mixed_precision=False,
+		output_path=output_path,
+		datasets=[dataset_config],
+		cudnn_benchmark=False,
+		test_sentences=[
+		  "Es hat mich viel Zeit gekostet ein Stimme zu entwickeln, jetzt wo ich sie habe werde ich nicht mehr schweigen.",
+		  "Sei eine Stimme, kein Echo.",
+		  "Es tut mir Leid David. Das kann ich leider nicht machen.",
+		  "Dieser Kuchen ist großartig. Er ist so lecker und feucht.",
+		  "Vor dem 22. November 1963.",
+		],
+	)
+
+	# INITIALIZE THE AUDIO PROCESSOR
+	# Audio processor is used for feature extraction and audio I/O.
+	# It mainly serves to the dataloader and the training loggers.
+	ap = AudioProcessor.init_from_config(config)
+
+	# INITIALIZE THE TOKENIZER
+	# Tokenizer is used to convert text to sequences of token IDs.
+	# config is updated with the default characters if not defined in the config.
+	tokenizer, config = TTSTokenizer.init_from_config(config)
+
+	# LOAD DATA SAMPLES
+	# Each sample is a list of ```[text, audio_file_path, speaker_name]```
+	# You can define your custom sample loader returning the list of samples.
+	# Or define your custom formatter and pass it to the `load_tts_samples`.
+	# Check `TTS.tts.datasets.load_tts_samples` for more details.
+	train_samples, eval_samples = load_tts_samples(
+		dataset_config,
+		eval_split=True,
+		eval_split_max_size=config.eval_split_max_size,
+		eval_split_size=config.eval_split_size,
+	)
+
+	# init model
+	model = Vits(config, ap, tokenizer, speaker_manager=None)
+
+	# init the trainer and 🚀
+	trainer = Trainer(
+		TrainerArgs(),
+		config,
+		output_path,
+		model=model,
+		train_samples=train_samples,
+		eval_samples=eval_samples,
+	)
+	trainer.fit()
+	print("Fertig!")
+
+from multiprocessing import Process, freeze_support
+if __name__ == '__main__':
+    freeze_support()  # needed for Windows
+    main()
--- a/docs/_config.yml
+++ b/docs/_config.yml
@ -0,0 +1 @@
+theme: jekyll-theme-cayman
--- a/docs/audio_compare.md
+++ b/docs/audio_compare.md
@ -0,0 +1,449 @@
+# Vocoder Vergleich auf Basis des "thorsten" Tacotron 2 Modells
+Hier sind Hörproben mit unterschiedlichen Vocodern. Alle gesprochenen Texte (*Sample 1 - 4*) basieren auf Aufnahmen im Dataset, jedoch nicht auf dem Spektogramm von "ground truth", sondern auf Basis des trainierten Tacotron 2 Modells. Sample 5 ist der Beginn des Märchens "Der Froschkönig" und wurde nicht für das Dataset aufgezeichnet.
+
+## Sätze
+* **Sample #01**: Eure Schoko-Bonbons sind sagenhaft lecker!
+* **Sample #02**: Eure Tröte nervt.
+* **Sample #03**: Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet.
+* **Sample #04**: Euer Plan hat ja toll geklappt.
+* *Sample #05: "In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön." (Anfang vom "Froschkönig")*
+
+# Ground truth
+Originalaufnahmen aus dem "thorsten" Dataset.
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-gt.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-gt.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-gt.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-gt.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+
+# Griffin Lim
+> Details zum Model: (todo: link)  
+> Tacotron2 + DDC: 460k Schritte trainiert
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-griffin-lim.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# ParallelWaveGAN
+> Details: [Notebook von Olaf](https://colab.research.google.com/drive/15kJHTDTVxyIjxiZgqD1G_s5gUeVNLkfy?usp=sharing)  
+> Tacotron2 + DDC: 360k Schritte trainiert, PWGAN Vocoder: 925k Schritte trainiert
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-pwgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-pwgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-pwgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-pwgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-pwgan.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+
+# WaveGrad
+> Tacotron2 + DDC: 460k Schritte trainiert, WaveGrad Vocoder: 510k Schritte trainiert (inkl. Noise-Schedule)
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-wavegrad.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# HifiGAN
+> Thanks to SanjaESC (https://github.com/SanjaESC) for training this model.
+<dl>
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-hifigan.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# VocGAN
+> **Diese Beispiele basieren auf "ground truth" und nicht auf dem Tacotron 2 Modell**  
+> 200 Epochen / 284k Trainingsschritte
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-vocgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-vocgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-vocgan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-vocgan.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# GlowTTS / Waveglow
+> Details: [Github von Synesthesiam](https://github.com/rhasspy/de_larynx-thorsten)
+> GlowTTS trainiert für 380k und Vocoder für 500k Schritte.
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-waveglow.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-waveglow.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-waveglow.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-waveglow.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-waveglow.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+
+
+# TensorFlowTTS
+## Multiband MelGAN
+> Thanks [Monatis](https://github.com/monatis)  
+> Details: [Notebook von Monatis](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing#scrollTo=SCbWCChVkfnn)  
+> Taco2 Modell für 80k Schritte trainiert, Multiband MelGAN für 800k Schritte.
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-TensorFlowTTS.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+
+# Silero models
+> Thanks [snakers4](https://github.com/snakers4/silero-models)  
+> Details: [Notebook von Silero](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb#scrollTo=indirect-berry)  
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-silero.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# Forward Tacotron
+> Thanks [cschaefer26](https://github.com/as-ideas/ForwardTacotron)  
+> Config: Forward-Tacotron, trained to 300k, alpha set to 0.8, pretrained HifiGAN vocoder
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1,48 @@
+# Motivation
+
+<span style="font-size:1.5em;font-weight:bold">
+Eine kostenfreie, qualitativ hochwertige, deutsche TTS Stimme, die offline erzeugt werden kann sollte jedem Projekt ohne Lizenzrechtliche Probleme zur Verfügung stehen.
+</span>
+
+
+# Egal aus welchem Bereich du kommst:
+* Privates Bastelprojekt
+* OpenSource/Community Projekt
+* Bildung/Forschung/Wissenschaft
+* Kommerzielles Unternehmen
+* ...
+
+# Egal welcher Bereich dich interessiert:
+* Smarte Sprachassistenten
+* Navigationssysteme
+* Smart Homes
+* Sprechende Kühlschränke
+* Vorlesen von Bildschirmtexten (Barrierefreiheit)
+* Interaktive Robotik
+* ...
+
+# Wer wir sind
+Wir sind eine kleine motivierte Gruppe hobbymäßiger TTS-Enthusiasten die sich nach einem abgewandelten "Herr der Ringe Zitat" benannt hat - "**Fellowership of free german tts**"
+
+# Wo wir aktuell stehen
+Wir arbeiten weiterhin daran qualitativ noch bessere Modell zu trainieren, aber den aktuellen "stable" Stand kannst Du hier anhören:
+* [Es ist im Moment klarer Himmel bei 18 Grad.](https://drive.google.com/file/d/1cDIq4QG6i60WjUYNT6fr2cpEjFQIi8w5/view?usp=sharing)
+* [Ich verstehe das nicht, aber ich lerne jeden Tag neue Dinge.](https://drive.google.com/file/d/1kja_2RsFt6EmC33HTB4ozJyFlvh_DTFQ/view?usp=sharing)
+* [Ich bin jetzt bereit.](https://drive.google.com/file/d/1GkplGH7LMJcPDpgFJocXHCjRln_ccVFs/view?usp=sharing)
+* [Bitte warte einen Moment, bis ich fertig mit dem Booten bin.](https://drive.google.com/file/d/19Td-F14n_05F-squ3bNlt2BDE-NMFaq1/view?usp=sharing)
+* [Mein Name ist Mycroft und ich bin funky.](https://drive.google.com/file/d/1dbyOyE7Oy8YdAsYqQ4vz4VJjiWIyc8oV/view?usp=sharing)
+
+
+## Vergleich einiger Vocoder
+Wir experimentieren aktuell mit unterschiedlichen Konfigurationen um das beste Modell zu ermitteln. Ein Vergleich der bisherigen Ergebnisse findest Du auf dieser Seite. 
+> [Vergleich der unterschiedlichen Modelle](./audio_compare)
+
+# Interessiert?
+[Weitere Details, Downloads und Danksagungen findet ihr hier.](https://github.com/thorstenMueller/deep-learning-german-tts "Dataset Details und Thorsten-Modell Download")
+
+
+---
+
+<span style="font-size:1.5em;font-weight:bold">
+Wir wünschen euch viel Spaß und Erfolg bei der Umsetzung eurer Projekte :-)
+</span>
--- a/docs/samples/sample01-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample01-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample01-TensorFlowTTS.wav
+++ b/docs/samples/sample01-TensorFlowTTS.wav
--- a/docs/samples/sample01-griffin-lim.wav
+++ b/docs/samples/sample01-griffin-lim.wav
--- a/docs/samples/sample01-gt.wav
+++ b/docs/samples/sample01-gt.wav
--- a/docs/samples/sample01-hifigan.wav
+++ b/docs/samples/sample01-hifigan.wav
--- a/docs/samples/sample01-pwgan.wav
+++ b/docs/samples/sample01-pwgan.wav
--- a/docs/samples/sample01-silero.wav
+++ b/docs/samples/sample01-silero.wav
--- a/docs/samples/sample01-vocgan.wav
+++ b/docs/samples/sample01-vocgan.wav
--- a/docs/samples/sample01-waveglow.wav
+++ b/docs/samples/sample01-waveglow.wav
--- a/docs/samples/sample01-wavegrad.wav
+++ b/docs/samples/sample01-wavegrad.wav
--- a/docs/samples/sample02-ForwardTacotron-HifiGAN.wav.wav
+++ b/docs/samples/sample02-ForwardTacotron-HifiGAN.wav.wav
--- a/docs/samples/sample02-TensorFlowTTS.wav
+++ b/docs/samples/sample02-TensorFlowTTS.wav
--- a/docs/samples/sample02-griffin-lim.wav
+++ b/docs/samples/sample02-griffin-lim.wav
--- a/docs/samples/sample02-gt.wav
+++ b/docs/samples/sample02-gt.wav
--- a/docs/samples/sample02-hifigan.wav
+++ b/docs/samples/sample02-hifigan.wav
--- a/docs/samples/sample02-pwgan.wav
+++ b/docs/samples/sample02-pwgan.wav
--- a/docs/samples/sample02-silero.wav
+++ b/docs/samples/sample02-silero.wav
--- a/docs/samples/sample02-vocgan.wav
+++ b/docs/samples/sample02-vocgan.wav
--- a/docs/samples/sample02-waveglow.wav
+++ b/docs/samples/sample02-waveglow.wav
--- a/docs/samples/sample02-wavegrad.wav
+++ b/docs/samples/sample02-wavegrad.wav
--- a/docs/samples/sample03-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample03-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample03-TensorFlowTTS.wav
+++ b/docs/samples/sample03-TensorFlowTTS.wav
--- a/docs/samples/sample03-griffin-lim.wav
+++ b/docs/samples/sample03-griffin-lim.wav
--- a/docs/samples/sample03-gt.wav
+++ b/docs/samples/sample03-gt.wav
--- a/docs/samples/sample03-hifigan.wav
+++ b/docs/samples/sample03-hifigan.wav
--- a/docs/samples/sample03-pwgan.wav
+++ b/docs/samples/sample03-pwgan.wav
--- a/docs/samples/sample03-silero.wav
+++ b/docs/samples/sample03-silero.wav
--- a/docs/samples/sample03-vocgan.wav
+++ b/docs/samples/sample03-vocgan.wav
--- a/docs/samples/sample03-waveglow.wav
+++ b/docs/samples/sample03-waveglow.wav
--- a/docs/samples/sample03-wavegrad.wav
+++ b/docs/samples/sample03-wavegrad.wav
--- a/docs/samples/sample04-ForwardTacotron-HifiGAN.wav.wav
+++ b/docs/samples/sample04-ForwardTacotron-HifiGAN.wav.wav
--- a/docs/samples/sample04-TensorFlowTTS.wav
+++ b/docs/samples/sample04-TensorFlowTTS.wav
--- a/docs/samples/sample04-griffin-lim.wav
+++ b/docs/samples/sample04-griffin-lim.wav
--- a/docs/samples/sample04-gt.wav
+++ b/docs/samples/sample04-gt.wav
--- a/docs/samples/sample04-hifigan.wav
+++ b/docs/samples/sample04-hifigan.wav
--- a/docs/samples/sample04-pwgan.wav
+++ b/docs/samples/sample04-pwgan.wav
--- a/docs/samples/sample04-silero.wav
+++ b/docs/samples/sample04-silero.wav
--- a/docs/samples/sample04-vocgan.wav
+++ b/docs/samples/sample04-vocgan.wav
--- a/docs/samples/sample04-waveglow.wav
+++ b/docs/samples/sample04-waveglow.wav
--- a/docs/samples/sample04-wavegrad.wav
+++ b/docs/samples/sample04-wavegrad.wav
--- a/docs/samples/sample05-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample05-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample05-TensorFlowTTS.wav
+++ b/docs/samples/sample05-TensorFlowTTS.wav
--- a/docs/samples/sample05-griffin-lim.wav
+++ b/docs/samples/sample05-griffin-lim.wav
--- a/docs/samples/sample05-hifigan.wav
+++ b/docs/samples/sample05-hifigan.wav
--- a/docs/samples/sample05-pwgan.wav
+++ b/docs/samples/sample05-pwgan.wav
--- a/docs/samples/sample05-silero.wav
+++ b/docs/samples/sample05-silero.wav
--- a/docs/samples/sample05-waveglow.wav
+++ b/docs/samples/sample05-waveglow.wav
--- a/docs/samples/sample05-wavegrad.wav
+++ b/docs/samples/sample05-wavegrad.wav
--- a/german_corpus-mimic_recording_studio.csv
+++ b/german_corpus-mimic_recording_studio.csv
--- a/helperScripts/Dockerfile.Jetson-Coqui
+++ b/helperScripts/Dockerfile.Jetson-Coqui
@ -0,0 +1,51 @@
+# Dockerfile for running Coqui TTS trainings in a docker container on NVIDIA Jetson platofrm.
+# Based on NVIDIA Jetson ML Image, provided without any warranty as is by Thorsten Müller (https://twitter.com/ThorstenVoice) in august 2021
+
+FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3
+
+RUN echo "deb https://repo.download.nvidia.com/jetson/common r32.4 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
+RUN echo "deb https://repo.download.nvidia.com/jetson/t194 r32.4 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
+
+RUN apt-get update -y
+RUN apt-get install vim python-mecab libmecab-dev cuda-toolkit-10-2 libcudnn8 libcudnn8-dev libsndfile1-dev locales -y
+
+# Setting some environment vars
+ENV LLVM_CONFIG=/usr/bin/llvm-config-9
+ENV PYTHONPATH=/coqui/TTS/
+ENV LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
+# Skipping OPENBLAS_CORETYPE might show "Illegal instruction (core dumped) error
+ENV OPENBLAS_CORETYPE=ARMV8
+
+ENV NVIDIA_VISIBLE_DEVICES all
+ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
+LABEL com.nvidia.volumes.needed="nvidia_driver"
+
+# Adjust locale setting to your personal needs
+RUN sed -i '/de_DE.UTF-8/s/^# //g' /etc/locale.gen && \
+    locale-gen
+ENV LANG de_DE.UTF-8
+ENV LANGUAGE de_DE:de
+ENV LC_ALL de_DE.UTF-8
+
+RUN mkdir /coqui
+WORKDIR /coqui
+
+ARG COQUI_BRANCH
+RUN git clone -b ${COQUI_BRANCH} https://github.com/coqui-ai/TTS.git
+WORKDIR /coqui/TTS
+RUN pip3 install pip setuptools wheel --upgrade
+RUN pip uninstall -y tensorboard tensorflow tensorflow-estimator nbconvert matplotlib
+RUN pip install -r requirements.txt
+RUN python3 ./setup.py develop
+
+# Jupyter Notebook
+RUN python3 -c "from notebook.auth.security import set_password; set_password('nvidia', '/root/.jupyter/jupyter_notebook_config.json')"
+CMD /bin/bash -c "jupyter lab --ip 0.0.0.0 --port 8888 --allow-root"
+
+
+# Build example:
+#   nvidia-docker build . -f Dockerfile.Jetson-Coqui --build-arg COQUI_BRANCH=v0.1.3 -t jetson-coqui
+# Run example:
+#   nvidia-docker run -p 8888:8888 -d --shm-size 32g --gpus all -v /ssd/___prj/tts/dataset-july21:/coqui/TTS/data jetson-coqui
+# Bash example:
+#   nvidia-docker exec -it <containerId> /bin/bash
--- a/helperScripts/MRS2LJSpeech.py
+++ b/helperScripts/MRS2LJSpeech.py
@ -0,0 +1,157 @@
+# This script generates the folder structure for ljspeech-1.1 processing from mimic-recording-studio database
+
+# Changelog
+# v1.0  - Initial release by Thorsten Müller (https://github.com/thorstenMueller/deep-learning-german-tts)
+# v1.1  - Great improvements by Peter Schmalfeldt (https://github.com/manifestinteractive)
+#           - Audio processing with ffmpeg (mono and samplerate of 22.050 Hz)
+#           - Much better Python coding than my original version
+#           - Greater logging output to command line
+#           - See more details here: https://gist.github.com/manifestinteractive/6fd9be62d0ede934d4e1171e5e751aba
+#           - Thanks Peter, it's a great contribution :-)
+# v1.2  - Added choice for choosing which recording session should be exported as LJSpeech
+# v1.3  - Added parameter mrs_dir to pass directory of Mimic-Recording-Studio
+# v1.4  - Script won't crash when audio recorded has been deleted on disk
+# v1.5  - Added parameter "ffmpeg" to make converting with ffmpeg optional
+
+from genericpath import exists
+import glob
+import sqlite3
+import os
+import argparse
+import sys
+
+from shutil import copyfile
+from shutil import rmtree
+
+# Setup Directory Data
+cwd = os.path.dirname(os.path.abspath(__file__))
+output_dir = os.path.join(cwd, "dataset")
+output_dir_audio = ""
+output_dir_audio_temp=""
+output_dir_speech = ""
+
+# Create folders needed for ljspeech
+def create_folders():
+  global output_dir
+  global output_dir_audio
+  global output_dir_audio_temp
+  global output_dir_speech
+
+  print('→ Creating Dataset Folders')
+
+  output_dir_speech = os.path.join(output_dir, "LJSpeech-1.1")
+
+  # Delete existing folder if exists for clean run
+  if os.path.exists(output_dir_speech):
+    rmtree(output_dir_speech)
+
+  output_dir_audio = os.path.join(output_dir_speech, "wavs")
+  output_dir_audio_temp = os.path.join(output_dir_speech, "temp")
+
+  # Create Clean Folders
+  os.makedirs(output_dir_speech)
+  os.makedirs(output_dir_audio)
+  os.makedirs(output_dir_audio_temp)
+
+def convert_audio():
+  global output_dir_audio
+  global output_dir_audio_temp
+
+  recordings = len([name for name in os.listdir(output_dir_audio_temp) if os.path.isfile(os.path.join(output_dir_audio_temp,name))])
+  
+  print('→ Converting %s Audio Files to 22050 Hz, 16 Bit, Mono\n' % "{:,}".format(recordings))
+
+  # Please use `pip install ffmpeg-python`
+  import ffmpeg
+
+  for idx, wav in enumerate(glob.glob(os.path.join(output_dir_audio_temp, "*.wav"))):
+
+    percent = (idx + 1) / recordings
+
+    print('› \033[96m%s\033[0m \033[2m%s / %s (%s)\033[0m ' % (os.path.basename(wav), "{:,}".format((idx + 1)), "{:,}".format(recordings), "{:.0%}".format(percent)))
+
+    # Convert WAV file to required format
+    (ffmpeg
+      .input(wav)
+      .output(os.path.join(output_dir_audio, os.path.basename(wav)), acodec='pcm_s16le', ac=1, ar=22050, loglevel='error')
+      .overwrite_output()
+      .run(capture_stdout=True)
+    )
+
+
+def copy_audio():
+  global output_dir_audio
+
+  print('→ Using ffmpeg to convert recordings')
+  recordings = len([name for name in os.listdir(output_dir_audio_temp) if os.path.isfile(os.path.join(output_dir_audio_temp,name))])
+  
+  print('→ Copy %s Audio Files to LJSpeech Dataset\n' % "{:,}".format(recordings))
+
+  for idx, wav in enumerate(glob.glob(os.path.join(output_dir_audio_temp, "*.wav"))):    
+    copyfile(wav,os.path.join(output_dir_audio, os.path.basename(wav)))
+
+def create_meta_data(mrs_dir):
+  print('→ Creating META Data')
+
+  conn = sqlite3.connect(os.path.join(mrs_dir, "backend", "db", "mimicstudio.db"))
+  c = conn.cursor()
+
+  # Create metadata.csv for ljspeech
+  metadata = open(os.path.join(output_dir_speech, "metadata.csv"), mode="w", encoding="utf8")
+
+  # List available recording sessions
+  user_models = c.execute('SELECT uuid, user_name from usermodel ORDER BY created_date DESC').fetchall()
+  user_id = user_models[0][0]
+
+  for row in user_models:
+    print(row[0] + ' -> ' + row[1])
+
+  user_answer = input('Please choose ID of recording session to export (default is newest session) [' + user_id + ']: ')
+
+  if user_answer:
+    user_id = user_answer
+
+
+  for row in c.execute('SELECT audio_id, prompt, lower(prompt) FROM audiomodel WHERE user_id = "' + user_id + '" ORDER BY length(prompt)'):
+    source_file = os.path.join(mrs_dir, "backend", "audio_files", user_id, row[0] + ".wav")
+    if exists(source_file):
+      metadata.write(row[0] + "|" + row[1] + "|" + row[2] + "\n")
+      copyfile(source_file, os.path.join(output_dir_audio_temp, row[0] + ".wav"))
+    else:
+      print("Wave file {} not found.".format(source_file))
+
+  metadata.close()
+  conn.close()
+
+def cleanup():
+  global output_dir_audio_temp
+
+  # Remove Temp Folder
+  rmtree(output_dir_audio_temp)
+
+def main():
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mrs_dir', required=True)
+  parser.add_argument('--ffmpeg', required=False, default=False)
+  args = parser.parse_args()
+  
+  if not os.path.isdir(os.path.join(args.mrs_dir,"backend")):
+    sys.exit("Passed directory is no valid Mimic-Recording-Studio main directory!")
+
+  print('\n\033[48;5;22m  MRS to LJ Speech Processor  \033[0m\n')
+
+  create_folders()
+  create_meta_data(args.mrs_dir)
+
+  if(args.ffmpeg):
+    convert_audio()
+  
+  else:
+    copy_audio()
+  
+  cleanup()
+
+  print('\n\033[38;5;86;1m✔\033[0m COMPLETE【ツ】\n')
+
+if __name__ == '__main__':
+  main()
--- a/helperScripts/README.md
+++ b/helperScripts/README.md
@ -0,0 +1,27 @@
+# Short collection of helpful scripts for dataset creation and/or TTS training stuff
+
+## MRS2LJSpeech
+Python script which takes recordings (filesystem and sqlite db) done with Mycroft Mimic-Recording-Studio (https://github.com/MycroftAI/mimic-recording-studio) and creates an audio optimized dataset in widely supported LJSpeech directory structure.
+
+Peter Schmalfeldt (https://github.com/manifestinteractive) did an amazing job as he optimized my originally (quick'n dirty) version of that script, so thank you Peter :-)
+See more details here: https://gist.github.com/manifestinteractive/6fd9be62d0ede934d4e1171e5e751aba#file-mrs2ljspeech-py
+
+## Dockerfile.Jetson-Coqui
+> Add your user to `docker` group to not require sudo on all operations.
+
+Thanks to NVIDIA for providing docker images for Jetson platform. I use the "machine learning (ML)" image as baseimage for setting up a Coqui environment.
+
+> You can use any branch or tag as COQUI_BRANCH argument. v0.1.3 is just the current stable version.
+
+Switch to directory where Dockerfile is in and run `nvidia-docker build . -f Dockerfile.Jetson-Coqui --build-arg COQUI_BRANCH=v0.1.3 -t jetson-coqui` to build your container image. When build process is finished you can start a container on that image.
+
+
+### Mapped volumes
+We need to bring your dataset and configuration file into our container so we should map a volume on running container
+`nvidia-docker run -p 8888:8888 -d --shm-size 32g --gpus all -v [host path with dataset and config.json]:/coqui/TTS/data jetson-coqui`. Now we have a running container ready for Coqui TTS magic.
+
+### Jupyter notebook
+Coqui provides lots of useful Jupyter notebooks for dataset analysis. Once your container is up and running you should be able to call 
+
+### Running bash into container
+`nvidia-docker exec -it jetson-coqui /bin/bash` now you're inside the container and an `ls /coqui/TTS/data` should show your dataset files.
--- a/helperScripts/getDatasetSpeechRate.py
+++ b/helperScripts/getDatasetSpeechRate.py
@ -0,0 +1,41 @@
+# This script gets speech rate per audio recording from a voice dataset (ljspeech file and directory structure)
+# Writte by Thorsten Müller (deep-learning-german@gmx.net) and provided without any warranty.
+# https://github.com/thorstenMueller/deep-learning-german-tts/
+# https://twitter.com/ThorstenVoice
+
+# Changelog:
+# v0.1 - 26.09.2021 - Initial version
+
+from genericpath import exists
+import os
+import librosa
+import csv
+
+dataset_dir = "/home/thorsten/___dev/tts/dataset/Thorsten-neutral-Dec2021-44k/" # Directory where metadata.csv is in
+out_csv_file = os.path.join(dataset_dir,"speech_rate_report.csv")
+decimal_use_comma = True # False: Splitting decimal value with a dot (.); True: Comma (,)
+
+out_csv = open(out_csv_file,"w")
+out_csv.write("filename;audiolength_sec;number_chars;chars_per_sec;remove_from_dataset\n")
+
+# Open metadata.csv file
+with open(os.path.join(dataset_dir,"metadata.csv")) as csvfile:
+    reader = csv.reader(csvfile, delimiter='|')
+    for row in reader:
+        wav_file = os.path.join(dataset_dir,"wavs",row[0] + ".wav")
+
+        if exists(wav_file):
+            # Gather values for report.csv output
+            phrase_len = len(row[1]) - 1 # Do not count punctuation marks.
+            duration = round(librosa.get_duration(filename=wav_file),2)
+            char_per_sec = round(phrase_len / duration,2)
+
+            if decimal_use_comma:
+                duration = str(duration).replace(".",",")
+                char_per_sec = str(char_per_sec).replace(".",",")
+
+            out_csv.write(row[0] + ".wav;" + str(duration) + ";" + str(phrase_len) + ";" + str(char_per_sec) + ";no\n")
+        else:
+            print("File " + wav_file + " does not exist.")
+
+out_csv.close()
--- a/helperScripts/removeFilesFromDataset.py
+++ b/helperScripts/removeFilesFromDataset.py
@ -0,0 +1,48 @@
+# This script removes recordings from an ljspeech file/directory structured dataset based on CSV file from "getDatasetSpeechRate"
+# Writte by Thorsten Müller (deep-learning-german@gmx.net) and provided without any warranty.
+# https://github.com/thorstenMueller/deep-learning-german-tts/
+# https://twitter.com/ThorstenVoice
+
+# Changelog:
+# v0.1 - 26.09.2021 - Initial version
+
+import os
+import csv
+import shutil
+
+dataset_dir = "/Users/thorsten/Downloads/thorsten-export-20210909/" # Directory where metadata.csv is in
+subfolder_removed = "___removed"
+in_csv_file = os.path.join(dataset_dir,"speech_rate_report.csv")
+to_remove = []
+
+# Open metadata.csv file
+with open(os.path.join(dataset_dir,in_csv_file)) as csvfile:
+    reader = csv.reader(csvfile, delimiter=';')
+    for row in reader:
+        if row[4] == "yes":
+            # Recording in that row should be removed from dataset
+            to_remove.append(row[0])
+            print("Recording " + row[0] + " will be removed from dataset.")
+
+print("\n" + str(len(to_remove)) + " recordings has been marked for deletion.")
+
+if len(to_remove) > 0:
+
+    metadata_cleaned = open(os.path.join(dataset_dir,"metadata_cleaned.csv"),"w")
+
+    # Create new subdirectory for removed wav files
+    removed_dir = os.path.join(dataset_dir,subfolder_removed)
+    if not os.path.exists(removed_dir):
+        os.makedirs(removed_dir)
+
+    # Remove lines from metadata.csv and move wav files to new subdirectory
+    with open(os.path.join(dataset_dir,"metadata.csv")) as csvfile:
+        reader = csv.reader(csvfile, delimiter='|')
+        for row in reader:
+            if (row[0] + ".wav") not in to_remove:
+                metadata_cleaned.write(row[0] + "|" + row[1] + "|" + row[2] + "\n")
+            else:
+                # Move recording to new subfolder
+                shutil.move(os.path.join(dataset_dir,"wavs",row[0] + ".wav"),removed_dir)
+    
+    metadata_cleaned.close()
--- a/img/thorsten-de---datasetAnalysis1.png
+++ b/img/thorsten-de---datasetAnalysis1.png
--- a/img/thorsten-de---datasetAnalysis2.png
+++ b/img/thorsten-de---datasetAnalysis2.png
--- a/img/thorsten-de---datasetAnalysis3.png
+++ b/img/thorsten-de---datasetAnalysis3.png
--- a/img/thorsten-de---datasetAnalysis4.png
+++ b/img/thorsten-de---datasetAnalysis4.png
--- a/img/thorsten-de---datasetAnalysis5.png
+++ b/img/thorsten-de---datasetAnalysis5.png
--- a/img/thorsten-de---datasetAnalysis6.png
+++ b/img/thorsten-de---datasetAnalysis6.png
--- a/samples/original_recording/recorded_sample_01.wav
+++ b/samples/original_recording/recorded_sample_01.wav
--- a/samples/original_recording/recorded_sample_02.wav
+++ b/samples/original_recording/recorded_sample_02.wav
--- a/samples/original_recording/recorded_sample_03.wav
+++ b/samples/original_recording/recorded_sample_03.wav
--- a/samples/original_recording/recorded_sample_04.wav
+++ b/samples/original_recording/recorded_sample_04.wav
--- a/samples/original_recording/recorded_sample_05.wav
+++ b/samples/original_recording/recorded_sample_05.wav
--- a/samples/original_recording/recorded_sample_06.wav
+++ b/samples/original_recording/recorded_sample_06.wav
--- a/samples/original_recording/recorded_sample_07.wav
+++ b/samples/original_recording/recorded_sample_07.wav
--- a/samples/thorsten-21.06-emotional/amused.wav
+++ b/samples/thorsten-21.06-emotional/amused.wav
--- a/samples/thorsten-21.06-emotional/angry.wav
+++ b/samples/thorsten-21.06-emotional/angry.wav
--- a/samples/thorsten-21.06-emotional/disgusted.wav
+++ b/samples/thorsten-21.06-emotional/disgusted.wav
--- a/samples/thorsten-21.06-emotional/drunk.wav
+++ b/samples/thorsten-21.06-emotional/drunk.wav
--- a/samples/thorsten-21.06-emotional/neutral.wav
+++ b/samples/thorsten-21.06-emotional/neutral.wav
--- a/samples/thorsten-21.06-emotional/sleepy.wav
+++ b/samples/thorsten-21.06-emotional/sleepy.wav
--- a/samples/thorsten-21.06-emotional/surprised.wav
+++ b/samples/thorsten-21.06-emotional/surprised.wav
--- a/samples/thorsten-21.06-emotional/whisper.wav
+++ b/samples/thorsten-21.06-emotional/whisper.wav
Author	SHA1	Message	Date
Thorsten Müller	f13bcaf63e	Added Windows TTS training recipe Added modified vits recipe for Thorsten-Voice model training using Windows	2023-03-05 16:19:50 +01:00
Thorsten Müller	04c5683194	German Corpus for Mimic-Recording-Studio	2022-12-16 22:54:02 +01:00
Thorsten Müller	50e09d49bf	Added social media info	2022-11-13 17:08:26 +01:00
Thorsten Müller	b0afed75f4	Added new 2022.10 ThorstenVoice dataset.	2022-11-13 16:47:46 +01:00
Thorsten Müller	9b7b4c6836	Added new released Tacotron2 DDC model to README tts-server --model_name tts_models/de/thorsten/tacotron2-DDC	2022-08-23 19:03:47 +02:00
Thorsten Müller	aba10bc64a	Added info on new VITS model.	2022-06-24 18:00:12 +02:00
Thorsten Müller	07e85b3905	Merge pull request #35 from thorstenMueller/thorstenMueller-patch-1 Add new project logo to header.	2022-05-09 20:56:03 +02:00
Thorsten Müller	e08d50d6bb	Added new logo to header	2022-05-09 20:46:17 +02:00
Thorsten Müller	e691aa4ee3	Delete Logo_Thorsten-Voice-kleiner.jpg	2022-05-09 20:45:31 +02:00
Thorsten Müller	625f73e986	Delete Logo_Thorsten-Voice.jpg	2022-05-09 20:45:18 +02:00
Thorsten Müller	de1802f8ce	Update README.md	2022-05-09 20:34:50 +02:00
Thorsten Müller	f0500309d6	Test with embedded logo	2022-05-09 18:19:02 +02:00
Thorsten Müller	41c91b9865	Add files via upload	2022-05-09 18:14:50 +02:00
Thorsten Müller	fcb1e705a9	Add files via upload	2022-05-09 18:13:06 +02:00
Thorsten Müller	b8802db4f8	Uploaded transparent Thorsten-Voice logo.	2022-05-09 18:12:03 +02:00
Thorsten Müller	b00c768343	Added badge links.	2022-04-28 18:13:49 +02:00
Thorsten Mueller	3b0b4f898f	Fixed typo.	2022-04-24 09:13:13 +02:00
Thorsten Müller	2106fc6b00	Test	2022-04-23 23:31:15 +02:00
Thorsten Müller	e4ff3ce04a	Initial draft FUNDING.yml	2022-04-23 23:29:15 +02:00
Thorsten Müller	f408508cd7	Merge pull request #31 from thorstenMueller/prep-thorsten-22.05 Merge new README (preparation for new TTS model release)	2022-04-23 23:26:17 +02:00
Thorsten Mueller	6b4cfb41d4	Added Youtube link.	2022-04-23 23:22:27 +02:00
Thorsten Mueller	521dd33483	Updated TOC	2022-04-23 21:15:26 +02:00
Thorsten Mueller	6efb25310a	preparations for new Thorsten models	2022-04-23 21:13:30 +02:00
Thorsten Müller	5654397f3e	Add citation file.	2022-04-20 23:48:54 +02:00
Thorsten Mueller	b5ec9ef991	Fixed minor issues	2022-02-15 17:52:03 +01:00
Thorsten Mueller	77ad01d4ff	Making ffmpeg conversion optional.	2022-02-15 17:28:40 +01:00
Thorsten Mueller	c35507b1f7	Added link for VoiceLunch slides.	2022-01-03 20:09:43 +01:00
Thorsten Mueller	b536dfd958	Added check if audio file exists in getDatasetSpeechRate	2021-12-19 18:44:01 +01:00
Thorsten Mueller	29238f2a31	Updated Download links / Cites	2021-12-11 17:44:49 +01:00
Thorsten Müller	8c5f4503f3	Added two hyperlinks To http://www.Thorsten-Voice.de and https://OpenVoice-Tech.net Wiki	2021-11-28 11:33:54 +01:00
Thorsten Mueller	2ff7e3961b	Added Forward Tacotron samples.	2021-10-30 21:48:21 +02:00
Thorsten Müller	1221713314	Remove Wikipedia link to "Thorsten (Stimme)"	2021-10-23 16:52:59 +02:00
Thorsten Mueller	d3225b48f8	Added Citation to README.	2021-10-08 18:22:34 +02:00
Thorsten Mueller	33c030f844	Added two scripts for dataset analysis/cleaning.	2021-09-28 06:10:21 +02:00
Thorsten Müller	2daabae53e	Added DOIs in README	2021-09-24 16:32:16 +02:00
Thorsten Müller	1d445b09f8	Added DOI badge for emotional dataset	2021-09-23 21:58:54 +02:00
Thorsten Mueller	2853f111dc	Merge branch 'master' of https://github.com/thorstenMueller/deep-learning-german-tts	2021-09-18 16:04:59 +02:00
Thorsten Mueller	7540606247	Added download link for new recording-in-progress neutral dataset.	2021-09-18 16:04:33 +02:00
Thorsten Mueller	0b9e929ce0	Added Fullband-MelGAN model download path. Thanks to (see #26 )	2021-08-20 06:02:47 +02:00
Thorsten Mueller	bc06fa923f	Added info on TensorVox by ZDisket - thanks :-)	2021-08-12 18:30:55 +02:00
Thorsten Mueller	f19144b085	Adjusted quick setup example to new vocoder model.	2021-08-06 09:50:44 +02:00
Thorsten Müller	251c093ad4	Added locale settings for german Umlaut handling.	2021-08-04 09:24:51 +02:00
Thorsten Mueller	f505fd38df	Dockerfile draft for NVIDIA Jetson Xavier AGX and Coqui	2021-08-02 19:54:38 +02:00
Thorsten Mueller	3e09ae8615	Added link to my Youtube channel.	2021-07-21 22:49:47 +02:00
Thorsten Mueller	2ed2413dda	Explain how i recorded emotional phrases.	2021-07-13 21:53:55 +02:00
Thorsten Mueller	51c5f55bbd	Added check that recording exists before export.	2021-07-12 23:27:50 +02:00
Thorsten Mueller	4f875ac591	Added --mrs_dir param for more flexibility	2021-07-07 22:00:47 +02:00
Thorsten Mueller	2ea44ede87	Added REAME for helperScripts	2021-07-04 22:38:38 +02:00
Thorsten Mueller	ba60fc57d4	Added script to create LJSpeech dataset out of Mimic-Recording-Studio recordings.	2021-07-04 22:33:38 +02:00
Thorsten Müller	9e68d99ee7	Updated emotional dataset v02 download link	2021-06-20 08:57:39 +02:00
Thorsten Mueller	7172604eed	Added v02 emotional dataset (drunk + whispering)	2021-06-13 10:59:04 +02:00
Thorsten Mueller	58dece7c55	Added chapter on public talks	2021-06-08 07:18:30 +02:00
Thorsten Mueller	c81f374aca	Test Commit	2021-06-07 21:52:31 +02:00
Thorsten Mueller	2c6aca780b	Added table with trained model checkpoint downloads	2021-05-11 22:34:10 +02:00
Thorsten Müller	68e60f2a92	Format Wikipedia link	2021-04-22 18:57:40 +02:00
Thorsten Mueller	a3b0dde296	Added info about Wikipedia article	2021-04-22 18:53:39 +02:00
Thorsten Mueller	28d81a0fb2	Update on emotional dataset info	2021-04-11 11:42:24 +02:00
Thorsten Mueller	12c6d26dbd	Moved emotional samples to other table.	2021-04-11 11:39:29 +02:00
Thorsten Mueller	4c06db69dd	Added silero models to audio comparison	2021-04-11 11:04:20 +02:00
Thorsten Müller	bae96a75a5	Added badge for link to TTS comparison page	2021-04-09 19:29:24 +02:00
Thorsten Müller	1313520064	Playing around with some cool badges :-)	2021-04-09 19:05:43 +02:00
Thorsten Mueller	e2ecf68c13	added details on coqui model usage.	2021-04-05 16:57:36 +02:00
Thorsten Mueller	c8a5e1082e	Small TOC fix	2021-04-03 23:48:10 +02:00
Thorsten Mueller	40aae591d7	Small fixes in TOC	2021-04-03 23:45:46 +02:00
Thorsten Mueller	4f722e96a9	Adding info on emotional dataset.	2021-04-03 23:24:53 +02:00
Thorsten Müller	7e1530b742	Merge pull request #14 from snakers4/master Add silero-models	2021-04-03 22:12:09 +02:00
snakers4	647786be6c	Add silero-models	2021-04-03 05:17:14 +00:00
Thorsten Müller	00685a008d	Added cute sloth smiley.	2021-03-30 12:07:41 +02:00
Thorsten Mueller	e5481a82a6	Added smaller logo	2021-03-30 08:00:58 +02:00
Thorsten Mueller	2d1428cd13	Switch to non-transparent logo	2021-03-30 07:55:08 +02:00
Thorsten Mueller	df55a19ae2	Added ThorstenVoice logo	2021-03-30 07:53:48 +02:00
Thorsten Müller	9585b73cc3	Modify title	2021-03-16 20:23:29 +01:00
Thorsten Müller	70158ba7c8	Small README updates	2021-03-16 18:51:21 +01:00
Thorsten Mueller	e1e9f8666a	Small text adjustments and formatting on README.	2021-03-16 18:41:39 +01:00
Thorsten Müller	cca10c215e	Added download link to v03 dataset.	2021-02-10 19:46:21 +01:00
Thorsten Mueller	09705597b8	Merge branch 'master' of https://github.com/thorstenMueller/deep-learning-german-tts	2021-01-23 18:50:15 +01:00
Thorsten Mueller	bdb3aa7d47	Added hifiGAN samples trained by SanjaESC	2021-01-23 18:15:56 +01:00
Thorsten Müller	f0c0f63ae1	Added nice guy SanjaESC to thanks section	2021-01-22 16:24:56 +01:00
Thorsten Müller	036c266ad7	Added Sebastian to thanks section - Thank you :-)	2021-01-16 08:24:10 +01:00
Thorsten Mueller	8e6137b3af	Added wavegrad samples (training in progress)	2020-12-14 17:53:32 +01:00
Thorsten Mueller	9ee0353da4	Changed main and subheading for TensorFlowTTS	2020-12-02 12:23:20 +01:00
Thorsten Mueller	a99d4b6477	Added first samples for TensorFlowTTS	2020-12-02 12:14:16 +01:00
Thorsten Mueller	02020e54f7	added sample 05 for griffin lim.	2020-11-21 10:19:13 +01:00
Thorsten Mueller	5347394f3e	Added Griffin Lim vocoder samples	2020-11-21 10:08:08 +01:00
Thorsten Mueller	c59d19e0a1	Added detail on glowtts training steps.	2020-11-17 22:04:09 +01:00
Thorsten Mueller	e45736f62d	added sample05 with GlowTTS.	2020-11-17 21:53:08 +01:00
Thorsten Mueller	e96de3a095	fixed typo	2020-11-16 18:25:38 +01:00
Thorsten Mueller	eaead5cebe	Rename to docs folder for Github pages	2020-11-16 17:28:20 +01:00
Thorsten Mueller	7b27bdac2d	Added github page with index and sample wavs	2020-11-16 17:25:42 +01:00
Thorsten Müller	f55e16d0fc	fixed typo	2020-09-23 19:32:27 +02:00
				`@ -0,0 +1,2 @@`
				`# These are supported funding model platforms`