Added Windows TTS training recipe

Added modified vits recipe for Thorsten-Voice model training using Windows
German Corpus for Mimic-Recording-Studio
2023-03-05 16:19:50 +01:00 · 2022-12-16 22:54:02 +01:00 · 2022-11-13 17:08:26 +01:00 · 2022-11-13 16:47:46 +01:00 · 2022-08-23 19:03:47 +02:00 · 2022-06-24 18:00:12 +02:00
54 changed files with 47217 additions and 70 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1,2 @@
+# These are supported funding model platforms
+
--- a/CITATION.cff
+++ b/CITATION.cff
@ -0,0 +1,28 @@
+# This CITATION.cff file was generated with cffinit.
+# Visit https://bit.ly/cffinit to generate yours today!
+
+cff-version: 1.2.0
+title: Thorsten-Voice
+message: >-
+  Please cite Thorsten-Voice project if you use
+  datasets or trained TTS models.
+type: dataset
+authors:
+  - given-names: Thorsten
+    family-names: Müller
+    email: tm@thorsten-voice.de
+  - given-names: Dominik
+    family-names: Kreutz
+repository-code: 'https://github.com/thorstenMueller/Thorsten-Voice'
+url: 'https://www.Thorsten-Voice.de'
+abstract: >-
+  A free to use, offline working, high quality german
+  TTS voice should be available for every project
+  without any license struggling.
+keywords:
+  - Thorsten
+  - Voice
+  - Open
+  - German
+  - TTS
+  - Dataset
--- a/EvolutionOfThorstenDataset.pdf
+++ b/EvolutionOfThorstenDataset.pdf
--- a/Logo_Thorsten-Voice.png
+++ b/Logo_Thorsten-Voice.png
--- a/README.md
+++ b/README.md
@ -0,0 +1,245 @@
+![Thorsten-Voice logo](Logo_Thorsten-Voice.png)
+
+- [Project motivation](#motivation-for-thorsten-voice-project-speaking_head-speech_balloon)
+  
+- [Personal note](#some-personal-words-before-using-thorsten-voice)
+
+- [**Thorsten** Voice Datasets](#voice-datasets)
+  - [Thorsten-21.02-neutral](#thorsten-2102-neutral)
+  - [Thorsten-21.06-emotional](#thorsten-2106-emotional)
+  - [Thorsten-22.10-neutral](#thorsten-2210-neutral)
+
+- [**Thorsten** TTS-Models](#tts-models)
+  - [Thorsten-21.04-Tacotron2-DCA](#thorsten-2104-tacotron2-dca)
+  - [Thorsten-22.05-VITS](#thorsten-2205-vits)
+  - [Thorsten-22.08-Tacotron2-DDC](#thorsten-2208-tacotron2-ddc)
+  - [Other models](#other-models)
+  
+- [Public talks](#public-talks)
+
+- [My Youtube channel](#youtube-channel)
+
+- [Special Thanks](#thanks-section)
+
+
+# Motivation for Thorsten-Voice project :speaking_head: :speech_balloon:
+A **free** to use, **offline** working, **high quality** **german** **TTS** voice should be available for every project without any license struggling.
+
+<a href="https://twitter.com/intent/follow?screen_name=ThorstenVoice"><img src="https://img.shields.io/twitter/follow/ThorstenVoice?style=social&logo=twitter" alt="follow on Twitter"></a>
+[![YouTube Channel Subscribers](https://img.shields.io/youtube/channel/subscribers/UCjqqTVVBTsxpm0iOhQ1fp9g?style=social)](https://www.youtube.com/c/ThorstenMueller)
+[![Project website](https://img.shields.io/badge/Project_website-www.Thorsten--Voice.de-92a0c0)](https://www.Thorsten-Voice.de)
+
+# Social media
+Please check and follow me on my social media profiles - Thank you.
+
+| Platform         | Link                                                                                                            |
+| --------------- | ------- |
+| Youtube | [ThorstenVoice on Youtube](https://www.youtube.com/c/ThorstenMueller) |
+| Twitter | [ThorstenVoice on Twitter](https://twitter.com/ThorstenVoice) |
+| Instagram | [ThorstenVoice on Instagram](https://www.instagram.com/thorsten_voice/) |
+| LinkedIn | [Thorsten Müller on LinkedIn](https://www.linkedin.com/in/thorsten-m%C3%BCller-848a344/) |
+
+# Some personal words before using **Thorsten-Voice**
+> I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. :earth_africa: (*Thorsten Müller*)
+
+Please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with the world.
+
+# Voice-Datasets
+Voice datasets are listed on Zenodo:
+| Dataset         | DOI Link                                                                                                            |
+| --------------- | ------- |
+| Thorsten-21.02-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342) |
+| Thorsten-21.06-emotional | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023) |
+| Thorsten-22.10-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7265581.svg)](https://doi.org/10.5281/zenodo.7265581) |
+
+## Thorsten-21.02-neutral
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342)
+
+```
+@dataset{muller_thorsten_2021_5525342,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {Thorsten-Voice - "Thorsten-21.02-neutral" Dataset},
+  month        = feb,
+  year         = 2021,
+  note         = {{Please use it to make the world a better place for 
+                   whole humankind.}},
+  publisher    = {Zenodo},
+  version      = {3.0},
+  doi          = {10.5281/zenodo.5525342},
+  url          = {https://doi.org/10.5281/zenodo.5525342}
+}
+```
+
+> :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1KVjGXG2ij002XRHb3fgFK4j0OEq1FsWm?usp=sharing).**
+
+### Dataset summary
+* Recorded by Thorsten Müller
+* Optimized by Dominik Kreutz
+* LJSpeech file and directory structure
+* 22.668 recorded phrases (*wav files*)
+* More than 23 hours of pure audio
+* Samplerate 22.050Hz
+* Mono
+* Normalized to -24dB
+* Phrase length (min/avg/max): 2 / 52 / 180 chars
+* No silence at beginning/ending
+* Avg spoken chars per second: 14
+* Sentences with question mark: 2.780
+* Sentences with exclamation mark: 1.840
+
+### Dataset evolution
+As described in the PDF document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf)) this dataset consists of three recording phases.
+
+* **Phase 1**: Recorded with a cheap usb microphone (*low quality*)
+* **Phase 2**: Recorded with a good microphone (*good quality*)
+* **Phase 3**: Recorded with same good microphone but longer phrases (> 100 chars) (*good quality*)
+
+If you want to use a dataset subset you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
+
+
+## Thorsten-21.06-emotional
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023)
+
+```
+@dataset{muller_thorsten_2021_5525023,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {{Thorsten-Voice - "Thorsten-21.06-emotional" 
+                   Dataset}},
+  month        = jun,
+  year         = 2021,
+  note         = {{Please use it to make the world a better place for 
+                   whole humankind.}},
+  publisher    = {Zenodo},
+  version      = {2.0},
+  doi          = {10.5281/zenodo.5525023},
+  url          = {https://doi.org/10.5281/zenodo.5525023}
+}
+```
+
+All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.
+
+### Samples
+Listen to the phrase "**Mist, wieder nichts geschafft.**" in following emotions.
+
+* :slightly_smiling_face: [Neutral](./samples/thorsten-21.06-emotional/neutral.wav)
+* :nauseated_face: [Disgusted](./samples/thorsten-21.06-emotional/disgusted.wav)
+* :angry: [Angry](./samples/thorsten-21.06-emotional/angry.wav)
+* :grinning: [Amused](./samples/thorsten-21.06-emotional/amused.wav)
+* :astonished: [Surprised](./samples/thorsten-21.06-emotional/surprised.wav)
+* :pensive: [Sleepy](./samples/thorsten-21.06-emotional/sleepy.wav)
+* :dizzy_face: [Drunk](./samples/thorsten-21.06-emotional/drunk.wav)
+* 🤫 [Whispering](./samples/thorsten-21.06-emotional/whisper.wav)
+### Dataset summary
+* Recorded by Thorsten Müller
+* Optimized by Dominik Kreutz
+* 300 sentences * 8 emotions = 2.400 recordings
+* Mono
+* Samplerate 22.050Hz
+* Normalized to -24dB
+* No silence at beginning/ending
+* Sentence length: 59 - 148 chars
+
+
+## Thorsten-22.10-neutral
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7265581.svg)](https://doi.org/10.5281/zenodo.7265581)
+> :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1dxoSo8Ktmh-5E0rSVqkq_Jm1r4sFnwJM?usp=sharing).**
+
+```
+@dataset{muller_thorsten_2022_7265581,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {ThorstenVoice Dataset 2022.10},
+  month        = oct,
+  year         = 2022,
+  publisher    = {Zenodo},
+  version      = {1.0},
+  doi          = {10.5281/zenodo.7265581},
+  url          = {https://doi.org/10.5281/zenodo.7265581
+}
+```
+
+# TTS Models
+
+## Thorsten-21.04-Tacotron2-DCA
+This [TTS-model](https://drive.google.com/drive/folders/1m4RuffbvdOmQWnmy_Hmw0cZ_q0hj2o8B?usp=sharing) has been trained on [**Thorsten-21.02-neutral**](#thorsten-2102-neutral) dataset. The recommended trained Fullband-MelGAN Vocoder can be downloaded [here](https://drive.google.com/drive/folders/1hsfaconm4Yd9wPVyOtrXjWQs4ZAPoouY?usp=sharing).
+
+Run the model:
+* pip install TTS==0.5.0
+* tts-server --model_name tts_models/de/thorsten/tacotron2-DCA
+
+
+## Thorsten-22.05-VITS
+Trained on dataset **Thorsten-22.05-neutral**.
+Audio samples are available on [Thorsten-Voice website](https://www.thorsten-voice.de/en/just-get-started/).
+
+To run TTS server just follow these steps:
+* pip install tts==0.7.1
+* tts-server --model_name tts_models/de/thorsten/vits
+* Open browser on http://localhost:5002 and enjoy playing
+
+## Thorsten-22.08-Tacotron2-DDC
+Trained on dataset [**Thorsten-22.05-neutral**](#thorsten-2205-neutral).
+Audio samples are available on [Thorsten-Voice website]([https://www.thorsten-voice.de/en/just-get-started/](https://www.thorsten-voice.de/2022/08/14/welches-tts-modell-klingt-besser/)).
+
+To run TTS server just follow these steps:
+* pip install tts==0.8.0
+* tts-server --model_name tts_models/de/thorsten/tacotron2-DDC
+* Open browser on http://localhost:5002 and enjoy playing
+
+
+## Other models
+### Silero
+
+You can use a free A-GPL licensed models trained on **Thorsten-21.02-neutral** dataset via the [silero-models](https://github.com/snakers4/silero-models/blob/master/models.yml) project.
+
+* [Thorsten 16kHz](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing)
+* [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb)
+
+### ZDisket
+[ZDisket](https://github.com/ZDisket/TensorVox) made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by [monatis](https://github.com/monatis/german-tts). Thanks for sharing that :thumbsup:. See it in action on [Youtube](https://youtu.be/tY6_xZnkv-A).
+
+# Public talks
+I really want to bring the topic "**Open Voice For An Open Future**" to a bigger public attention.
+
+* I've been part of a Linux User Group podcast about Mycroft AI and talked on my TTS efforts on that in (*May 2021*).
+* I was invited by [Yusuf](https://github.com/monatis/) from Turkish tensorflow community to talk on "How to make machines speak with your own voice". This talk has been streamed live on Youtube and is available [here](https://www.youtube.com/watch?v=m-Uwb-Bg144&t=2303s). If you're interested on the showed slides, feel free to download my presentation [here](https://docs.google.com/presentation/d/1ynnw0ilKV3WwMSJHytrN3GXRiFr8x3r0DUimBm1y0LI/edit?usp=sharing) (*June 2021*)
+)
+* I've been invited as speaker on VoiceLunch language & linguistics on 03.01.2022. [Here are my slides](https://docs.google.com/presentation/d/1Gi6BmYHs7g4ZgdAiIKGBnBwZDCvJOD9DJxQOGlgds1o/edit?usp=sharing) (*January 2022*).
+
+# Youtube channel
+In summer 2021 i've started to share my lessons learned and experiences on open voice tech, in special **TTS** on my little [Youtube channel](https://www.youtube.com/c/ThorstenMueller). If you check out and like my videos i'd happy to welcome you as subscriber and member of my little Youtube community.
+
+
+# Feel free to file an issue if you ...
+* Use my TTS voice in your project(s)
+* Want to share your trained "Thorsten" model
+* Get to know about any abuse usage of my voice
+
+# Thanks section
+## Cool projects
+* https://commonvoice.mozilla.org/
+* https://coqui.ai/
+* https://mycroft.ai/
+* https://github.com/rhasspy/
+
+## Cool people
+* [El-Tocino](https://github.com/el-tocino/)
+* [Eren Gölge](https://github.com/erogol/)
+* [Gras64](https://github.com/gras64/)
+* [Kris Gesling](https://github.com/krisgesling/)
+* [Nmstoker](https://github.com/nmstoker)
+* [Othiele](https://discourse.mozilla.org/u/othiele/summary)
+* [Repodiac](https://github.com/repodiac)
+* [SanjaESC](https://github.com/SanjaESC)
+* [Synesthesiam](https://github.com/synesthesiam/)
+
+## Even more special people
+Additionally, a really nice thanks for my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design.
+
+And last but not least i want to say a **huge, huge thank you** to a special guy who supported me on this journey as a partner right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally GPU power. 
+
+**Thank you so much, dear **Dominik** ([@domcross](https://github.com/domcross/)) for being my partner on this journey.**
+
+Thorsten (*Twitter: @ThorstenVoice*)
--- a/RecordingQuality.csv
+++ b/RecordingQuality.csv
--- a/Youtube/train_vits_win.py
+++ b/Youtube/train_vits_win.py
@ -0,0 +1,94 @@
+import os
+
+from trainer import Trainer, TrainerArgs
+
+from TTS.tts.configs.shared_configs import BaseDatasetConfig
+from TTS.tts.configs.vits_config import VitsConfig
+from TTS.tts.datasets import load_tts_samples
+from TTS.tts.models.vits import Vits, VitsAudioConfig
+from TTS.tts.utils.text.tokenizer import TTSTokenizer
+from TTS.utils.audio import AudioProcessor
+
+def main():
+
+	output_path = os.path.dirname(os.path.abspath(__file__))
+	#output_path = "c:\\temp\tts"
+	dataset_config = BaseDatasetConfig(
+		formatter="ljspeech", meta_file_train="metadata_small.csv", path="C:\\Users\\ThorstenVoice\\TTS-Training\\ThorstenVoice-Dataset_2022.10"
+	)
+	audio_config = VitsAudioConfig(
+		sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None
+	)
+
+	config = VitsConfig(
+		audio=audio_config,
+		run_name="vits_thorsten-voice",
+		batch_size=4,
+		eval_batch_size=4,
+		batch_group_size=5,
+		num_loader_workers=1,
+		num_eval_loader_workers=1,
+		run_eval=True,
+		test_delay_epochs=-1,
+		epochs=1000,
+		text_cleaner="phoneme_cleaners",
+		use_phonemes=True,
+		phoneme_language="de",
+		phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
+		compute_input_seq_cache=True,
+		print_step=25,
+		print_eval=True,
+		mixed_precision=False,
+		output_path=output_path,
+		datasets=[dataset_config],
+		cudnn_benchmark=False,
+		test_sentences=[
+		  "Es hat mich viel Zeit gekostet ein Stimme zu entwickeln, jetzt wo ich sie habe werde ich nicht mehr schweigen.",
+		  "Sei eine Stimme, kein Echo.",
+		  "Es tut mir Leid David. Das kann ich leider nicht machen.",
+		  "Dieser Kuchen ist großartig. Er ist so lecker und feucht.",
+		  "Vor dem 22. November 1963.",
+		],
+	)
+
+	# INITIALIZE THE AUDIO PROCESSOR
+	# Audio processor is used for feature extraction and audio I/O.
+	# It mainly serves to the dataloader and the training loggers.
+	ap = AudioProcessor.init_from_config(config)
+
+	# INITIALIZE THE TOKENIZER
+	# Tokenizer is used to convert text to sequences of token IDs.
+	# config is updated with the default characters if not defined in the config.
+	tokenizer, config = TTSTokenizer.init_from_config(config)
+
+	# LOAD DATA SAMPLES
+	# Each sample is a list of ```[text, audio_file_path, speaker_name]```
+	# You can define your custom sample loader returning the list of samples.
+	# Or define your custom formatter and pass it to the `load_tts_samples`.
+	# Check `TTS.tts.datasets.load_tts_samples` for more details.
+	train_samples, eval_samples = load_tts_samples(
+		dataset_config,
+		eval_split=True,
+		eval_split_max_size=config.eval_split_max_size,
+		eval_split_size=config.eval_split_size,
+	)
+
+	# init model
+	model = Vits(config, ap, tokenizer, speaker_manager=None)
+
+	# init the trainer and 🚀
+	trainer = Trainer(
+		TrainerArgs(),
+		config,
+		output_path,
+		model=model,
+		train_samples=train_samples,
+		eval_samples=eval_samples,
+	)
+	trainer.fit()
+	print("Fertig!")
+
+from multiprocessing import Process, freeze_support
+if __name__ == '__main__':
+    freeze_support()  # needed for Windows
+    main()
--- a/docs/audio_compare.md
+++ b/docs/audio_compare.md
@ -6,7 +6,7 @@ Hier sind Hörproben mit unterschiedlichen Vocodern. Alle gesprochenen Texte (*S
 * **Sample #02**: Eure Tröte nervt.
 * **Sample #03**: Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet.
 * **Sample #04**: Euer Plan hat ja toll geklappt.
-* *Sample #05: "In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön ..." (Anfang vom "Froschkönig")*
+* *Sample #05: "In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön." (Anfang vom "Froschkönig")*

 # Ground truth
 Originalaufnahmen aus dem "thorsten" Dataset.
@ -52,10 +52,50 @@ Originalaufnahmen aus dem "thorsten" Dataset.
 > Details zum Model: (todo: link)  
 > Tacotron2 + DDC: 460k Schritte trainiert

-# ParallelWaveGAN
-> Tacotron2 + DDC: 360k Schritte trainiert, PWGAN Vocoder: 925k Schritte trainiert
+<dl>

+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-griffin-lim.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-griffin-lim.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# ParallelWaveGAN
 > Details: [Notebook von Olaf](https://colab.research.google.com/drive/15kJHTDTVxyIjxiZgqD1G_s5gUeVNLkfy?usp=sharing)  
+> Tacotron2 + DDC: 360k Schritte trainiert, PWGAN Vocoder: 925k Schritte trainiert
 <dl>

 <table>
@ -89,7 +129,7 @@ Originalaufnahmen aus dem "thorsten" Dataset.
  </tr>
  <tr>
    <td>05</td>
-    <td>Anfang vom Froschkönig</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
    <td><audio controls="" preload="none"><source src="samples/sample05-pwgan.wav"></audio></td>
  </tr>
 </tbody>
@ -99,10 +139,89 @@ Originalaufnahmen aus dem "thorsten" Dataset.


 # WaveGrad
-> todo
+> Tacotron2 + DDC: 460k Schritte trainiert, WaveGrad Vocoder: 510k Schritte trainiert (inkl. Noise-Schedule)
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-wavegrad.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-wavegrad.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>

 # HifiGAN
-> todo
+> Thanks to SanjaESC (https://github.com/SanjaESC) for training this model.
+<dl>
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-hifigan.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-hifigan.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>

 # VocGAN
 > **Diese Beispiele basieren auf "ground truth" und nicht auf dem Tacotron 2 Modell**  
@ -146,6 +265,7 @@ Originalaufnahmen aus dem "thorsten" Dataset.

 # GlowTTS / Waveglow
 > Details: [Github von Synesthesiam](https://github.com/rhasspy/de_larynx-thorsten)
+> GlowTTS trainiert für 380k und Vocoder für 500k Schritte.

 <dl>

@ -178,6 +298,151 @@ Originalaufnahmen aus dem "thorsten" Dataset.
    <td>Euer Plan hat ja toll geklappt.</td>
    <td><audio controls="" preload="none"><source src="samples/sample04-waveglow.wav"></audio></td>
  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-waveglow.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+
+
+# TensorFlowTTS
+## Multiband MelGAN
+> Thanks [Monatis](https://github.com/monatis)  
+> Details: [Notebook von Monatis](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing#scrollTo=SCbWCChVkfnn)  
+> Taco2 Modell für 80k Schritte trainiert, Multiband MelGAN für 800k Schritte.
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-TensorFlowTTS.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-TensorFlowTTS.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+
+# Silero models
+> Thanks [snakers4](https://github.com/snakers4/silero-models)  
+> Details: [Notebook von Silero](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb#scrollTo=indirect-berry)  
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-silero.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-silero.wav"></audio></td>
+  </tr>
+</tbody>
+</table>
+
+</dl>
+
+# Forward Tacotron
+> Thanks [cschaefer26](https://github.com/as-ideas/ForwardTacotron)  
+> Config: Forward-Tacotron, trained to 300k, alpha set to 0.8, pretrained HifiGAN vocoder
+
+<dl>
+
+<table>
+<thead>
+  <tr>
+    <th>Sample</th>
+    <th>Text</th>
+    <th>Audio</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>01</td>
+    <td>Eure Schoko-Bonbons sind sagenhaft lecker</td>
+    <td><audio controls="" preload="none"><source src="samples/sample01-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>02</td>
+    <td>Eure Tröte nervt</td>
+    <td><audio controls="" preload="none"><source src="samples/sample02-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>03</td>
+    <td>Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet</td>
+    <td><audio controls="" preload="none"><source src="samples/sample03-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>04</td>
+    <td>Euer Plan hat ja toll geklappt.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample04-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
+  <tr>
+    <td>05</td>
+    <td>In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön.</td>
+    <td><audio controls="" preload="none"><source src="samples/sample05-ForwardTacotron-HifiGAN.wav"></audio></td>
+  </tr>
 </tbody>
 </table>

--- a/docs/index.md
+++ b/docs/index.md
@ -32,9 +32,10 @@ Wir arbeiten weiterhin daran qualitativ noch bessere Modell zu trainieren, aber
 * [Bitte warte einen Moment, bis ich fertig mit dem Booten bin.](https://drive.google.com/file/d/19Td-F14n_05F-squ3bNlt2BDE-NMFaq1/view?usp=sharing)
 * [Mein Name ist Mycroft und ich bin funky.](https://drive.google.com/file/d/1dbyOyE7Oy8YdAsYqQ4vz4VJjiWIyc8oV/view?usp=sharing)

+
 ## Vergleich einiger Vocoder
 Wir experimentieren aktuell mit unterschiedlichen Konfigurationen um das beste Modell zu ermitteln. Ein Vergleich der bisherigen Ergebnisse findest Du auf dieser Seite. 
-> [Vergleich der unterschiedlichen Modell](./audio_compare)
+> [Vergleich der unterschiedlichen Modelle](./audio_compare)

 # Interessiert?
 [Weitere Details, Downloads und Danksagungen findet ihr hier.](https://github.com/thorstenMueller/deep-learning-german-tts "Dataset Details und Thorsten-Modell Download")
--- a/docs/index_longDesc.md
+++ b/docs/index_longDesc.md
@ -1,60 +0,0 @@
-# tl;dr
---
-
-<span style="font-family:Papyrus; font-size:3em;color:green"> Eine kostenfreie qualitativ hochwertige deutsche Stimme, die lokal erzeugt werden kann!</span> 
-
---
-
-
-# Eine freie Deutsche Stimme
-Auch wenn die Überschrift stark nach einem politischen Statement klingt, geht es hier um ein völlig anderes Thema.
-
-Derzeit gewinnt die sprachbasierte Bedienung von Maschinen rasant an Bedeutung. Viele kennen diese Kommunikation bereits aus ihrem Alltag mit Smartphones oder so genannten smarten Assistenten wie Apple Siri, Google Home oder Amazon Alexa.
-
-Die Systeme der großen Hersteller bringen, neben sehr vielen Vorteilen, auch einige durchaus schwerwiegende Nachteile im Datenschutzbereich mit sich (Cloudzwang, mangelnde Hoheit über die eigenen Daten, Bedenken über "Mithörer", ...). Daher gibt es durchaus Menschen, die zwar die Vorteile solcher Systeme gerne nutzen möchten, aber aufgrund von den genannten Datenschutzbedenken von deren Nutzung absehen.
-
-# Alternativen zu (_online Spracherzeugung_) von Amazon, Google, Apple, ...
-Glücklicherweise bilden sich auch Alternativen (u.a. OpenSource) heraus um der Marktmarkt der "Big Player" etwas entgegenzutreten. Einige davon sind:
-
-* Mozilla Voice Projekte
-* MyCroft AI
-
-Diese (und andere) Communities arbeiten daran entsprechende Alternativen anzubieten. Jedoch steht hier oft die englische Sprache im Vordergrund. Dies ist gerade bei der Interaktion mit deutschsprachigen Anwendern natürlich nicht hilfreich.
-
-# Freies deutsches TTS - was ist das?
-Die meisten haben sicherlich schon einmal einen persönlichen smarten Assistenten (oder Smartphone) nach dem Wetter, Terminen, oder ähnlichem gefragt.
-Falls dem so ist und das Gerät eine gut verständliche deutsche Antwort geliefert hat wurden in diesem Fall "Cloud Resourcen" genutzt.
-
-Natürlich wissen Amazon, Google und Apple um die gute Qualität ihrer künstlichen Stimmen und sind u.a. daher nicht bereit, diese für eine private- und kostenfreie Offlinenutzung zur Verfügung zu stellen.
-Und genau da liegt eines der großen Probleme in (quelloffenen) Alternativen. Selbst wenn große Anteile kostenfrei und offline zu betreiben sind, spätestens bei der Sprachausgabe sind sie auf die "Big Player" angewiesen, sofern sie einen gewissen Qualitätsanspruch haben.
-
-# Wie und wem hilft dieses Projekt
-Das freie deutsche Dataset beinhaltet über 23 aufgezeichneter Stunden auf Basis freier Texte. Darauf basieren die mit machine learning trainierten TTS Modelle.
-Die Nutzung ist **ohne Lizenzrechtliche Bedenken** möglich und steht somit allen Interessierten offen. Zum Beispiel:
-
-* OpenSource Projekte/Communities
-* Bildung/Forschung/Wissenschaft
-* kommerzielle Einsatzzwecke
-
-Gerade den kleinen Communities soll hiermit die Möglichkeit geboten zu werden offline TTS Funktion in ihren Projekten mit auszuliefern.
-
-# Beispiele
-* [Es ist im Moment klarer Himmel bei 18 Grad.](https://drive.google.com/file/d/1cDIq4QG6i60WjUYNT6fr2cpEjFQIi8w5/view?usp=sharing)
-* [Ich verstehe das nicht, aber ich lerne jeden Tag neue Dinge.](https://drive.google.com/file/d/1kja_2RsFt6EmC33HTB4ozJyFlvh_DTFQ/view?usp=sharing)
-* [Ich bin jetzt bereit.](https://drive.google.com/file/d/1GkplGH7LMJcPDpgFJocXHCjRln_ccVFs/view?usp=sharing)
-* [Bitte warte einen Moment, bis ich fertig mit dem Booten bin.](https://drive.google.com/file/d/19Td-F14n_05F-squ3bNlt2BDE-NMFaq1/view?usp=sharing)
-* [Mein Name ist MyCroft und ich bin funky.](https://drive.google.com/file/d/1dbyOyE7Oy8YdAsYqQ4vz4VJjiWIyc8oV/view?usp=sharing)
-
-# Aktueller Stand
-Wir (eine Gruppe von netten TTS Enthusiasten) wissen, dass das aktuelle Modell qualitativ noch viel Luft nach oben hat. Aber wir sind weiterhin motiviert in Zukunft hoffentlich noch bessere Modelle zur Verfügung stellen zu können.
-
-# Zu guter Letzt
-Da ich keinen großen Einfluss habe, welche Aussagen mit meiner Stimme zukünftig gemacht werden, möchte ich doch einige Punkte sagen, die mir persönlich wichtig sind:
-
-Ich teile meine Stimme als Person, die daran glaubt, dass alle Menschen gleichberechtigt sind, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder den Geokoordinaten der Geburt. An eine Welt wo jeder Mensch zu jeder Zeit herzlich Willkommen ist und wo Bildung und Wissen für jeden  frei verfügbar ist.
-
-# Links
-* https://github.com/thorstenMueller/deep-learning-german-tts/
-* https://medium.com/@thorsten_Mueller/why-ive-chosen-to-donate-my-german-voice-for-mankind-177beeb91675
-* TODO Github Links der Mitstreiter
-* TODO Modell (TTS Server Package) veröffentlichen
--- a/docs/samples/sample01-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample01-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample01-TensorFlowTTS.wav
+++ b/docs/samples/sample01-TensorFlowTTS.wav
--- a/docs/samples/sample01-griffin-lim.wav
+++ b/docs/samples/sample01-griffin-lim.wav
--- a/docs/samples/sample01-hifigan.wav
+++ b/docs/samples/sample01-hifigan.wav
--- a/docs/samples/sample01-silero.wav
+++ b/docs/samples/sample01-silero.wav
--- a/docs/samples/sample01-wavegrad.wav
+++ b/docs/samples/sample01-wavegrad.wav
--- a/docs/samples/sample02-ForwardTacotron-HifiGAN.wav.wav
+++ b/docs/samples/sample02-ForwardTacotron-HifiGAN.wav.wav
--- a/docs/samples/sample02-TensorFlowTTS.wav
+++ b/docs/samples/sample02-TensorFlowTTS.wav
--- a/docs/samples/sample02-griffin-lim.wav
+++ b/docs/samples/sample02-griffin-lim.wav
--- a/docs/samples/sample02-hifigan.wav
+++ b/docs/samples/sample02-hifigan.wav
--- a/docs/samples/sample02-silero.wav
+++ b/docs/samples/sample02-silero.wav
--- a/docs/samples/sample02-wavegrad.wav
+++ b/docs/samples/sample02-wavegrad.wav
--- a/docs/samples/sample03-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample03-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample03-TensorFlowTTS.wav
+++ b/docs/samples/sample03-TensorFlowTTS.wav
--- a/docs/samples/sample03-griffin-lim.wav
+++ b/docs/samples/sample03-griffin-lim.wav
--- a/docs/samples/sample03-hifigan.wav
+++ b/docs/samples/sample03-hifigan.wav
--- a/docs/samples/sample03-silero.wav
+++ b/docs/samples/sample03-silero.wav
--- a/docs/samples/sample03-wavegrad.wav
+++ b/docs/samples/sample03-wavegrad.wav
--- a/docs/samples/sample04-ForwardTacotron-HifiGAN.wav.wav
+++ b/docs/samples/sample04-ForwardTacotron-HifiGAN.wav.wav
--- a/docs/samples/sample04-TensorFlowTTS.wav
+++ b/docs/samples/sample04-TensorFlowTTS.wav
--- a/docs/samples/sample04-griffin-lim.wav
+++ b/docs/samples/sample04-griffin-lim.wav
--- a/docs/samples/sample04-hifigan.wav
+++ b/docs/samples/sample04-hifigan.wav
--- a/docs/samples/sample04-silero.wav
+++ b/docs/samples/sample04-silero.wav
--- a/docs/samples/sample04-wavegrad.wav
+++ b/docs/samples/sample04-wavegrad.wav
--- a/docs/samples/sample05-ForwardTacotron-HifiGAN.wav
+++ b/docs/samples/sample05-ForwardTacotron-HifiGAN.wav
--- a/docs/samples/sample05-TensorFlowTTS.wav
+++ b/docs/samples/sample05-TensorFlowTTS.wav
--- a/docs/samples/sample05-griffin-lim.wav
+++ b/docs/samples/sample05-griffin-lim.wav
--- a/docs/samples/sample05-hifigan.wav
+++ b/docs/samples/sample05-hifigan.wav
--- a/docs/samples/sample05-silero.wav
+++ b/docs/samples/sample05-silero.wav
--- a/docs/samples/sample05-wavegrad.wav
+++ b/docs/samples/sample05-wavegrad.wav
--- a/german_corpus-mimic_recording_studio.csv
+++ b/german_corpus-mimic_recording_studio.csv
--- a/helperScripts/Dockerfile.Jetson-Coqui
+++ b/helperScripts/Dockerfile.Jetson-Coqui
@ -0,0 +1,51 @@
+# Dockerfile for running Coqui TTS trainings in a docker container on NVIDIA Jetson platofrm.
+# Based on NVIDIA Jetson ML Image, provided without any warranty as is by Thorsten Müller (https://twitter.com/ThorstenVoice) in august 2021
+
+FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3
+
+RUN echo "deb https://repo.download.nvidia.com/jetson/common r32.4 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
+RUN echo "deb https://repo.download.nvidia.com/jetson/t194 r32.4 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
+
+RUN apt-get update -y
+RUN apt-get install vim python-mecab libmecab-dev cuda-toolkit-10-2 libcudnn8 libcudnn8-dev libsndfile1-dev locales -y
+
+# Setting some environment vars
+ENV LLVM_CONFIG=/usr/bin/llvm-config-9
+ENV PYTHONPATH=/coqui/TTS/
+ENV LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
+# Skipping OPENBLAS_CORETYPE might show "Illegal instruction (core dumped) error
+ENV OPENBLAS_CORETYPE=ARMV8
+
+ENV NVIDIA_VISIBLE_DEVICES all
+ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
+LABEL com.nvidia.volumes.needed="nvidia_driver"
+
+# Adjust locale setting to your personal needs
+RUN sed -i '/de_DE.UTF-8/s/^# //g' /etc/locale.gen && \
+    locale-gen
+ENV LANG de_DE.UTF-8
+ENV LANGUAGE de_DE:de
+ENV LC_ALL de_DE.UTF-8
+
+RUN mkdir /coqui
+WORKDIR /coqui
+
+ARG COQUI_BRANCH
+RUN git clone -b ${COQUI_BRANCH} https://github.com/coqui-ai/TTS.git
+WORKDIR /coqui/TTS
+RUN pip3 install pip setuptools wheel --upgrade
+RUN pip uninstall -y tensorboard tensorflow tensorflow-estimator nbconvert matplotlib
+RUN pip install -r requirements.txt
+RUN python3 ./setup.py develop
+
+# Jupyter Notebook
+RUN python3 -c "from notebook.auth.security import set_password; set_password('nvidia', '/root/.jupyter/jupyter_notebook_config.json')"
+CMD /bin/bash -c "jupyter lab --ip 0.0.0.0 --port 8888 --allow-root"
+
+
+# Build example:
+#   nvidia-docker build . -f Dockerfile.Jetson-Coqui --build-arg COQUI_BRANCH=v0.1.3 -t jetson-coqui
+# Run example:
+#   nvidia-docker run -p 8888:8888 -d --shm-size 32g --gpus all -v /ssd/___prj/tts/dataset-july21:/coqui/TTS/data jetson-coqui
+# Bash example:
+#   nvidia-docker exec -it <containerId> /bin/bash
--- a/helperScripts/MRS2LJSpeech.py
+++ b/helperScripts/MRS2LJSpeech.py
@ -0,0 +1,157 @@
+# This script generates the folder structure for ljspeech-1.1 processing from mimic-recording-studio database
+
+# Changelog
+# v1.0  - Initial release by Thorsten Müller (https://github.com/thorstenMueller/deep-learning-german-tts)
+# v1.1  - Great improvements by Peter Schmalfeldt (https://github.com/manifestinteractive)
+#           - Audio processing with ffmpeg (mono and samplerate of 22.050 Hz)
+#           - Much better Python coding than my original version
+#           - Greater logging output to command line
+#           - See more details here: https://gist.github.com/manifestinteractive/6fd9be62d0ede934d4e1171e5e751aba
+#           - Thanks Peter, it's a great contribution :-)
+# v1.2  - Added choice for choosing which recording session should be exported as LJSpeech
+# v1.3  - Added parameter mrs_dir to pass directory of Mimic-Recording-Studio
+# v1.4  - Script won't crash when audio recorded has been deleted on disk
+# v1.5  - Added parameter "ffmpeg" to make converting with ffmpeg optional
+
+from genericpath import exists
+import glob
+import sqlite3
+import os
+import argparse
+import sys
+
+from shutil import copyfile
+from shutil import rmtree
+
+# Setup Directory Data
+cwd = os.path.dirname(os.path.abspath(__file__))
+output_dir = os.path.join(cwd, "dataset")
+output_dir_audio = ""
+output_dir_audio_temp=""
+output_dir_speech = ""
+
+# Create folders needed for ljspeech
+def create_folders():
+  global output_dir
+  global output_dir_audio
+  global output_dir_audio_temp
+  global output_dir_speech
+
+  print('→ Creating Dataset Folders')
+
+  output_dir_speech = os.path.join(output_dir, "LJSpeech-1.1")
+
+  # Delete existing folder if exists for clean run
+  if os.path.exists(output_dir_speech):
+    rmtree(output_dir_speech)
+
+  output_dir_audio = os.path.join(output_dir_speech, "wavs")
+  output_dir_audio_temp = os.path.join(output_dir_speech, "temp")
+
+  # Create Clean Folders
+  os.makedirs(output_dir_speech)
+  os.makedirs(output_dir_audio)
+  os.makedirs(output_dir_audio_temp)
+
+def convert_audio():
+  global output_dir_audio
+  global output_dir_audio_temp
+
+  recordings = len([name for name in os.listdir(output_dir_audio_temp) if os.path.isfile(os.path.join(output_dir_audio_temp,name))])
+  
+  print('→ Converting %s Audio Files to 22050 Hz, 16 Bit, Mono\n' % "{:,}".format(recordings))
+
+  # Please use `pip install ffmpeg-python`
+  import ffmpeg
+
+  for idx, wav in enumerate(glob.glob(os.path.join(output_dir_audio_temp, "*.wav"))):
+
+    percent = (idx + 1) / recordings
+
+    print('› \033[96m%s\033[0m \033[2m%s / %s (%s)\033[0m ' % (os.path.basename(wav), "{:,}".format((idx + 1)), "{:,}".format(recordings), "{:.0%}".format(percent)))
+
+    # Convert WAV file to required format
+    (ffmpeg
+      .input(wav)
+      .output(os.path.join(output_dir_audio, os.path.basename(wav)), acodec='pcm_s16le', ac=1, ar=22050, loglevel='error')
+      .overwrite_output()
+      .run(capture_stdout=True)
+    )
+
+
+def copy_audio():
+  global output_dir_audio
+
+  print('→ Using ffmpeg to convert recordings')
+  recordings = len([name for name in os.listdir(output_dir_audio_temp) if os.path.isfile(os.path.join(output_dir_audio_temp,name))])
+  
+  print('→ Copy %s Audio Files to LJSpeech Dataset\n' % "{:,}".format(recordings))
+
+  for idx, wav in enumerate(glob.glob(os.path.join(output_dir_audio_temp, "*.wav"))):    
+    copyfile(wav,os.path.join(output_dir_audio, os.path.basename(wav)))
+
+def create_meta_data(mrs_dir):
+  print('→ Creating META Data')
+
+  conn = sqlite3.connect(os.path.join(mrs_dir, "backend", "db", "mimicstudio.db"))
+  c = conn.cursor()
+
+  # Create metadata.csv for ljspeech
+  metadata = open(os.path.join(output_dir_speech, "metadata.csv"), mode="w", encoding="utf8")
+
+  # List available recording sessions
+  user_models = c.execute('SELECT uuid, user_name from usermodel ORDER BY created_date DESC').fetchall()
+  user_id = user_models[0][0]
+
+  for row in user_models:
+    print(row[0] + ' -> ' + row[1])
+
+  user_answer = input('Please choose ID of recording session to export (default is newest session) [' + user_id + ']: ')
+
+  if user_answer:
+    user_id = user_answer
+
+
+  for row in c.execute('SELECT audio_id, prompt, lower(prompt) FROM audiomodel WHERE user_id = "' + user_id + '" ORDER BY length(prompt)'):
+    source_file = os.path.join(mrs_dir, "backend", "audio_files", user_id, row[0] + ".wav")
+    if exists(source_file):
+      metadata.write(row[0] + "|" + row[1] + "|" + row[2] + "\n")
+      copyfile(source_file, os.path.join(output_dir_audio_temp, row[0] + ".wav"))
+    else:
+      print("Wave file {} not found.".format(source_file))
+
+  metadata.close()
+  conn.close()
+
+def cleanup():
+  global output_dir_audio_temp
+
+  # Remove Temp Folder
+  rmtree(output_dir_audio_temp)
+
+def main():
+  parser = argparse.ArgumentParser()
+  parser.add_argument('--mrs_dir', required=True)
+  parser.add_argument('--ffmpeg', required=False, default=False)
+  args = parser.parse_args()
+  
+  if not os.path.isdir(os.path.join(args.mrs_dir,"backend")):
+    sys.exit("Passed directory is no valid Mimic-Recording-Studio main directory!")
+
+  print('\n\033[48;5;22m  MRS to LJ Speech Processor  \033[0m\n')
+
+  create_folders()
+  create_meta_data(args.mrs_dir)
+
+  if(args.ffmpeg):
+    convert_audio()
+  
+  else:
+    copy_audio()
+  
+  cleanup()
+
+  print('\n\033[38;5;86;1m✔\033[0m COMPLETE【ツ】\n')
+
+if __name__ == '__main__':
+  main()
--- a/helperScripts/README.md
+++ b/helperScripts/README.md
@ -0,0 +1,27 @@
+# Short collection of helpful scripts for dataset creation and/or TTS training stuff
+
+## MRS2LJSpeech
+Python script which takes recordings (filesystem and sqlite db) done with Mycroft Mimic-Recording-Studio (https://github.com/MycroftAI/mimic-recording-studio) and creates an audio optimized dataset in widely supported LJSpeech directory structure.
+
+Peter Schmalfeldt (https://github.com/manifestinteractive) did an amazing job as he optimized my originally (quick'n dirty) version of that script, so thank you Peter :-)
+See more details here: https://gist.github.com/manifestinteractive/6fd9be62d0ede934d4e1171e5e751aba#file-mrs2ljspeech-py
+
+## Dockerfile.Jetson-Coqui
+> Add your user to `docker` group to not require sudo on all operations.
+
+Thanks to NVIDIA for providing docker images for Jetson platform. I use the "machine learning (ML)" image as baseimage for setting up a Coqui environment.
+
+> You can use any branch or tag as COQUI_BRANCH argument. v0.1.3 is just the current stable version.
+
+Switch to directory where Dockerfile is in and run `nvidia-docker build . -f Dockerfile.Jetson-Coqui --build-arg COQUI_BRANCH=v0.1.3 -t jetson-coqui` to build your container image. When build process is finished you can start a container on that image.
+
+
+### Mapped volumes
+We need to bring your dataset and configuration file into our container so we should map a volume on running container
+`nvidia-docker run -p 8888:8888 -d --shm-size 32g --gpus all -v [host path with dataset and config.json]:/coqui/TTS/data jetson-coqui`. Now we have a running container ready for Coqui TTS magic.
+
+### Jupyter notebook
+Coqui provides lots of useful Jupyter notebooks for dataset analysis. Once your container is up and running you should be able to call 
+
+### Running bash into container
+`nvidia-docker exec -it jetson-coqui /bin/bash` now you're inside the container and an `ls /coqui/TTS/data` should show your dataset files.
--- a/helperScripts/getDatasetSpeechRate.py
+++ b/helperScripts/getDatasetSpeechRate.py
@ -0,0 +1,41 @@
+# This script gets speech rate per audio recording from a voice dataset (ljspeech file and directory structure)
+# Writte by Thorsten Müller (deep-learning-german@gmx.net) and provided without any warranty.
+# https://github.com/thorstenMueller/deep-learning-german-tts/
+# https://twitter.com/ThorstenVoice
+
+# Changelog:
+# v0.1 - 26.09.2021 - Initial version
+
+from genericpath import exists
+import os
+import librosa
+import csv
+
+dataset_dir = "/home/thorsten/___dev/tts/dataset/Thorsten-neutral-Dec2021-44k/" # Directory where metadata.csv is in
+out_csv_file = os.path.join(dataset_dir,"speech_rate_report.csv")
+decimal_use_comma = True # False: Splitting decimal value with a dot (.); True: Comma (,)
+
+out_csv = open(out_csv_file,"w")
+out_csv.write("filename;audiolength_sec;number_chars;chars_per_sec;remove_from_dataset\n")
+
+# Open metadata.csv file
+with open(os.path.join(dataset_dir,"metadata.csv")) as csvfile:
+    reader = csv.reader(csvfile, delimiter='|')
+    for row in reader:
+        wav_file = os.path.join(dataset_dir,"wavs",row[0] + ".wav")
+
+        if exists(wav_file):
+            # Gather values for report.csv output
+            phrase_len = len(row[1]) - 1 # Do not count punctuation marks.
+            duration = round(librosa.get_duration(filename=wav_file),2)
+            char_per_sec = round(phrase_len / duration,2)
+
+            if decimal_use_comma:
+                duration = str(duration).replace(".",",")
+                char_per_sec = str(char_per_sec).replace(".",",")
+
+            out_csv.write(row[0] + ".wav;" + str(duration) + ";" + str(phrase_len) + ";" + str(char_per_sec) + ";no\n")
+        else:
+            print("File " + wav_file + " does not exist.")
+
+out_csv.close()
--- a/helperScripts/removeFilesFromDataset.py
+++ b/helperScripts/removeFilesFromDataset.py
@ -0,0 +1,48 @@
+# This script removes recordings from an ljspeech file/directory structured dataset based on CSV file from "getDatasetSpeechRate"
+# Writte by Thorsten Müller (deep-learning-german@gmx.net) and provided without any warranty.
+# https://github.com/thorstenMueller/deep-learning-german-tts/
+# https://twitter.com/ThorstenVoice
+
+# Changelog:
+# v0.1 - 26.09.2021 - Initial version
+
+import os
+import csv
+import shutil
+
+dataset_dir = "/Users/thorsten/Downloads/thorsten-export-20210909/" # Directory where metadata.csv is in
+subfolder_removed = "___removed"
+in_csv_file = os.path.join(dataset_dir,"speech_rate_report.csv")
+to_remove = []
+
+# Open metadata.csv file
+with open(os.path.join(dataset_dir,in_csv_file)) as csvfile:
+    reader = csv.reader(csvfile, delimiter=';')
+    for row in reader:
+        if row[4] == "yes":
+            # Recording in that row should be removed from dataset
+            to_remove.append(row[0])
+            print("Recording " + row[0] + " will be removed from dataset.")
+
+print("\n" + str(len(to_remove)) + " recordings has been marked for deletion.")
+
+if len(to_remove) > 0:
+
+    metadata_cleaned = open(os.path.join(dataset_dir,"metadata_cleaned.csv"),"w")
+
+    # Create new subdirectory for removed wav files
+    removed_dir = os.path.join(dataset_dir,subfolder_removed)
+    if not os.path.exists(removed_dir):
+        os.makedirs(removed_dir)
+
+    # Remove lines from metadata.csv and move wav files to new subdirectory
+    with open(os.path.join(dataset_dir,"metadata.csv")) as csvfile:
+        reader = csv.reader(csvfile, delimiter='|')
+        for row in reader:
+            if (row[0] + ".wav") not in to_remove:
+                metadata_cleaned.write(row[0] + "|" + row[1] + "|" + row[2] + "\n")
+            else:
+                # Move recording to new subfolder
+                shutil.move(os.path.join(dataset_dir,"wavs",row[0] + ".wav"),removed_dir)
+    
+    metadata_cleaned.close()
--- a/samples/thorsten-21.06-emotional/amused.wav
+++ b/samples/thorsten-21.06-emotional/amused.wav
--- a/samples/thorsten-21.06-emotional/angry.wav
+++ b/samples/thorsten-21.06-emotional/angry.wav
--- a/samples/thorsten-21.06-emotional/disgusted.wav
+++ b/samples/thorsten-21.06-emotional/disgusted.wav
--- a/samples/thorsten-21.06-emotional/drunk.wav
+++ b/samples/thorsten-21.06-emotional/drunk.wav
--- a/samples/thorsten-21.06-emotional/neutral.wav
+++ b/samples/thorsten-21.06-emotional/neutral.wav
--- a/samples/thorsten-21.06-emotional/sleepy.wav
+++ b/samples/thorsten-21.06-emotional/sleepy.wav
--- a/samples/thorsten-21.06-emotional/surprised.wav
+++ b/samples/thorsten-21.06-emotional/surprised.wav
--- a/samples/thorsten-21.06-emotional/whisper.wav
+++ b/samples/thorsten-21.06-emotional/whisper.wav
Author	SHA1	Message	Date
Thorsten Müller	f13bcaf63e	Added Windows TTS training recipe Added modified vits recipe for Thorsten-Voice model training using Windows	2023-03-05 16:19:50 +01:00
Thorsten Müller	04c5683194	German Corpus for Mimic-Recording-Studio	2022-12-16 22:54:02 +01:00
Thorsten Müller	50e09d49bf	Added social media info	2022-11-13 17:08:26 +01:00
Thorsten Müller	b0afed75f4	Added new 2022.10 ThorstenVoice dataset.	2022-11-13 16:47:46 +01:00
Thorsten Müller	9b7b4c6836	Added new released Tacotron2 DDC model to README tts-server --model_name tts_models/de/thorsten/tacotron2-DDC	2022-08-23 19:03:47 +02:00
Thorsten Müller	aba10bc64a	Added info on new VITS model.	2022-06-24 18:00:12 +02:00
Thorsten Müller	07e85b3905	Merge pull request #35 from thorstenMueller/thorstenMueller-patch-1 Add new project logo to header.	2022-05-09 20:56:03 +02:00
Thorsten Müller	e08d50d6bb	Added new logo to header	2022-05-09 20:46:17 +02:00
Thorsten Müller	e691aa4ee3	Delete Logo_Thorsten-Voice-kleiner.jpg	2022-05-09 20:45:31 +02:00
Thorsten Müller	625f73e986	Delete Logo_Thorsten-Voice.jpg	2022-05-09 20:45:18 +02:00
Thorsten Müller	de1802f8ce	Update README.md	2022-05-09 20:34:50 +02:00
Thorsten Müller	f0500309d6	Test with embedded logo	2022-05-09 18:19:02 +02:00
Thorsten Müller	41c91b9865	Add files via upload	2022-05-09 18:14:50 +02:00
Thorsten Müller	fcb1e705a9	Add files via upload	2022-05-09 18:13:06 +02:00
Thorsten Müller	b8802db4f8	Uploaded transparent Thorsten-Voice logo.	2022-05-09 18:12:03 +02:00
Thorsten Müller	b00c768343	Added badge links.	2022-04-28 18:13:49 +02:00
Thorsten Mueller	3b0b4f898f	Fixed typo.	2022-04-24 09:13:13 +02:00
Thorsten Müller	2106fc6b00	Test	2022-04-23 23:31:15 +02:00
Thorsten Müller	e4ff3ce04a	Initial draft FUNDING.yml	2022-04-23 23:29:15 +02:00
Thorsten Müller	f408508cd7	Merge pull request #31 from thorstenMueller/prep-thorsten-22.05 Merge new README (preparation for new TTS model release)	2022-04-23 23:26:17 +02:00
Thorsten Mueller	6b4cfb41d4	Added Youtube link.	2022-04-23 23:22:27 +02:00
Thorsten Mueller	521dd33483	Updated TOC	2022-04-23 21:15:26 +02:00
Thorsten Mueller	6efb25310a	preparations for new Thorsten models	2022-04-23 21:13:30 +02:00
Thorsten Müller	5654397f3e	Add citation file.	2022-04-20 23:48:54 +02:00
Thorsten Mueller	b5ec9ef991	Fixed minor issues	2022-02-15 17:52:03 +01:00
Thorsten Mueller	77ad01d4ff	Making ffmpeg conversion optional.	2022-02-15 17:28:40 +01:00
Thorsten Mueller	c35507b1f7	Added link for VoiceLunch slides.	2022-01-03 20:09:43 +01:00
Thorsten Mueller	b536dfd958	Added check if audio file exists in getDatasetSpeechRate	2021-12-19 18:44:01 +01:00
Thorsten Mueller	29238f2a31	Updated Download links / Cites	2021-12-11 17:44:49 +01:00
Thorsten Müller	8c5f4503f3	Added two hyperlinks To http://www.Thorsten-Voice.de and https://OpenVoice-Tech.net Wiki	2021-11-28 11:33:54 +01:00
Thorsten Mueller	2ff7e3961b	Added Forward Tacotron samples.	2021-10-30 21:48:21 +02:00
Thorsten Müller	1221713314	Remove Wikipedia link to "Thorsten (Stimme)"	2021-10-23 16:52:59 +02:00
Thorsten Mueller	d3225b48f8	Added Citation to README.	2021-10-08 18:22:34 +02:00
Thorsten Mueller	33c030f844	Added two scripts for dataset analysis/cleaning.	2021-09-28 06:10:21 +02:00
Thorsten Müller	2daabae53e	Added DOIs in README	2021-09-24 16:32:16 +02:00
Thorsten Müller	1d445b09f8	Added DOI badge for emotional dataset	2021-09-23 21:58:54 +02:00
Thorsten Mueller	2853f111dc	Merge branch 'master' of https://github.com/thorstenMueller/deep-learning-german-tts	2021-09-18 16:04:59 +02:00
Thorsten Mueller	7540606247	Added download link for new recording-in-progress neutral dataset.	2021-09-18 16:04:33 +02:00
Thorsten Mueller	0b9e929ce0	Added Fullband-MelGAN model download path. Thanks to (see #26 )	2021-08-20 06:02:47 +02:00
Thorsten Mueller	bc06fa923f	Added info on TensorVox by ZDisket - thanks :-)	2021-08-12 18:30:55 +02:00
Thorsten Mueller	f19144b085	Adjusted quick setup example to new vocoder model.	2021-08-06 09:50:44 +02:00
Thorsten Müller	251c093ad4	Added locale settings for german Umlaut handling.	2021-08-04 09:24:51 +02:00
Thorsten Mueller	f505fd38df	Dockerfile draft for NVIDIA Jetson Xavier AGX and Coqui	2021-08-02 19:54:38 +02:00
Thorsten Mueller	3e09ae8615	Added link to my Youtube channel.	2021-07-21 22:49:47 +02:00
Thorsten Mueller	2ed2413dda	Explain how i recorded emotional phrases.	2021-07-13 21:53:55 +02:00
Thorsten Mueller	51c5f55bbd	Added check that recording exists before export.	2021-07-12 23:27:50 +02:00
Thorsten Mueller	4f875ac591	Added --mrs_dir param for more flexibility	2021-07-07 22:00:47 +02:00
Thorsten Mueller	2ea44ede87	Added REAME for helperScripts	2021-07-04 22:38:38 +02:00
Thorsten Mueller	ba60fc57d4	Added script to create LJSpeech dataset out of Mimic-Recording-Studio recordings.	2021-07-04 22:33:38 +02:00
Thorsten Müller	9e68d99ee7	Updated emotional dataset v02 download link	2021-06-20 08:57:39 +02:00
Thorsten Mueller	7172604eed	Added v02 emotional dataset (drunk + whispering)	2021-06-13 10:59:04 +02:00
Thorsten Mueller	58dece7c55	Added chapter on public talks	2021-06-08 07:18:30 +02:00
Thorsten Mueller	c81f374aca	Test Commit	2021-06-07 21:52:31 +02:00
Thorsten Mueller	2c6aca780b	Added table with trained model checkpoint downloads	2021-05-11 22:34:10 +02:00
Thorsten Müller	68e60f2a92	Format Wikipedia link	2021-04-22 18:57:40 +02:00
Thorsten Mueller	a3b0dde296	Added info about Wikipedia article	2021-04-22 18:53:39 +02:00
Thorsten Mueller	28d81a0fb2	Update on emotional dataset info	2021-04-11 11:42:24 +02:00
Thorsten Mueller	12c6d26dbd	Moved emotional samples to other table.	2021-04-11 11:39:29 +02:00
Thorsten Mueller	4c06db69dd	Added silero models to audio comparison	2021-04-11 11:04:20 +02:00
Thorsten Müller	bae96a75a5	Added badge for link to TTS comparison page	2021-04-09 19:29:24 +02:00
Thorsten Müller	1313520064	Playing around with some cool badges :-)	2021-04-09 19:05:43 +02:00
Thorsten Mueller	e2ecf68c13	added details on coqui model usage.	2021-04-05 16:57:36 +02:00
Thorsten Mueller	c8a5e1082e	Small TOC fix	2021-04-03 23:48:10 +02:00
Thorsten Mueller	40aae591d7	Small fixes in TOC	2021-04-03 23:45:46 +02:00
Thorsten Mueller	4f722e96a9	Adding info on emotional dataset.	2021-04-03 23:24:53 +02:00
Thorsten Müller	7e1530b742	Merge pull request #14 from snakers4/master Add silero-models	2021-04-03 22:12:09 +02:00
snakers4	647786be6c	Add silero-models	2021-04-03 05:17:14 +00:00
Thorsten Müller	00685a008d	Added cute sloth smiley.	2021-03-30 12:07:41 +02:00
Thorsten Mueller	e5481a82a6	Added smaller logo	2021-03-30 08:00:58 +02:00
Thorsten Mueller	2d1428cd13	Switch to non-transparent logo	2021-03-30 07:55:08 +02:00
Thorsten Mueller	df55a19ae2	Added ThorstenVoice logo	2021-03-30 07:53:48 +02:00
Thorsten Müller	9585b73cc3	Modify title	2021-03-16 20:23:29 +01:00
Thorsten Müller	70158ba7c8	Small README updates	2021-03-16 18:51:21 +01:00
Thorsten Mueller	e1e9f8666a	Small text adjustments and formatting on README.	2021-03-16 18:41:39 +01:00
Thorsten Müller	cca10c215e	Added download link to v03 dataset.	2021-02-10 19:46:21 +01:00
Thorsten Mueller	09705597b8	Merge branch 'master' of https://github.com/thorstenMueller/deep-learning-german-tts	2021-01-23 18:50:15 +01:00
Thorsten Mueller	bdb3aa7d47	Added hifiGAN samples trained by SanjaESC	2021-01-23 18:15:56 +01:00
Thorsten Müller	f0c0f63ae1	Added nice guy SanjaESC to thanks section	2021-01-22 16:24:56 +01:00
Thorsten Müller	036c266ad7	Added Sebastian to thanks section - Thank you :-)	2021-01-16 08:24:10 +01:00
Thorsten Mueller	8e6137b3af	Added wavegrad samples (training in progress)	2020-12-14 17:53:32 +01:00
Thorsten Mueller	9ee0353da4	Changed main and subheading for TensorFlowTTS	2020-12-02 12:23:20 +01:00
Thorsten Mueller	a99d4b6477	Added first samples for TensorFlowTTS	2020-12-02 12:14:16 +01:00
Thorsten Mueller	02020e54f7	added sample 05 for griffin lim.	2020-11-21 10:19:13 +01:00
Thorsten Mueller	5347394f3e	Added Griffin Lim vocoder samples	2020-11-21 10:08:08 +01:00
Thorsten Mueller	c59d19e0a1	Added detail on glowtts training steps.	2020-11-17 22:04:09 +01:00
Thorsten Mueller	e45736f62d	added sample05 with GlowTTS.	2020-11-17 21:53:08 +01:00
Thorsten Mueller	e96de3a095	fixed typo	2020-11-16 18:25:38 +01:00
Thorsten Mueller	eaead5cebe	Rename to docs folder for Github pages	2020-11-16 17:28:20 +01:00
Thorsten Mueller	7b27bdac2d	Added github page with index and sample wavs	2020-11-16 17:25:42 +01:00
Thorsten Müller	f55e16d0fc	fixed typo	2020-09-23 19:32:27 +02:00
				`@ -0,0 +1,2 @@`
				`# These are supported funding model platforms`