Formatting

2022-04-23 15:55:09 +02:00 · 2022-04-23 15:32:19 +02:00 · 2022-04-23 15:30:13 +02:00 · 2022-04-23 15:29:08 +02:00 · 2022-04-23 15:07:19 +02:00 · 2022-04-23 15:03:20 +02:00
14 changed files with 189 additions and 104 deletions
--- a/README.md
+++ b/README.md
@ -1,126 +1,210 @@
-# Introduction
-Many smart voice assistants like Amazon Alexa, Google Home, Apple Siri and Microsoft Cortana use cloud services to offer their (base) functionality.
+- [Project motivation](#motivation-for-thorsten-voice-project-speaking_head-speech_balloon)
+  
+- [Personal note](#some-personal-words-before-using-thorsten-voice)

-As some people have privacy concerns using these services there are some (open source) projects trying to build offline and/or privacy aware alternatives.
+- [**Thorsten** Voice Datasets](#voice-datasets)
+  - [Thorsten-21.02-neutral](#thorsten-2102-neutral)
+  - [Thorsten-21.06-emotional](#thorsten-2106-emotional)
+  - [Thorsten-22.05-neutral](#thorsten-2205-neutral)

-But speech recognition and text synthesis still requires cloud services for providing these in a decent quality.
+- [**Thorsten** TTS-Models](#tts-models)
+  - [Thorsten-21.04-Tacotron2-DCA](#thorsten-2104-tacotron2-dca)
+  - [Thorsten-22.05-VITS](#thorsten-2205-vits)
+  - [Thorsten-22.05-Tacotron2-DDC](#thorsten-2205-tacotron2-ddc)
+  - [Audio samples](#tts-audio-samples)
+  - [Other models](#other-models)
+  
+- [Public talks](#public-talks)

-# MyCroft AI
-> https://mycroft.ai/
-
-MyCroft is a company developing an opensource voice assistant with a very nice and active community. But the stt/tts parts are still cloud based (eg. google services), even if requests are anonymized by a mycroft proxy in between. But integration with locally hosted services such as deepspeech (stt) or mimic/tacotron (tts) is possible.
-
-# Mozilla
-Mozilla works on these really important aspects for free and open human machine voice interaction.
-
-## STT - speech to text
-> https://commonvoice.mozilla.org/
-
-"STT" needs lots of audio training data by many speakers (women/men/kids) of all ages, dialects and in various audio quality levels. So any voice contribution for common voice project is highly welcome.
-
-## TTS - text to speech
-> https://github.com/mozilla/tts
-
-"TTS" needs lots of clean recordings by one speaker to train a model. Mozilla is developing a software stack for proper model training based on tacotron2 papers.
-
-# And?!
-I want to make the most personal contribution i can give and contribute my personal voice (**german**) for TTS training to the community for free usage.
-
-## Please read some personal words before downloading the dataset
-I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone.
-
-So hopefully my voice is used in this manner to make this world a better place for all of us :-).
-
-**tl;dr** Please don't use for evil!
-
-# Dataset "thorsten"
-## Samples of my voice
-To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.
-
-* [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
-* [Der Prophet spricht stets in Gleichnissen.](./samples/original_recording/recorded_sample_02.wav )
-* [Bitte schmeißt euren Müll nicht einfach in die Walachei.](./samples/original_recording/recorded_sample_03.wav )
-* [So etwas würde mir nie in den Sinn kommen.](./samples/original_recording/recorded_sample_04.wav )
-* [Sie klettert auf einen Stein und nimmt eine Denkerpose ein.](./samples/original_recording/recorded_sample_05.wav )
-* [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
-* [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )
+- [Special Thanks](#thanks-section)


-## Dataset information
-
-* ljspeech-1.1 structure
-* 22.668 recorded phrases (wav files)
-* more than 23 hours of pure audio
-* samplerate 22.050Hz
-* mono
-* phrase length (min/avg/max): 2 / 52 / 180 chars
-* no silence at beginning/ending
-* avg spoken chars per second: 14
-* sentences with question mark: 2.780
-* sentences with exclamation mark: 1.840
+# Motivation for Thorsten-Voice project :speaking_head: :speech_balloon:
+A **free** to use, **offline** working, **high quality** **german** **TTS** voice should be available for every project without any license struggling.


-![text length vs. mean audio duration](./img/thorsten-de---datasetAnalysis1.png)
-![text length vs. median audio duration](./img/thorsten-de---datasetAnalysis2.png)
-![text length vs. STD](./img/thorsten-de---datasetAnalysis3.png)
-![text length vs. number instances](./img/thorsten-de---datasetAnalysis4.png)
-![signal noise ratio](./img/thorsten-de---datasetAnalysis5.png)
-![bokeh](./img/thorsten-de---datasetAnalysis6.png)
+[![Open Source](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://opensource.org/)
+<a href="https://twitter.com/intent/follow?screen_name=ThorstenVoice"><img src="https://img.shields.io/twitter/follow/ThorstenVoice?style=social&logo=twitter" alt="follow on Twitter"></a>
+![YouTube Channel Subscribers](https://img.shields.io/youtube/channel/subscribers/UCjqqTVVBTsxpm0iOhQ1fp9g?style=social)
+![Project website](https://img.shields.io/badge/Project_website-www.Thorsten--Voice.de-92a0c0)

-> Interested in evolution of this dataset? See following pdf document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf) )
+# Some personal words before using **Thorsten-Voice**
+> I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. :earth_africa: (*Thorsten Müller*)

-## Download information
-> Download size: 2,7GB
+Please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with the world.

-Version | Description | Date | Link
------------ | ------------- | ------------- | -------------
-thorsten-de-v01 | Initial version | 2020-06-28 | [Google Drive Download v01](https://drive.google.com/file/d/1yKJM1LAOQpRVojKunD9r8WN_p5KzBxjc/view?usp=sharing)
-thorsten-de-v02 | normalized to -24dB and split metadata.csv into shuffeled metadata_train.csv and metadata_val.csv | 2020-08-22 | [Google Drive Download v02](https://drive.google.com/file/d/1mGWfG0s2V2TEg-AI2m85tze1m4pyeM7b/view?usp=sharing)
+# Voice-Datasets
+Voice datasets are listed on Zenodo:
+| Dataset         | DOI Link                                                                                                            |
+| --------------- | ------- |
+| Thorsten-21.02-neutral | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342) |
+| Thorsten-21.06-emotional | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023) |
+| Thorsten-21.05-neutral | soon to come |
+
+## Thorsten-21.02-neutral
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525342.svg)](https://doi.org/10.5281/zenodo.5525342)
+
+```
+@dataset{muller_thorsten_2021_5525342,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {Thorsten-Voice - "Thorsten-21.02-neutral" Dataset},
+  month        = feb,
+  year         = 2021,
+  note         = {{Please use it to make the world a better place for 
+                   whole humankind.}},
+  publisher    = {Zenodo},
+  version      = {3.0},
+  doi          = {10.5281/zenodo.5525342},
+  url          = {https://doi.org/10.5281/zenodo.5525342}
+}
+```
+
+> :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1KVjGXG2ij002XRHb3fgFK4j0OEq1FsWm?usp=sharing).**
+
+### Dataset summary
+* Recorded by Thorsten Müller
+* Optimized by Dominik Kreutz
+* LJSpeech file and directory structure
+* 22.668 recorded phrases (*wav files*)
+* More than 23 hours of pure audio
+* Samplerate 22.050Hz
+* Mono
+* Normalized to -24dB
+* Phrase length (min/avg/max): 2 / 52 / 180 chars
+* No silence at beginning/ending
+* Avg spoken chars per second: 14
+* Sentences with question mark: 2.780
+* Sentences with exclamation mark: 1.840
+
+### Dataset evolution
+As described in the PDF document ([evolution of thorsten dataset](./EvolutionOfThorstenDataset.pdf)) this dataset consists of three recording phases.
+
+* **Phase 1**: Recorded with a cheap usb microphone (*low quality*)
+* **Phase 2**: Recorded with a good microphone (*good quality*)
+* **Phase 3**: Recorded with same good microphone but longer phrases (> 100 chars) (*good quality*)
+
+If you want to use a dataset subset you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.


-# Trained tacotron2 model "thorsten"
-If you trained a model on "thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated. 
+## Thorsten-21.06-emotional
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5525023.svg)](https://doi.org/10.5281/zenodo.5525023)

-## Trained models (TODO)
+```
+@dataset{muller_thorsten_2021_5525023,
+  author       = {Müller, Thorsten and
+                  Kreutz, Dominik},
+  title        = {{Thorsten-Voice - "Thorsten-21.06-emotional" 
+                   Dataset}},
+  month        = jun,
+  year         = 2021,
+  note         = {{Please use it to make the world a better place for 
+                   whole humankind.}},
+  publisher    = {Zenodo},
+  version      = {2.0},
+  doi          = {10.5281/zenodo.5525023},
+  url          = {https://doi.org/10.5281/zenodo.5525023}
+}
+```

-Folder | Date | Link | Description
------------ | ------------- | ------------- | -------------
-thorsten-taco2-ddc-v0.1 | to do | to do | to do
+All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.
+
+### Samples
+Listen to the phrase "**Mist, wieder nichts geschafft.**" in following emotions.
+
+* :slightly_smiling_face: [Neutral](./samples/emotional_recording/neutral.wav)
+* :nauseated_face: [Disgusted](./samples/emotional_recording/disgusted.wav)
+* :angry: [Angry](./samples/emotional_recording/angry.wav)
+* :grinning: [Amused](./samples/emotional_recording/amused.wav)
+* :astonished: [Surprised](./samples/emotional_recording/surprised.wav)
+* :pensive: [Sleepy](./samples/emotional_recording/sleepy.wav)
+* :dizzy_face: [Drunk](./samples/emotional_recording/drunk.wav)
+* 🤫 [Whispering](./samples/emotional_recording/whisper.wav)
+### Dataset summary
+* Recorded by Thorsten Müller
+* Optimized by Dominik Kreutz
+* 300 sentences * 8 emotions = 2.400 recordings
+* Mono
+* Samplerate 22.050Hz
+* Normalized to -24dB
+* No silence at beginning/ending
+* Sentence length: 59 - 148 chars
+
+
+## Thorsten-22.05-neutral
+> :speaking_head: **Listen to some audio recordings from this dataset [here](https://drive.google.com/drive/folders/1dxoSo8Ktmh-5E0rSVqkq_Jm1r4sFnwJM?usp=sharing).**
+
+Soon to come
+
+# TTS Models
+
+## Thorsten-21.04-Tacotron2-DCA
+This [TTS-model](https://drive.google.com/drive/folders/1m4RuffbvdOmQWnmy_Hmw0cZ_q0hj2o8B?usp=sharing) has been trained on [**Thorsten-21.02-neutral**](#thorsten-2102-neutral) dataset. The recommended trained Fullband-MelGAN Vocoder can be downloaded [here](https://drive.google.com/drive/folders/1hsfaconm4Yd9wPVyOtrXjWQs4ZAPoouY?usp=sharing).
+
+Run the model:
+* pip install TTS==0.5.0
+* tts-server --model_name tts_models/de/thorsten/tacotron2-DCA
+
+
+## Thorsten-22.05-VITS
+Trained on dataset **Thorsten-22.05-neutral**.
+> TODO
+
+## Thorsten-22.05-Tacotron2-DDC
+Trained on dataset [**Thorsten-22.05-neutral**](#thorsten-2205-neutral).
+> :speaking_head: **Listen to synthesized samples [here](https://drive.google.com/drive/folders/1cZlLYkLWKtF0cZQ74Pef8fJ8fiG1G7du?usp=sharing).**
+
+Soon to come.
+
+
+## Other models
+### Silero
+
+You can use a free A-GPL licensed models trained on **Thorsten-21.02-neutral** dataset via the [silero-models](https://github.com/snakers4/silero-models/blob/master/models.yml) project.
+
+* [Thorsten 16kHz](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing)
+* [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb)
+
+### ZDisket
+[ZDisket](https://github.com/ZDisket/TensorVox) made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by [monatis](https://github.com/monatis/german-tts). Thanks for sharing that :thumbsup:. See it in action on [Youtube](https://youtu.be/tY6_xZnkv-A).
+
+# Public talks
+I really want to bring the topic "**Open Voice For An Open Future**" to a bigger public attention.
+
+* I've been part of a Linux User Group podcast about Mycroft AI and talked on my TTS efforts on that in (*May 2021*).
+* I was invited by [Yusuf](https://github.com/monatis/) from Turkish tensorflow community to talk on "How to make machines speak with your own voice". This talk has been streamed live on Youtube and is available [here](https://www.youtube.com/watch?v=m-Uwb-Bg144&t=2303s). If you're interested on the showed slides, feel free to download my presentation [here](https://docs.google.com/presentation/d/1ynnw0ilKV3WwMSJHytrN3GXRiFr8x3r0DUimBm1y0LI/edit?usp=sharing) (*June 2021*)
+)
+* I've been invited as speaker on VoiceLunch language & linguistics on 03.01.2022. [Here are my slides](https://docs.google.com/presentation/d/1Gi6BmYHs7g4ZgdAiIKGBnBwZDCvJOD9DJxQOGlgds1o/edit?usp=sharing) (*January 2022*).
+* In addition i share my thoughts and knowledge on Open Voice on my [Youtube channel](https://www.youtube.com/c/ThorstenMueller).

 # Feel free to file an issue if you ...
-* have improvements on dataset
-* use my TTS voice in your project(s)
-* want to share your trained "thorsten" model
-* get to know about any abuse usage of my voice
+* Use my TTS voice in your project(s)
+* Want to share your trained "Thorsten" model
+* Get to know about any abuse usage of my voice

-# Special thanks
-I want to thank all open source communities for providing great projects.
+# Thanks section
+## Cool projects
+* https://commonvoice.mozilla.org/
+* https://coqui.ai/
+* https://mycroft.ai/
+* https://github.com/rhasspy/

-Just to name some nice guys who joined me on this tts-roadtrip:
+## Cool people
+* [El-Tocino](https://github.com/el-tocino/)
+* [Eren Gölge](https://github.com/erogol/)
+* [Gras64](https://github.com/gras64/)
+* [Kris Gesling](https://github.com/krisgesling/)
+* [Nmstoker](https://github.com/nmstoker)
+* [Othiele](https://discourse.mozilla.org/u/othiele/summary)
+* [Repodiac](https://github.com/repodiac)
+* [SanjaESC](https://github.com/SanjaESC)
+* [Synesthesiam](https://github.com/synesthesiam/)

-* eltocino (https://github.com/el-tocino/)
-* erogol (https://github.com/erogol/)
-* gras64 (https://github.com/gras64/)
-* krisgesling (https://github.com/krisgesling/)
-* nmstoker (https://github.com/nmstoker)
-* othiele (https://discourse.mozilla.org/u/othiele/summary)
-* repodiac (https://github.com/repodiac)
+## Even more special people
+Additionally, a really nice thanks for my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design.

-And last but not least i want to say a huge thank you to a special guy who supported me on this journey right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally his gpu computing power. 
+And last but not least i want to say a **huge, huge thank you** to a special guy who supported me on this journey as a partner right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally GPU power. 

-Without his amazing support this dataset (in it's current way) would not exists.
+**Thank you so much, dear **Dominik** ([@domcross](https://github.com/domcross/)) for being my partner on this journey.**

-Thank you Dominik (@domcross / https://github.com/domcross/)
-
-# Links
-* https://discourse.mozilla.org/t/contributing-my-german-voice-for-tts/48150
-* https://community.mycroft.ai/
-* https://github.com/MycroftAI/mimic-recording-studio
-* https://voice.mozilla.org/
-* https://github.com/mozilla/TTS
-(https://github.com/repodiac/tit-for-tat/tree/master/thorsten-TTS)
-* https://raw.githubusercontent.com/mozilla/voice-web/master/server/data/de/sentence-collector.txt
-
-We'll hear us in future :-)
-
-Thorsten
+Thorsten (*Twitter: @ThorstenVoice*)
--- a/_config.yml
+++ b/_config.yml
@ -0,0 +1 @@
+theme: jekyll-theme-architect
--- a/samples/tts_compare/21.04-DCA-die
+++ b/samples/tts_compare/21.04-DCA-die
--- a/samples/tts_compare/21.04-DCA-ich
+++ b/samples/tts_compare/21.04-DCA-ich
--- a/samples/tts_compare/21.04-DCA-ingesamt
+++ b/samples/tts_compare/21.04-DCA-ingesamt
--- a/samples/tts_compare/21.05-DCA-und
+++ b/samples/tts_compare/21.05-DCA-und
--- a/samples/tts_compare/22.05-DDC-die
+++ b/samples/tts_compare/22.05-DDC-die
--- a/samples/tts_compare/22.05-DDC-ich
+++ b/samples/tts_compare/22.05-DDC-ich
--- a/samples/tts_compare/22.05-DDC-insgesamt
+++ b/samples/tts_compare/22.05-DDC-insgesamt
--- a/samples/tts_compare/22.05-DDC-und
+++ b/samples/tts_compare/22.05-DDC-und
--- a/samples/tts_compare/22.05-VITS-die
+++ b/samples/tts_compare/22.05-VITS-die
--- a/samples/tts_compare/22.05-VITS-ich
+++ b/samples/tts_compare/22.05-VITS-ich
--- a/samples/tts_compare/22.05-VITS-insgesamt
+++ b/samples/tts_compare/22.05-VITS-insgesamt
--- a/samples/tts_compare/22.05-VITS-und
+++ b/samples/tts_compare/22.05-VITS-und
Author	SHA1	Message	Date
Thorsten Müller	186fc786cb	Formatting	2022-04-23 15:55:09 +02:00
Thorsten Mueller	99eb4c2f9f	Formatting	2022-04-23 15:32:19 +02:00
Thorsten Mueller	7a2850ccd1	Formatting	2022-04-23 15:30:13 +02:00
Thorsten Mueller	23aa867ff4	Formatting stuff	2022-04-23 15:29:08 +02:00
Thorsten Mueller	abba39410d	Revert: Open links in new tab.	2022-04-23 15:07:19 +02:00
Thorsten Mueller	11a2c9ee54	Open links in new window	2022-04-23 15:03:20 +02:00
Thorsten Mueller	972c73d789	Updated samples links.	2022-04-23 15:00:15 +02:00
Thorsten Mueller	283a79e8c2	Added dataset samples	2022-04-23 14:51:08 +02:00
Thorsten Mueller	f3ee21e8d9	Remove model dummy folder	2022-04-23 14:40:24 +02:00
Thorsten Mueller	f76a8548e8	Cite update	2022-04-23 14:16:31 +02:00
Thorsten Mueller	3a4e78ffe7	Added tts audio samples.	2022-04-22 22:30:29 +02:00
Thorsten Mueller	cc2125de53	Added emoji and sample section	2022-04-22 22:14:28 +02:00
Thorsten Müller	8dec2f4ef4	Update README	2022-04-21 21:10:28 +02:00
Thorsten Müller	7efbf34b65	Update TOC	2022-04-21 16:58:55 +02:00
Thorsten Mueller	1db1be8f83	Reorg. README	2022-04-21 16:51:52 +02:00
Thorsten Müller	3b81154d42	Set theme jekyll-theme-architect	2020-09-28 13:25:03 +02:00
Thorsten Müller	61911f230c	Set theme jekyll-theme-hacker	2020-09-28 13:23:40 +02:00
Thorsten Mueller	930a4d2803	Added script and config for taco2 + ddc training	2020-08-23 12:00:07 +02:00
Thorsten Mueller	e960ad4b6c	Added info on normalization	2020-08-22 13:15:27 +02:00