diff --git a/README.md b/README.md index 195b95b..e1b0bb6 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,18 @@ ![Thorsten - Open German Voice Dataset](./img/ThorstenVoice_Logo_Small.png "Thorsten - Open German Voice Dataset") - [Introduction to "Thorsten-Voice" :speaking_head: :speech_balloon: :sloth:](#introduction-to-thorsten-voice-speaking_head-speech_balloon-sloth) - - [**A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.**](#a-free-to-use-offline-working-high-quality-german-tts-voice-should-be-available-for-every-project-without-any-license-struggling) - - [True, but what is this all about](#true-but-what-is-this-all-about) -- [Please read some personal words before using dataset / TTS models](#please-read-some-personal-words-before-using-dataset--tts-models) - - [Dataset "Thorsten"](#dataset-thorsten) - - [Samples of my voice](#samples-of-my-voice) - - [Dataset information :microphone:](#dataset-information-microphone) - - [Dataset evolution](#dataset-evolution) - - [Download information](#download-information) + +- [A personal note](#please-read-some-personal-words-before-using-dataset--tts-models) + +- [Voice "Thorsten" (neutral)](#dataset-thorsten-neutral) + - [Samples of my original voice](#samples-of-my-voice) + - [Dataset information :microphone:](#dataset-information-microphone) + - [Dataset evolution / changelog](#dataset-evolution) + - [Download information](#download-information) + +- [Voice "Thorsten" (emotional)](#Dataset "Thorsten" (**emotional**)) + + - [Pretrained TTS models](#pretrained-tts-models) - [Trained models](#trained-models) - [Pre-trained Silero-models](#pre-trained-silero-models) @@ -27,8 +31,9 @@ Speaking tech devices and voice based smart assistants are very popular ourdays. > I want to (*hopefully*) fill that german TTS gap and make the most personal contribution i can give.
**I contribute my personal voice!** :green_heart: -This contribution is split into two parts: -* "Thorsten" dataset +## This contribution is split into three parts: +* "Thorsten" **neutral** dataset +* "Thorsten" **emotional** dataset * Pretrained TTS models based on "Thorsten" dataset # Please read some personal words before using dataset / TTS models @@ -38,9 +43,13 @@ This contribution is split into two parts: **tl;dr** Please don't use for evil! -## Dataset "Thorsten" -> Please keep in mind that **i am no professional voice artists**. I'm just a normal guy sharing his voice with you. -### Samples of my voice +# Datasets + + +> For both datasets please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with you. + +## Dataset "Thorsten" neutral +### Samples of my neutral voice To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first. * [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav ) @@ -51,7 +60,6 @@ To get an impression what my voice sounds to decide if it fits to your project i * [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav ) * [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav ) - ### Dataset information :microphone: * ljspeech-1.1 structure @@ -83,7 +91,7 @@ As described in the pdf document ([evolution of thorsten dataset](./EvolutionOfT If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file. -### Download information +### Download information (**neutral dataset**) > Download size: 2,7GB | Version | Description | Date | Link | @@ -93,15 +101,49 @@ If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you | thorsten-de-v03 | Based on v02 dataset, but with increased speed by 10% (using ffmpeg atempo=1.1). | 2021-02-10 | [Google Drive Download v03](https://drive.google.com/file/d/134_UramfCRoAxRrOnhbPJ2YHHTwxRtr-/view?usp=sharing) | -## Pretrained TTS models +## Dataset "Thorsten" (**emotional**) +### Samples of my emotional voice +*Btw. i mentioned, that i'm no professional voice talent, did i?* +> "Mist, wieder nichts geschafft." +* [neutral](./samples/emotional_recording/neutral.wav) +* [disgusted](./samples/emotional_recording/disgusted.wav) +* [angry](./samples/emotional_recording/angry.wav) +* [amused](./samples/emotional_recording/amused.wav) +* [surprised](./samples/emotional_recording/surprised.wav) +* [sleepy](./samples/emotional_recording/sleepy.wav) +### Emotional dataset information :microphone: +* 300 sentences * 6 emotions = 1.800 recordings +* recorded by Thorsten Müller (optimized by Dominik Kreutz) +* mono +* samplerate 22.050Hz +* normalized to -24dB +* no silence at beginning/ending +* sentence length: 59 - 148 chars + +| Emotion | Minutes | +|-----------|---------| +| Normal :slightly_smiling_face: | 19 min. | +| Disgusted :nauseated_face: | 23 min. | +| Angry :angry: | 20 min. | +| Amused :grinning: | 18 min. | +| Surprised :astonished: | 18 min. | +| Sleepy :pensive: | 30 min. | + +### Download **emotional** dataset +> Download size: 300MB + +| Version | Description | Date | Link | +| --------------- | ------------------------------------------------------------------------------------------------- | ---------- | --------------------------------------------------------------------------------------------------------------- | +| thorsten-de-emotional-v01 | Initial version | 2021-04-03 | [Google Drive Download v01](https://drive.google.com/file/d/1fm2IqXMLr6jaZCgG_Mt4vq_O3ZubiIQ6/view?usp=sharing) | + + +# Pretrained TTS models If you trained a model on "Thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated. My personal training sessions are based on TTS repo code (originally initiated by Mozilla) and now maintained through https://www.coqui.ai (:frog:) - -### Trained models -> training in progress. Available models will be listed here in future. - -### Pre-trained Silero-models +## Coqui models +todo +## Silero-models You can use a free A-GPL licensed models trained on this dataset via the [silero-models](https://github.com/snakers4/silero-models) project. The full list of models including their older version is available via this [yaml file](https://github.com/snakers4/silero-models/blob/master/models.yml). @@ -111,6 +153,7 @@ You can use a free A-GPL licensed models trained on this dataset via the [silero | thorsten_16khz | m | de | [8000](https://drive.google.com/drive/folders/1mpQCK5E_IqhcSurnYuGePJiJWL4ZL08z?usp=sharing) / [16000](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | + # Feel free to file an issue if you ... * have improvements on dataset * use my TTS voice in your project(s) @@ -156,3 +199,4 @@ Thank you Dominik (@domcross / https://github.com/domcross/) We'll hear us in future :speaking_head: Thorsten +(https://twitter.com/ThorstenVoice) diff --git a/samples/emotional_recording/amused.wav b/samples/emotional_recording/amused.wav new file mode 100644 index 0000000..bec7814 Binary files /dev/null and b/samples/emotional_recording/amused.wav differ diff --git a/samples/emotional_recording/angry.wav b/samples/emotional_recording/angry.wav new file mode 100644 index 0000000..ebc7bd5 Binary files /dev/null and b/samples/emotional_recording/angry.wav differ diff --git a/samples/emotional_recording/disgusted.wav b/samples/emotional_recording/disgusted.wav new file mode 100644 index 0000000..37ea523 Binary files /dev/null and b/samples/emotional_recording/disgusted.wav differ diff --git a/samples/emotional_recording/neutral.wav b/samples/emotional_recording/neutral.wav new file mode 100644 index 0000000..4c1ac8d Binary files /dev/null and b/samples/emotional_recording/neutral.wav differ diff --git a/samples/emotional_recording/sleepy.wav b/samples/emotional_recording/sleepy.wav new file mode 100644 index 0000000..d9c7dab Binary files /dev/null and b/samples/emotional_recording/sleepy.wav differ diff --git a/samples/emotional_recording/surprised.wav b/samples/emotional_recording/surprised.wav new file mode 100644 index 0000000..1ff1c20 Binary files /dev/null and b/samples/emotional_recording/surprised.wav differ