Adding info on emotional dataset.

2025-06-27 13:11:48 +02:00 · 2021-04-03 23:24:53 +02:00 · 2021-04-03 23:24:53 +02:00 · 4f722e96a9
commit 4f722e96a9
parent 7e1530b742
7 changed files with 65 additions and 21 deletions
--- a/README.md
+++ b/README.md
@ -1,14 +1,18 @@
 ![Thorsten - Open German Voice Dataset](./img/ThorstenVoice_Logo_Small.png "Thorsten - Open German Voice Dataset")
 - [Introduction to "Thorsten-Voice" :speaking_head: :speech_balloon: :sloth:](#introduction-to-thorsten-voice-speaking_head-speech_balloon-sloth)
-  - [**A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.**](#a-free-to-use-offline-working-high-quality-german-tts-voice-should-be-available-for-every-project-without-any-license-struggling)
+  
-  - [True, but what is this all about](#true-but-what-is-this-all-about)
+- [A personal note](#please-read-some-personal-words-before-using-dataset--tts-models)
- [Please read some personal words before using dataset / TTS models](#please-read-some-personal-words-before-using-dataset--tts-models)
+
-  - [Dataset "Thorsten"](#dataset-thorsten)
+- [Voice "Thorsten" (neutral)](#dataset-thorsten-neutral)
-    - [Samples of my voice](#samples-of-my-voice)
+  - [Samples of my original voice](#samples-of-my-voice)
-    - [Dataset information :microphone:](#dataset-information-microphone)
+  - [Dataset information :microphone:](#dataset-information-microphone)
-    - [Dataset evolution](#dataset-evolution)
+  - [Dataset evolution / changelog](#dataset-evolution)
-    - [Download information](#download-information)
+  - [Download information](#download-information)
 - [Voice "Thorsten" (emotional)](#Dataset "Thorsten" (**emotional**))
  - [Pretrained TTS models](#pretrained-tts-models)
    - [Trained models](#trained-models)
    - [Pre-trained Silero-models](#pre-trained-silero-models)
@ -27,8 +31,9 @@ Speaking tech devices and voice based smart assistants are very popular ourdays.
 > I want to (*hopefully*) fill that german TTS gap and make the most personal contribution i can give.<br>
 **I contribute my personal voice!** :green_heart:
-This contribution is split into two parts:
+## This contribution is split into three parts:
-* "Thorsten" dataset
+* "Thorsten" **neutral** dataset
 * "Thorsten" **emotional** dataset
 * Pretrained TTS models based on "Thorsten" dataset
 # Please read some personal words before using dataset / TTS models
@ -38,9 +43,13 @@ This contribution is split into two parts:
 **tl;dr** Please don't use for evil!
-## Dataset "Thorsten"
+# Datasets
-> Please keep in mind that **i am no professional voice artists**. I'm just a normal guy sharing his voice with you.
+
-### Samples of my voice
+
 > For both datasets please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with you.
 ## Dataset "Thorsten" neutral
 ### Samples of my neutral voice
 To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.
 * [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
@ -51,7 +60,6 @@ To get an impression what my voice sounds to decide if it fits to your project i
 * [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
 * [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )
 ### Dataset information :microphone:
 * ljspeech-1.1 structure
@ -83,7 +91,7 @@ As described in the pdf document ([evolution of thorsten dataset](./EvolutionOfT
 If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
-### Download information
+### Download information (**neutral dataset**)
 > Download size: 2,7GB
 | Version         | Description                                                                                       | Date       | Link                                                                                                            |
@ -93,15 +101,49 @@ If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you
 | thorsten-de-v03 | Based on v02 dataset, but with increased speed by 10% (using ffmpeg atempo=1.1).                  | 2021-02-10 | [Google Drive Download v03](https://drive.google.com/file/d/134_UramfCRoAxRrOnhbPJ2YHHTwxRtr-/view?usp=sharing) |
-## Pretrained TTS models
+## Dataset "Thorsten" (**emotional**)
 ### Samples of my emotional voice
 *Btw. i mentioned, that i'm no professional voice talent, did i?*
 > "Mist, wieder nichts geschafft."
 * [neutral](./samples/emotional_recording/neutral.wav)
 * [disgusted](./samples/emotional_recording/disgusted.wav)
 * [angry](./samples/emotional_recording/angry.wav)
 * [amused](./samples/emotional_recording/amused.wav)
 * [surprised](./samples/emotional_recording/surprised.wav)
 * [sleepy](./samples/emotional_recording/sleepy.wav)
 ### Emotional dataset information :microphone:
 * 300 sentences * 6 emotions = 1.800 recordings
 * recorded by Thorsten Müller (optimized by Dominik Kreutz)
 * mono
 * samplerate 22.050Hz
 * normalized to -24dB
 * no silence at beginning/ending
 * sentence length: 59 - 148 chars
 | Emotion   | Minutes |
 |-----------|---------|
 | Normal :slightly_smiling_face:    | 19 min. |
 | Disgusted :nauseated_face: | 23 min. |
 | Angry :angry:    | 20 min. |
 | Amused :grinning:    | 18 min. |
 | Surprised :astonished: | 18 min. |
 | Sleepy :pensive:    | 30 min. |
 ### Download **emotional** dataset
 > Download size: 300MB
 | Version         | Description                                                                                       | Date       | Link                                                                                                            |
 | --------------- | ------------------------------------------------------------------------------------------------- | ---------- | --------------------------------------------------------------------------------------------------------------- |
 | thorsten-de-emotional-v01 | Initial version                                                                                   | 2021-04-03 | [Google Drive Download v01](https://drive.google.com/file/d/1fm2IqXMLr6jaZCgG_Mt4vq_O3ZubiIQ6/view?usp=sharing) |
 # Pretrained TTS models
 If you trained a model on "Thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated.
 My personal training sessions are based on TTS repo code (originally initiated by Mozilla) and now maintained through https://www.coqui.ai (:frog:)
-
+## Coqui models
-### Trained models
+todo
-> training in progress. Available models will be listed here in future.
+## Silero-models
 ### Pre-trained Silero-models
 You can use a free A-GPL licensed models trained on this dataset via the [silero-models](https://github.com/snakers4/silero-models) project. The full list of models including their older version is available via this [yaml file](https://github.com/snakers4/silero-models/blob/master/models.yml).
@ -111,6 +153,7 @@ You can use a free A-GPL licensed models trained on this dataset via the [silero
 | thorsten_16khz | m      | de       | [8000](https://drive.google.com/drive/folders/1mpQCK5E_IqhcSurnYuGePJiJWL4ZL08z?usp=sharing) / [16000](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
 # Feel free to file an issue if you ...
 * have improvements on dataset
 * use my TTS voice in your project(s)
@ -156,3 +199,4 @@ Thank you Dominik (@domcross / https://github.com/domcross/)
 We'll hear us in future :speaking_head:
 Thorsten
 (https://twitter.com/ThorstenVoice)
--- a/samples/emotional_recording/amused.wav
+++ b/samples/emotional_recording/amused.wav
--- a/samples/emotional_recording/angry.wav
+++ b/samples/emotional_recording/angry.wav
--- a/samples/emotional_recording/disgusted.wav
+++ b/samples/emotional_recording/disgusted.wav
--- a/samples/emotional_recording/neutral.wav
+++ b/samples/emotional_recording/neutral.wav
--- a/samples/emotional_recording/sleepy.wav
+++ b/samples/emotional_recording/sleepy.wav
--- a/samples/emotional_recording/surprised.wav
+++ b/samples/emotional_recording/surprised.wav