Adding info on emotional dataset.

2025-06-27 05:01:51 +02:00 · 2021-04-03 23:24:53 +02:00 · 2021-04-03 23:24:53 +02:00 · 4f722e96a9
commit 4f722e96a9
parent 7e1530b742
7 changed files with 65 additions and 21 deletions
--- a/README.md
+++ b/README.md
@ -1,14 +1,18 @@
 ![Thorsten - Open German Voice Dataset](./img/ThorstenVoice_Logo_Small.png "Thorsten - Open German Voice Dataset")

 - [Introduction to "Thorsten-Voice" :speaking_head: :speech_balloon: :sloth:](#introduction-to-thorsten-voice-speaking_head-speech_balloon-sloth)
-  - [**A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.**](#a-free-to-use-offline-working-high-quality-german-tts-voice-should-be-available-for-every-project-without-any-license-struggling)
-  - [True, but what is this all about](#true-but-what-is-this-all-about)
- [Please read some personal words before using dataset / TTS models](#please-read-some-personal-words-before-using-dataset--tts-models)
-  - [Dataset "Thorsten"](#dataset-thorsten)
-    - [Samples of my voice](#samples-of-my-voice)
-    - [Dataset information :microphone:](#dataset-information-microphone)
-    - [Dataset evolution](#dataset-evolution)
-    - [Download information](#download-information)
+  
+- [A personal note](#please-read-some-personal-words-before-using-dataset--tts-models)
+
+- [Voice "Thorsten" (neutral)](#dataset-thorsten-neutral)
+  - [Samples of my original voice](#samples-of-my-voice)
+  - [Dataset information :microphone:](#dataset-information-microphone)
+  - [Dataset evolution / changelog](#dataset-evolution)
+  - [Download information](#download-information)
+
+- [Voice "Thorsten" (emotional)](#Dataset "Thorsten" (**emotional**))
+
+
  - [Pretrained TTS models](#pretrained-tts-models)
    - [Trained models](#trained-models)
    - [Pre-trained Silero-models](#pre-trained-silero-models)
@ -27,8 +31,9 @@ Speaking tech devices and voice based smart assistants are very popular ourdays.
 > I want to (*hopefully*) fill that german TTS gap and make the most personal contribution i can give.<br>
 **I contribute my personal voice!** :green_heart:

-This contribution is split into two parts:
-* "Thorsten" dataset
+## This contribution is split into three parts:
+* "Thorsten" **neutral** dataset
+* "Thorsten" **emotional** dataset
 * Pretrained TTS models based on "Thorsten" dataset

 # Please read some personal words before using dataset / TTS models
@ -38,9 +43,13 @@ This contribution is split into two parts:

 **tl;dr** Please don't use for evil!

-## Dataset "Thorsten"
-> Please keep in mind that **i am no professional voice artists**. I'm just a normal guy sharing his voice with you.
-### Samples of my voice
+# Datasets
+
+
+> For both datasets please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with you.
+
+## Dataset "Thorsten" neutral
+### Samples of my neutral voice
 To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.

 * [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
@ -51,7 +60,6 @@ To get an impression what my voice sounds to decide if it fits to your project i
 * [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
 * [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )

-
 ### Dataset information :microphone:

 * ljspeech-1.1 structure
@ -83,7 +91,7 @@ As described in the pdf document ([evolution of thorsten dataset](./EvolutionOfT
 If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.


-### Download information
+### Download information (**neutral dataset**)
 > Download size: 2,7GB

 | Version         | Description                                                                                       | Date       | Link                                                                                                            |
@ -93,15 +101,49 @@ If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you
 | thorsten-de-v03 | Based on v02 dataset, but with increased speed by 10% (using ffmpeg atempo=1.1).                  | 2021-02-10 | [Google Drive Download v03](https://drive.google.com/file/d/134_UramfCRoAxRrOnhbPJ2YHHTwxRtr-/view?usp=sharing) |


-## Pretrained TTS models
+## Dataset "Thorsten" (**emotional**)
+### Samples of my emotional voice
+*Btw. i mentioned, that i'm no professional voice talent, did i?*
+> "Mist, wieder nichts geschafft."
+* [neutral](./samples/emotional_recording/neutral.wav)
+* [disgusted](./samples/emotional_recording/disgusted.wav)
+* [angry](./samples/emotional_recording/angry.wav)
+* [amused](./samples/emotional_recording/amused.wav)
+* [surprised](./samples/emotional_recording/surprised.wav)
+* [sleepy](./samples/emotional_recording/sleepy.wav)
+### Emotional dataset information :microphone:
+* 300 sentences * 6 emotions = 1.800 recordings
+* recorded by Thorsten Müller (optimized by Dominik Kreutz)
+* mono
+* samplerate 22.050Hz
+* normalized to -24dB
+* no silence at beginning/ending
+* sentence length: 59 - 148 chars
+
+| Emotion   | Minutes |
+|-----------|---------|
+| Normal :slightly_smiling_face:    | 19 min. |
+| Disgusted :nauseated_face: | 23 min. |
+| Angry :angry:    | 20 min. |
+| Amused :grinning:    | 18 min. |
+| Surprised :astonished: | 18 min. |
+| Sleepy :pensive:    | 30 min. |
+
+### Download **emotional** dataset
+> Download size: 300MB
+
+| Version         | Description                                                                                       | Date       | Link                                                                                                            |
+| --------------- | ------------------------------------------------------------------------------------------------- | ---------- | --------------------------------------------------------------------------------------------------------------- |
+| thorsten-de-emotional-v01 | Initial version                                                                                   | 2021-04-03 | [Google Drive Download v01](https://drive.google.com/file/d/1fm2IqXMLr6jaZCgG_Mt4vq_O3ZubiIQ6/view?usp=sharing) |
+
+
+# Pretrained TTS models
 If you trained a model on "Thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated.

 My personal training sessions are based on TTS repo code (originally initiated by Mozilla) and now maintained through https://www.coqui.ai (:frog:)
-
-### Trained models
-> training in progress. Available models will be listed here in future.
-
-### Pre-trained Silero-models
+## Coqui models
+todo
+## Silero-models

 You can use a free A-GPL licensed models trained on this dataset via the [silero-models](https://github.com/snakers4/silero-models) project. The full list of models including their older version is available via this [yaml file](https://github.com/snakers4/silero-models/blob/master/models.yml).

@ -111,6 +153,7 @@ You can use a free A-GPL licensed models trained on this dataset via the [silero
 | thorsten_16khz | m      | de       | [8000](https://drive.google.com/drive/folders/1mpQCK5E_IqhcSurnYuGePJiJWL4ZL08z?usp=sharing) / [16000](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |


+
 # Feel free to file an issue if you ...
 * have improvements on dataset
 * use my TTS voice in your project(s)
@ -156,3 +199,4 @@ Thank you Dominik (@domcross / https://github.com/domcross/)
 We'll hear us in future :speaking_head:

 Thorsten
+(https://twitter.com/ThorstenVoice)
--- a/samples/emotional_recording/amused.wav
+++ b/samples/emotional_recording/amused.wav
--- a/samples/emotional_recording/angry.wav
+++ b/samples/emotional_recording/angry.wav
--- a/samples/emotional_recording/disgusted.wav
+++ b/samples/emotional_recording/disgusted.wav
--- a/samples/emotional_recording/neutral.wav
+++ b/samples/emotional_recording/neutral.wav
--- a/samples/emotional_recording/sleepy.wav
+++ b/samples/emotional_recording/sleepy.wav
--- a/samples/emotional_recording/surprised.wav
+++ b/samples/emotional_recording/surprised.wav