Adding info on emotional dataset.

This commit is contained in:
Thorsten Mueller 2021-04-03 23:24:53 +02:00
parent 7e1530b742
commit 4f722e96a9
7 changed files with 65 additions and 21 deletions

View File

@ -1,14 +1,18 @@
![Thorsten - Open German Voice Dataset](./img/ThorstenVoice_Logo_Small.png "Thorsten - Open German Voice Dataset")
- [Introduction to "Thorsten-Voice" :speaking_head: :speech_balloon: :sloth:](#introduction-to-thorsten-voice-speaking_head-speech_balloon-sloth)
- [**A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.**](#a-free-to-use-offline-working-high-quality-german-tts-voice-should-be-available-for-every-project-without-any-license-struggling)
- [True, but what is this all about](#true-but-what-is-this-all-about)
- [Please read some personal words before using dataset / TTS models](#please-read-some-personal-words-before-using-dataset--tts-models)
- [Dataset "Thorsten"](#dataset-thorsten)
- [Samples of my voice](#samples-of-my-voice)
- [Dataset information :microphone:](#dataset-information-microphone)
- [Dataset evolution](#dataset-evolution)
- [Download information](#download-information)
- [A personal note](#please-read-some-personal-words-before-using-dataset--tts-models)
- [Voice "Thorsten" (neutral)](#dataset-thorsten-neutral)
- [Samples of my original voice](#samples-of-my-voice)
- [Dataset information :microphone:](#dataset-information-microphone)
- [Dataset evolution / changelog](#dataset-evolution)
- [Download information](#download-information)
- [Voice "Thorsten" (emotional)](#Dataset "Thorsten" (**emotional**))
- [Pretrained TTS models](#pretrained-tts-models)
- [Trained models](#trained-models)
- [Pre-trained Silero-models](#pre-trained-silero-models)
@ -27,8 +31,9 @@ Speaking tech devices and voice based smart assistants are very popular ourdays.
> I want to (*hopefully*) fill that german TTS gap and make the most personal contribution i can give.<br>
**I contribute my personal voice!** :green_heart:
This contribution is split into two parts:
* "Thorsten" dataset
## This contribution is split into three parts:
* "Thorsten" **neutral** dataset
* "Thorsten" **emotional** dataset
* Pretrained TTS models based on "Thorsten" dataset
# Please read some personal words before using dataset / TTS models
@ -38,9 +43,13 @@ This contribution is split into two parts:
**tl;dr** Please don't use for evil!
## Dataset "Thorsten"
> Please keep in mind that **i am no professional voice artists**. I'm just a normal guy sharing his voice with you.
### Samples of my voice
# Datasets
> For both datasets please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with you.
## Dataset "Thorsten" neutral
### Samples of my neutral voice
To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.
* [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
@ -51,7 +60,6 @@ To get an impression what my voice sounds to decide if it fits to your project i
* [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
* [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )
### Dataset information :microphone:
* ljspeech-1.1 structure
@ -83,7 +91,7 @@ As described in the pdf document ([evolution of thorsten dataset](./EvolutionOfT
If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
### Download information
### Download information (**neutral dataset**)
> Download size: 2,7GB
| Version | Description | Date | Link |
@ -93,15 +101,49 @@ If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you
| thorsten-de-v03 | Based on v02 dataset, but with increased speed by 10% (using ffmpeg atempo=1.1). | 2021-02-10 | [Google Drive Download v03](https://drive.google.com/file/d/134_UramfCRoAxRrOnhbPJ2YHHTwxRtr-/view?usp=sharing) |
## Pretrained TTS models
## Dataset "Thorsten" (**emotional**)
### Samples of my emotional voice
*Btw. i mentioned, that i'm no professional voice talent, did i?*
> "Mist, wieder nichts geschafft."
* [neutral](./samples/emotional_recording/neutral.wav)
* [disgusted](./samples/emotional_recording/disgusted.wav)
* [angry](./samples/emotional_recording/angry.wav)
* [amused](./samples/emotional_recording/amused.wav)
* [surprised](./samples/emotional_recording/surprised.wav)
* [sleepy](./samples/emotional_recording/sleepy.wav)
### Emotional dataset information :microphone:
* 300 sentences * 6 emotions = 1.800 recordings
* recorded by Thorsten Müller (optimized by Dominik Kreutz)
* mono
* samplerate 22.050Hz
* normalized to -24dB
* no silence at beginning/ending
* sentence length: 59 - 148 chars
| Emotion | Minutes |
|-----------|---------|
| Normal :slightly_smiling_face: | 19 min. |
| Disgusted :nauseated_face: | 23 min. |
| Angry :angry: | 20 min. |
| Amused :grinning: | 18 min. |
| Surprised :astonished: | 18 min. |
| Sleepy :pensive: | 30 min. |
### Download **emotional** dataset
> Download size: 300MB
| Version | Description | Date | Link |
| --------------- | ------------------------------------------------------------------------------------------------- | ---------- | --------------------------------------------------------------------------------------------------------------- |
| thorsten-de-emotional-v01 | Initial version | 2021-04-03 | [Google Drive Download v01](https://drive.google.com/file/d/1fm2IqXMLr6jaZCgG_Mt4vq_O3ZubiIQ6/view?usp=sharing) |
# Pretrained TTS models
If you trained a model on "Thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated.
My personal training sessions are based on TTS repo code (originally initiated by Mozilla) and now maintained through https://www.coqui.ai (:frog:)
### Trained models
> training in progress. Available models will be listed here in future.
### Pre-trained Silero-models
## Coqui models
todo
## Silero-models
You can use a free A-GPL licensed models trained on this dataset via the [silero-models](https://github.com/snakers4/silero-models) project. The full list of models including their older version is available via this [yaml file](https://github.com/snakers4/silero-models/blob/master/models.yml).
@ -111,6 +153,7 @@ You can use a free A-GPL licensed models trained on this dataset via the [silero
| thorsten_16khz | m | de | [8000](https://drive.google.com/drive/folders/1mpQCK5E_IqhcSurnYuGePJiJWL4ZL08z?usp=sharing) / [16000](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
# Feel free to file an issue if you ...
* have improvements on dataset
* use my TTS voice in your project(s)
@ -156,3 +199,4 @@ Thank you Dominik (@domcross / https://github.com/domcross/)
We'll hear us in future :speaking_head:
Thorsten
(https://twitter.com/ThorstenVoice)

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.