mirror of
https://github.com/thorstenMueller/Thorsten-Voice.git
synced 2024-11-21 23:43:12 +01:00
Adding info on emotional dataset.
This commit is contained in:
parent
7e1530b742
commit
4f722e96a9
86
README.md
86
README.md
@ -1,14 +1,18 @@
|
||||
![Thorsten - Open German Voice Dataset](./img/ThorstenVoice_Logo_Small.png "Thorsten - Open German Voice Dataset")
|
||||
|
||||
- [Introduction to "Thorsten-Voice" :speaking_head: :speech_balloon: :sloth:](#introduction-to-thorsten-voice-speaking_head-speech_balloon-sloth)
|
||||
- [**A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.**](#a-free-to-use-offline-working-high-quality-german-tts-voice-should-be-available-for-every-project-without-any-license-struggling)
|
||||
- [True, but what is this all about](#true-but-what-is-this-all-about)
|
||||
- [Please read some personal words before using dataset / TTS models](#please-read-some-personal-words-before-using-dataset--tts-models)
|
||||
- [Dataset "Thorsten"](#dataset-thorsten)
|
||||
- [Samples of my voice](#samples-of-my-voice)
|
||||
- [Dataset information :microphone:](#dataset-information-microphone)
|
||||
- [Dataset evolution](#dataset-evolution)
|
||||
- [Download information](#download-information)
|
||||
|
||||
- [A personal note](#please-read-some-personal-words-before-using-dataset--tts-models)
|
||||
|
||||
- [Voice "Thorsten" (neutral)](#dataset-thorsten-neutral)
|
||||
- [Samples of my original voice](#samples-of-my-voice)
|
||||
- [Dataset information :microphone:](#dataset-information-microphone)
|
||||
- [Dataset evolution / changelog](#dataset-evolution)
|
||||
- [Download information](#download-information)
|
||||
|
||||
- [Voice "Thorsten" (emotional)](#Dataset "Thorsten" (**emotional**))
|
||||
|
||||
|
||||
- [Pretrained TTS models](#pretrained-tts-models)
|
||||
- [Trained models](#trained-models)
|
||||
- [Pre-trained Silero-models](#pre-trained-silero-models)
|
||||
@ -27,8 +31,9 @@ Speaking tech devices and voice based smart assistants are very popular ourdays.
|
||||
> I want to (*hopefully*) fill that german TTS gap and make the most personal contribution i can give.<br>
|
||||
**I contribute my personal voice!** :green_heart:
|
||||
|
||||
This contribution is split into two parts:
|
||||
* "Thorsten" dataset
|
||||
## This contribution is split into three parts:
|
||||
* "Thorsten" **neutral** dataset
|
||||
* "Thorsten" **emotional** dataset
|
||||
* Pretrained TTS models based on "Thorsten" dataset
|
||||
|
||||
# Please read some personal words before using dataset / TTS models
|
||||
@ -38,9 +43,13 @@ This contribution is split into two parts:
|
||||
|
||||
**tl;dr** Please don't use for evil!
|
||||
|
||||
## Dataset "Thorsten"
|
||||
> Please keep in mind that **i am no professional voice artists**. I'm just a normal guy sharing his voice with you.
|
||||
### Samples of my voice
|
||||
# Datasets
|
||||
|
||||
|
||||
> For both datasets please keep in mind, that **i am no professional voice talent**. I'm just a normal guy sharing his voice with you.
|
||||
|
||||
## Dataset "Thorsten" neutral
|
||||
### Samples of my neutral voice
|
||||
To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.
|
||||
|
||||
* [Das Teilen eines Benutzerkontos ist strengstens untersagt.](./samples/original_recording/recorded_sample_01.wav )
|
||||
@ -51,7 +60,6 @@ To get an impression what my voice sounds to decide if it fits to your project i
|
||||
* [Jede gute Küchenwaage hat eine Tara-Funktion.](./samples/original_recording/recorded_sample_06.wav )
|
||||
* [Jeden Gedanken kannst du hier loswerden.](./samples/original_recording/recorded_sample_07.wav )
|
||||
|
||||
|
||||
### Dataset information :microphone:
|
||||
|
||||
* ljspeech-1.1 structure
|
||||
@ -83,7 +91,7 @@ As described in the pdf document ([evolution of thorsten dataset](./EvolutionOfT
|
||||
If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you can see which files belong to which recording phase in [recording quality](./RecordingQuality.csv) csv file.
|
||||
|
||||
|
||||
### Download information
|
||||
### Download information (**neutral dataset**)
|
||||
> Download size: 2,7GB
|
||||
|
||||
| Version | Description | Date | Link |
|
||||
@ -93,15 +101,49 @@ If you wanna use just a dataset subset (phase1 and/or phase2 and/or phase3) you
|
||||
| thorsten-de-v03 | Based on v02 dataset, but with increased speed by 10% (using ffmpeg atempo=1.1). | 2021-02-10 | [Google Drive Download v03](https://drive.google.com/file/d/134_UramfCRoAxRrOnhbPJ2YHHTwxRtr-/view?usp=sharing) |
|
||||
|
||||
|
||||
## Pretrained TTS models
|
||||
## Dataset "Thorsten" (**emotional**)
|
||||
### Samples of my emotional voice
|
||||
*Btw. i mentioned, that i'm no professional voice talent, did i?*
|
||||
> "Mist, wieder nichts geschafft."
|
||||
* [neutral](./samples/emotional_recording/neutral.wav)
|
||||
* [disgusted](./samples/emotional_recording/disgusted.wav)
|
||||
* [angry](./samples/emotional_recording/angry.wav)
|
||||
* [amused](./samples/emotional_recording/amused.wav)
|
||||
* [surprised](./samples/emotional_recording/surprised.wav)
|
||||
* [sleepy](./samples/emotional_recording/sleepy.wav)
|
||||
### Emotional dataset information :microphone:
|
||||
* 300 sentences * 6 emotions = 1.800 recordings
|
||||
* recorded by Thorsten Müller (optimized by Dominik Kreutz)
|
||||
* mono
|
||||
* samplerate 22.050Hz
|
||||
* normalized to -24dB
|
||||
* no silence at beginning/ending
|
||||
* sentence length: 59 - 148 chars
|
||||
|
||||
| Emotion | Minutes |
|
||||
|-----------|---------|
|
||||
| Normal :slightly_smiling_face: | 19 min. |
|
||||
| Disgusted :nauseated_face: | 23 min. |
|
||||
| Angry :angry: | 20 min. |
|
||||
| Amused :grinning: | 18 min. |
|
||||
| Surprised :astonished: | 18 min. |
|
||||
| Sleepy :pensive: | 30 min. |
|
||||
|
||||
### Download **emotional** dataset
|
||||
> Download size: 300MB
|
||||
|
||||
| Version | Description | Date | Link |
|
||||
| --------------- | ------------------------------------------------------------------------------------------------- | ---------- | --------------------------------------------------------------------------------------------------------------- |
|
||||
| thorsten-de-emotional-v01 | Initial version | 2021-04-03 | [Google Drive Download v01](https://drive.google.com/file/d/1fm2IqXMLr6jaZCgG_Mt4vq_O3ZubiIQ6/view?usp=sharing) |
|
||||
|
||||
|
||||
# Pretrained TTS models
|
||||
If you trained a model on "Thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated.
|
||||
|
||||
My personal training sessions are based on TTS repo code (originally initiated by Mozilla) and now maintained through https://www.coqui.ai (:frog:)
|
||||
|
||||
### Trained models
|
||||
> training in progress. Available models will be listed here in future.
|
||||
|
||||
### Pre-trained Silero-models
|
||||
## Coqui models
|
||||
todo
|
||||
## Silero-models
|
||||
|
||||
You can use a free A-GPL licensed models trained on this dataset via the [silero-models](https://github.com/snakers4/silero-models) project. The full list of models including their older version is available via this [yaml file](https://github.com/snakers4/silero-models/blob/master/models.yml).
|
||||
|
||||
@ -111,6 +153,7 @@ You can use a free A-GPL licensed models trained on this dataset via the [silero
|
||||
| thorsten_16khz | m | de | [8000](https://drive.google.com/drive/folders/1mpQCK5E_IqhcSurnYuGePJiJWL4ZL08z?usp=sharing) / [16000](https://drive.google.com/drive/folders/1tR6w4kgRS2JJ1TWZhwoFuU04Xkgo6YAs?usp=sharing) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
|
||||
|
||||
|
||||
|
||||
# Feel free to file an issue if you ...
|
||||
* have improvements on dataset
|
||||
* use my TTS voice in your project(s)
|
||||
@ -156,3 +199,4 @@ Thank you Dominik (@domcross / https://github.com/domcross/)
|
||||
We'll hear us in future :speaking_head:
|
||||
|
||||
Thorsten
|
||||
(https://twitter.com/ThorstenVoice)
|
||||
|
BIN
samples/emotional_recording/amused.wav
Normal file
BIN
samples/emotional_recording/amused.wav
Normal file
Binary file not shown.
BIN
samples/emotional_recording/angry.wav
Normal file
BIN
samples/emotional_recording/angry.wav
Normal file
Binary file not shown.
BIN
samples/emotional_recording/disgusted.wav
Normal file
BIN
samples/emotional_recording/disgusted.wav
Normal file
Binary file not shown.
BIN
samples/emotional_recording/neutral.wav
Normal file
BIN
samples/emotional_recording/neutral.wav
Normal file
Binary file not shown.
BIN
samples/emotional_recording/sleepy.wav
Normal file
BIN
samples/emotional_recording/sleepy.wav
Normal file
Binary file not shown.
BIN
samples/emotional_recording/surprised.wav
Normal file
BIN
samples/emotional_recording/surprised.wav
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user