Thorsten Müller a74179c72a

2020-08-09 11:39:15 +02:00

6.2 KiB

Raw Blame History

Introduction

Many smart voice assistants like Amazon Alexa, Google Home, Apple Siri and Microsoft Cortana use cloud services to offer their (base) functionality.

As some people have privacy concerns using these services there are some (open source) projects trying to build offline and/or privacy aware alternatives.

But speech recognition and text synthesis still requires cloud services for providing these in a decent quality.

MyCroft AI

https://mycroft.ai/

MyCroft is a company developing an opensource voice assistant with a very nice and active community. But the stt/tts parts are still cloud based (eg. google services), even if requests are anonymized by a mycroft proxy in between. But integration with locally hosted services such as deepspeech (stt) or mimic/tacotron (tts) is possible.

Mozilla

Mozilla works on these really important aspects for free and open human machine voice interaction.

STT - speech to text

https://commonvoice.mozilla.org/

"STT" needs lots of audio training data by many speakers (women/men/kids) of all ages, dialects and in various audio quality levels. So any voice contribution for common voice project is highly welcome.

TTS - text to speech

https://github.com/mozilla/tts

"TTS" needs lots of clean recordings by one speaker to train a model. Mozilla is developing a software stack for proper model training based on tacotron2 papers.

And?!

I want to make the most personal contribution i can give and contribute my personal voice (german) for TTS training to the community for free usage.

Please read some personal words before downloading the dataset

I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone.

So hopefully my voice is used in this manner to make this world a better place for all of us :-).

tl;dr Please don't use for evil!

Dataset "thorsten"

Samples of my voice

To get an impression what my voice sounds to decide if it fits to your project i published some sample recordings, so no need to download complete dataset first.

Dataset information

ljspeech-1.1 structure
22.668 recorded phrases (wav files)
more than 23 hours of pure audio
samplerate 22.050Hz
mono
phrase length (min/avg/max): 2 / 52 / 180 chars
no silence at beginning/ending
avg spoken chars per second: 14
sentences with question mark: 2.780
sentences with exclamation mark: 1.840

Interested in evolution of this dataset? See following pdf document (evolution of thorsten dataset )

Download information

https://drive.google.com/file/d/1yKJM1LAOQpRVojKunD9r8WN_p5KzBxjc/view?usp=sharing

Download size: 2,7GB

Trained tacotron2 model "thorsten"

Training is currently in progress.

If you trained a model on "thorsten" dataset please file an issue with some information on it. Sharing a trained model is highly appreciated.

Trained models (with at least acceptable) quality

Inside the "models" (sub)folders are configs and Dockerfiles for a specific training from scratch.

Thanks to @erogol and @repodiac for brining in idea/code for script/container files.

Folder	Date	Branch (Mozilla TTS repo)	Description
thorsten-taco2-v0.0.1	august 2020	dev	pure taco2 training without vocoder
thorsten-taco2-v0.0.2	to do	to do	to do

Feel free to file an issue if you ...

have improvements on dataset
use my TTS voice in your project(s)
want to share your trained "thorsten" model
get to know about any abuse usage of my voice

Special thanks

I want to thank all open source communities for providing great projects.

Just to name some nice guys who joined me on this tts-roadtrip:

eltocino (https://github.com/el-tocino/)
erogol (https://github.com/erogol/)
gras64 (https://github.com/gras64/)
krisgesling (https://github.com/krisgesling/)
nmstoker (https://github.com/nmstoker)
othiele (https://discourse.mozilla.org/u/othiele/summary)
repodiac (https://github.com/repodiac)

And last but not least i want to say a huge thank you to a special guy who supported me on this journey right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally his gpu computing power.

Without his amazing support this dataset (in it's current way) would not exists.

Thank you Dominik (@domcross / https://github.com/domcross/)

Links

We'll hear us in future :-)

Thorsten

6.2 KiB Raw Blame History