mirror of
https://github.com/thorstenMueller/Thorsten-Voice.git
synced 2025-02-17 18:20:55 +01:00
First samples
This commit is contained in:
parent
a77b919eaf
commit
84f791f106
@ -1,67 +1,56 @@
|
||||
> english version below
|
||||
# Vocoder Vergleich auf Basis des "thorsten" Tacotron 2 Modells
|
||||
Hier sind Hörproben mit unterschiedlichen Vocodern. Alle gesprochenen Texte (Sample 1 - 4) basieren auf Aufnahmen im Dataset, jedoch nicht auf dem Spektogramm von "ground truth", sondern auf Basis des trainierten Tacotron 2 Modells. Sample 5 ist der Beginn des Märchens "Der Froschkönig" und wurde nicht für das Dataset aufgezeichnet.
|
||||
|
||||
## Sätze
|
||||
* **Sample #01**: Eure Schoko-Bonbons sind sagenhaft lecker!
|
||||
* **Sample #02**: Eure Tröte nervt.
|
||||
* **Sample #03**: Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet.
|
||||
* **Sample #04**: Euer Plan hat ja toll geklappt.
|
||||
* *Sample #05: "In den alten Zeiten, wo das Wünschen noch geholfen hat, lebte ein König, dessen Töchter waren alle schön ..." (Anfang vom "Froschkönig")*
|
||||
|
||||
# Ground truth
|
||||
Originalaufnahmen aus dem "thorsten" Dataset.
|
||||
|
||||
# Audio vergleich
|
||||
<dl>
|
||||
<ul>
|
||||
<li>Sample #01 - Eure Schoko-Bonbons sind sagenhaft lecker!: <audio controls="" preload="none"><source src="../samples/sample01-gt.wav"></audio></li>
|
||||
<li>Sample #02 - Eure Tröte nervt.: <audio controls="" preload="none"><source src="../samples/sample02-gt.wav"></audio></li>
|
||||
<li>Sample #03 - Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet.: <audio controls="" preload="none"><source src="../samples/sample03-gt.wav"></audio></li>
|
||||
<li>Sample #04 - Euer Plan hat ja toll geklappt.: <audio controls="" preload="none"><source src="../samples/sample04-gt.wav"></audio></li>
|
||||
</ul>
|
||||
</dl>
|
||||
|
||||
<table width="80%">
|
||||
<thead>
|
||||
<tr>
|
||||
<th >Vocoder</th>
|
||||
<th >RealTimeFactor</th>
|
||||
<th >Sample #1</th>
|
||||
<th >Sample #2</th>
|
||||
<th >Sample #3</th>
|
||||
<th >Sample #4</th>
|
||||
<th >Details</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td >Ground truth</td>
|
||||
<td > --- </td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td >Griffin lim</td>
|
||||
<td >0.5</td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td >ParallelWaveGAN</td>
|
||||
<td >0.6</td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td >WaveGrad</td>
|
||||
<td >0.7</td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td >HifiGAN</td>
|
||||
<td >1.5</td>
|
||||
<td ><audio controls="" preload="none"><source src="./sample.wav"></audio></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
<td ></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
</dl>
|
||||
|
||||
# Griffin Lim
|
||||
> Details zum Model: (todo: link)
|
||||
> Tacotron2 + DDC: 460k Schritte trainiert
|
||||
|
||||
# ParallelWaveGAN
|
||||
> Tacotron2 + DDC: 360k Schritte trainiert
|
||||
> PWGAN Vocoder: 925k Schritte trainiert
|
||||
|
||||
<dl>
|
||||
<ul>
|
||||
<li>Sample #01 - Eure Schoko-Bonbons sind sagenhaft lecker!:
|
||||
<audio controls="" preload="none"><source src="../samples/sample01-pwgan.wav"></audio></li>
|
||||
|
||||
<li>Sample #02 - Eure Tröte nervt.:
|
||||
<audio controls="" preload="none"><source src="../samples/sample02-pwgan.wav"></audio></li>
|
||||
|
||||
<li>Sample #03 - Europa und Asien zusammengenommen wird auch als Eurasien bezeichnet.:
|
||||
<audio controls="" preload="none"><source src="../samples/sample03-pwgan.wav"></audio></li>
|
||||
|
||||
<li>Sample #04 - Euer Plan hat ja toll geklappt.:
|
||||
<audio controls="" preload="none"><source src="../samples/sample04-pwgan.wav"></audio></li>
|
||||
|
||||
<li>Sample #05 - Anfang vom Froschkönig:
|
||||
<audio controls="" preload="none"><source src="../samples/sample04-pwgan.wav"></audio></li>
|
||||
</ul>
|
||||
</dl>
|
||||
|
||||
|
||||
# WaveGrad
|
||||
> todo
|
||||
|
||||
# HifiGAN
|
||||
> todo
|
||||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
samples/sample01-gt.wav
Normal file
BIN
samples/sample01-gt.wav
Normal file
Binary file not shown.
BIN
samples/sample01-pwgan.wav
Normal file
BIN
samples/sample01-pwgan.wav
Normal file
Binary file not shown.
BIN
samples/sample02-gt.wav
Normal file
BIN
samples/sample02-gt.wav
Normal file
Binary file not shown.
BIN
samples/sample02-pwgan.wav
Normal file
BIN
samples/sample02-pwgan.wav
Normal file
Binary file not shown.
BIN
samples/sample03-gt.wav
Normal file
BIN
samples/sample03-gt.wav
Normal file
Binary file not shown.
BIN
samples/sample03-pwgan.wav
Normal file
BIN
samples/sample03-pwgan.wav
Normal file
Binary file not shown.
BIN
samples/sample04-gt.wav
Normal file
BIN
samples/sample04-gt.wav
Normal file
Binary file not shown.
BIN
samples/sample04-pwgan.wav
Normal file
BIN
samples/sample04-pwgan.wav
Normal file
Binary file not shown.
BIN
samples/sample05-pwgan.wav
Normal file
BIN
samples/sample05-pwgan.wav
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user