mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-01-14 01:48:32 +01:00
talk.wasm : final touches
This commit is contained in:
parent
9aea96f774
commit
326573de9a
@ -1,8 +1,8 @@
|
||||
# talk.wasm
|
||||
|
||||
Talk with an Artificial Intelligence entity in your browser:
|
||||
Talk with an Artificial Intelligence in your browser:
|
||||
|
||||
https://user-images.githubusercontent.com/1991296/202914175-115793b1-d32e-4aaa-a45b-59e313707ff6.mp4
|
||||
https://user-images.githubusercontent.com/1991296/203411580-fedb4839-05e4-4474-8364-aaf1e9a9b615.mp4
|
||||
|
||||
Online demo: https://talk.ggerganov.com
|
||||
|
||||
@ -14,13 +14,12 @@ This demo leverages 2 modern neural network models to create a high-quality voic
|
||||
- Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
|
||||
- The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
|
||||
|
||||
The web page does the processing locally on your machine. However, in order to run the models, it first needs to
|
||||
download the model data which is about ~350 MB. The model data is then cached in your browser's cache and can be reused
|
||||
in future visits without downloading it again.
|
||||
The web page does the processing locally on your machine. The processing of these heavy neural network models in the
|
||||
browser is possible by implementing them efficiently in C/C++ and using the browser's WebAssembly SIMD capabilities for
|
||||
extra performance. For more detailed information, checkout the [current repository](https://github.com/ggerganov/whisper.cpp).
|
||||
|
||||
The processing of these heavy neural network models in the browser is possible by implementing them efficiently in C/C++
|
||||
and using WebAssembly SIMD capabilities for extra performance. For more detailed information, checkout the
|
||||
[current repository](https://github.com/ggerganov/whisper.cpp).
|
||||
In order to run the models, the web page first needs to download the model data which is about ~350 MB. The model data
|
||||
is then cached in your browser's cache and can be reused in future visits without downloading it again.
|
||||
|
||||
## Requirements
|
||||
|
||||
@ -37,8 +36,13 @@ Also, the prompting strategy can likely be improved to achieve better results.
|
||||
The demo is quite computationally heavy - it's not usual to run these transformer models in a browser. Typically, they
|
||||
run on powerful GPU hardware. So for better experience, you do need to have a powerful computer.
|
||||
|
||||
Probably in the near future, mobile browsers will start to support the WASM SIMD capabilities and this will allow
|
||||
to run the demo on your phone or tablet. But for now it seems to be not supported (at least on iPhone).
|
||||
Probably in the near future, mobile browsers will start supporting WASM SIMD. This will allow to run the demo on your
|
||||
phone or tablet. But for now this functionality is not supported on mobile devices (at least not on iPhone).
|
||||
|
||||
## Todo
|
||||
|
||||
- Better UI (contributions are welcome)
|
||||
- Better GPT-2 prompting
|
||||
|
||||
## Feedback
|
||||
|
||||
|
@ -137,6 +137,16 @@
|
||||
<li>Your browser supports WASM <a href="https://webassembly.org/roadmap/">Fixed-width SIMD</a></li>
|
||||
</ul>
|
||||
|
||||
Note that these neural network models were not meant to be used in a browser, so the performance and <br>
|
||||
quality of the results may not be optimal. If you have any questions or suggestions, checkout the following
|
||||
<a href="https://github.com/ggerganov/whisper.cpp/discussions/167">discussion</a>.
|
||||
|
||||
<br><br>
|
||||
|
||||
Here is a short video of the demo in action: <a href="https://youtu.be/2om-7tFMaNs">https://youtu.be/2om-7tFMaNs</a>
|
||||
|
||||
<br><br>
|
||||
|
||||
<div class="cell-version">
|
||||
<span>
|
||||
|
|
||||
@ -230,6 +240,8 @@
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
onPromptChange();
|
||||
}
|
||||
};
|
||||
|
||||
@ -487,6 +499,7 @@
|
||||
doRecording = false;
|
||||
audio0 = null;
|
||||
audio = null;
|
||||
context = null;
|
||||
}
|
||||
|
||||
function startRecording() {
|
||||
@ -519,6 +532,9 @@
|
||||
reader.onload = function(event) {
|
||||
var buf = new Uint8Array(reader.result);
|
||||
|
||||
if (!context) {
|
||||
return;
|
||||
}
|
||||
context.decodeAudioData(buf.buffer, function(audioBuffer) {
|
||||
var offlineContext = new OfflineAudioContext(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate);
|
||||
var source = offlineContext.createBufferSource();
|
||||
@ -695,9 +711,9 @@ I'm fine, thanks. How are you?\n\
|
||||
Thanks, I'm fine too. What are you doing?\n\
|
||||
I'm just sitting here.\n\
|
||||
It's a lovely day, isn't it?\n\
|
||||
Yes, it is.\n\
|
||||
Did you know that I'm a robot?\n\
|
||||
I wasn't aware of that.\n";
|
||||
Yes, it is. I love the weather this time of year.\n\
|
||||
I wish it would rain a little bit.\n\
|
||||
Me too.\n";
|
||||
break;
|
||||
case '1':
|
||||
// Robot
|
||||
|
Loading…
Reference in New Issue
Block a user