Update README.md and finalize the whisper.wasm example

2025-08-17 14:52:00 +02:00 · 2022-10-22 18:17:08 +03:00
parent 491ecd7056
commit 6b45e37b2b
7 changed files with 39 additions and 6 deletions
--- a/examples/whisper.wasm/README.md
+++ b/examples/whisper.wasm/README.md
@ -1,3 +1,27 @@
 # whisper.wasm

-Live demo: https://whisper.ggerganov.com
+Inference of [OpenAI's Whisper ASR model](https://github.com/openai/whisper) inside the browser
+
+This example uses a WebAssembly (WASM) port of the [whisper.cpp](https://github.com/ggerganov/whisper.cpp)
+implementation of the transformer to run the inference inside a web page. The audio data does not leave your computer -
+it is processed locally on your machine. The performance is not great but you should be able to achieve x2 or x3
+real-time for the `tiny` and `base` models on a modern CPU and browser (i.e. transcribe a 60 seconds audio in about
+~20-30 seconds).
+
+This WASM port utilizes [WASM SIMD 128-bit intrinsics](https://emcc.zcopy.site/docs/porting/simd/) so you have to make
+sure that [your browser supports them](https://webassembly.org/roadmap/).
+
+The example is capable of running all models up to size `small` inclusive. Beyond that, the memory requirements and
+performance are unsatisfactory. The implementation currently support only the `Greedy` sampling strategy. Both
+transcription and translation are supported.
+
+Since the model data is quite big (74MB for the `tiny` model) you need to manually load the model into the web-page.
+
+The example supports both loading audio from a file and recording audio from the microphone. The maximum length of the
+audio is limited to 120 seconds.
+
+## Live demo
+
+Link: https://whisper.ggerganov.com
+
+![image](https://user-images.githubusercontent.com/1991296/197348344-1a7fead8-3dae-4922-8b06-df223a206603.png)
--- a/examples/whisper.wasm/index-tmpl.html
+++ b/examples/whisper.wasm/index-tmpl.html
@ -162,7 +162,7 @@
                </tr>
            </table>

-            <br><br>
+            <br>

            <!-- textarea with height filling the rest of the page -->
            <textarea id="output" rows="20"></textarea>
@ -254,6 +254,10 @@
                return new type(buffer);
            }

+            //
+            // load model
+            //
+
            function loadFile(event, fname) {
                var file = event.target.files[0] || null;
                if (file == null) {
@ -281,6 +285,10 @@
                reader.readAsArrayBuffer(file);
            }

+            //
+            // audio file
+            //
+
            function loadAudio(event) {
                if (!context) {
                    context = new AudioContext({sampleRate: 16000});
@ -327,7 +335,7 @@
            }

            //
-            // Microphone
+            // microphone
            //

            var mediaRecorder = null;