whisper.cpp/examples/talk.wasm
Jhen-Jie Hong 0463028bc2
whisper : add context param to disable gpu (#1293)
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-06 11:04:24 +02:00
..
CMakeLists.txt whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
emscripten.cpp whisper : add context param to disable gpu (#1293) 2023-11-06 11:04:24 +02:00
gpt-2.cpp ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220) 2023-09-05 13:54:40 +03:00
gpt-2.h whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
index-tmpl.html whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
README.md talk.wasm : bump memory usage + update whisper.js 2023-01-06 21:13:44 +02:00

talk.wasm

Talk with an Artificial Intelligence in your browser:

https://user-images.githubusercontent.com/1991296/203411580-fedb4839-05e4-4474-8364-aaf1e9a9b615.mp4

Online demo: https://whisper.ggerganov.com/talk/

Terminal version: examples/talk

How it works?

This demo leverages 2 modern neural network models to create a high-quality voice chat directly in your browser:

  • OpenAI's Whisper speech recognition model is used to process your voice and understand what you are saying
  • Upon receiving some voice input, the AI generates a text response using OpenAI's GPT-2 language model
  • The AI then vocalizes the response using the browser's Web Speech API

The web page does the processing locally on your machine. The processing of these heavy neural network models in the browser is possible by implementing them efficiently in C/C++ and using the browser's WebAssembly SIMD capabilities for extra performance:

In order to run the models, the web page first needs to download the model data which is about ~350 MB. The model data is then cached in your browser's cache and can be reused in future visits without downloading it again.

Requirements

In order to run this demo efficiently, you need to have the following:

  • Latest Chrome or Firefox browser (Safari is not supported)
  • Run this on a desktop or laptop with modern CPU (a mobile phone will likely not be good enough)
  • Speak phrases that are no longer than 10 seconds - this is the audio context of the AI
  • The web-page uses about 1.8GB of RAM

Notice that this demo is using the smallest GPT-2 model, so the generated text responses are not always very good. Also, the prompting strategy can likely be improved to achieve better results.

The demo is quite computationally heavy, so you need a fast CPU. It's not usual to run these transformer models in a browser. Typically, they run on powerful GPUs.

Currently, mobile browsers do not support the Fixed-width SIMD WebAssembly capability, so you cannot run this demo on a phone or a tablet. Hopefully, in the near future this will become supported.

Todo

  • Better UI (contributions are welcome)
  • Better GPT-2 prompting

Build instructions

# build using Emscripten (v3.1.2)
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
mkdir build-em && cd build-em
emcmake cmake ..
make -j

# copy the produced page to your HTTP path
cp bin/talk.wasm/*       /path/to/html/
cp bin/libtalk.worker.js /path/to/html/

Feedback

If you have any comments or ideas for improvement, please drop a comment in the following discussion:

https://github.com/ggerganov/whisper.cpp/discussions/167