docs : convert README_sycl.md to utf8 format [no ci] (#3191)

This commit updates the README_sycl.md file to use UTF-8 encoding.

The motivation for this is that while this file displays correctly in
github it will fail to render with tools that expect UTF-8 encoding.
For example this is the case when using `grip` to view the file locally.
This commit is contained in:
Daniel Bevenius 2025-05-27 10:53:50 +02:00 committed by GitHub
parent 450de0787e
commit 2bb7694edb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1,249 +1,249 @@
# whisper.cpp for SYCL # whisper.cpp for SYCL
[Background](#background) [Background](#background)
[OS](#os) [OS](#os)
[Intel GPU](#intel-gpu) [Intel GPU](#intel-gpu)
[Linux](#linux) [Linux](#linux)
[Environment Variable](#environment-variable) [Environment Variable](#environment-variable)
[Known Issue](#known-issue) [Known Issue](#known-issue)
[Todo](#todo) [Todo](#todo)
## Background ## Background
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators—such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17. SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators—such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17.
oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms. oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms.
Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs. Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs.
To avoid re-inventing the wheel, this code refers other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) (Commercial release [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html)) migrate to SYCL. To avoid re-inventing the wheel, this code refers other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) (Commercial release [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html)) migrate to SYCL.
The whisper.cpp for SYCL is used to support Intel GPUs. The whisper.cpp for SYCL is used to support Intel GPUs.
For Intel CPU, recommend to use whisper.cpp for X86 (Intel MKL build). For Intel CPU, recommend to use whisper.cpp for X86 (Intel MKL build).
## OS ## OS
|OS|Status|Verified| |OS|Status|Verified|
|-|-|-| |-|-|-|
|Linux|Support|Ubuntu 22.04| |Linux|Support|Ubuntu 22.04|
|Windows|Ongoing| | |Windows|Ongoing| |
## Intel GPU ## Intel GPU
|Intel GPU| Status | Verified Model| |Intel GPU| Status | Verified Model|
|-|-|-| |-|-|-|
|Intel Data Center Max Series| Support| Max 1550| |Intel Data Center Max Series| Support| Max 1550|
|Intel Data Center Flex Series| Support| Flex 170| |Intel Data Center Flex Series| Support| Flex 170|
|Intel Arc Series| Support| Arc 770| |Intel Arc Series| Support| Arc 770|
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake| |Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7| |Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
## Linux ## Linux
### Setup Environment ### Setup Environment
1. Install Intel GPU driver. 1. Install Intel GPU driver.
a. Please install Intel GPU driver by official guide: [Install GPU Drivers](https://dgpu-docs.intel.com/driver/installation.html). a. Please install Intel GPU driver by official guide: [Install GPU Drivers](https://dgpu-docs.intel.com/driver/installation.html).
Note: for iGPU, please install the client GPU driver. Note: for iGPU, please install the client GPU driver.
b. Add user to group: video, render. b. Add user to group: video, render.
``` ```
sudo usermod -aG render username sudo usermod -aG render username
sudo usermod -aG video username sudo usermod -aG video username
``` ```
Note: re-login to enable it. Note: re-login to enable it.
c. Check c. Check
``` ```
sudo apt install clinfo sudo apt install clinfo
sudo clinfo -l sudo clinfo -l
``` ```
Output (example): Output (example):
``` ```
Platform #0: Intel(R) OpenCL Graphics Platform #0: Intel(R) OpenCL Graphics
`-- Device #0: Intel(R) Arc(TM) A770 Graphics `-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #0: Intel(R) OpenCL HD Graphics Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49] `-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49]
``` ```
2. Install Intel® oneAPI Base toolkit. 2. Install Intel® oneAPI Base toolkit.
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
Recommend to install to default folder: **/opt/intel/oneapi**. Recommend to install to default folder: **/opt/intel/oneapi**.
Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder. Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder.
b. Check b. Check
``` ```
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
sycl-ls sycl-ls
``` ```
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**. There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
Output (example): Output (example):
``` ```
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000] [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000] [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
``` ```
2. Build locally: 2. Build locally:
``` ```
mkdir -p build mkdir -p build
cd build cd build
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
#for FP16 #for FP16
#cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWHISPER_SYCL_F16=ON #cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWHISPER_SYCL_F16=ON
#for FP32 #for FP32
cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
#build example/main only #build example/main only
#cmake --build . --config Release --target main #cmake --build . --config Release --target main
#build all binary #build all binary
cmake --build . --config Release -v cmake --build . --config Release -v
``` ```
or or
``` ```
./examples/sycl/build.sh ./examples/sycl/build.sh
``` ```
Note: Note:
- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only. - By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.
### Run ### Run
1. Put model file to folder **models** 1. Put model file to folder **models**
2. Enable oneAPI running environment 2. Enable oneAPI running environment
``` ```
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
``` ```
3. List device ID 3. List device ID
Run without parameter: Run without parameter:
``` ```
./build/bin/ls-sycl-device ./build/bin/ls-sycl-device
or or
./build/bin/main ./build/bin/main
``` ```
Check the ID in startup log, like: Check the ID in startup log, like:
``` ```
found 4 SYCL devices: found 4 SYCL devices:
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3, Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2, Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280 max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0, Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280 max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0, Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
``` ```
|Attribute|Note| |Attribute|Note|
|-|-| |-|-|
|compute capability 1.3|Level-zero running time, recommended | |compute capability 1.3|Level-zero running time, recommended |
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases| |compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
4. Set device ID and execute whisper.cpp 4. Set device ID and execute whisper.cpp
Set device ID = 0 by **GGML_SYCL_DEVICE=0** Set device ID = 0 by **GGML_SYCL_DEVICE=0**
``` ```
GGML_SYCL_DEVICE=0 ./build/bin/main -m models/ggml-base.en.bin -f samples/jfk.wav GGML_SYCL_DEVICE=0 ./build/bin/main -m models/ggml-base.en.bin -f samples/jfk.wav
``` ```
or run by script: or run by script:
``` ```
./examples/sycl/run_whisper.sh ./examples/sycl/run_whisper.sh
``` ```
5. Check the device ID in output 5. Check the device ID in output
Like: Like:
``` ```
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
``` ```
## Environment Variable ## Environment Variable
#### Build #### Build
|Name|Value|Function| |Name|Value|Function|
|-|-|-| |-|-|-|
|WHISPER_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, WHISPER_SYCL=ON is mandatory.| |WHISPER_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, WHISPER_SYCL=ON is mandatory.|
|WHISPER_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path.For FP32, do not set it.| |WHISPER_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path.For FP32, do not set it.|
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path| |CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path| |CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path|
#### Running #### Running
|Name|Value|Function| |Name|Value|Function|
|-|-|-| |-|-|-|
|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output| |GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output|
|GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG| |GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
## Known Issue ## Known Issue
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`. - Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
Miss to enable oneAPI running environment. Miss to enable oneAPI running environment.
Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`. Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.
- Hang during startup - Hang during startup
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block. llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
Solution: add **--no-mmap**. Solution: add **--no-mmap**.
## Todo ## Todo
- Support to build in Windows. - Support to build in Windows.
- Support multiple cards. - Support multiple cards.