Default Branch

31aea563a8 · whisper : fix extra memory usage (#2534) · Updated 2024-11-06 22:02:11 +01:00

Branches

552419f2c0 · ggml : aligned malloc -> malloc · Updated 2024-10-31 20:40:11 +01:00

35
3

ceb77363cd · ggml : disable CUDA graphs for non-llama.cpp projects · Updated 2024-06-26 19:14:22 +02:00

290
1

267e15a46d · cuda : avoid async allocs in CUDA mel code · Updated 2024-06-12 08:52:15 +02:00

407
1

5801b8ac64 · cuda : fix HIPBLAS build · Updated 2024-06-11 18:13:43 +02:00

408
1

13c5446759 · Update ggml-cuda/mmvq.cu · Updated 2024-06-11 16:37:32 +02:00

410
2

059bcd3009 · ci : fix CUDA builds · Updated 2024-06-11 10:40:19 +02:00

410
1

ba69578828 · whisper : add whisper_token_count helper · Updated 2024-03-25 13:46:07 +01:00

554
2

66df44b0b7 · alloc : fix allocation data of pre-allocated leafs · Updated 2024-03-16 15:47:14 +01:00

563
2

f25edade2b · whisper : alternative way to handle the external encoders · Updated 2024-02-12 15:32:26 +01:00

690
2

15c4fdce45 · chess : tuning performance · Updated 2023-11-30 09:50:47 +01:00

920
21

4260d4fc70 · wchess : minor · Updated 2023-11-28 14:10:18 +01:00

920
11

c8b3bc6a0d · cuda : use CUBLAS_COMPTE_F32 insted of CUBLAS_COMPUTE_F16 · Updated 2023-11-27 10:57:07 +01:00

902
1

ee2971bf6a · bench : multi-thread memcpy · Updated 2023-11-21 20:57:07 +01:00

920
1

ec96d68402 · whisper : quantize encoder only · Updated 2023-11-16 15:19:02 +01:00

930
1

270b1e48db · cuda : sync llama.cpp fixes · Updated 2023-11-15 14:52:06 +01:00

942
14

5031f54717 · whisper : try to fix the parallel whisper_state functionality (#1479) · Updated 2023-11-12 13:52:38 +01:00

950
21

a2f3b82db3 · whisper : free backend instances in whisper_state · Updated 2023-11-12 13:31:51 +01:00

950
23

7a91a3ba60 · bench-all : add q4 models · Updated 2023-11-10 21:23:18 +01:00

950
16

bf4110dbcf · whisper : wip sched (not working yet) · Updated 2023-11-09 18:07:54 +01:00

955
2

40be74271f · models : update readme · Updated 2023-11-07 12:53:01 +01:00

961
4