11 Performance Testing
mmcclaskey edited this page 2021-09-16 03:56:40 -04:00

Overview

KasmVNC includes a built in benchmarking feature that performs the following isolated tests, single threaded. Test are done using source images at a fixed resolution of 1600x1200.

  • Jpeg compression at quality 8 (64 runs)
  • Jpeg compression at quality 4 (64 runs)
  • Webp compression at quality 8 (8 runs)
  • Webp compression at quality 4 (16 runs)
  • Nearest scaling to 80% (64 runs)
  • Nearest scaling to 40% (64 runs)
  • Bilinear scaling to 80% (64 runs)
  • Bilinear scaling to 40% (64 runs)
  • Progressive bilinear scaling to 80% (64 runs)
  • Progressive bilinear scaling to 40% (64 runs)
  • Analysis (64 runs) (incl. memcpy overhead)
  • Analysis w/ vertical scroll detection (64 runs) (incl. memcpy overhead)
  • Analysis w/ horizontal and vertical scroll detection (32 runs) (incl. memcpy overhead)

Purpose

The purpose of these tests is to provide highly isolated and repeatable benchmarks for functions that we intend to improve performance on over time. For example, we are working on vectorizing sections of code for AVX and SSE extensions and we want to have a high level of confidence in our testing and keep a historic trail of our improvements.

Running

All our testing is done on an AWS c5.large instance type running on Ubuntu 18.04 LTS.

ubuntu@hostname:~$ sudo apt-get update
ubuntu@hostname:~$ sudo dpkg -i kasmvncserver_bionic_*.deb
ubuntu@hostname:~$ sudo apt-get install -f
ubuntu@hostname:~$ /usr/bin/Xvnc -interface 0.0.0.0 -selfBench :1

Xvnc KasmVNC 0.9 - built Sep  3 2021 14:29:59
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12010000, The X.Org Foundation

 vncext:      VNC extension running!
 vncext:      Listening for websocket connections on 0.0.0.0 interface(s), port 6800
 VNCServerST: CPU capability: SSE2 yes, AVX512f yes
 SelfBench:   Running micro-benchmarks (single-threaded, runs depending on task)
 SelfBench:   Jpeg compression at quality 8 took 3095 ms (64 runs)
 SelfBench:   Jpeg compression at quality 4 took 1325 ms (64 runs)
 SelfBench:   Webp compression at quality 8 took 3474 ms (8 runs)
 SelfBench:   Webp compression at quality 4 took 2410 ms (16 runs)
 SelfBench:   Nearest scaling to 80% took 237 ms (64 runs)
 SelfBench:   Nearest scaling to 40% took 59 ms (64 runs)
 SelfBench:   Bilinear scaling to 80% took 1272 ms (64 runs)
 SelfBench:   Bilinear scaling to 40% took 319 ms (64 runs)
 SelfBench:   Progressive bilinear scaling to 80% took 1272 ms (64 runs)
 SelfBench:   Progressive bilinear scaling to 40% took 818 ms (64 runs)
 SelfBench:   Analysis took 235 ms (64 runs) (incl. memcpy overhead)
 SelfBench:   Analysis w/ scroll detection took 646 ms (64 runs) (incl. memcpy overhead)
 SelfBench:   Analysis w/ horizontal scroll detection took 2852 ms (32 runs) (incl. memcpy overhead)

Historical Progress

Date Description Commit Jpeg8 Jpeg4 Webp8 Webp4 NS80 BS80 PBS80 Analysis ScrollV ScrollHV
20210909 Initial Test dc21d5f 3071 1314 3464 2407 236 1271 1268 166 544 2743
20210909 Vectorize Progressive scaling 0cb2c0b 3080 1308 3388 2429 215 1270 783 190 564 2656
20210909 LibJPEG Turbo latest. 0cb2c0b 2087 887 3386 2427 211 1281 782 196 602 2695

Update Notes

  • Vectorize Progressive Scaling (20210909) - manual vectorization of progressive bilinear scaling, which can be used in video mode. The vectorization resulted in a 38% increase in performance of the progressive bilinear scaling function at 80% in size and a 71% increase in speed for scaling to 40% of the original size.
  • LibJPEG Turbo (20210909) - This is a special build of KasmVNC that targets the latest version of libjpeg-turbo rather than using the older version available in the respective OS repos. The latest version has optimizations for AVX and SSE2 extensions. Our testing shows a 32% increase in speed of encoding jpeg. We do not publicly release this build, however, it is used within Kasm Workspaces docker images.