Thursday, June 10, 2021

Qt WebAssembly performance enhancement

SIMD is something related to performance stuff. It makes certain things go faster (simply put). Kind of like sticking laughing gas in your petrol car's fuel line.

https://en.wikipedia.org/wiki/SIMD

Emscripten, WebAssembly now have better support for SIMD (to various degrees)

https://emscripten.org/docs/porting/simd.html


Chrome and firefox also support SIMD (to various degrees)

So for Qt 6.3, I have been working to get Qt building and running using those SIMD instructions available for javascript (and thereby WebAssembly) in the web browsers (sorry, Safari.. catch up soon?)


Just configure soon to be qt 6.3 with the -sse2 argument (change has not been reviewed or merged yet)

https://codereview.qt-project.org/c/qt/qtbase/+/343563

To see if it is actually worth adding SIMD support to Qt WebAssembly, I built a couple Qt Quick benchmarks, namely the declarative particles benchmarks - affectors and emission.

I had to put image and qml files into a .qrc resource file so that Qt WebAssembly could find them, as we have no real local file system access.


The results are much better than I expected. Clearly, there is a performance boost by using simd in wasm. 

 Someone else has had similar results with wasm SIMD

https://robaboukhalil.medium.com/webassembly-and-simd-7a7daa4f2ecd

 

Next I want to expand the number and type of benchmarks, but this gives us early baseline results.

Chrome browser
no SIMD:

********* Start testing of tst_affectors *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_affectors::initTestCase()
Heap resize call from 16777216 to 20185088 took 0.09999999403953552 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0.10000000894069672 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0.10000000894069672 msecs. Success: true
Heap resize call from 29097984 to 34930688 took 0.5 msecs. Success: true
PASS   : tst_affectors::test_basic(16ms)
RESULT : tst_affectors::test_basic():"16ms":
     0.29 msecs per iteration (total: 75, iterations: 256)
Heap resize call from 34930688 to 41943040 took 0.10000000894069672 msecs. Success: true
PASS   : tst_affectors::test_basic(32ms)
RESULT : tst_affectors::test_basic():"32ms":
     0.41 msecs per iteration (total: 53, iterations: 128)
Heap resize call from 41943040 to 50331648 took 0.29999999701976776 msecs. Success: true
PASS   : tst_affectors::test_basic(100ms)
RESULT : tst_affectors::test_basic():"100ms":
     0.87 msecs per iteration (total: 56, iterations: 64)
Heap resize call from 50331648 to 60424192 took 0.3999999910593033 msecs. Success: true
PASS   : tst_affectors::test_basic(500ms)
RESULT : tst_affectors::test_basic():"500ms":
     3.3 msecs per iteration (total: 53, iterations: 16)
Heap resize call from 60424192 to 72548352 took 0.19999998807907104 msecs. Success: true
PASS   : tst_affectors::test_filtered(16ms)
RESULT : tst_affectors::test_filtered():"16ms":
     0.84 msecs per iteration (total: 54, iterations: 64)
PASS   : tst_affectors::test_filtered(32ms)
RESULT : tst_affectors::test_filtered():"32ms":
     0.96 msecs per iteration (total: 62, iterations: 64)
Heap resize call from 72548352 to 87097344 took 0.20000000298023224 msecs. Success: true
PASS   : tst_affectors::test_filtered(100ms)
RESULT : tst_affectors::test_filtered():"100ms":
     1.3 msecs per iteration (total: 89, iterations: 64)
PASS   : tst_affectors::test_filtered(500ms)
RESULT : tst_affectors::test_filtered():"500ms":
     3.7 msecs per iteration (total: 60, iterations: 16)
PASS   : tst_affectors::cleanupTestCase()
Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 15037ms
********* Finished testing of tst_affectors *********

********* Start testing of tst_emission *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_emission::initTestCase()
Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
PASS   : tst_emission::test_basic(16ms)
RESULT : tst_emission::test_basic():"16ms":
     1.6 msecs per iteration (total: 53, iterations: 32)
Heap resize call from 29097984 to 34930688 took 0.4000000059604645 msecs. Success: true
PASS   : tst_emission::test_basic(32ms)
RESULT : tst_emission::test_basic():"32ms":
     3.1 msecs per iteration (total: 51, iterations: 16)
PASS   : tst_emission::test_basic(100ms)
RESULT : tst_emission::test_basic():"100ms":
     4.5 msecs per iteration (total: 73, iterations: 16)
PASS   : tst_emission::test_basic(500ms)
RESULT : tst_emission::test_basic():"500ms":
     21 msecs per iteration (total: 87, iterations: 4)
Heap resize call from 34930688 to 41943040 took 0.09999999403953552 msecs. Success: true
PASS   : tst_emission::test_basic(1000ms)
RESULT : tst_emission::test_basic():"1000ms":
     22 msecs per iteration (total: 89, iterations: 4)
PASS   : tst_emission::test_basic(10000ms)
RESULT : tst_emission::test_basic():"10000ms":
     23 msecs per iteration (total: 92, iterations: 4)
PASS   : tst_emission::cleanupTestCase()
Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 5398ms
********* Finished testing of tst_emission *********

======================================================================
======================================================================

chrome SIMD


********* Start testing of tst_affectors *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_affectors::initTestCase()
Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
Heap resize call from 29097984 to 34930688 took 0.3999999761581421 msecs. Success: true
Heap resize call from 34930688 to 41943040 took 0.19999998807907104 msecs. Success: true
PASS   : tst_affectors::test_basic(16ms)
RESULT : tst_affectors::test_basic():"16ms":
     0.059 msecs per iteration (total: 61, iterations: 1024)
Heap resize call from 41943040 to 50331648 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_basic(32ms)
RESULT : tst_affectors::test_basic():"32ms":
     0.11 msecs per iteration (total: 59, iterations: 512)
Heap resize call from 50331648 to 60424192 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_basic(100ms)
RESULT : tst_affectors::test_basic():"100ms":
     0.15 msecs per iteration (total: 81, iterations: 512)
Heap resize call from 60424192 to 72548352 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_basic(500ms)
RESULT : tst_affectors::test_basic():"500ms":
     0.58 msecs per iteration (total: 75, iterations: 128)
Heap resize call from 72548352 to 87097344 took 0.3999999761581421 msecs. Success: true
PASS   : tst_affectors::test_filtered(16ms)
RESULT : tst_affectors::test_filtered():"16ms":
     0.10 msecs per iteration (total: 52, iterations: 512)
Heap resize call from 87097344 to 104529920 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_filtered(32ms)
RESULT : tst_affectors::test_filtered():"32ms":
     0.12 msecs per iteration (total: 64, iterations: 512)
PASS   : tst_affectors::test_filtered(100ms)
RESULT : tst_affectors::test_filtered():"100ms":
     0.19 msecs per iteration (total: 51, iterations: 256)
Heap resize call from 104529920 to 125435904 took 0.20000001788139343 msecs. Success: true
PASS   : tst_affectors::test_filtered(500ms)
RESULT : tst_affectors::test_filtered():"500ms":
     0.61 msecs per iteration (total: 79, iterations: 128)
PASS   : tst_affectors::cleanupTestCase()
Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 9728ms
********* Finished testing of tst_affectors *********

********* Start testing of tst_emission *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_emission::initTestCase()
Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
Heap resize call from 29097984 to 34930688 took 0.29999998211860657 msecs. Success: true
PASS   : tst_emission::test_basic(16ms)
RESULT : tst_emission::test_basic():"16ms":
     0.046 msecs per iteration (total: 95, iterations: 2048)
Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true
PASS   : tst_emission::test_basic(32ms)
RESULT : tst_emission::test_basic():"32ms":
     0.090 msecs per iteration (total: 93, iterations: 1024)
Heap resize call from 41943040 to 50331648 took 0.29999998211860657 msecs. Success: true
PASS   : tst_emission::test_basic(100ms)
RESULT : tst_emission::test_basic():"100ms":
     0.27 msecs per iteration (total: 70, iterations: 256)
Heap resize call from 50331648 to 60424192 took 0.4000000059604645 msecs. Success: true
PASS   : tst_emission::test_basic(500ms)
RESULT : tst_emission::test_basic():"500ms":
     1.3 msecs per iteration (total: 85, iterations: 64)
Heap resize call from 60424192 to 72548352 took 0.4000000059604645 msecs. Success: true
PASS   : tst_emission::test_basic(1000ms)
RESULT : tst_emission::test_basic():"1000ms":
     1.3 msecs per iteration (total: 87, iterations: 64)
PASS   : tst_emission::test_basic(10000ms)
RESULT : tst_emission::test_basic():"10000ms":
     1.3 msecs per iteration (total: 86, iterations: 64)
PASS   : tst_emission::cleanupTestCase()
Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 6017ms
********* Finished testing of tst_emission *********


Firefox has similar results:


firefox nosimd:

    ********* Start testing of tst_affectors *********
    Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/opt/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5f3c99085d4c2ebf57fd0586b013b02e32a8e20b)), unknown unknown
    PASS   : tst_affectors::initTestCase()
    Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    PASS   : tst_affectors::test_basic(16ms)
    RESULT : tst_affectors::test_basic():"16ms":
         0.81 msecs per iteration (total: 52, iterations: 64)
    PASS   : tst_affectors::test_basic(32ms)
    RESULT : tst_affectors::test_basic():"32ms":
         1.1 msecs per iteration (total: 76, iterations: 64)
    PASS   : tst_affectors::test_basic(100ms)
    RESULT : tst_affectors::test_basic():"100ms":
         2.4 msecs per iteration (total: 79, iterations: 32)
    Heap resize call from 29097984 to 34930688 took 17 msecs. Success: true qtloader.js line 443 > eval:11829:17
    PASS   : tst_affectors::test_basic(500ms)
    RESULT : tst_affectors::test_basic():"500ms":
         9.1 msecs per iteration (total: 73, iterations: 8)
    PASS   : tst_affectors::test_filtered(16ms)
    RESULT : tst_affectors::test_filtered():"16ms":
         2.3 msecs per iteration (total: 74, iterations: 32)
    PASS   : tst_affectors::test_filtered(32ms)
    RESULT : tst_affectors::test_filtered():"32ms":
         2.6 msecs per iteration (total: 86, iterations: 32)
    PASS   : tst_affectors::test_filtered(100ms)
    RESULT : tst_affectors::test_filtered():"100ms":
         3.8 msecs per iteration (total: 62, iterations: 16)
    Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    PASS   : tst_affectors::test_filtered(500ms)
    RESULT : tst_affectors::test_filtered():"500ms":
         11 msecs per iteration (total: 88, iterations: 8)
    PASS   : tst_affectors::cleanupTestCase()
    Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 16781ms
    ********* Finished testing of tst_affectors *********    
        
     ********* Start testing of tst_emission *********
     Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/opt/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5f3c99085d4c2ebf57fd0586b013b02e32a8e20b)), unknown unknown
     PASS   : tst_emission::initTestCase()
     Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
     Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
     Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
     PASS   : tst_emission::test_basic(16ms)
     RESULT : tst_emission::test_basic():"16ms":
          2.1 msecs per iteration (total: 70, iterations: 32)
     PASS   : tst_emission::test_basic(32ms)
     RESULT : tst_emission::test_basic():"32ms":
          4.1 msecs per iteration (total: 67, iterations: 16)
     PASS   : tst_emission::test_basic(100ms)
     RESULT : tst_emission::test_basic():"100ms":
          8.1 msecs per iteration (total: 65, iterations: 8)
     PASS   : tst_emission::test_basic(500ms)
     RESULT : tst_emission::test_basic():"500ms":
          43 msecs per iteration (total: 87, iterations: 2)
     PASS   : tst_emission::test_basic(1000ms)
     RESULT : tst_emission::test_basic():"1000ms":
          43 msecs per iteration (total: 86, iterations: 2)
     PASS   : tst_emission::test_basic(10000ms)
     RESULT : tst_emission::test_basic():"10000ms":
          43 msecs per iteration (total: 86, iterations: 2)
     PASS   : tst_emission::cleanupTestCase()
     Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 4178ms
     ********* Finished testing of tst_emission *********
                
======================================================================
======================================================================
                
                                         
firefox SIMD:


    ********* Start testing of tst_affectors *********
    Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
    PASS   : tst_affectors::initTestCase()
    Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
    Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
    Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
    Heap resize call from 29097984 to 34930688 took 0 msecs. Success: true
    PASS   : tst_affectors::test_basic(16ms)
    RESULT : tst_affectors::test_basic():"16ms":
         0.14 msecs per iteration (total: 73, iterations: 512)
    Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true
    Heap resize call from 41943040 to 50331648 took 1 msecs. Success: true
    PASS   : tst_affectors::test_basic(32ms)
    RESULT : tst_affectors::test_basic():"32ms":
         0.21 msecs per iteration (total: 54, iterations: 256)
    Heap resize call from 50331648 to 60424192 took 5 msecs. Success: true
    PASS   : tst_affectors::test_basic(100ms)
    RESULT : tst_affectors::test_basic():"100ms":
         0.45 msecs per iteration (total: 58, iterations: 128)
    PASS   : tst_affectors::test_basic(500ms)
    RESULT : tst_affectors::test_basic():"500ms":
         1.8 msecs per iteration (total: 60, iterations: 32)
    Heap resize call from 60424192 to 72548352 took 7 msecs. Success: true
    PASS   : tst_affectors::test_filtered(16ms)
    RESULT : tst_affectors::test_filtered():"16ms":
         0.28 msecs per iteration (total: 73, iterations: 256)
    Heap resize call from 72548352 to 87097344 took 0 msecs. Success: true
    PASS   : tst_affectors::test_filtered(32ms)
    RESULT : tst_affectors::test_filtered():"32ms":
         0.35 msecs per iteration (total: 90, iterations: 256)
    PASS   : tst_affectors::test_filtered(100ms)
    RESULT : tst_affectors::test_filtered():"100ms":
         0.60 msecs per iteration (total: 77, iterations: 128)
    Heap resize call from 87097344 to 104529920 took 0 msecs. Success: true
    PASS   : tst_affectors::test_filtered(500ms)
    RESULT : tst_affectors::test_filtered():"500ms":
         2.0 msecs per iteration (total: 66, iterations: 32)
    PASS   : tst_affectors::cleanupTestCase()
    Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 6411ms
    ********* Finished testing of tst_affectors *********
                                                
    ********* Start testing of tst_emission *********
    Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
    PASS   : tst_emission::initTestCase()
    Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
    Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
    Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
    Heap resize call from 29097984 to 34930688 took 0 msecs. Success: true
    PASS   : tst_emission::test_basic(16ms)
    RESULT : tst_emission::test_basic():"16ms":
         0.12 msecs per iteration (total: 65, iterations: 512)
    PASS   : tst_emission::test_basic(32ms)
    RESULT : tst_emission::test_basic():"32ms":
         0.24 msecs per iteration (total: 63, iterations: 256)
    Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true
    PASS   : tst_emission::test_basic(100ms)
    RESULT : tst_emission::test_basic():"100ms":
         0.75 msecs per iteration (total: 97, iterations: 128)
    Heap resize call from 41943040 to 50331648 took 1 msecs. Success: true
    PASS   : tst_emission::test_basic(500ms)
    RESULT : tst_emission::test_basic():"500ms":
         3.6 msecs per iteration (total: 58, iterations: 16)
    Heap resize call from 50331648 to 60424192 took 4 msecs. Success: true
    PASS   : tst_emission::test_basic(1000ms)
    RESULT : tst_emission::test_basic():"1000ms":
         3.6 msecs per iteration (total: 58, iterations: 16)
    PASS   : tst_emission::test_basic(10000ms)
    RESULT : tst_emission::test_basic():"10000ms":
         3.6 msecs per iteration (total: 58, iterations: 16)
    PASS   : tst_emission::cleanupTestCase()
    Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 3339ms
    ********* Finished testing of tst_emission *********



No comments: