Sunday, June 13, 2021

(redux) Qt WebAssembly performance enhancement

 In my last post Qt WebAssembly performance enhancement

there were some impressive performance stat speedups. Unfortunately, as my collegue Morten pointed out, both builds were in debug mode.  *sigh*

So I rebuilt them in release mode, and added a few selected benchmarks from the Qt tests/benchmark source directory:

  • tst_affectors
  • tst_emission
  • tst_QGraphicsScene
  • tst_QGraphicsView
  • tst_QGraphicsWidget
  • tst_qanimation
  • tst_QMatrix4x4
  • BlendBench
  • tst_QImageConversion
  • tst_DrawTexture
  • tst_QPainter

Although not as impressive overall, there is still quite a speed up in the image conversions and QPainter areas, for example:

non-simd:

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), circle)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), circle":

     2.3 msecs per iteration (total: 76, iterations: 32)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), line":

     2.4 msecs per iteration (total: 77, iterations: 32)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), solidrect":

     2.4 msecs per iteration (total: 78, iterations: 32)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), alpharect":

     2.4 msecs per iteration (total: 78, iterations: 32)


simd:

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), circle":

     0.95 msecs per iteration (total: 61, iterations: 64)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), line":

     0.95 msecs per iteration (total: 61, iterations: 64)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), solidrect":

     0.92 msecs per iteration (total: 59, iterations: 64)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), alpharect":

     0.95 msecs per iteration (total: 61, iterations: 64


non-simd:

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), circle)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), circle":

     1.7 msecs per iteration (total: 56, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), line":

     1.7 msecs per iteration (total: 55, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), solidrect":

     1.7 msecs per iteration (total: 55, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), alpharect":

     3.6 msecs per iteration (total: 58, iterations: 16)


simd:

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), circle)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), circle":

     2.6 msecs per iteration (total: 85, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), line":

     4.0 msecs per iteration (total: 64, iterations: 16)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), solidrect":

     2.2 msecs per iteration (total: 71, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), alpharect":

     4.5 msecs per iteration (total: 73, iterations: 16)


and image conversions:

non-simd:

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> argb32pm -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> argb32pm -> argb32":

     6.1 msecs per iteration (total: 98, iterations: 16)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgb32 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgb32 -> argb32":

     2.9 msecs per iteration (total: 94, iterations: 32)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgba8888 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgba8888 -> argb32":

     4.6 msecs per iteration (total: 75, iterations: 16)

simd:

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> argb32pm -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> argb32pm -> argb32":

     4.2 msecs per iteration (total: 68, iterations: 16)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgb32 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgb32 -> argb32":

     0.49 msecs per iteration (total: 63, iterations: 128)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgba8888 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgba8888 -> argb32":

     0.90 msecs per iteration (total: 58, iterations: 64)



But others were slower for the simd build. Probably due to emscripten not fully supporting simd instructions and emulating those where it doesn't support.


For full benchmark results get the zip file




Thursday, June 10, 2021

Qt WebAssembly performance enhancement

SIMD is something related to performance stuff. It makes certain things go faster (simply put). Kind of like sticking laughing gas in your petrol car's fuel line.

https://en.wikipedia.org/wiki/SIMD

Emscripten, WebAssembly now have better support for SIMD (to various degrees)

https://emscripten.org/docs/porting/simd.html


Chrome and firefox also support SIMD (to various degrees)

So for Qt 6.3, I have been working to get Qt building and running using those SIMD instructions available for javascript (and thereby WebAssembly) in the web browsers (sorry, Safari.. catch up soon?)


Just configure soon to be qt 6.3 with the -sse2 argument (change has not been reviewed or merged yet)

https://codereview.qt-project.org/c/qt/qtbase/+/343563

To see if it is actually worth adding SIMD support to Qt WebAssembly, I built a couple Qt Quick benchmarks, namely the declarative particles benchmarks - affectors and emission.

I had to put image and qml files into a .qrc resource file so that Qt WebAssembly could find them, as we have no real local file system access.


The results are much better than I expected. Clearly, there is a performance boost by using simd in wasm. 

 Someone else has had similar results with wasm SIMD

https://robaboukhalil.medium.com/webassembly-and-simd-7a7daa4f2ecd

 

Next I want to expand the number and type of benchmarks, but this gives us early baseline results.

Chrome browser
no SIMD:

********* Start testing of tst_affectors *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_affectors::initTestCase()
Heap resize call from 16777216 to 20185088 took 0.09999999403953552 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0.10000000894069672 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0.10000000894069672 msecs. Success: true
Heap resize call from 29097984 to 34930688 took 0.5 msecs. Success: true
PASS   : tst_affectors::test_basic(16ms)
RESULT : tst_affectors::test_basic():"16ms":
     0.29 msecs per iteration (total: 75, iterations: 256)
Heap resize call from 34930688 to 41943040 took 0.10000000894069672 msecs. Success: true
PASS   : tst_affectors::test_basic(32ms)
RESULT : tst_affectors::test_basic():"32ms":
     0.41 msecs per iteration (total: 53, iterations: 128)
Heap resize call from 41943040 to 50331648 took 0.29999999701976776 msecs. Success: true
PASS   : tst_affectors::test_basic(100ms)
RESULT : tst_affectors::test_basic():"100ms":
     0.87 msecs per iteration (total: 56, iterations: 64)
Heap resize call from 50331648 to 60424192 took 0.3999999910593033 msecs. Success: true
PASS   : tst_affectors::test_basic(500ms)
RESULT : tst_affectors::test_basic():"500ms":
     3.3 msecs per iteration (total: 53, iterations: 16)
Heap resize call from 60424192 to 72548352 took 0.19999998807907104 msecs. Success: true
PASS   : tst_affectors::test_filtered(16ms)
RESULT : tst_affectors::test_filtered():"16ms":
     0.84 msecs per iteration (total: 54, iterations: 64)
PASS   : tst_affectors::test_filtered(32ms)
RESULT : tst_affectors::test_filtered():"32ms":
     0.96 msecs per iteration (total: 62, iterations: 64)
Heap resize call from 72548352 to 87097344 took 0.20000000298023224 msecs. Success: true
PASS   : tst_affectors::test_filtered(100ms)
RESULT : tst_affectors::test_filtered():"100ms":
     1.3 msecs per iteration (total: 89, iterations: 64)
PASS   : tst_affectors::test_filtered(500ms)
RESULT : tst_affectors::test_filtered():"500ms":
     3.7 msecs per iteration (total: 60, iterations: 16)
PASS   : tst_affectors::cleanupTestCase()
Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 15037ms
********* Finished testing of tst_affectors *********

********* Start testing of tst_emission *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_emission::initTestCase()
Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
PASS   : tst_emission::test_basic(16ms)
RESULT : tst_emission::test_basic():"16ms":
     1.6 msecs per iteration (total: 53, iterations: 32)
Heap resize call from 29097984 to 34930688 took 0.4000000059604645 msecs. Success: true
PASS   : tst_emission::test_basic(32ms)
RESULT : tst_emission::test_basic():"32ms":
     3.1 msecs per iteration (total: 51, iterations: 16)
PASS   : tst_emission::test_basic(100ms)
RESULT : tst_emission::test_basic():"100ms":
     4.5 msecs per iteration (total: 73, iterations: 16)
PASS   : tst_emission::test_basic(500ms)
RESULT : tst_emission::test_basic():"500ms":
     21 msecs per iteration (total: 87, iterations: 4)
Heap resize call from 34930688 to 41943040 took 0.09999999403953552 msecs. Success: true
PASS   : tst_emission::test_basic(1000ms)
RESULT : tst_emission::test_basic():"1000ms":
     22 msecs per iteration (total: 89, iterations: 4)
PASS   : tst_emission::test_basic(10000ms)
RESULT : tst_emission::test_basic():"10000ms":
     23 msecs per iteration (total: 92, iterations: 4)
PASS   : tst_emission::cleanupTestCase()
Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 5398ms
********* Finished testing of tst_emission *********

======================================================================
======================================================================

chrome SIMD


********* Start testing of tst_affectors *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_affectors::initTestCase()
Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
Heap resize call from 29097984 to 34930688 took 0.3999999761581421 msecs. Success: true
Heap resize call from 34930688 to 41943040 took 0.19999998807907104 msecs. Success: true
PASS   : tst_affectors::test_basic(16ms)
RESULT : tst_affectors::test_basic():"16ms":
     0.059 msecs per iteration (total: 61, iterations: 1024)
Heap resize call from 41943040 to 50331648 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_basic(32ms)
RESULT : tst_affectors::test_basic():"32ms":
     0.11 msecs per iteration (total: 59, iterations: 512)
Heap resize call from 50331648 to 60424192 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_basic(100ms)
RESULT : tst_affectors::test_basic():"100ms":
     0.15 msecs per iteration (total: 81, iterations: 512)
Heap resize call from 60424192 to 72548352 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_basic(500ms)
RESULT : tst_affectors::test_basic():"500ms":
     0.58 msecs per iteration (total: 75, iterations: 128)
Heap resize call from 72548352 to 87097344 took 0.3999999761581421 msecs. Success: true
PASS   : tst_affectors::test_filtered(16ms)
RESULT : tst_affectors::test_filtered():"16ms":
     0.10 msecs per iteration (total: 52, iterations: 512)
Heap resize call from 87097344 to 104529920 took 0.30000001192092896 msecs. Success: true
PASS   : tst_affectors::test_filtered(32ms)
RESULT : tst_affectors::test_filtered():"32ms":
     0.12 msecs per iteration (total: 64, iterations: 512)
PASS   : tst_affectors::test_filtered(100ms)
RESULT : tst_affectors::test_filtered():"100ms":
     0.19 msecs per iteration (total: 51, iterations: 256)
Heap resize call from 104529920 to 125435904 took 0.20000001788139343 msecs. Success: true
PASS   : tst_affectors::test_filtered(500ms)
RESULT : tst_affectors::test_filtered():"500ms":
     0.61 msecs per iteration (total: 79, iterations: 128)
PASS   : tst_affectors::cleanupTestCase()
Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 9728ms
********* Finished testing of tst_affectors *********

********* Start testing of tst_emission *********
Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
PASS   : tst_emission::initTestCase()
Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
Heap resize call from 29097984 to 34930688 took 0.29999998211860657 msecs. Success: true
PASS   : tst_emission::test_basic(16ms)
RESULT : tst_emission::test_basic():"16ms":
     0.046 msecs per iteration (total: 95, iterations: 2048)
Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true
PASS   : tst_emission::test_basic(32ms)
RESULT : tst_emission::test_basic():"32ms":
     0.090 msecs per iteration (total: 93, iterations: 1024)
Heap resize call from 41943040 to 50331648 took 0.29999998211860657 msecs. Success: true
PASS   : tst_emission::test_basic(100ms)
RESULT : tst_emission::test_basic():"100ms":
     0.27 msecs per iteration (total: 70, iterations: 256)
Heap resize call from 50331648 to 60424192 took 0.4000000059604645 msecs. Success: true
PASS   : tst_emission::test_basic(500ms)
RESULT : tst_emission::test_basic():"500ms":
     1.3 msecs per iteration (total: 85, iterations: 64)
Heap resize call from 60424192 to 72548352 took 0.4000000059604645 msecs. Success: true
PASS   : tst_emission::test_basic(1000ms)
RESULT : tst_emission::test_basic():"1000ms":
     1.3 msecs per iteration (total: 87, iterations: 64)
PASS   : tst_emission::test_basic(10000ms)
RESULT : tst_emission::test_basic():"10000ms":
     1.3 msecs per iteration (total: 86, iterations: 64)
PASS   : tst_emission::cleanupTestCase()
Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 6017ms
********* Finished testing of tst_emission *********


Firefox has similar results:


firefox nosimd:

    ********* Start testing of tst_affectors *********
    Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/opt/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5f3c99085d4c2ebf57fd0586b013b02e32a8e20b)), unknown unknown
    PASS   : tst_affectors::initTestCase()
    Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    PASS   : tst_affectors::test_basic(16ms)
    RESULT : tst_affectors::test_basic():"16ms":
         0.81 msecs per iteration (total: 52, iterations: 64)
    PASS   : tst_affectors::test_basic(32ms)
    RESULT : tst_affectors::test_basic():"32ms":
         1.1 msecs per iteration (total: 76, iterations: 64)
    PASS   : tst_affectors::test_basic(100ms)
    RESULT : tst_affectors::test_basic():"100ms":
         2.4 msecs per iteration (total: 79, iterations: 32)
    Heap resize call from 29097984 to 34930688 took 17 msecs. Success: true qtloader.js line 443 > eval:11829:17
    PASS   : tst_affectors::test_basic(500ms)
    RESULT : tst_affectors::test_basic():"500ms":
         9.1 msecs per iteration (total: 73, iterations: 8)
    PASS   : tst_affectors::test_filtered(16ms)
    RESULT : tst_affectors::test_filtered():"16ms":
         2.3 msecs per iteration (total: 74, iterations: 32)
    PASS   : tst_affectors::test_filtered(32ms)
    RESULT : tst_affectors::test_filtered():"32ms":
         2.6 msecs per iteration (total: 86, iterations: 32)
    PASS   : tst_affectors::test_filtered(100ms)
    RESULT : tst_affectors::test_filtered():"100ms":
         3.8 msecs per iteration (total: 62, iterations: 16)
    Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
    PASS   : tst_affectors::test_filtered(500ms)
    RESULT : tst_affectors::test_filtered():"500ms":
         11 msecs per iteration (total: 88, iterations: 8)
    PASS   : tst_affectors::cleanupTestCase()
    Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 16781ms
    ********* Finished testing of tst_affectors *********    
        
     ********* Start testing of tst_emission *********
     Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/opt/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5f3c99085d4c2ebf57fd0586b013b02e32a8e20b)), unknown unknown
     PASS   : tst_emission::initTestCase()
     Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
     Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
     Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true qtloader.js line 443 > eval:11829:17
     PASS   : tst_emission::test_basic(16ms)
     RESULT : tst_emission::test_basic():"16ms":
          2.1 msecs per iteration (total: 70, iterations: 32)
     PASS   : tst_emission::test_basic(32ms)
     RESULT : tst_emission::test_basic():"32ms":
          4.1 msecs per iteration (total: 67, iterations: 16)
     PASS   : tst_emission::test_basic(100ms)
     RESULT : tst_emission::test_basic():"100ms":
          8.1 msecs per iteration (total: 65, iterations: 8)
     PASS   : tst_emission::test_basic(500ms)
     RESULT : tst_emission::test_basic():"500ms":
          43 msecs per iteration (total: 87, iterations: 2)
     PASS   : tst_emission::test_basic(1000ms)
     RESULT : tst_emission::test_basic():"1000ms":
          43 msecs per iteration (total: 86, iterations: 2)
     PASS   : tst_emission::test_basic(10000ms)
     RESULT : tst_emission::test_basic():"10000ms":
          43 msecs per iteration (total: 86, iterations: 2)
     PASS   : tst_emission::cleanupTestCase()
     Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 4178ms
     ********* Finished testing of tst_emission *********
                
======================================================================
======================================================================
                
                                         
firefox SIMD:


    ********* Start testing of tst_affectors *********
    Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
    PASS   : tst_affectors::initTestCase()
    Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
    Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
    Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
    Heap resize call from 29097984 to 34930688 took 0 msecs. Success: true
    PASS   : tst_affectors::test_basic(16ms)
    RESULT : tst_affectors::test_basic():"16ms":
         0.14 msecs per iteration (total: 73, iterations: 512)
    Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true
    Heap resize call from 41943040 to 50331648 took 1 msecs. Success: true
    PASS   : tst_affectors::test_basic(32ms)
    RESULT : tst_affectors::test_basic():"32ms":
         0.21 msecs per iteration (total: 54, iterations: 256)
    Heap resize call from 50331648 to 60424192 took 5 msecs. Success: true
    PASS   : tst_affectors::test_basic(100ms)
    RESULT : tst_affectors::test_basic():"100ms":
         0.45 msecs per iteration (total: 58, iterations: 128)
    PASS   : tst_affectors::test_basic(500ms)
    RESULT : tst_affectors::test_basic():"500ms":
         1.8 msecs per iteration (total: 60, iterations: 32)
    Heap resize call from 60424192 to 72548352 took 7 msecs. Success: true
    PASS   : tst_affectors::test_filtered(16ms)
    RESULT : tst_affectors::test_filtered():"16ms":
         0.28 msecs per iteration (total: 73, iterations: 256)
    Heap resize call from 72548352 to 87097344 took 0 msecs. Success: true
    PASS   : tst_affectors::test_filtered(32ms)
    RESULT : tst_affectors::test_filtered():"32ms":
         0.35 msecs per iteration (total: 90, iterations: 256)
    PASS   : tst_affectors::test_filtered(100ms)
    RESULT : tst_affectors::test_filtered():"100ms":
         0.60 msecs per iteration (total: 77, iterations: 128)
    Heap resize call from 87097344 to 104529920 took 0 msecs. Success: true
    PASS   : tst_affectors::test_filtered(500ms)
    RESULT : tst_affectors::test_filtered():"500ms":
         2.0 msecs per iteration (total: 66, iterations: 32)
    PASS   : tst_affectors::cleanupTestCase()
    Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted, 6411ms
    ********* Finished testing of tst_affectors *********
                                                
    ********* Start testing of tst_emission *********
    Config: Using QtTest library 6.2.0, Qt 6.2.0 (wasm-little_endian-ilp32 static debug build; by Clang 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 5852582532b3eb3ea8da51a1e272d8d017bd36c9)), unknown unknown
    PASS   : tst_emission::initTestCase()
    Heap resize call from 16777216 to 20185088 took 0 msecs. Success: true
    Heap resize call from 20185088 to 24248320 took 0 msecs. Success: true
    Heap resize call from 24248320 to 29097984 took 0 msecs. Success: true
    Heap resize call from 29097984 to 34930688 took 0 msecs. Success: true
    PASS   : tst_emission::test_basic(16ms)
    RESULT : tst_emission::test_basic():"16ms":
         0.12 msecs per iteration (total: 65, iterations: 512)
    PASS   : tst_emission::test_basic(32ms)
    RESULT : tst_emission::test_basic():"32ms":
         0.24 msecs per iteration (total: 63, iterations: 256)
    Heap resize call from 34930688 to 41943040 took 0 msecs. Success: true
    PASS   : tst_emission::test_basic(100ms)
    RESULT : tst_emission::test_basic():"100ms":
         0.75 msecs per iteration (total: 97, iterations: 128)
    Heap resize call from 41943040 to 50331648 took 1 msecs. Success: true
    PASS   : tst_emission::test_basic(500ms)
    RESULT : tst_emission::test_basic():"500ms":
         3.6 msecs per iteration (total: 58, iterations: 16)
    Heap resize call from 50331648 to 60424192 took 4 msecs. Success: true
    PASS   : tst_emission::test_basic(1000ms)
    RESULT : tst_emission::test_basic():"1000ms":
         3.6 msecs per iteration (total: 58, iterations: 16)
    PASS   : tst_emission::test_basic(10000ms)
    RESULT : tst_emission::test_basic():"10000ms":
         3.6 msecs per iteration (total: 58, iterations: 16)
    PASS   : tst_emission::cleanupTestCase()
    Totals: 8 passed, 0 failed, 0 skipped, 0 blacklisted, 3339ms
    ********* Finished testing of tst_emission *********



Friday, April 2, 2021

Qt 6 WebAssembly QtQuick3d or, NOT April fools

 I am so happy right now! As of sha 4972fdb350fe79e18b0413e74028cd9b9803f96b (1 April), you can build Qt 6 for WebAssembly!

Not only that, because QtQuick3d in Qt 6 now supports OpenGL ES2/3, it will run in a web browser!

Here is a video of the helloquick3d example, running in Firefox: [edit] running at 62 fps

 
 
if that does not work, try this link:
 
Now I can get on with my life and stop working on build system stuff!

Friday, January 22, 2021

Qt 6 WebAssembly

ahhh well now. We all know that in Qt6, qmake was ditched for cmake for the build system of Qt itself, and we are playing catch-up in the WebAssembly platform.

History

First a little history of the Qt build system.

tmake was a perl script that generates Makefiles. If I recall correctly, it was originally written by Sam Magnuson. (I am sure someone who started in Trolltech before me will correct me if I am wrong). I met Sam when he was working at Trolltech's Brisbane office when I was hired as Qtopia Community Liasion. (Briefly, as he soon moved back to the US, after he sold me some furnature and gave me his cat! Still have the Ikea chair.)

qmake was tmake re-written in c++, which was started around 2000. (OMG - its 21 years old! - so old, it farts dust!)

qmake was added on, hacked up to tackle all kinds of things, which brings us to Qt 6.

cmake: The behemoth.

Knowing next to nothing about cmake made this journey a bit bumpy. Add in all the stuff that Qt implements in cmake and on top of that, the special arguments we use for WebAssembly means I am glad this patch is about finished!

Some things are not yet working, such as using cmake to build Qt WebAssembly applications, luckily qmake for app builds is still there... for now.

First off, until this change gets integrated into the git repo, you can grab it here:

https://codereview.qt-project.org/c/qt/qtbase/+/313243

You will need:

  • Emscripten version 2.0.12 (others in the 2.0.x range should also work)
  • Ninja build tool (optional but highly recommended)
  • cmake (3.19 at least, I think)

Host build

One of the differences between Qt5 and Qt6 is that you will need a host build with the same versioning, since Qt WebAssembly is a cross platform build. In the near future, you will be able to install Qt binary release and use that, but for now, you need to build your host Qt yourself (to get the same version as the git repo).

You need a host build of both QtBase and QtDeclarative if you are to use declarative in your Qt WebAssembly apps.

WebAssembly build

On all platforms, you will need to set CMAKE_TOOLCHAIN_FILE and QT_HOST_PATH. Emscripten comes with a convient cmake toolchain file, so set the CMAKE_TOOLCHAIN_FILE to where ever it is installed. QT_HOST_PATH is set to where the Qt 6 host directory is.

Windows:

Just like with Qt5, you will need the Mingw toolchain installed.

On windows, we can use the mingw toolchain that gets installed with Qt binaries, so make sure mingw32-make is in your PATH. I usually do this after I run emsdk_env.bat to set up the Emscripten toolchain. To configure Qt, I use something like:

cmake -DCMAKE_GENERATOR=Ninja  -DCMAKE_TOOLCHAIN_FILE=H:\development\emsdk\upstream\emscripten\cmake\Modules\Platform\Emscripten.cmake -DFEATURE_developer_build=ON -DFEATURE_headersclean=OFF -DWARNINGS_ARE_ERRORS=OFF -DQT_BUILD_EXAMPLES=OFF -DQT_BUILD_TESTS=OFF -DQT_HOST_PATH=H:\development\platforms\desktop\qtbase H:\development\depot\qt\qt5\qtbase


Linux/Mac:

You will need gcc on Linux and Xcode 10.15 or greater on Mac.

cmake -DFEATURE_developer_build=ON
-DFEATURE_headersclean=OFF
-DWARNINGS_ARE_ERRORS=OFF
-DQT_BUILD_EXAMPLES=OFF
-DQT_BUILD_TESTS=OFF
-DCMAKE_GENERATOR=Ninja
-DQT_HOST_PATH=/development/platforms/desktop/qtbase
-DCMAKE_TOOLCHAIN_FILE=/emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake
/depot/qt/qt5/qtbase

There are a couple experimental features you can configure Qt for, such as
  • threads:  -DFEATURE_threads=On
I have probably forgotten something important, but this will get someone started that is interested. I have tested this build on Linux, Mac and Windows.

QtDeclarative:

To configure declarative using cmake, it's fairly straight forward. In the qtbase/bin directory is a helper tool:
qt-configure-module ~/depot/qt/qt5/qtdeclarative


Application builds:

For now, you can use qmake in the old fashioned way to configure applications.

See also the Qt WebAssembly wiki: https://wiki.qt.io/Qt_for_WebAssembly

Wednesday, July 22, 2020

'wasm memory too small' Qt for WebAssembly

Sometimes when I am building a larger project with Qt for WebAssembly, I get this type of message:

wasm-ld: error: initial memory too small, 17553264 bytes needed

and the build fails.

This means that you need to tell Emscripten compiler to allocate more than the standard 1GB initial memory.
Qt allows you to specify to add more initial memory by using QMAKE_TOTAL_MEMORY in your pro file.

So it makes sense to add something like this:

QMAKE_TOTAL_MEMORY=17553264

BUT the result is:

shared:ERROR: For wasm, TOTAL_MEMORY must be a multiple of 64KB, was 17553264

grrrr...  ok, what if we find a multiple of 64?

17553264 / 64 = 274,269.95

We need a whole number, so lets round up to the next whole number.

274270 * 64 = 17553280

But no. That doesn't work either:

shared:ERROR: For wasm, TOTAL_MEMORY must be a multiple of 64KB, was 17553280

WTH?!?

The answer is that a KB is 1024 bytes, so 64 * 1024 = 65536 bytes

So let's find the closest whole number multiple of 64 KB (65536)

17553264 / 65536 = 267.841...

so let's try 268

268 * 65536 = 17563648

QMAKE_TOTAL_MEMORY=17563648

and this now builds, and hopefully runs!


You can also read more about Mobile and Embedded development and Qt for WebAssembly in the book Hands-On Mobile and Embedded Development with Qt 5

Friday, May 29, 2020

Debugging Qt for WebAssembly


Debugging is often difficult in the best of times. Debugging Qt webassembly apps in the web browser is not something I want to do on a nice summer day, and certainly makes a person more appreciative of mature, gui based debugging. But sometimes you have to do what you have to do.

If you have a crash on your Qt for WebAssembly app, you would see a not very human readable backtrace in the console output. You can start by recompiling your app with CONFIG+=debug, which allows nice symbol names in the backtrace that appears in the dev console of the browser when an exception or crash happens. Many times this can lead you to find your bug fairly quickly.

NOTE: The Qt libraries do not need to be compiled in debug mode for you to just see symbols while debugging. Although, if you want to step through C++ source code using sourcemaps, you will need to compile Qt and your application in debug mode.

The easy way is to add qDebug output, run it and see what happens. I do this quite often. But just as often, it is not enough to find a particular bug.

One issue with that is linking emscripten based wasm files takes a long time (because that is where all the magic happen), so this method is rather time consuming. Thankfully, with Qt 5.15, we use Emscripten 1.39.8 which is based on upstream clang, transpiles directly to wasm binaries and skips the internediate step of outputting javascript, so link time is greatly reduced.

What if you need to step through the source code like a desktop debugger? QtCreator does not at this time, have support for debugging Qt WebAssembly apps. I suppose it could be implemented using the browsers remote debugging interface API 

Emscripten can be used to generate source maps for the browser debugger to utilize for debugging. In Qt, we use qmake to add the necessary linker lines to allow emscripten to generate these source maps. Qt should do this by default when you use CONFIG+=debug

There is a bug that was recently fixed regarding this. If you do not find any source maps files (.map), the workaround is to add QMAKE_LFLAGS_DEBUG += -g4 in your pro file.

By default, the source base in Qt is set to use http://localhost:8000/

You can change this by adding into your .pro file, something like:
QMAKE_WASM_SOURCE_MAP_BASE = http://myip:6931 /

Emrun by default uses port 6931.

If you are running a server of some sort, you can use the ip for that. 

Sources need to be in the directory tree seen by the server, so an in-source build is the easiest. In my experience, Chrome browser seems to be able to debug with sourcemaps better. If you are using Mac or Linux, you can simply symlink the source file path into the web server base.

This is what happens when you do not have this set up correctly:

Error while fetching an original source: request failed with status 404
Source URL: http://192.168.0.21:6931/src/wasm-windows/main.cpp

Or pehaps you see a message in the sources tab of the browser console such as:
Could not load content for http://localhost:8000/depot/qt/qt5/qtbase/examples/widgets/richtext/textedit/textedit.cpp (HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE)

It means it would not find the specific source file. You can symlink the sources directory.
I am using chrome, as I had better luck with it utilizing the sourcemaps. The actual procedure for firefox might be a bit different. It isn't working for me at this time.
  • emrun --browser chrome --port 8000 --hostname localhost --serve_after_close textedit.html 
    • (or wasmsourceserver.py as below)
  • open the web console
  • [chrome] Go into the debugger by clicking on 'sources'
  • [firefox] Uses the 'Debugger' tab
You should now see some entries to source files, including our main.cpp


You can now step in, step out, step over and even set breakpoints, just like a desktop gui debugger like QtCreator.



Note that these file paths are relative, and will create an error when you try debugging it, when it tries to step into a source file not local. This defeats the purpose of steping into code for the most part.

Morten has a work around on the QTBUG https://bugreports.qt.io/browse/QTBUG-72002 with a python web server named:
wasmsourceserver.py

By running this custom server, it will build a file map of all the files in the current directly reursively.


Another way to add a type of breakpoint, if you want to stop execution on a certain line and popup the browser debugger programmatically, you can add
 emscripten_debugger(); 
on the line you want it to stop and pop up. This is in the header emscripten.h, and recompile.




You can also read more about Mobile and Embedded development and Qt for WebAssembly in the book Hands-On Mobile and Embedded Development with Qt 5


Thursday, March 26, 2020

Qt on RaspberryPi

Qt on RaspberryPi is really easy, but can be rather time consuming to build an image to run on raspberry pi with Qt 5. You can either do it up yourself manually grabbing the sources, tool chains, etc. You could also buy the Qt for device creation package which comes with great support, bells, whistles and commercial licensing, or build it yourself and use open source GPL licensing. To get started using the open sourced parts, you clone one git repo:
  • meta-boot2qt 
git clone git://code.qt.io/yocto/meta-boot2qt.git

 You then need to decide which device to target. I have a fancy new raspberry pi 4 hooked up to a touch screen, so I will choose that.

To get a list of the Boot2Qt targets run the command:

  meta-boot2qt/b2qt-init-build-env list-devices

To initialize the build environment for a raspberry pi:

 meta-boot2qt/b2qt-init-build-env init --device raspberrypi4

This will clone needed repo's.

You will then need to set up the environment specifically for your device by setting MACHINE
and then source the setup-environment.sh script:

 export MACHINE=raspberrypi4 
 source ./setup-environment.sh

Now, if you just want an image to run:

 bitbake b2qt-embedded-qt5-image

Sit back, enjoy a cuppa and a pizza on this fine day... Don't let the computer sleep, or you will kick yourself in a few hours when you check on the progress. My final image to be dd'd onto the sd card was found in

   tmp/deploy/images/raspberrypi4

This is what you will be presented with when you boot your device:

To build the sdk/sysroot the command would be:

 bitbake meta-toolchain-b2qt-embedded-qt5-sdk 

You can build other targets including qemu, which will give you an image that runs in the qemu emulator.

You can then also set up target kits in Qt Creator to target the raspberry pi, set up the device kit so you can run and debug on the device. More fun for all that time on your hands right now!

I write about creating device OS images using Qt's Boot To Qt and Bitbake for the Raspeberry Pi in my book, Hands-On Mobile and Embedded Development with Qt 5