Sunday, June 13, 2021

(redux) Qt WebAssembly performance enhancement

 In my last post Qt WebAssembly performance enhancement

there were some impressive performance stat speedups. Unfortunately, as my collegue Morten pointed out, both builds were in debug mode.  *sigh*

So I rebuilt them in release mode, and added a few selected benchmarks from the Qt tests/benchmark source directory:

  • tst_affectors
  • tst_emission
  • tst_QGraphicsScene
  • tst_QGraphicsView
  • tst_QGraphicsWidget
  • tst_qanimation
  • tst_QMatrix4x4
  • BlendBench
  • tst_QImageConversion
  • tst_DrawTexture
  • tst_QPainter

Although not as impressive overall, there is still quite a speed up in the image conversions and QPainter areas, for example:

non-simd:

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), circle)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), circle":

     2.3 msecs per iteration (total: 76, iterations: 32)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), line":

     2.4 msecs per iteration (total: 77, iterations: 32)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), solidrect":

     2.4 msecs per iteration (total: 78, iterations: 32)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), alpharect":

     2.4 msecs per iteration (total: 78, iterations: 32)


simd:

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), circle":

     0.95 msecs per iteration (total: 61, iterations: 64)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), line":

     0.95 msecs per iteration (total: 61, iterations: 64)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), solidrect":

     0.92 msecs per iteration (total: 59, iterations: 64)

PASS   : tst_QPainter::drawPixmap(BGR30 on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"BGR30 on RGB32, (1000x1000), alpharect":

     0.95 msecs per iteration (total: 61, iterations: 64


non-simd:

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), circle)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), circle":

     1.7 msecs per iteration (total: 56, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), line":

     1.7 msecs per iteration (total: 55, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), solidrect":

     1.7 msecs per iteration (total: 55, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), alpharect":

     3.6 msecs per iteration (total: 58, iterations: 16)


simd:

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), circle)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), circle":

     2.6 msecs per iteration (total: 85, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), line)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), line":

     4.0 msecs per iteration (total: 64, iterations: 16)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), solidrect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), solidrect":

     2.2 msecs per iteration (total: 71, iterations: 32)

PASS   : tst_QPainter::drawPixmap(ARGB32_pm on RGB32, (1000x1000), alpharect)

RESULT : tst_QPainter::drawPixmap():"ARGB32_pm on RGB32, (1000x1000), alpharect":

     4.5 msecs per iteration (total: 73, iterations: 16)


and image conversions:

non-simd:

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> argb32pm -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> argb32pm -> argb32":

     6.1 msecs per iteration (total: 98, iterations: 16)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgb32 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgb32 -> argb32":

     2.9 msecs per iteration (total: 94, iterations: 32)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgba8888 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgba8888 -> argb32":

     4.6 msecs per iteration (total: 75, iterations: 16)

simd:

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> argb32pm -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> argb32pm -> argb32":

     4.2 msecs per iteration (total: 68, iterations: 16)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgb32 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgb32 -> argb32":

     0.49 msecs per iteration (total: 63, iterations: 128)

PASS   : tst_QImageConversion::convertGenericInplace(argb32 -> rgba8888 -> argb32)

RESULT : tst_QImageConversion::convertGenericInplace():"argb32 -> rgba8888 -> argb32":

     0.90 msecs per iteration (total: 58, iterations: 64)



But others were slower for the simd build. Probably due to emscripten not fully supporting simd instructions and emulating those where it doesn't support.


For full benchmark results get the zip file




No comments: