Shootout 0001
ToC
Introduction
Motivation
This shootout is a collection of programs that can be run and the performance observed in some way. The purpose of the shootout is to provide a mechanism for comparing machinate’s performance to itself over time, and also to compare against other similar libraries.
The programs are grouped into different implementation of the “same” problem.
Ring Problem
The ring problem involves setting up some number of threads/logical threads/virtual threads/etc to copy from an input channel/stream/queue to an output, arranged to form a ring. The performance measured is the number of laps around the ring that are made in a given fixed time period.
Fan Problem
The fan problem is a sort of “fan out” followed by a “fan in”. A thread writes N messages to a channel, and N threads read a message each from that output and then write that message to a single channel, and the original thread reads N messages from that channel and then loops. The performance measurement is again how many iterations of that process happen in a fixed time period.
Observations
Observation Conditions
This iteration of the shootout was run on two different platforms. Platform 1 is a dedicated computer with a very modest 4 cores. Platform 2 is a cloud vm with 48 “vcpus”.
The shootout code attempts to determine how many virtual threads
the jvm allows to execute concurrently (how many can be mounted at once) and sets core.async’s dispatch thread pool (used by go) to be that size. This is an attempt to provide something of an apples to apples comparison, but in some cases, like using core.async’s blocking api with virtual threads, may cause some contention for resources between virtual threads and core.async’s thread pool.
Platform 1 (4 cores)
tag | name | min(measure) | avg(measure) | max(measure) | count(*) | version | range |
---|---|---|---|---|---|---|---|
fan | machinate-fan | 1.16125 | 1.23849077453366 | 1.29788333333333 | 822 | 0.0.79-35-g7f85ba3-dirty | 0.136633333333333 |
fan | core-async-go-loop-fan | 0.175966666666667 | 0.208628200692042 | 0.2257 | 867 | 0.0.79-35-g7f85ba3-dirty | 0.0497333333333334 |
fan | machinate0-0-79-fan | 0.00483333333333333 | 0.0824708084428514 | 0.169633333333333 | 837 | 0.0.79-35-g7f85ba3-dirty | 0.1648 |
ring | transfer-queue-ring | 0.729683333333333 | 1.05268213317847 | 1.49736666666667 | 861 | 0.0.79-35-g7f85ba3-dirty | 0.767683333333333 |
ring | core-async-go-loop-ring | 0.812433333333333 | 0.910689824360443 | 1.13861666666667 | 873 | 0.0.79-35-g7f85ba3-dirty | 0.326183333333333 |
ring | machinate-ring | 0.5916 | 0.883672948717949 | 0.9794 | 780 | 0.0.79-35-g7f85ba3-dirty | 0.3878 |
ring | thread-parking-locking-strawman-ring | 0.5606 | 0.843746034482759 | 0.944816666666667 | 870 | 0.0.79-35-g7f85ba3-dirty | 0.384216666666667 |
ring | core-async-virtual-thread-ring | 0.395116666666667 | 0.817322968490879 | 1.07066666666667 | 804 | 0.0.79-35-g7f85ba3-dirty | 0.67555 |
ring | thread-parking-single-queue-ring | 0.591833333333333 | 0.815828427895981 | 1.05895 | 846 | 0.0.79-35-g7f85ba3-dirty | 0.467116666666667 |
ring | thread-parking-strawman-ring | 0.599983333333333 | 0.761403076582559 | 0.874516666666667 | 753 | 0.0.79-35-g7f85ba3-dirty | 0.274533333333333 |
ring | machinate-go-loop-ring | 0.213 | 0.237484157832744 | 0.270433333333333 | 849 | 0.0.79-35-g7f85ba3-dirty | 0.0574333333333334 |
ring | machinate0-0-79-ring | 0.0399833333333333 | 0.0832531461569096 | 0.114733333333333 | 837 | 0.0.79-35-g7f85ba3-dirty | 0.07475 |
Platform 2 (48 vcpus)
tag | name | min(measure) | avg(measure) | max(measure) | count(*) | version | range |
---|---|---|---|---|---|---|---|
fan | machinate-fan | 0.45685 | 0.469560222222222 | 0.482666666666667 | 75 | 0.0.79-35-g7f85ba3-dirty | 0.0258166666666667 |
fan | machinate0-0-79-fan | 0.136983333333333 | 0.152629111111111 | 0.168433333333333 | 75 | 0.0.79-35-g7f85ba3-dirty | 0.03145 |
fan | core-async-go-loop-fan | 0.0835 | 0.14226994949495 | 0.165816666666667 | 66 | 0.0.79-35-g7f85ba3-dirty | 0.0823166666666667 |
ring | core-async-go-loop-ring | 1.16768333333333 | 1.26121047008547 | 1.34861666666667 | 78 | 0.0.79-35-g7f85ba3-dirty | 0.180933333333333 |
ring | transfer-queue-ring | 0.4999 | 0.956492831541219 | 1.03095 | 93 | 0.0.79-35-g7f85ba3-dirty | 0.53105 |
ring | thread-parking-single-queue-ring | 0.653516666666667 | 0.702432592592593 | 0.7236 | 90 | 0.0.79-35-g7f85ba3-dirty | 0.0700833333333334 |
ring | core-async-virtual-thread-ring | 0.581516666666667 | 0.610411428571428 | 0.62765 | 105 | 0.0.79-35-g7f85ba3-dirty | 0.0461333333333334 |
ring | thread-parking-strawman-ring | 0.423933333333333 | 0.599186257309941 | 0.6382 | 114 | 0.0.79-35-g7f85ba3-dirty | 0.214266666666667 |
ring | thread-parking-locking-strawman-ring | 0.463966666666667 | 0.586507026143791 | 0.60825 | 102 | 0.0.79-35-g7f85ba3-dirty | 0.144283333333333 |
ring | machinate-ring | 0.401783333333333 | 0.483170899470899 | 0.499083333333333 | 63 | 0.0.79-35-g7f85ba3-dirty | 0.0973 |
ring | machinate0-0-79-ring | 0.118783333333333 | 0.132741555555556 | 0.145783333333333 | 75 | 0.0.79-35-g7f85ba3-dirty | 0.027 |
ring | machinate-go-loop-ring | 0.0755333333333333 | 0.0997511494252874 | 0.139383333333333 | 87 | 0.0.79-35-g7f85ba3-dirty | 0.06385 |
Names
Columns Explained
tag
indicates which problem a program belongs toname
is the name of the programcount
is the number of scores (measure) recorded for the given programversion
is a the output ofgit describe --tags --dirty
in the git tree where the shootout was run.
Some Program Names Explained
machinate-fan
is the current version of machinate running the fan program using virtual threadsmachinate0-0-79-fan
is machinate version 0.0.79 running the fan programtransfer-queue-ring
is the ring problem implemented using java’s TransferQueuethread-parking-single-queue-ring
,thread-parking-strawman-ring
, andthread-parking-locking-strawman-ring
are channel implementations that exist just as part of the shootout suite as explorations of the channel implementation space.- except where indicated in the program name virtual threads are used as the light weight thread mechanism.
Reactions
- Machinate performance has improved a fair bit between 0.0.79 and the upcoming release
- Machinate performance still trails core.async’s channels.
- On the 4 core machine performance appears to be fairly close
- On the 48 vcpu vm core.async is smoking machinate in the ring benchmark
- Why?
- machinate may still be allocating more
- core.async elides some locks
- machinate may be overly aggressive looping in the channel event implementation causing contention.
- virtual threads are new :/
- These programs are large enough and complicated enough that it doesn’t take that many runs to get a fairly stable average score.
Future Work
machinate
Performance is not the only goal of machinate, and machinate provides more than just message passing over channels. But the next bit of work on the performance of message passing over channels will likely look at limiting any looping the channel event does while trying to ensure anything queued is processed.
shootout
More platforms
- A thing a single core/vcpu platform might be interesting
- It was very disappointing to see lots of improvement in machinates standing on the 4 core platform over the course of development, only to get a smaller improvement on the 48 vcpu platform. I should figure out a way to more regularly test on a beefier machine.
More Problems
- core.async and machinate both provide a pubsub implementation. Add a problem based on that.
- The fan problem is implemented kind of weird, maybe it should be using mults or the equivalent
More Programs
- manifold is glaringly absent from the shootout.
- it would be interesting to see how a transfer queue based fan implementation does.