Shootout 0001

ToC

  1. Introduction
    1. Motivation
    2. Ring Problem
    3. Fan Problem
  2. Observations
    1. Observation Conditions
    2. Platform 1
    3. Platform 2
    4. Names
      1. Columns Explained
      2. Some Program Names Explained
  3. Reactions
  4. Future Work
    1. Machinate
    2. Shootout
      1. More Platforms
      2. More Problems
      3. More Programs

Introduction

Motivation

This shootout is a collection of programs that can be run and the performance observed in some way. The purpose of the shootout is to provide a mechanism for comparing machinate’s performance to itself over time, and also to compare against other similar libraries.

The programs are grouped into different implementation of the “same” problem.

Ring Problem

The ring problem involves setting up some number of threads/logical threads/virtual threads/etc to copy from an input channel/stream/queue to an output, arranged to form a ring. The performance measured is the number of laps around the ring that are made in a given fixed time period.

Fan Problem

The fan problem is a sort of “fan out” followed by a “fan in”. A thread writes N messages to a channel, and N threads read a message each from that output and then write that message to a single channel, and the original thread reads N messages from that channel and then loops. The performance measurement is again how many iterations of that process happen in a fixed time period.

Observations

Observation Conditions

This iteration of the shootout was run on two different platforms. Platform 1 is a dedicated computer with a very modest 4 cores. Platform 2 is a cloud vm with 48 “vcpus”.

The shootout code attempts to determine how many virtual threads the jvm allows to execute concurrently (how many can be mounted at once) and sets core.async’s dispatch thread pool (used by go) to be that size. This is an attempt to provide something of an apples to apples comparison, but in some cases, like using core.async’s blocking api with virtual threads, may cause some contention for resources between virtual threads and core.async’s thread pool.

Platform 1 (4 cores)

tag name min(measure) avg(measure) max(measure) count(*) version range
fan machinate-fan 1.16125 1.23849077453366 1.29788333333333 822 0.0.79-35-g7f85ba3-dirty 0.136633333333333
fan core-async-go-loop-fan 0.175966666666667 0.208628200692042 0.2257 867 0.0.79-35-g7f85ba3-dirty 0.0497333333333334
fan machinate0-0-79-fan 0.00483333333333333 0.0824708084428514 0.169633333333333 837 0.0.79-35-g7f85ba3-dirty 0.1648
ring transfer-queue-ring 0.729683333333333 1.05268213317847 1.49736666666667 861 0.0.79-35-g7f85ba3-dirty 0.767683333333333
ring core-async-go-loop-ring 0.812433333333333 0.910689824360443 1.13861666666667 873 0.0.79-35-g7f85ba3-dirty 0.326183333333333
ring machinate-ring 0.5916 0.883672948717949 0.9794 780 0.0.79-35-g7f85ba3-dirty 0.3878
ring thread-parking-locking-strawman-ring 0.5606 0.843746034482759 0.944816666666667 870 0.0.79-35-g7f85ba3-dirty 0.384216666666667
ring core-async-virtual-thread-ring 0.395116666666667 0.817322968490879 1.07066666666667 804 0.0.79-35-g7f85ba3-dirty 0.67555
ring thread-parking-single-queue-ring 0.591833333333333 0.815828427895981 1.05895 846 0.0.79-35-g7f85ba3-dirty 0.467116666666667
ring thread-parking-strawman-ring 0.599983333333333 0.761403076582559 0.874516666666667 753 0.0.79-35-g7f85ba3-dirty 0.274533333333333
ring machinate-go-loop-ring 0.213 0.237484157832744 0.270433333333333 849 0.0.79-35-g7f85ba3-dirty 0.0574333333333334
ring machinate0-0-79-ring 0.0399833333333333 0.0832531461569096 0.114733333333333 837 0.0.79-35-g7f85ba3-dirty 0.07475

Platform 2 (48 vcpus)

tag name min(measure) avg(measure) max(measure) count(*) version range
fan machinate-fan 0.45685 0.469560222222222 0.482666666666667 75 0.0.79-35-g7f85ba3-dirty 0.0258166666666667
fan machinate0-0-79-fan 0.136983333333333 0.152629111111111 0.168433333333333 75 0.0.79-35-g7f85ba3-dirty 0.03145
fan core-async-go-loop-fan 0.0835 0.14226994949495 0.165816666666667 66 0.0.79-35-g7f85ba3-dirty 0.0823166666666667
ring core-async-go-loop-ring 1.16768333333333 1.26121047008547 1.34861666666667 78 0.0.79-35-g7f85ba3-dirty 0.180933333333333
ring transfer-queue-ring 0.4999 0.956492831541219 1.03095 93 0.0.79-35-g7f85ba3-dirty 0.53105
ring thread-parking-single-queue-ring 0.653516666666667 0.702432592592593 0.7236 90 0.0.79-35-g7f85ba3-dirty 0.0700833333333334
ring core-async-virtual-thread-ring 0.581516666666667 0.610411428571428 0.62765 105 0.0.79-35-g7f85ba3-dirty 0.0461333333333334
ring thread-parking-strawman-ring 0.423933333333333 0.599186257309941 0.6382 114 0.0.79-35-g7f85ba3-dirty 0.214266666666667
ring thread-parking-locking-strawman-ring 0.463966666666667 0.586507026143791 0.60825 102 0.0.79-35-g7f85ba3-dirty 0.144283333333333
ring machinate-ring 0.401783333333333 0.483170899470899 0.499083333333333 63 0.0.79-35-g7f85ba3-dirty 0.0973
ring machinate0-0-79-ring 0.118783333333333 0.132741555555556 0.145783333333333 75 0.0.79-35-g7f85ba3-dirty 0.027
ring machinate-go-loop-ring 0.0755333333333333 0.0997511494252874 0.139383333333333 87 0.0.79-35-g7f85ba3-dirty 0.06385

Names

Columns Explained

  • tag indicates which problem a program belongs to
  • name is the name of the program
  • count is the number of scores (measure) recorded for the given program
  • version is a the output of git describe --tags --dirty in the git tree where the shootout was run.

Some Program Names Explained

  • machinate-fan is the current version of machinate running the fan program using virtual threads
  • machinate0-0-79-fan is machinate version 0.0.79 running the fan program
  • transfer-queue-ring is the ring problem implemented using java’s TransferQueue
  • thread-parking-single-queue-ring, thread-parking-strawman-ring, and thread-parking-locking-strawman-ring are channel implementations that exist just as part of the shootout suite as explorations of the channel implementation space.
  • except where indicated in the program name virtual threads are used as the light weight thread mechanism.

Reactions

  1. Machinate performance has improved a fair bit between 0.0.79 and the upcoming release
  2. Machinate performance still trails core.async’s channels.
    • On the 4 core machine performance appears to be fairly close
    • On the 48 vcpu vm core.async is smoking machinate in the ring benchmark
    • Why?
      1. machinate may still be allocating more
      2. core.async elides some locks
      3. machinate may be overly aggressive looping in the channel event implementation causing contention.
      4. virtual threads are new :/
  3. These programs are large enough and complicated enough that it doesn’t take that many runs to get a fairly stable average score.

Future Work

machinate

Performance is not the only goal of machinate, and machinate provides more than just message passing over channels. But the next bit of work on the performance of message passing over channels will likely look at limiting any looping the channel event does while trying to ensure anything queued is processed.

shootout

More platforms

  1. A thing a single core/vcpu platform might be interesting
  2. It was very disappointing to see lots of improvement in machinates standing on the 4 core platform over the course of development, only to get a smaller improvement on the 48 vcpu platform. I should figure out a way to more regularly test on a beefier machine.

More Problems

  1. core.async and machinate both provide a pubsub implementation. Add a problem based on that.
  2. The fan problem is implemented kind of weird, maybe it should be using mults or the equivalent

More Programs

  1. manifold is glaringly absent from the shootout.
  2. it would be interesting to see how a transfer queue based fan implementation does.