Shootout 0001

Introduction

Motivation

This shootout is a collection of programs that can be run and the performance observed in some way. The purpose of the shootout is to provide a mechanism for comparing machinate’s performance to itself over time, and also to compare against other similar libraries.

The programs are grouped into different implementation of the “same” problem.

Ring Problem

The ring problem involves setting up some number of threads/logical threads/virtual threads/etc to copy from an input channel/stream/queue to an output, arranged to form a ring. The performance measured is the number of laps around the ring that are made in a given fixed time period.

Fan Problem

The fan problem is a sort of “fan out” followed by a “fan in”. A thread writes N messages to a channel, and N threads read a message each from that output and then write that message to a single channel, and the original thread reads N messages from that channel and then loops. The performance measurement is again how many iterations of that process happen in a fixed time period.

Observations

Observation Conditions

This iteration of the shootout was run on two different platforms. Platform 1 is a dedicated computer with a very modest 4 cores. Platform 2 is a cloud vm with 48 “vcpus”.

The shootout code attempts to determine how many virtual threads the jvm allows to execute concurrently (how many can be mounted at once) and sets core.async’s dispatch thread pool (used by go) to be that size. This is an attempt to provide something of an apples to apples comparison, but in some cases, like using core.async’s blocking api with virtual threads, may cause some contention for resources between virtual threads and core.async’s thread pool.

Platform 1 (4 cores)

tag	name	min(measure)	avg(measure)	max(measure)	count(*)	version	range
fan	machinate-fan	1.16125	1.23849077453366	1.29788333333333	822	0.0.79-35-g7f85ba3-dirty	0.136633333333333
fan	core-async-go-loop-fan	0.175966666666667	0.208628200692042	0.2257	867	0.0.79-35-g7f85ba3-dirty	0.0497333333333334
fan	machinate0-0-79-fan	0.00483333333333333	0.0824708084428514	0.169633333333333	837	0.0.79-35-g7f85ba3-dirty	0.1648
ring	transfer-queue-ring	0.729683333333333	1.05268213317847	1.49736666666667	861	0.0.79-35-g7f85ba3-dirty	0.767683333333333
ring	core-async-go-loop-ring	0.812433333333333	0.910689824360443	1.13861666666667	873	0.0.79-35-g7f85ba3-dirty	0.326183333333333
ring	machinate-ring	0.5916	0.883672948717949	0.9794	780	0.0.79-35-g7f85ba3-dirty	0.3878
ring	thread-parking-locking-strawman-ring	0.5606	0.843746034482759	0.944816666666667	870	0.0.79-35-g7f85ba3-dirty	0.384216666666667
ring	core-async-virtual-thread-ring	0.395116666666667	0.817322968490879	1.07066666666667	804	0.0.79-35-g7f85ba3-dirty	0.67555
ring	thread-parking-single-queue-ring	0.591833333333333	0.815828427895981	1.05895	846	0.0.79-35-g7f85ba3-dirty	0.467116666666667
ring	thread-parking-strawman-ring	0.599983333333333	0.761403076582559	0.874516666666667	753	0.0.79-35-g7f85ba3-dirty	0.274533333333333
ring	machinate-go-loop-ring	0.213	0.237484157832744	0.270433333333333	849	0.0.79-35-g7f85ba3-dirty	0.0574333333333334
ring	machinate0-0-79-ring	0.0399833333333333	0.0832531461569096	0.114733333333333	837	0.0.79-35-g7f85ba3-dirty	0.07475

Platform 2 (48 vcpus)

tag	name	min(measure)	avg(measure)	max(measure)	count(*)	version	range
fan	machinate-fan	0.45685	0.469560222222222	0.482666666666667	75	0.0.79-35-g7f85ba3-dirty	0.0258166666666667
fan	machinate0-0-79-fan	0.136983333333333	0.152629111111111	0.168433333333333	75	0.0.79-35-g7f85ba3-dirty	0.03145
fan	core-async-go-loop-fan	0.0835	0.14226994949495	0.165816666666667	66	0.0.79-35-g7f85ba3-dirty	0.0823166666666667
ring	core-async-go-loop-ring	1.16768333333333	1.26121047008547	1.34861666666667	78	0.0.79-35-g7f85ba3-dirty	0.180933333333333
ring	transfer-queue-ring	0.4999	0.956492831541219	1.03095	93	0.0.79-35-g7f85ba3-dirty	0.53105
ring	thread-parking-single-queue-ring	0.653516666666667	0.702432592592593	0.7236	90	0.0.79-35-g7f85ba3-dirty	0.0700833333333334
ring	core-async-virtual-thread-ring	0.581516666666667	0.610411428571428	0.62765	105	0.0.79-35-g7f85ba3-dirty	0.0461333333333334
ring	thread-parking-strawman-ring	0.423933333333333	0.599186257309941	0.6382	114	0.0.79-35-g7f85ba3-dirty	0.214266666666667
ring	thread-parking-locking-strawman-ring	0.463966666666667	0.586507026143791	0.60825	102	0.0.79-35-g7f85ba3-dirty	0.144283333333333
ring	machinate-ring	0.401783333333333	0.483170899470899	0.499083333333333	63	0.0.79-35-g7f85ba3-dirty	0.0973
ring	machinate0-0-79-ring	0.118783333333333	0.132741555555556	0.145783333333333	75	0.0.79-35-g7f85ba3-dirty	0.027
ring	machinate-go-loop-ring	0.0755333333333333	0.0997511494252874	0.139383333333333	87	0.0.79-35-g7f85ba3-dirty	0.06385

Names

Columns Explained

tag indicates which problem a program belongs to
name is the name of the program
count is the number of scores (measure) recorded for the given program
version is a the output of git describe --tags --dirty in the git tree where the shootout was run.

Some Program Names Explained

machinate-fan is the current version of machinate running the fan program using virtual threads
machinate0-0-79-fan is machinate version 0.0.79 running the fan program
transfer-queue-ring is the ring problem implemented using java’s TransferQueue
thread-parking-single-queue-ring, thread-parking-strawman-ring, and thread-parking-locking-strawman-ring are channel implementations that exist just as part of the shootout suite as explorations of the channel implementation space.
except where indicated in the program name virtual threads are used as the light weight thread mechanism.

Reactions

Machinate performance has improved a fair bit between 0.0.79 and the upcoming release
Machinate performance still trails core.async’s channels.
- On the 4 core machine performance appears to be fairly close
- On the 48 vcpu vm core.async is smoking machinate in the ring benchmark
- Why?
  1. machinate may still be allocating more
  2. core.async elides some locks
  3. machinate may be overly aggressive looping in the channel event implementation causing contention.
  4. virtual threads are new :/
These programs are large enough and complicated enough that it doesn’t take that many runs to get a fairly stable average score.

Future Work

machinate

Performance is not the only goal of machinate, and machinate provides more than just message passing over channels. But the next bit of work on the performance of message passing over channels will likely look at limiting any looping the channel event does while trying to ensure anything queued is processed.

shootout

More platforms

A thing a single core/vcpu platform might be interesting
It was very disappointing to see lots of improvement in machinates standing on the 4 core platform over the course of development, only to get a smaller improvement on the 48 vcpu platform. I should figure out a way to more regularly test on a beefier machine.

More Programs

manifold is glaringly absent from the shootout.
it would be interesting to see how a transfer queue based fan implementation does.

Generated by Codox

0.0.126

Project

Topics

Namespaces