Planet Crustaceans

This is a Planet instance for community feeds. To add/update an entry or otherwise improve things, fork this repo.

May 27, 2020

Frederic Cambus (fcambus)

OpenBSD/armv7 on the CubieBoard2 May 27, 2020 10:15 PM

I bought the CubieBoard2 back in 2016 with the idea to run OpenBSD on it, but because of various reliability issues with the onboard NIC, it ended up running NetBSD for a few weeks before ending up in a drawer.

Back in October, Mark Kettenis commited code to allow switching to the framebuffer "glass" console in the bootloader on OpenBSD/armv7, making it possible to install the system without using a serial cable.

>> OpenBSD/armv7 BOOTARM 1.14
boot> set tty fb0
switching console to fb0

This prompted me to plug the board again, and having support for the framebuffer console is a game changer. It also allows running Xenocara, if that's your thing.

Here is the output of running file on executables:

ELF 32-bit LSB shared object, ARM, version 1

And this is the result of the md5 -t benchmark:

MD5 time trial.  Processing 10000 10000-byte blocks...
Digest = 52e5f9c9e6f656f3e1800dfa5579d089
Time   = 1.340000 seconds
Speed  = 74626865.671642 bytes/second

For the record, LibreSSL speed benchmark results are available here.

System message buffer (dmesg output):

OpenBSD 6.7-current (GENERIC) #299: Sun May 24 18:25:45 MDT 2020
real mem  = 964190208 (919MB)
avail mem = 935088128 (891MB)
random: good seed from bootblocks
mainbus0 at root: Cubietech Cubieboard2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A7 r0p4
cpu0: 32KB 32b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu0: 256KB 64b/line 8-way L2 cache
cortex0 at mainbus0
psci0 at mainbus0: PSCI 0.0
sxiccmu0 at mainbus0
agtimer0 at mainbus0: tick rate 24000 KHz
simplebus0 at mainbus0: "soc"
sxiccmu1 at simplebus0
sxipio0 at simplebus0: 175 pins
sxirtc0 at simplebus0
sxisid0 at simplebus0
ampintc0 at simplebus0 nirq 160, ncpu 2: "interrupt-controller"
"system-control" at simplebus0 not configured
"interrupt-controller" at simplebus0 not configured
"dma-controller" at simplebus0 not configured
"lcd-controller" at simplebus0 not configured
"lcd-controller" at simplebus0 not configured
"video-codec" at simplebus0 not configured
sximmc0 at simplebus0
sdmmc0 at sximmc0: 4-bit, sd high-speed, mmc high-speed, dma
"usb" at simplebus0 not configured
"phy" at simplebus0 not configured
ehci0 at simplebus0
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ohci0 at simplebus0: version 1.0
"crypto-engine" at simplebus0 not configured
"hdmi" at simplebus0 not configured
sxiahci0 at simplebus0: AHCI 1.1
scsibus0 at sxiahci0: 32 targets
ehci1 at simplebus0
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ohci1 at simplebus0: version 1.0
"timer" at simplebus0 not configured
sxidog0 at simplebus0
"ir" at simplebus0 not configured
"codec" at simplebus0 not configured
sxits0 at simplebus0
com0 at simplebus0: ns16550, no working fifo
sxitwi0 at simplebus0
iic0 at sxitwi0
axppmic0 at iic0 addr 0x34: AXP209
sxitwi1 at simplebus0
iic1 at sxitwi1
"gpu" at simplebus0 not configured
dwge0 at simplebus0: address 02:0a:09:03:27:08
rlphy0 at dwge0 phy 1: RTL8201L 10/100 PHY, rev. 1
"hstimer" at simplebus0 not configured
"display-frontend" at simplebus0 not configured
"display-frontend" at simplebus0 not configured
"display-backend" at simplebus0 not configured
"display-backend" at simplebus0 not configured
gpio0 at sxipio0: 32 pins
gpio1 at sxipio0: 32 pins
gpio2 at sxipio0: 32 pins
gpio3 at sxipio0: 32 pins
gpio4 at sxipio0: 32 pins
gpio5 at sxipio0: 32 pins
gpio6 at sxipio0: 32 pins
gpio7 at sxipio0: 32 pins
gpio8 at sxipio0: 32 pins
usb2 at ohci0: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1
usb3 at ohci1: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1
simplefb0 at mainbus0: 1920x1080, 32bpp
wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation)
scsibus1 at sdmmc0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <SD/MMC, SC64G, 0080> removable
sd0: 60906MB, 512 bytes/sector, 124735488 sectors
uhidev0 at uhub2 port 1 configuration 1 interface 0 "Lenovo ThinkPad Compact USB Keyboard with TrackPoint" rev 2.00/3.30 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub2 port 1 configuration 1 interface 1 "Lenovo ThinkPad Compact USB Keyboard with TrackPoint" rev 2.00/3.30 addr 2
uhidev1: iclass 3/1, 22 report ids
ums0 at uhidev1 reportid 1: 5 buttons, Z and W dir
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 16: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 17: input=2, output=0, feature=0
uhid2 at uhidev1 reportid 19: input=8, output=8, feature=8
uhid3 at uhidev1 reportid 21: input=2, output=0, feature=0
uhid4 at uhidev1 reportid 22: input=2, output=0, feature=0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootfile: sd0a:/bsd
boot device: sd0
root on sd0a (f7b555b0fa0e8c49.a) swap on sd0b dump on sd0b

Sensors output:

$ sysctl hw.sensors
hw.sensors.sxits0.temp0=39.50 degC
hw.sensors.axppmic0.temp0=30.00 degC
hw.sensors.axppmic0.volt0=4.95 VDC (ACIN)
hw.sensors.axppmic0.volt1=0.03 VDC (VBUS)
hw.sensors.axppmic0.volt2=4.85 VDC (APS)
hw.sensors.axppmic0.current0=0.11 A (ACIN)
hw.sensors.axppmic0.current1=0.00 A (VBUS)
hw.sensors.axppmic0.indicator0=On (ACIN), OK
hw.sensors.axppmic0.indicator1=Off (VBUS)

May 25, 2020

Gustaf Erikson (gerikson)

4,000 dead in Sweden May 25, 2020 09:48 AM

Alsing dead May 25, 2020 09:48 AM

3,000 dead in Sweden May 25, 2020 09:47 AM

2,000 dead in Sweden May 25, 2020 09:47 AM

1,000 dead in Sweden May 25, 2020 09:47 AM

Work from home begins today May 25, 2020 09:47 AM

WHO declares a pandemic May 25, 2020 09:46 AM

Marc Brooker (mjb)

Reading Research: A Guide for Software Engineers May 25, 2020 12:00 AM

Reading Research: A Guide for Software Engineers

Don't be afraid.

One thing I'm known for at work is reading research papers, and referring to results in technical conversations. People ask me if, and how, they should read papers themselves. This post is a long-form answer to that question. The intended audience is working software engineers.

Why read research?

I read research in one of three mental modes.

The first mode is solution finding: I’m faced with a particular problem, and am looking for solutions. This isn’t too different from the way that you probably use Stack Overflow, but for more esoteric or systemic problems. Solution finding can work directly from papers, but I tend to find books more useful in this mode, unless I know an area well and am looking for something specific.

A more productive mode is what I call discovery. In this case, I’ve been working on a problem or in a space, and know something about it. In discovery mode, I want to explore around the space I know and see if there are better solutions. For example, when I was building a system using Paxos, I read a lot of literature about consensus protocols in general (including classics like Viewstamped Replication1, and newer papers like Raft). The goal in discovery mode is to find alternative solutions, opportunities for optimization, or new ways to think about a problem.

The most intellectually gratifying mode for me is curiosity mode. Here, I’ll read papers that just seem interesting to me, but aren’t related to anything I’m currently working on. I’m constantly surprised by how reading broadly has helped me solve problems, or just informed by approach. For example, reading about misuse-resistant cryptography primitives like GCM-SIV has deeply informed my approach to API design. Similarly, reading about erasure codes around 2005 helped me solve an important problem for my team just this year.

I’ve found reading for discovery and curiosity very helpful to my career. It has also given me tools that makes reading for solution finding more efficient. Sometimes, reading for curiosity leads to new paths. About five years ago I completely changed what I was working on after reading Latency lags bandwidth, which I believe is one of the most important trends in computing.

Do I need a degree to read research papers?

No. Don’t expect to be able to pick up every paper and understand it completely. You do need a certain amount of background knowledge, but no credentials. Try to avoid being discouraged when you don't understand a paper, or sections of a paper. I'm often surprised when I revisit something after a couple years and find I now understand it.

Learning a new field from primary research can be very difficult. When tackling a new area, books, blogs, talks, and courses are better options.

How do I find papers worth reading?

That depends on the mode you’re in. In solution finding and discovery modes, search engines like Google Scholar are a great place to start. One challenge with searching is that you might not even know the right things to search for: it’s not unusual for researchers to use different terms from the ones you are used to. If you run into this problem, picking up a book on the topic can often help bridge the gap, and the references in books are a great way to discover papers.

Following particular authors and researchers can be great for discovery and curiosity modes. If there’s a researcher who’s working in a space I’m interested in, I’ll follow them on Twitter or add search alerts to see when they’ve published something new.

Conferences and journals are another great place to go. Most of the computer science research you’ll read is probably published at conferences. There are some exceptions. For example, I followed ACM Transactions on Storage when I was working in that area. Pick a couple of conferences in areas that you’re interested in, and read through their programs when they come out. In my area, NSDI and Eurosys happened earlier this year, and OSDI is coming up. Jeff Huang has a nice list of best paper winners at a wide range of CS conferences.

A lot of research involves going through the graph of references. Most papers include a list of references, and as I read I note down which ones I’d like to follow up on and add them to my reading list. References form a directed (mostly) acyclic graph of research going into the past.

Finally, some research bloggers are worth following. Adrian Colyer's blog is worth its weight in gold. I’ve written about research from researchers from Leslie Lamport, Nancy Lynch and others, too.

That’s quite a fire hose! How do I avoid drowning?

You don’t have to drink that whole fire hose. I know I can’t. Titles and abstracts can be a good way to filter out papers you want to read. Don’t be afraid to scan down a list of titles and pick out one or two papers to read.

Another approach is to avoid reading new papers at all. Focus on the classics, and let time filter out papers that are worth reading. For example, I often find myself recommending Jim Gray's 1986 paper on The 5 Minute Rule and Lisanne Bainbridge's 1983 paper on Ironies of Automation2.

Who writes research papers?

Research papers in the areas of computer science I work in are generally written by one of three groups. First, researchers at universities, including professors, post docs, and graduate students. These are people who’s job it is to do research. They have a lot of freedom to explore quite broadly, and do foundational and theoretical work.

Second, engineering teams at companies publish their work. Amazon’s Dynamo, Firecracker, Aurora and Physalia papers are examples. Here, work is typically more directly aimed at a problem to be solved in a particular context. The strength of industry research is that it’s often been proven in the real world, at scale.

In the middle are industrial research labs. Bell Labs was home to some of the foundational work in computing and communications. Microsoft Research do a great deal of impressive work. Industry labs, as a broad generalization, also tend to focus on concrete problems, but can operate over longer time horizons.

Should I trust the results in research papers?

The right answer to this question is no. Nothing about being in a research paper guarantees that a result is right. Results can range from simply wrong, to flawed in more subtle ways3.

On the other hand, the process of peer review does help set a bar of quality for published results, and results published in reputable conferences and journals are generally trustworthy. Reviewers and editors put a great deal of effort into this, and it’s a real strength of scientific papers over informal publishing.

My general advice is to read methods carefully, and verify results for yourself if you’re going to make critical decisions based on them. A common mistake is to apply a correct result too broadly, and assume it applies to contexts or systems it wasn’t tested on.

Should I distrust results that aren’t in research papers?

No. The process of peer review is helpful, but not magical. Results that haven’t been peer reviewed, or rejected from peer review aren’t necessarily wrong. Some important papers have been rejected from traditional publishing, and were published in other ways. This happened to Leslie Lamport's classic paper introducing Paxos:

I submitted the paper to TOCS in 1990. All three referees said that the paper was mildly interesting, though not very important, but that all the Paxos stuff had to be removed. I was quite annoyed at how humorless everyone working in the field seemed to be, so I did nothing with the paper.

It was eventually published 8 years later, and quite well received:

This paper won an ACM SIGOPS Hall of Fame Award in 2012.

There's a certain dance one needs to know, and follow, to get published in a top conference or journal. Some of the steps are necessary, and lead to better research and better communities. Others are just for show.

What should I look out for in the methods section?

That depends on the field. In distributed systems, one thing to look out for is scale. Due to the constraints of research, systems may be tested and validated at a scale below what you’ll need to run in production. Think carefully about how the scale assumptions in the paper might impact the results. Both academic and industry authors have an incentive to talk up the strengths of their approach, and avoid highlighting the weaknesses. This is very seldom done to the point of dishonesty, but worth paying attention to as you read.

How do I get time to read?

This is going to depend on your personal circumstances, and your job. It's not always easy. Long-term learning is one of the keys to a sustainable and successful career, so it's worth making time to learn. One of the ways I like to learn is by reading research papers. You might find other ways more efficient, effective or enjoyable. That's OK too.


Pekka Enberg pointed me at How to Read a Paper by Srinivasan Keshav. It describes a three-pass approach to reading a paper that I like very much:

The first pass gives you a general idea about the paper. The second pass lets you grasp the paper’s content, but not its details. The third pass helps you understand the paper in depth.

Murat Demirbas shared his post How I Read a Research Paper which contains a lot of great advice. Like Murat, I like to read on paper, although I have taken to doing my lighter-weight reading using LiquidText


  1. I wrote a blog post about Viewstamped Replication back in 2014. It's a pity VR isn't more famous, because it's an interestingly different framing that helped me make sense of a lot of what Paxos does.
  2. Obviously stuff like maths is timeless, but even in fast-moving fields like systems there are papers worth reading from the 50s and 60s. I think about Sayre's 1969 paper Is automatic “folding” of programs efficient enough to displace manual? when people talk about how modern programmers don't care about efficiency.
  3. There's a lot of research that looks at the methods and evidence of other research. For a start, and to learn interesting things about your own benchmarking, take a look at Is Big Data Performance Reproducible in Modern Cloud Networks? and A Nine Year Study of File System and Storage Benchmarking

Joe Nelson (begriffs)

Logging TLS session keys in LibreSSL May 25, 2020 12:00 AM

LibreSSL is a fork of OpenSSL that improves code quality and security. It was originally developed for OpenBSD, but has since been ported to several platforms (Linux, *BSD, HP-UX, Solaris, macOS, AIX, Windows) and is now the default TLS provider for some of them.

When debugging a program that uses LibreSSL, it can be useful to see decrypted network traffic. Wireshark can decrypt TLS if you provide the secret session key, however the session key is difficult to obtain. It is different from the private key used for functions like tls_config_set_keypair_file(), which merely secures the initial TLS handshake with asymmetric cryptography. The handshake establishes the session key between client and server using a method such as Diffie-Hellman (DH). The session key is then used for efficient symmetric cryptography for the remainder of the communication.

Web browsers, from their Netscape provenance, will log session keys to a file specified by the environment variable SSLKEYLOGFILE when present. Netscape packaged this behavior in its Network Security Services library.

OpenSSL and LibreSSL don’t implement that NSS behavior, although OpenSSL allows code to register a callback for when TLS key material is generated or received. The callback receives a string in the NSS Key Log Format.

In addition to refactoring OpenSSL code, LibreSSL offers a simplified TLS interface called libtls. The simplicity makes it more likely that applications will use it safely. However, I couldn’t find an easy way to log session keys for my libtls connection.

I found a somewhat hacky way to do it, and asked their development list whether there’s a better way. From the lack of response, I assume there isn’t yet. Posting the solution here in case it’s helpful for anyone else.

This module provides a tls_dump_keylog() function that appends to the file specified in SSLKEYLOGFILE.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include <openssl/ssl.h>

/* A copy of the tls structure from libtls/tls_internal.h
 * This is a fragile hack! When the structure changes in libtls
 * then it will be Undefined Behavior to alias it with this.
 * See C99 section 6.5 (Expressions), paragraph 7
struct tls_internal {
	struct tls_config *config;
	struct tls_keypair *keypair;

	struct {
		char *msg;
		int num;
		int tls;
	} error;

	uint32_t flags;
	uint32_t state;

	char *servername;
	int socket;

	SSL *ssl_conn;
	SSL_CTX *ssl_ctx;

	struct tls_sni_ctx *sni_ctx;

	X509 *ssl_peer_cert;
	STACK_OF(X509) *ssl_peer_chain;

	struct tls_conninfo *conninfo;

	struct tls_ocsp *ocsp;

	tls_read_cb read_cb;
	tls_write_cb write_cb;
	void *cb_arg;

static void printhex(FILE *fp, const unsigned char* s, size_t len)
	while (len-- > 0)
		fprintf(fp, "%02x", *s++);

bool tls_dump_keylog(struct tls *tls)
	FILE *fp;
	unsigned int len_key, len_id;
	unsigned char key[256];
	const unsigned char *id;

	const char *path = getenv("SSLKEYLOGFILE");
	if (!path)
		return false;

	/* potentially nonstrict aliasing */
	sess = SSL_get_session(((struct tls_internal*)tls)->ssl_conn);
	if (!sess)
		fprintf(stderr, "Failed to get session for TLS\n");
		return false;
	len_key = SSL_SESSION_get_master_key(sess, key, sizeof key);
	id      = SSL_SESSION_get_id(sess, &len_id);

	if ((fp = fopen(path, "a")) == NULL)
		fprintf(stderr, "Unable to write keylog to '%s'\n", path);
		return false;
	fputs("RSA Session-ID:", fp);
	printhex(fp, id, len_id);
	fputs(" Master-Key:", fp);
	printhex(fp, key, len_key);
	fputs("\n", fp);
	return true;

To use the logfile in Wireshark, right click on a TLS packet, and select Protocol Preferences(Pre)-Master-Secret log filename.

(Pre)-Master-Secret log filename menu item

(Pre)-Master-Secret log filename menu item

In the resulting dialog, add the filename to the logfile. Then you can view the decrypted traffic with FollowTLS Stream.

Follow TLS stream menu item

Follow TLS stream menu item

May 24, 2020

Derek Jones (derek-jones)

New users generate more exceptions than existing users (in one dataset) May 24, 2020 10:42 PM

Application usage data is one of the rarest kinds of public software engineering data.

Even data that might be used to approximate application usage is rare. Server logs might be used as a proxy for browser usage or operating system usage, and number of Debian package downloads as a proxy for usage of packages.

Usage data is an important component of fault prediction models, and the failure to incorporate such data is one reason why existing fault models are almost completely worthless.

The paper Deriving a Usage-Independent Software Quality Metric appeared a few months ago (it’s a bit of a kitchen sink of a paper), and included lots of usage data! As far as I know, this is a first.

The data relates to a mobile based communications App that used Google analytics to log basic usage information, i.e., daily totals of: App usage time, uses by existing users, uses by new users, operating system+version used by the mobile device, and number of exceptions raised by the App.

Working with daily totals means there is likely to be a non-trivial correlation between usage time and number of uses. Given that this is the only public data of its kind, it has to be handled (in my case, ignored for the time being).

I’m expecting to see a relationship between number of exceptions raised and daily usage (the data includes a count of fatal exceptions, which are less common; because lots of data is needed to build a good model, I went with the more common kind). So a’fishing I went.

On most days no exception occurred (zero is the ideal case for the vendor, but I want lots of exception to build a good model). Daily exception counts are likely to be small integers, which suggests a Poisson error model.

It is likely that the same set of exceptions were experienced by many users, rather like the behavior that occurs when fuzzing a program.

Applications often have an initial beta testing period, intended to check that everything works. Lucky for me the beta testing data is included (i.e., more exceptions are likely to occur during beta testing, which get sorted out prior to official release). This is the data I concentrated my modeling.

The model I finally settled on has the form (code+data):

Exceptions approx uses^{0.1}newUserUses^{0.54}e^{0.002sqrt{usagetime}}AndroidVersion

Yes, newUserUses had a much bigger impact than uses. This was true for all the models I built using data for all Android/iOS Apps, and the exponent difference was always greater than two.

Why square-root, rather than log? The model fit was much better for square-root; too much better for me to be willing to go with a model which had usagetime as a power-law.

The impact of AndroidVersion varied by several orders of magnitude (which won’t come as a surprise to developers using earlier versions of Android).

There were not nearly as many exceptions once the App became generally available, and there were a lot fewer exceptions for the iOS version.

The outsized impact of new users on exceptions experienced is easily explained by developers failing to check for users doing nonsensical things (which users new to an App are prone to do). Existing users have a better idea of how to drive an App, and tend to do the kind of things that developers expect them to do.

As always, if you know of any interesting software engineering data, please let me know.

Pages From The Fire (kghose)

Just use std::filesystem May 24, 2020 07:13 PM

C++17’s std::filesystem gives me the warm fuzzy feeling Python3’s pathlib does. Easy, intuitive and cross-platform, yet another excuse to use C++17. Don’t look back. Just use it. Specially useful functions in std::filesystem are: The / operator : The committee fought their instincts to be “enterprise” , decided to be more Pythonic, and got this very… Read More Just use std::filesystem

Ponylang (SeanTAllen)

Last Week in Pony - May 24, 2020 May 24, 2020 02:33 PM

Damon Kwok has been doing a ton of awesome work on the Emacs ponylang-mode. Sean T Allen gave an informal presentation on Pony to the Houston Functional Programmers Users Group.

May 22, 2020

Wesley Moore (wezm)

Software Bounties May 22, 2020 10:07 AM

I don't have time to build all the things I'd like to build, so I'm offering bounties on the following work.


  • Payment will be made via PayPal when the criteria is met. If you would prefer another mechanism, feel free to suggest it, but no guarantees.
  • Amounts are in Australian dollars.
  • I will not pay out a bounty after the expiration date.
  • I may choose the extend the expiration date.
  • How can I trust you'll pay me? I like to think that I'm a trustworthy person. However, if you would like to discuss a partial payment prior to starting work on one of these issues please get in touch.
  • You have to be the primary contributor to claim the bounty. If someone else does all the work and you just nudge it over the line the other person is the intended recipient.
  • If in doubt contact me.

Emoji Reactions in Fractal

Fractal is a Matrix client written in Rust using GTK.

Criteria: Implement emoji reactions in Fractal to the satisfaction of the maintainers, resulting in the issue being closed as completed.
Language: Rust
Amount: AU$500
Expires: 2021-01-01T00:00:00Z

Update Mattermost Server to Support Emoji Added After Unicode 9.0

Mattermost's emoji picker is stuck on emoji from Unicode 9. We're now up to Unicode 13 and many emoji added in the last few years are missing. This bounty pertains only to the work required in the Mattermost server, not the desktop and mobile apps.

Criteria: Update the list of emoji in the Mattermost server to Unicode 13.0, resulting in the issue being closed as completed.
Language: Go
Amount: AU$200
Expires: 2021-01-01T00:00:00Z

Carlos Fenollosa (carlesfe)

No more Google Analytics May 22, 2020 09:24 AM

I have removed the GA tracking code from this website. does not use any tracking technique, neither with cookies, nor js, nor image pixels.

Even though this was one of the first sites to actually implement a consent-based GA tracking, the current situation with the cookie banners is terrible.

We are back to the flash era where every site had a "home page" and you needed to perform some extra clicks to view the actual content. Now those extra clicks are spent in disabling all the tracking code.

I hate the current situation so much that I just couldn't be a part of it any more. So, no banner, no cookies, no js, nothing. Any little traffic I get I'll analyze with a log parser like webalizer. I wasn't checking it anyways.

Tags: internet, web, security

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

May 21, 2020

Gonçalo Valério (dethos)

Dynamic DNS using Cloudflare Workers May 21, 2020 11:09 PM

In this post I’ll try to describe a simple solution, that I came up with, to solve the issue of dynamically updating DNS records when the IP addresses of your machines/instances changes frequently.

While Dynamic DNS isn’t a new thing and many services/tools around the internet already provide solutions to this problem (for more than 2 decades), I had a few requirements that ruled out most of them:

  • I didn’t want to sign up to a new account in one of these external services.
  • I would prefer to use a domain name under my control.
  • I don’t trust the machine/instance that executes the update agent, so according to the principle of the least privilege, the client should only able to update one DNS record.

The first and second points rule out the usual DDNS service providers and the third point forbids me from using the Cloudflare API as is (like it is done in other blog posts), since the permissions we are allowed to setup for a new API token aren’t granular enough to only allow access to a single DNS record, at best I would’ve to give access to all records under that domain.

My solution to the problem at hand was to put a worker is front of the API, basically delegating half of the work to this “serverless function”. The flow is the following;

  • agent gets IP address and timestamp
  • agent signs the data using a previously known key
  • agent contacts the worker
  • worker verifies signature, IP address and timestamp
  • worker fetches DNS record info of a predefined subdomain
  • If the IP address is the same, nothing needs to be done
  • If the IP address is different, worker updates DNS record
  • worker notifies the agent of the outcome

Nothing too fancy or clever, right? But is works like a charm.

I’ve published my implementation on GitHub with a FOSS license, so anyone can modify and reuse. It doesn’t require any extra dependencies, it consists of only two files and you just need to drop them at the right locations and you’re ready to go. The repository can be found here and the contains the detailed steps to deploy it.

There are other small features that could be implemented, such as using the same worker with several agents that need to update different records, so only one of these “serverless functions” would be required. But these improvements will have wait for another time, for now I just needed something that worked well for this particular case and that could be easily deployed in a short time.

Robin Schroer (sulami)

Literate Calculations in Emacs May 21, 2020 12:00 AM

It is no secret that I am a big fan of literate programming for many use cases. I think it is a great match for investigative or exploratory notes, research, and configuration.

On a Friday evening about two weeks ago, my flatmate came up with an idea for doing calculations in a literate way. Of course, if you really wanted to, you could use a Jupyter Notebook, but we were looking for something more lightweight, and ideally integrated into Emacs.

A quick search came up empty, so on Saturday morning I got started writing what came to be Literate Calc Mode. The features I wanted included named references to earlier results, spreadsheet-like recalculations on every change, and the ability to save my calculations to a file. And then of course the ability to interlace calculations with explanatory text.

It was in part inspired by the iOS app TydligScreenshot of Tydlig

, which also provides calculations with automatically updating references to earlier results, but does not allow saving the workspaces as files, which I find very limiting.

But enough talk, this is what the result looks like in action:

This is literate-calc-minor-mode, running alongside org-mode. As you can see, it automatically picks up calculations and inserts the results as overlays at the end of the line. It allows the user to bind results to variables, which can even have names with spaces. Any change causes all values to be recalculated, similar to a spreadsheet.

Because it uses Emacs’ built-in calc-eval behind the scenes, it supports almost everything M-x calc does, including formulas, complex units, and unresolved mathematical variables.

Of course there are also other convenience functions, such as evaluating just a single line, or inserting the results into the file for sharing. I do have some more plans for the future, which are outlined in the documentation.

In addition to hopefully providing some value to other Emacs users, this was also a great learning experience.On a meta-level, writing this post has taught me how to use <video> on my blog.

I have learned a lot about overlays in Emacs, and I published my first package on MELPA, which was a thoroughly pleasant experience.

eta (eta)

Writing as a form of relief May 21, 2020 12:00 AM

The content on this blog does not get updated very frequently.

This is largely because, as a somewhat permanent and public sort of thing, I have to be quite careful about what stuff I write on here, since it could come back to bite me later, right? We’ve all heard the stories about people getting turned down from job offers due to some embarrassing stuff they posted on the social network du jour12, and I generally get the impression that being careful with what you choose to write into your permanent online record is generally a good thing. (Well, when you phrase it like that…)

What are blogs for, though? As far as I can tell, almost nobody reads this one; there are a few stragglers who come here through Google because I posted about something like microG or Rust programming (even though there are now much better resources to learn Rust out there, given the language has changed a whole load since I started learning it…).

Furthermore, I don’t consider myself the kind of person who’s happy to go and do lots of writing about technical topics (for the moment, at least). Some people can sustain an entire blogging habit by packing things full of interesting technical content / deep dives / whatever. This is great, because then the content is at least ‘useful’ to the person reading it (in some sense3), instead of being some poor sap whining on about random other things happening in their life.

Unfortunately, though, I’m not one of these people. So, either I go about my life and don’t write any part of it up on the blog, or I do the converse, and end up spewing things I’ll probably regret later out into the web at large.

There’s a pandemic on. Everyone’s feeling miserable, a lot of people have very tragically lost their lives to the COVID-19 virus, and people are beginning to question all sorts of things about the existence we led before this all started.

So, what the hell, let’s just get on with it then.

The utility of personal content

Some people might be of the opinion that personal content (like somewhat soppy blog posts) is not worth reading, and should perhaps be gotten rid of entirely. However – and obviously I’m going to be biased here! – I don’t really think so. Okay, I think the sort of content where people talk about whatever mundane things they’ve been getting up to (“so, this week, I washed my bike, went out for a run, tinkered with Node.js a bit…”) is perhaps a bit of a waste of time – I’d call that ‘oversharing’, perhaps. But I do think it’s possible to read stuff about someone else’s problems and gain some insight into how you might be able to solve your own, so I don’t really want to dismiss personal content entirely.

I guess there’s a distinction between content that is purely descriptive – explaining how much you hate yourself, or how annoying thing X is, or whatever – and content that has an analytical or empathetic component as well – trying to figure out the reasons why this is the case and provide some advice to people feeling the same way, or otherwise attempting to connect your own personal experience to what others may feel. The former has no value to the reader, really – oh, poor random internet commentator. How sad. Imagine an agony aunt column without the agony aunt’s responses. How awful would that be? But the latter kind of stuff can definitely be of some value; I’ve read things on the web that have influenced the way I look at the world and respond to things – most people probably have. So it’s not entirely worthless!

The title of this post

In fact, I think writing about things is a great way to process and deal with said things. I’m not just talking about personal or emotional matters – way back in 2016 when I wrote a short Rust tutorial series, the aim was as much to inform others as to force me to be honest about my own Rust abilities; writing something up gets you to specify what exactly you mean in plain English, which can be great for identifying gaps in your knowledge, or areas of flawed thinking.

This is partially why, as it says in the title, writing can be a form of relief; there’s something about putting pen to paper that makes you feel just a bit better about whatever it is you’re writing about, be that your frustrations learning a new programming language or something more personal.

It’s also, in some ways, a lot lower friction than talking to someone about something. If you start calling up your friends and ranting to them about how much asynchronous programming paradigms suck, you eventually lose most of your friends – whereas you aren’t going to annoy anyone, or take up anyone’s time, by writing about things4. (Unless you do something crazy like start sending your friends letters in the post containing your rants. This is also a good way to lose most of your friends.)

‘Mental health awareness’

Now, of course, you don’t actually have to publish anything to get these benefits; simply writing something up should be enough. (This is the idea behind journaling, I think.) In fact, as I discussed at the start, publishing things can be harmful to your career.

However, I think it’s still worth doing: I’m a human being, and you are too. There are thousands of tech blogs that just talk about tech and don’t talk about anything personal or human; there are thousands of people who only talk about technical topics on their website and never mention a thing about their private lives. I’m not saying they should – but I do tend to think that seeing other people talk about their problems publicly can be a great motivator for you to do the same (for example, I’m a big fan of and her occasional post about toxic Silicon Valley culture). To me, that’s what this concept of ‘mental health awareness’ is about (at least in part): recognizing that other people are people too, and trying to get people to talk more openly about their thoughts and feelings, instead of just keeping them to themselves.

So, yeah. Write (somewhat critically) about things that bother you, even if they aren’t technical. It’s helpful for you, and you never know what impact it’ll have on somebody else!

Or, you know, just don’t, if you’re not into that sort of thing. But I’m going to give it a try.

Also, the hope is that just trying to get into a semi-regular pattern of writing about /anything/ without much of a filter will mean that more technical stuff seeps out as well. We’ll see what happens!

  1. Well, people tell me this happens. I’m not, ehm, experienced enough to actually have heard of this happening first-hand. 

  2. On a related note, if you’re someone who might have the capability to make me a job offer, just… do me a solid and don’t read the blog, okay? :p 

  3. More on this later. 

  4. You also are probably not going to annoy your friends by talking about personal issues if you really feel the need to talk to someone about them, since that’s what friends are for! However, it doesn’t feel too great having to do this a lot (where ‘a lot’ is subjectively defined) – in other words, even though your friends might not actually get annoyed, your fear of them getting annoyed (and perhaps not telling you) might be enough to make you not want to talk to them. 

May 20, 2020

Benjamin Pollack (gecko)

The Deprecated *nix API May 20, 2020 07:31 PM

I realized the other day that, while I do almost all of my development “in *nix”, I don’t actually meaningfully program in what I traditionally have thought of as “*nix” anymore. And, if things like Hacker News, Lobsters, and random dotfiles I come across on GitHub are any indication, then there are many developers like me.

“I work on *nix” can mean a lot of very different things, depending on who you ask. To some, it honestly just means they’re on the command line: being in cmd.exe on Windows, despite utterly different (and not necessarily inferior!) semantics, might qualify. To others, it means a rigid adherence to POSIX, even if GNU’s incompatible variants might rule the day on the most common Linux distros. To others, it truly means working on an actual, honest-to-goodness Unix derivative, such as some of the BSDs—or perhaps a SunOS or Solaris derivative, like OpenIndiana.

To me, historically, it’s meant that I build on top of the tooling that Unix provides. Even if I’m on Windows, I might be developing “in *nix” as long as I’m using sed, awk, shell scripts, and so on, to get what I need to do done. The fact I’m on Windows doesn’t necessarily matter; what matters is the underlying tooling.

But the other day, I realized that I’ve replaced virtually all of the traditional tooling. I don’t use find; I use fd. I don’t use sed; I use sd. du is gone for dust, bash for fish, vim for kakoune, screen for tmux, and so on. Even the venerable grep and awk are replaced by not one, but two tools, and not in a one-for-one: depending on my ultimate goal, ripgrep and angle-grinder replace either or both tools, sometimes in concert, and sometimes alone.

I’m not particularly interested in a discussion on whether these tools are “better”; they work better for me, so I use them. Based on what I see on GitHub, enough other people feel similarly that all of these incompatible variations on a theme must be heavily used.

My concern is that, in that context, I think the meaning of “I write in *nix” is starting to blur a bit. The API for Windows is defined in terms of C (or perhaps C++, if you squint). For Linux, it’s syscalls. For macOS, some combo of C and Objective-C. But for “*nix”, without any clarifying context, I for one think in terms of shell scripts and their utilities. And the problem is that my own naïve scripts, despite being written on a legit *nix variant, simply will not run on a vanilla Linux, macOS, or *BSD installation. They certainly can—I can install fish, and sd, and ripgrep, and whatever else I’m using, very easily—but those tools aren’t available out-of-the-box, any more than, I dunno, the PowerShell 6 for Linux is. (Or MinGW is for Windows, to turn that around.) It amounts to a gradual ad-hoc breakage of the traditional ad-hoc “*nix” API, in favor of my own, custom, bespoke variant.

I think, in many ways, what we’re seeing is a good thing. sed, awk, and the other traditional tools all have (let’s be honest) major failings. There’s a reason that awk, despite recent protestations, was legitimately replaced by Perl. (At least, until people forgot why that happened in the first place.) But I do worry about the API split, and our poor ability to handle it. Microsoft, the paragon of backwards compatibility, has failed repeatedly to actually ensure that compatibility, even when armed with much richer metadata than vague, non-version-pinned plain-text shell-scripts calling ad-hoc, non-standard tooling. If we all go to our own variants of traditional Unix utilities, I worry that none of my scripts will meaningfully run in a decade.

Or maybe they will. Maybe my specific preferred forks of Unix utilities rule the day and all of my scripts will go through unscathed.

May 19, 2020

Jan van den Berg (j11g)

Unorthodox – Netflix miniseries May 19, 2020 06:19 PM

I was impressed by the Netflix miniseries Unorthodox. Specifically with the talented actors, the believable authentic world-building and the spot-on casting (so good). With regards to all of these aspects this is a very good show.

Huge parts of the show are in Yiddish which is a unique experience (especially when you speak a little bit of German). It felt genuine and intimate.

Moishe and Esty – Two main characters with similar but ultimately opposite experiences

I like that the story works with flashbacks and you are thrown right in the middle. And a lot is left unexplained, specifically Hasidic customs. Some, you intuitively understand (e.g. consistently touching/kissing doorposts) while others left me puzzled (an entire kitchen wrapped in tin-foil?). The show does not over-explain and it keeps the story going, but it does provide enough pointers to dig deeper.

The final audition scene tied a lot of things together — families, friends and worlds — while at the same time it made it clear that some bridges were definitely burned. Wonderfully done.

Loose ends?

However, there were some things that could have been better, or that made little sense. Spoilers ahead.

  • How long was Esty in Berlin, are we watching days, weeks or even months? Sometimes I thought this was a couple of days. But that didn’t always make sense.
  • Why did the grandmother die? Esty was never made aware of this, so what was the purpose of this tragic subplot?
  • What was the meaning of Moishe’s successful gambling scene? The fact that he won in a pokergame didn’t add anything new to his character (we already knew he had an ambivalent personality) or the story, but they made it seem significant — including his full monty dive into a Berlin river.
  • I understand that a miniseries that is almost shorter than the latest Scorsese does not have time for everything. However, the relationship Esty quickly gets with the coffee-guy felt a bit forced and far-fetched for her character arc. You don’t go from removing your sheitel to sleeping with a guy in two (?) days.

This being said, maybe some of these are setups; loose ends for another season? It could very well be, because there are still some story lines left to explore (specifically Moishe). I would watch it.

The post Unorthodox – Netflix miniseries appeared first on Jan van den Berg.

Mark J. Nelson (mjn)

Newsgames of the 2000s May 19, 2020 12:00 PM

During grad school in the late 2000s, I used to maintain a list of newsgames, i.e. games that commented in some kind of timely and usually editorial way on current events. I remembered it existed while looking through an archive of my old website, and decided to reproduce it here in case it helps anyone looking for information on newsgames of that period. You can also browse the original list on the Wayback Machine if you prefer.

The list covers games released in 1999–2008, with a bit of ad-hoc summarization, analysis, and categorization, and link to more in-depth commentary elsewhere, where available. It's almost certainly missing other games of the same era that I didn't happen to run across. I've replaced dead links with updated ones or Wayback Machine links, where possible, but I've otherwise left the entries unedited as I originally wrote them. Even in cases where the Wayback Machine has a copy of the game, unfortunately, many of them run on the Flash platform, which is tricky to get working in modern browsers. (Note that since my commentary is unedited, when an entry says that a link is to the Internet Archive's copy or a game is unavailable, it means this was already the case as of the late 2000s.)


Pico's School

Released a few months after the Columbine school shootings, Pico's School drops you, as Pico, into a school that's just been shot up and taken over by angsty goth kids who like KMFDM. You play a graphical-adventure game to defeat them, interspersed with arcade-style boss fights. There are aliens also. Somewhat ambiguous what the commentary is; it was controversial at the time for supposedly making light of the tragedy through farcical elements or even appealing to disaffected teens. On a less newsgamey note, it was also the game that launched the reputation of Newgrounds, as well as an impressive technical achievement given the limitations of 1999-era Flash 3 that had made previous Flash games not nearly as interactive.


Kabul Kaboom!

An arcade-style game where you're an Afghani civilian who has to catch U.S. aid packages (hamburgers) while dodging U.S. bombs. Released during the U.S. war against the Taliban, in which it was also dropping humanitarian aid to Afghani civilians. A bit more discussion from designer Gonzalo Frasca can be found here.


Al Quaidamon

  • URL:
  • Author: Tom Fulp
  • Date: February 2002
  • Platform: Web
  • Types of commentary: direct criticism, satire, rhetoric of failure
  • Types of gameplay: classic arcade, whack a mole
  • Subjects: terrorism, human rights, police

A criticism of liberal-minded criticisms of U.S. treatment of its war-on-terror prisoners. There's a prisoner, and you can choose to either punch him, or feed him donuts, brush his hair, or attend to his wounds, each of which impacts a meter that shows how well he's being treated. The meter steadily decreases if you do nothing, and even with non-stop coddling it's hard to get up to Geneva Convention standards, which are in any case portrayed as being better than the lives of most Americans. Part of the War on Terror collection at newgrounds, although one of the few games in the collection with editorial content.


  • URL:
  • Author: Josh On
  • Date: May 2002
  • Platform: Web
  • Types of commentary: direct criticism, pointing out tradeoffs
  • Types of gameplay: strategy, resource management
  • Subjects: international affairs, corporations, war

A simulation of the war on terror that critiques war through the simulation rules. For example, business and military spending is basically the same; not using your troops enough reduces their effectiveness; not spending enough on domestic affairs makes you unpopular; not spending enough on the military can get you assassinated; and so on. The commentary isn't particularly subtle, but the way it's built into the simulation rules is a nice use of games' procedurality.

See this blog post by Michael Mateas for a more detailed rundown of the rules and commentary they produce, and, for those not afraid of books, pgs. 82-84 of Ian Bogost's Persuasive Games (MIT Press, 2007) for more discussion.


September 12

A critique of the "war on terror"'s use of missile strikes that cause civilian casualties. You can fire a missile periodically at terrorists, or not. If you do, you'll almost certainly cause the landscape to get increasingly battle-scarred, while causing an increase in the number of terrorists through civilian casualties. If you don't, terrorists will stay present at some default background level. The first game to call itself a "newsgame".

The Howard Dean for Iowa Game

  • URL:
  • Authors: Persuasive Games and Gonzalo Frasca
  • Date: December 2003
  • Platform: Web
  • Type of commentary: electioneering
  • Types of gameplay: strategy, social
  • Subjects: elections, activism

Somewhat of a milestone in political games, since it was commissioned officially by the Howard Dean campaign for his run in the 2004 Democratic primary. Has a map-level strategic view in which you place supporters, and once you place a supporter, goes into a short real-time segment where you try to wave your campaign sign at people. There were initially some social effects, with the map changing colors based on the level of Dean support in various regions created by other players of the game, plus even some instant-messaging integration, though the IM integration has since been disabled, and the social effects are hard to see since few people still play the game. The main aim of the game seems to have been to raise some vague awareness about how the caucus system works, plus just create some buzz.

The creators give a detailed retrospective account in this essay.

Kimdom Come

You play North Korean leader Kim Jong Il in a farcical game of brinksmanship, threatening South Korea with your missiles, staging parades, gaining concessions, and so on. Gameplay is a mixture of strategy and stuff like missile-command style arcade action. Released during one of many periods where North Korea was making belligerent noises and negotiating concessions from the West. Available for the Mac.

Reelect Bush?

  • URL:
  • Author: EllaZ Systems
  • Date: December 2003
  • Platform: Standalone executable
  • Type of commentary: satire
  • Types of gameplay: sim, quiz
  • Subjects: elections, famous people

One of several games bundled with the chatbot "AI Bush", a Bush-specific version of a chatbot from a group that seems to have state-of-the-art technology as far as chatbots go. This one mocks George W. Bush's reputed lack of knowledge on various subjects by having you play an advisor who whispers answers to him, though he stops listening to you if you feed him too much bad advice. Was available for Windows for purchase, but no longer seems to be, so summary based on the official blurb rather than playing it myself.



  • URL:
  • Author: Gonzalo Frasca
  • Date: March 2004
  • Platform: Web
  • Type of commentary: memorial
  • Type of gameplay: whack a mole
  • Subject: terrorism

Released two days after the March 2004 Madrid subway bombings, this is a simple game where you click on candles' flames to make them burn brighter, raising an overall light meter. They diminish some time after you click them, so gameplay is to keep clicking to keep them as bright as possible.

Useful Voter Guide

  • URL:
  • Author: LoveInWar
  • Date: March 2004
  • Platform: Web
  • Type of commentary: satire
  • Type of gameplay: quiz
  • Subjects: elections, media

A parody of both polarized politics and the voter questionnaires that ask you a series of questions and give you a political position or candidate based on your replies. In this one, you're presented with a series of people representing opposite stereotypes, and shoot the one you hate most: Would you rather put a bullet in the flag-waving guy, or the America-hating one? Based on your choices, you get a political position, invariably described in insulting terms. The original site seems to have disappeared, so the link is to the Internet Archive's copy.

John Kerry Tax Invaders

  • URL:
  • Author: The Republican Party
  • Date: April 2004
  • Platform: Web
  • Types of commentary: direct criticism, electioneering
  • Type of gameplay: classic arcade
  • Subjects: elections, taxes

A skin of Space Invaders with you playing George W. Bush shooting down John Kerry tax proposals. Not very much game rhetoric here beyond a skin, but it gets included in the list since it was an earlyish official political game. The original site is no longer up, so the link is to the Internet Archive's copy.

2004 Everybody Fight

  • URL:
  • Author: Digital Extreme
  • Date: May 2004
  • Platform: Standalone executable
  • Type of commentary: satire
  • Type of gameplay: first person shooter
  • Subject: elections

A first-person shooter featuring personalities from Taiwan's 2004 election. It's tempting to consider this just an opportunistic skin of an FPS, but the manufacturer claims it's a parody of violence and acrimony between supporters of the two main political camps. Summary based on an article in the Taipei Times.


  • URL:
  • Author: The Republican Party
  • Date: May 2004
  • Platform: Web
  • Types of commentary: direct criticism, electioneering, rhetoric of failure
  • Type of gameplay: non videogame
  • Subjects: elections, wealth, famous people

A stripped down representation of Monopoly, where you start out with $40,000, labeled as the average household income, and roll the dice to move around a board filled with properties owned by Kerry, which inevitably bankrupt you. The gameplay is a bit weak, but I suppose the point that Kerry is rich comes across. The original site is no longer up, so the link is to the Internet Archive's copy, which is somewhat broken unfortunately.

Escape from Woomera

A Half-Life mod that puts you inside Australia's controversial Woomera Detention Centre for asylum applicants. You can try to get yourself asylum; navigate the daily routines of what's essentially prison life; or try to escape from the razor-wire compound. Of course, you can't get asylum, and you can't escape either. The inevitable gameplay failure highlights the no-win situation asylum applicants are put in, and the portrayal of prison-like conditions aims to highlight to the game-playing public why they should oppose the detention center (the game is unapologetically part of an anti-Woomera campaign). Got a good bit of media coverage during development.


  • URL:
  • Author: Persuasive Games
  • Date: September 2004
  • Platform: Web
  • Types of commentary: electioneering, pointing out tradeoffs
  • Types of gameplay: resource management, whack a mole
  • Subjects: elections, activism

A political game commissioned by the Democratic Congressional Campaign Committee (DCCC) for the 2004 elections. You allocate your 10,000 activists between six public-policy areas, and that allocation combined with some whack-a-mole gameplay on your part affects the three factors of money, peace, and quality of life. Mainly a simulation and encouragement of activism itself and promoting the view that there are lots of ways to balance priorities rather than direct commentary on any of the six policy areas (though some references to Democratic policy proposals are thrown in). It also allows players to share their "activism plans", which are indexed by demographic information.

Take Back Illinois

  • URL:
  • Author: Persuasive Games
  • Date: September 2004
  • Platform: Web
  • Types of commentary: electioneering, pointing out tradeoffs
  • Types of gameplay: resource management, sim
  • Subjects: elections, health care, tort reform, education, economy, activism

A political game commissioned by the Illinois state Republican Party for the 2004 election. Actually a sequence of four games, released one per week, about medical malpractice reform, education, participation, and economic development. All except for the participation game have resource-management gameplay, where the game's rhetoric is built into the simulation rules, giving a particular view of the effects of various policy choices. The participation game is a bit different, and has you going around in a simulated world to involve people in politics.

A few comments from designer Ian Bogost at the Persuasive Games page on the game and Water Cooler Games.


  • URL:
  • Author: Gonzalo Frasca
  • Date: October 2004
  • Platform: Web
  • Type of commentary: electioneering
  • Type of gameplay: non videogame
  • Subject: elections

A game created for the October 2004 Uruguayan presidential elections, commissioned by the left-wing coalition Frente Amplio - Encuentro Progresista. You put together tiles to complete a puzzle that shows positive, uplifting images of Uruguay's future. Some commentary from BBC News is available here. The game itself doesn't seem to be online anymore; let me know if you know of or have a copy.



A monopoly-inspired board game (though on the computer) with inverted goals: you compete to destroy the economy and tear down the properties, satirizing the ongoing conflict and mismanagement of Robert Mugabe in Zimbabwe. Available for the Mac.

Airport Insecurity

A parodic simulation of airport security practices. Models 138 airports, with varying degrees of inconvenience that nonetheless fail to provide very good security. Mocks your annoying fellow travelers for good measure. Sells for $3.99 for certain Nokia mobile phones, so you can play it while in a security line.


Darfur is Dying

  • URL:
  • Author: USC students
  • Date: April 2006
  • Platform: Web
  • Types of commentary: direct criticism, rhetoric of failure
  • Types of gameplay: resource management, sim
  • Subjects: international affairs, war, human rights

A game designed to raise awareness about the humanitarian situation in Darfur. There are two gameplay modes: In one, you try to manage a camp, which is short on resources and under constant threat of attack. In the other, you send children to go fetch water, admist threat of attack and abduction. The main point is that the situation is hopeless and intolerable without outside assistance. Winner of a digital-activism competition that led to it being distributed by mtvU.

There are writeups at Gameology and at Serious Games Source.

Airport Security

Another parodic simulation of airport security practices, from the makers of Airport Insecurity, but this time from the perspective of the security personnel who have to enforce absurd changing regulations. Released shortly after a rule change banned liquids in carry-on luggage at U.S. airports.

So You Think You Can Drive, Mel?

You play Mel Gibson trying to drive his car while getting increasingly drunker, without running over state troopers or getting hit by the stars of david that Hasidic Jews on the side of the road throw at you. Mocks an incident in which Gibson was pulled over for drunk driving and went on a tirade about Jews. The editorial content isn't particularly strong, though Zach Whalen over at Gameology thinks it does have a reasonable amount.

Bacteria Salad

This game was released shortly after a bagged-spinach recall in the U.S. due to E. coli contamination, which affected a surprisingly large number of retail brands due to common sourcing. The game has you manage a farm operation, and trades off running separate small farms (less profitable, but less spread of contamination) versus consolidating them into a gigantic farm (very profitable, but contamination spreads more easily). The actual active gameplay is whack-a-mole style cleaning up of contamination and issuing recalls before anyone gets sick.


Food Import Folly

Released shortly after a contaminated food import scandal in the U.S., this game puts you in an impossible role inspecting food imports with few resources. The gameplay is whack-a-mole style, where you click on imports as they come in to inspect them before a contaminated one can get through, but you have only one guy, who takes some time to inspect a shipment and can't do anything else in the meantime. So the game mainly consists of sitting there losing regardless of what you do—a good example of what designer Ian Bogost calls a "rhetoric of failure". Also notable for being the first "playable editorial cartoon" published by a major newspaper (The New York Times).

Operation: Pedopriest

  • URL:
  • Author: Molleindustria
  • Date: June 2007
  • Platform: Web
  • Types of commentary: direct criticism, satire, rhetoric of failure
  • Type of gameplay: sim
  • Subjects: bureaucracy, religion, human rights

Based on a late-2006 BBC documentary alleging the Vatican had a secret process to deal with priest sex-abuse allegations quietly, this game puts you in the role of trying to protect children from pedophilic priests while also warding off police, parents, and so on. It turns out not to be possible to succeed at protecting the children, though just protecting the priests is possible.

Points of Entry

  • URL: (dead link)
  • Author: Persuasive Games
  • Date: June 2007
  • Platform: Web
  • Type of commentary: satire
  • Type of gameplay: sim
  • Subjects: immigration, bureaucracy

A commentary on a proposed system that would give prospective U.S. immigrants points based on various criteria such as job status, age, English skills, and so on. You play an immigration clerk who has to adjust the stats of a prospective immigrant so that they're better than those of the clerk next to you, but by as small a margin as possible. It's also timed. The scenario seems a bit silly, but it succeeds in making you learn what the proposed point allocations are, in addition to portraying the process as arbitrary and bureaucratic.

Presidential Pong

A fairly simple game, but the first newsgame published by CNN, in which presidential debates are played out as a game of pong. Could be interpreted as a fairly boring shallow skin on pong, or as satirizing the quality and level of earnestness of presidential debates.


  • URL:
  • Author: INM Inter Network Marketing
  • Date: October 2007
  • Platform: Web
  • Types of commentary: direct criticism, electioneering, rhetoric of failure
  • Types of gameplay: classic arcade, whack a mole
  • Subjects: elections, immigration, taxes

An almost comically racist game by the Swiss People's Party (SVP). Has good production values and made some news, so probably one of the more successful political games. It features the party's mascot, Zottel, a very Swiss goat, facing off in four games against abuse of naturalization, illegal immigration, EU tax collectors, and federal government waste. The party had previously courted controversy with a poster that showed Zottel kicking out a black sheep, and the theme reappears here, where in one game you need to keep the black sheep off Switzerland's green pastures without harrassing the friendly white sheep. Oh, and watch out for the dastardly Green Party, trying to smuggle illegal immigrants in their party buses! The game seems to have disappeared from the internet, but there are pretty good writeups here and here. Let me know if you have or know of an archived copy.

Matt Blunt Document Destroyer

  • URL:
  • Author: The Democratic Party
  • Date: November 2007
  • Platform: Web
  • Types of commentary: direct criticism, electioneering
  • Type of gameplay: whack a mole
  • Subject: corruption

A whack-a-mole game where you try to stop Missouri governor Matt Blunt from deleting emails, apparently as a response to some sort of email-deleting scandal. The editorial content is rather weak.



  • URL:
  • Author: Conor O'Kane
  • Date: January 2008
  • Platform: Standalone executable
  • Type of commentary: satire
  • Type of gameplay: classic arcade
  • Subject: whaling

A shmup billed as a "Japanese Cetacean Research Simulator". The political commentary is of course in the discontinuity between the billing (cetacean research) and the actual gameplay (a whale-harpooning shmup), paralleling the disconnect between the official and actual purposes of Japan's whaling program. Available for Windows.

I Can End Deportation

  • URL:
  • Author: Breakthrough
  • Date: February 2008
  • Platform: Standalone executable
  • Types of commentary: direct criticism, rhetoric of failure
  • Types of gameplay: sim, quiz
  • Subjects: immigration, bureaucracy

A game released amidst ongoing debate over immigration reform in the United States that aims to highlight the brokenness of the immigration and enforcement systems, and raise some sympathy for immigrants' situation. Gameplay is an open-world game where immigrants need to carry on life while avoiding police and making various choices, some of which are presented explicitly in pop-up quiz type boxes, which then give vaguely didactic correct answers to try to educate the player on the complexities of immigration law. Some gameplay rules make rhetorical points as well, such as the results of immigration trials being basically random.

There's an interesting exchange about the game at Water Cooler Games: Ian Bogost posts an extended critique that gives it a mostly lukewarm review, and lead designer Heidi Boisvert responds, also at some length (fifth comment down).

Available for the Mac and Windows.

Police Brutality

A response to the "Don't tase me, bro!" incident in which a student protestor at a John Kerry event was tasered. You organize students by knocking them out of their torpor and blocking the police. Sort of the opposite of the more common "rhetoric of failure", in which an impossible-to-win game points out the impossibility of a situation—here a possible-to-win game aims to emphasize the possibility of successfully organizing and resisting the police in such a situation.

Available for Windows.

Sevan Janiyan (sevan)

Heads up for RSS subscribers May 19, 2020 11:40 AM

I’m going to be experimenting with the migrating from WordPress to Hugo this week, if you subscribe to the RSS feeds on this site and wish to continue to do so, you might want to check everything is ok at your end after Monday the 25th. One of the key factors of migrating to Hugo …

May 18, 2020

Jan van den Berg (j11g)

Impatient Optimist: Bill Gates in His Own Words – Lisa Rogak May 18, 2020 12:06 PM

I have a lot of respect for Bill Gates and tend to follow what he does. So this book, just like the one on Steve Jobs, is a nice reminder of the man’s personality and his thinking process.

As it spans some 30+ years, there are mild variations noticeable, but overall: what you see is what you get and with Bill Gates and that is head-on, rational straightforwardness and a passion for software.

Impatient Optimist: Bill Gates in His Own Words – Lisa Rogak (2012) – 160 pagina’s

The post Impatient Optimist: Bill Gates in His Own Words – Lisa Rogak appeared first on Jan van den Berg.

iSteve – George Beahm en Wim Zefat May 18, 2020 12:05 PM

This is a book just with quotes from late Apple founder Steve Jobs. I already knew most of them, having read more than one book about Steve Jobs. Nonetheless, seeing his most salient quotes in one place is a good indication and reminder of the man’s personality and vision.

iSteve – George Beahm en Wim Zefat (2011) – 160 pagina’s

Since the quotes are all dated I particularly noticed 3 types of Steve.

  • The brass, cocky, young Steve (everything up until 1985, before his Apple exit)
  • The reflective, contemplating Steve (from 1985 – 2000 the in-between NeXT/Pixar years)
  • The seasoned, wise Steve (2000 – 2011)

You can probably date the quotes based on their spirit to either three of these periods.

The timeline after the quotes was a great plus for this book as well as the references! However this book was not without mistakes, there never was an iPhone 4GS (a 4S sure) and the iPod was introduced on October 23 2001 (not in november).

The post iSteve – George Beahm en Wim Zefat appeared first on Jan van den Berg.

May 17, 2020

Jeff Carpenter (jeffcarp)

Machine Learning Reference May 17, 2020 08:49 PM

I often need to look up random bits of ML-related information. This post is a currently work-in-progress attempt to collect common machine learning terms and formulas into one central place. I plan on updating this post as I come across further useful pieces of information as needed. Learning Resources Google’s ML Crash Course. Neural Networks and Deep Learning by Michael Nielsen. Problem Framing Classification A type of ML model that attempts to sort an input into a discrete set of output classes.

Derek Jones (derek-jones)

Happy 60th birthday: Algol 60 May 17, 2020 08:40 PM

Report on the Algorithmic Language ALGOL 60 is the title of a 16-page paper appearing in the May 1960 issue of the Communication of the ACM. Probably one of the most influential programming languages, and a language that readers may never have heard of.

During the 1960s there were three well known, widely used, programming languages: Algol 60, Cobol, and Fortran.

When somebody created a new programming languages Algol 60 tended to be their role-model. A few of the authors of the Algol 60 report cited beauty as one of their aims, a romantic notion that captured some users imaginations. Also, the language was full of quirky, out-there, features; plenty of scope for pin-head discussions.

Cobol appears visually clunky, is used by business people and focuses on data formatting (a deadly dull, but very important issue).

Fortran spent 20 years catching up with features supported by Algol 60.

Cobol and Fortran are still with us because they never had any serious competition within their target markets.

Algol 60 had lots of competition and its successor language, Algol 68, was groundbreaking within its academic niche, i.e., not in a developer useful way.

Language family trees ought to have Algol 60 at, or close to their root. But the Algol 60 descendants have been so successful, that the creators of these family trees have rarely heard of it.

In the US the ‘military’ language was Jovial, and in the UK it was Coral 66, both derived from Algol 60 (Coral 66 was the first language I used in industry after graduating). I used to hear people saying that Jovial was derived from Fortran; another example of people citing the language the popular language know.

Algol compiler implementers documented their techniques (probably because they were often academics); ALGOL 60 Implementation is a real gem of a book, and still worth a read today (as an introduction to compiling).

Algol 60 was ahead of its time in supporting undefined behaviors 😉 Such as: “The effect, of a go to statement, outside a for statement, which refers to a label within the for statement, is undefined.”

One feature of Algol 60 rarely adopted by other languages is its parameter passing mechanism, call-by-name (now that lambda expressions are starting to appear in widely used languages, call-by-name has a kind-of comeback). Call-by-name essentially has the same effect as textual substitution. Given the following procedure (it’s not a function because it does not return a value):

procedure swap (a, b);
   integer a, b, temp;
temp := a;
a := b;
b:= temp

the effect of the call: swap(i, x[i]) is:

  temp := i;
  i := x[i];
  x[i] := temp

which might come as a surprise to some.

Needless to say, programmers came up with ‘clever’ ways of exploiting this behavior; the most famous being Jensen’s device.

The follow example of go to usage appears in: International Standard 1538 Programming languages – ALGOL 60 (the first and only edition appeared in 1984, after most people had stopped using the language):

go to if Ab  c then L17
  else g[if w  0 then 2 else n]

Orthogonality of language use won out over the goto FUD.

The Software Preservation Group is a great resource for Algol 60 books and papers.

Ponylang (SeanTAllen)

Last Week in Pony - May 17, 2020 May 17, 2020 03:54 PM

Pony 0.35.0 and 0.35.1 have been released! Stable, our little dependency manager that could, has been deprecated in favor of our new dependency manager, Corral.

May 16, 2020

asrpo (asrp)

Speed improvements using hash tables May 16, 2020 03:11 PM

I wrote the Forth Lisp Python Continuum (Flpc)'s self hosted compiler in stages. When I completed the parser and gave it larger and larger pieces of its own source code, it was running too slow. I tried many things to speed it up, one that helped was using hash tables.

They helped make dictionaries [names] which can

An example of a dictionary:

May 15, 2020

Jeremy Morgan (JeremyMorgan)

What Is Deno and Why Is Everyone Talking About It? May 15, 2020 03:48 PM

Deno is a hot new runtime that may replace Node.js. Everyone’s talking about it like it’s the next big thing. It likely is. Here’s why. What Is Deno? From the manual: Deno is a JavaScript/TypeScript runtime with secure defaults and a great developer experience. It’s built on V8, Rust, and Tokio. Deno is designed to be a replacement for our beloved Node.js, and it’s led by Ryan Dahl, who started the Node.

May 14, 2020

Andrew Montalenti (amontalenti)

Python 3 is here and the sky is not falling May 14, 2020 08:38 PM

James Bennett, a long-time Python developer, blogger, and contributor to Django, recently wrote a nice post about the “end” of Python 2.x, entitled “Variations on the Death of Python 2.” It’s a great read for anyone who, like me, has been in the Python community a long time.

I’ve been a Python user since the early 2.x days, first discovering Python in a print copy of Linux Journal in the year 2000, where a well-known open source developer and advocate described his transition from Perl to Python. He wrote:

I was generating working code nearly as fast as I could type. When I realized this, I was quite startled.

An important measure of effort in coding is the frequency with which you write something that doesn’t actually match your mental representation of the problem, and have to backtrack on realizing that what you just typed won’t actually tell the language to do what you’re thinking. An important measure of good language design is how rapidly the percentage of missteps of this kind falls as you gain experience with the language. When you’re writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you’ve achieved mastery of the language.

But that didn’t make sense, because it was still day one and I was regularly pausing to look up new language and library features!

This was my first clue that, in Python, I was actually dealing with an exceptionally good design.

Python’s wonderful design as a language has always been a source of inspiration for me. I even wrote “The Elements of Python Style”, as an ode to how good Python code, to me, felt like good written prose. And, of course, many of my personal and professional projects are proudly Python Powered.


Thus, I was always a little worried about the Python 2 to 3 transition. I was concerned that this one big risk, taken on by the core team, could imperil the entire language, and thus the entire community. Perl 5 had embarked on a language schism toward Perl 6 (now Raku), and many believe that both communities (Perl 5 and Raku) became weaker as a result.

But, here we are in 2020, and Python 2 is EOL, and Python 3 is here to stay. A lot of the internet debates about Python 2 vs Python 3 (like this flame war on now seem to boil down to this question: was Python 3 a good idea, in retrospect?

Python 3 caused a split in the Python community. It caused confusion about what Python actually is. It also caused a lot of porting pain for many companies, and a decade-long migration effort among major open source libraries.

If you are a relatively recent Python user and have not heard much about the Python 2 vs 3 community-wide migration effort, you can get pointers to some of the history and technical details in this wiki page. There’s a nice tl;dr summary in this Python 3 Q&A. To understand some of the porting pain involved, a somewhat common 2-to-3 porting approach is covered in the official Python 2 to 3 porting guide.

Regardless of the amount of pain caused, ultimately, Python 3 is here now. It works, it’s popular, and the Python core team has officially made its final Python 2.7 release. The community has survived the transition. To quote the Python release manager in a recent email announcing the release of Python 2.7.18:

Python 2.7.18 is a special release. I refer, of course, to the fact that “2.7.18” is the closest any Python version number will ever approximate e, Euler’s number. Simply exquisite! A less transcendent property of Python 2.7.18 is that it is the last Python 2.7 release and therefore the last Python 2 release. It’s time for the CPython community to say a fond but firm farewell to Python 2. Users still on Python 2 can use e to compute the instantaneously compounding interest on their technical debt.

Ubuntu 20.04 LTS — the “Long-Term Support” April 2020 release of one of the world’s most popular desktop and server Linux distributions — includes Python 3.8.2 by default, and includes Python 2.7.18 only optionally (via the python2 and python2-dev packages), for compatibility with old scripts.

On Ubuntu 20.04, you can install a package, python-is-python3, to ensure that the python interpreter and associated commands on your Linux machine run Python 3, which means you can then only access Python 2.7 via commands like python2, pydoc2, pdb2, and so on. The default download links on are for Python 3.x. If a Windows 10 user runs python on a stock install, they’re directed to the Windows Store to install Python 3.x. We can only assume this same assumption will be coming to Mac OS X soon.

Support for Python 2.7 and Django 1.11 ends in 2020, according to the Django project FAQ. Major Python open source project maintainers — such as those behind TensorFlow, scikit-learn, pandas, tornado, PyTorch, PySpark, IPython, and NumPy — have signed a pledge to drop Python 2 support from their projects in 2020.

The Python library “wall of shame” has become a “wall of superpowers”. And it is no longer maintained, since its mission has been accomplished.

So, it’s worth asking some questions.

Now that we’re here, is there any point in resisting any longer? Short answer: no. Python 3 is here to stay, and Python 2 is really, truly end-of-life.

Will Python 4.x ever happen, creating a disruptive 3-to-4 transition akin to 2-to-3? Short answer: no. That’s very unlikely. To quote Brett Cannon:

We will never do this kind of backwards-incompatible change again. We have decided as a team that a change as big as unicode/str/bytes will never happen so abruptly again. […] We don’t see any shortcomings in the fundamental design of the language that could warrant the need to make such a major change.

What can we learn from this experience? What has the grand experiment in language evolution taught us? Short answer: that big changes like this always take longer than you think, even when you take into account Hofstadter’s Law. Breaking backwards compatibility in a large inter-connected open source community has real cost that will test the strength of that community.

We are fortunate that Python’s community was very strong indeed at the time of this transition, and it even grew rapidly during the transition, thanks to the explosion of Python-based web programming (via Django, Flask, Tornado, etc.), numerical computing (via PyData libraries, like NumPy, Pandas), machine learning (via scikit-learn, TensorFlow, PyTorch, etc.), distributed computing (via PySpark, Dask, etc.), cloud computing (boto, google-cloud, libcloud). The network effects driven by these popular communities were like a perpetual motion machine that ensured Python’s adoption and the freshness of libraries with regard to Python 3 support.

The community is even learning to evolve beyond the direction of its creator, the BDFL, who resigned in 2018 and laid the groundwork for a smooth transition to a Python Steering Council.

So, here we are. Where do we go from here? Can the Python community continue to evolve in a positive direction atop the foundation of Python 3.x? Short answer: yes!

Python has never been healthier, and the community has learned many lessons.

So, let’s get on with it! If you’ve been holding back on using Python 3 features for some time, you can get a nice summary from this Python 3 feature round-up, which goes up through Python 3.6. Then, you can check out the official “What’s New” guides for 3.7, 3.8, and 3.9. Some of my favorite new features include:

So, what are you waiting for? It’s time to get hacking! Here’s to the next Python releases, 3.9 and 4.0 3.10! And to 3.11, 3.12, 3.13, … thereafter!

Pete Corey (petecorey)

Adding More Chords to Glorious Voice Leader May 14, 2020 12:00 AM

Prior to its latest release, Glorious Voice Leader only let you choose from a pitifully small selection of chords to build a progression with. For a tool who’s primary purpose is to guide you through the wondrous world of guitar harmony, this was inexcusable.

Glorious Voice Leader was in dire need of more chord types.

That said, faced with the enormous data entry task of manually adding every chord quality I could think of (of which, here are a few), my programmer instincts kicked in. “Music is theory is an organized, systematic area of study,” I told myself. “There has to be a way to algorithmically generate all possible chord qualities,” my naive past self believed.

What a poor fool.

What’s the Problem?

I’ve written and re-written this post a handful of times now, and each time I’ve failed to sufficiently capture the complexity of the task at hand. Regardless of the direction I’ve tackled it from, be it generating names directly and inferring notes from that name, or inferring a name from a collection of notes, this is an incredibly complicated problem.

Music is art, and music theory exists to describe it. And unfortunately for me, people have been describing music in various ways for a very long time. This means that music theory is deeply cultural, deeply rooted in tradition, and not always as systematic as we’d like to believe it to be.

The first thing we need to do when coming up with “all possible chord qualities” is deciding which tradition we want to follow. For the purposes of Glorious Voice Leader, I’m largely concerned with the jazz tradition of chord naming, which has largely evolved to describe chords used in modern popular music.

But even within a single niche, ambiguities and asymmetries abound!

A “maj7/6” chord has the same notes as a “maj13 no 9”, assuming your “maj13” chords don’t have an 11th.

Some folks assume that a “maj13” chord includes a natural 11th. Some assume it includes a sharpened 11th. Other still assume that the 11th is omitted by entirely from a “maj13” chord.

Is “aug9” an acceptable chord name, or should it be “9#5”? Both qualities share the same set of notes, and both should be understandable to musicians, but only the latter is the culturally accepted name.

Speaking of alterations like “#5” and “b9”, which order should these appear in the chord name? Sorted by the degree being altered? Or sorted by importance? More concretely, is it a “7b9#5” chord, or a “7#5b9”?

Many notes in a chord are optional, including the root note! A Cmaj13 without a 1st, or 5th is perfectly acceptable. Even the third can be optional. But is a Cmaj13 without a 1st, 3rd, and 5th still a Cmaj13? At what point does a chord with missing notes cease to be that chord?

The subtleties and nuances goes on an on.

A More Human Approach

Rather than fully automating the generation of chord qualities and names through algorithmic means, I decided to take a more human approach. I start with a large set of human-accepted chord formulas and their corresponding names:

const baseQualities = [
  ["maj", "1 3 5"],
  ["maj6", "1 3 5 6"],
  ["maj/9", "1 3 5 9"],
  ["maj6/9", "1 3 5 6 9"],
  ["maj7", "1 3 5 7"],

From there, we can modify our formulas to specify which notes in the chord are optional. It’s important to note that when specifying optional notes, any or all of those notes may be missing and the name must still make sense.

const baseQualities = [
  ["maj", "1 3 5"],
  ["maj6", "1 3 (5) 6"],
  ["maj/9", "1 3 (5) 9"],
  ["maj6/9", "(1) 3 (5) 6 9"],
  ["maj7", "(1) 3 (5) 7"],

So a chord with a formula of “1 3 5 6 9”, “3 5 6 9”, “1 5 6 9”, or “3 6 9” can still be considered a “maj6/9” chord.

For ever [name, formula] pair, we’ll tease out the full set of scale degrees and the set of optionals. From there, we find all _.combinations of those optionals that we’ll remove from the list of degress. For each combination, the degrees without the missing degrees creates a new formula and name specifying which degrees are missing:

export const qualities = _.chain(baseQualities)
  .flatMap(([name, formula]) => {
    let degrees = _.chain(formula)
      .map(degree => _.trim(degree, "()"))
    let optionals = _.chain(formula)
      .filter(degree => _.startsWith(degree, "("))
      .map(degree => _.trim(degree, "()"))
    return _.chain(_.range(_.size(optionals) + 1))
      .flatMap(i => _.combinations(optionals, i))
      .map(missing => {
        let result = {
          name: _.chain(missing)
            .map(degree => `no ${_.replace(degree, /#|b/, "")}`)
            .join(" ")
            .thru(missingString => _.trim(`${name} ${missingString}`))
          formula: _.chain(degrees)
            .join(" ")
        result.value = JSON.stringify(result);
        return result;

Some formulas have so many optional notes that the removal of enough of them results in a chord with less than three notes. We don’t want that, so we’ll add one final filter to our qualities chain:

export const qualities = _.chain(baseQualities)
  .flatMap(([name, formula]) => { ... })
  .reject(({ degrees, parent, missing }) => {
    return _.size(degrees) < 3;

And that’s all there is to it.

Final Thoughts

From a set of eighty two baseQualities that I entered and maintain by hand, this algorithm generates three hundred twenty four total qualities that users of Glorious Voice Leader are free to choose from.

This list is by no means exhaustive, but with this approach I can easily change and add to it, without concern for the oddities and asymmetries of how actual humans name the chords they play.

A part of me still believes that an algorithmic approach can generate chord quality names that fall in line with human expectations, but I haven’t found it. I imagine this is one of those problems that will live in the back of my mind for years to come.

May 13, 2020

Gustaf Erikson (gerikson)

March May 13, 2020 10:10 AM

May 12, 2020

Jan van den Berg (j11g)

I, Robot – Isaac Asimov May 12, 2020 03:33 PM

If one writer is responsible for how we think about robots it is, of course, Isaac Asimov. The terrifically prolific writer and groundbreaking author of the science-fiction genre, produced numerous works with terrific futuristic insight — and, some were about robots. And I, Robot is a seminal work in this oeuvre. But this book is of course not really about robots, or the famous law of robotics.

I, Robot – Isaac Asimov (1950) – 271 pages

No, this law is a vehicle, for these 9 loosely connected stories to present — very clever — logical puzzles often with a philosophical or ethical undertone. And this is what makes this work hold up, even after 70 years (this was written in 1950 🤯).

Our views on robots might have changed but the questions remain valid. And it not so much the robots Asimov makes us think about, but even more so about what it means to be human.

The post I, Robot – Isaac Asimov appeared first on Jan van den Berg.

May 11, 2020

Jeremy Morgan (JeremyMorgan)

Setting Up Pop!_OS for Front End Development May 11, 2020 08:29 PM

If you’ve heard all the chatter lately about Pop!_OS and have been wanting to try it out, here’s a good guide to setting up a front end development machine. If you’re relatively new to Linux and are just trying it out, I suggest building this in a Virtual Machine. I have the full instructions for installing Pop!_OS in a virtual machine here. This is the best way to dip your toes in without significant modifications to your computer.

Bogdan Popa (bogdan)

Continuations in Racket's Web Server May 11, 2020 08:55 AM

In The Missing Guide to Racket’s Web Server, I said that dispatch/servlet is equivalent to: 1 2 3 (lambda (start) (lambda (conn req) (output-response conn (start req)))) That was an oversimplification. It does apply its start argument to incoming requests and it does take care of writing the responses to the appropriate connections, but it has another important job: to handle responses returned from continuations and to dispatch incoming requests to captured continuations.

May 10, 2020

Derek Jones (derek-jones)

Having all the source code in one file May 10, 2020 10:27 PM

An early, and supposedly influential, analysis of the Coronavirus outbreak was based on results from a model whose 15,000 line C implementation was contained in a single file. There has been lots of tut-tutting from the peanut gallery, about the code all being in one file rather than distributed over many files. The source on Github has been heavily reworked.

Why do programmers work with all the code in one file, rather than split across multiple files? What are the costs and benefits of having the 15K of source in one file, compared to distributing it across multiple files?

There are two kinds of people who work with code all in one file, novices and really capable developers. Richard Stallman is an example of a very capable developer who worked using files containing huge amounts of code, as anybody who looked at the early sources of gcc will be all to familiar.

The benefit of having all the code in one file is that it is easy to find stuff and make global changes. If the source is scattered over multiple files, then working on the code entails knowing which file to look in to find whatever; there is a learning curve (these days screens have lots of pixels, and editors support multiple windows with a different file in each window; I’m sure lots of readers work like this).

Many years ago, when 64K was a lot of memory, I sometimes had to do developer support: people would come to me complaining that the computer was preventing them writing a larger program. What had happened was they had hit the capacity limit of the editor. The source now had to be spread over multiple files to get over this ‘limitation’. In practice people experienced the benefits of using multiple files, e.g., editor loading files faster (because they were a lot smaller) and reduced program build time (because only the code that changed needed to be recompiled).

These days, 15K of source can be loaded or compiled in a blink of an eye (unless a really cheap laptop is being used). Computing power has significantly reduced these benefits that used to exist.

What costs might be associated with keeping all the source in one file?

Monolithic code makes sharing difficult. I don’t know anything about the development environment within which these researched worked. If there were lots of different programs using the same algorithms, or reading/writing the same file formats, then code reuse often provides a benefit that makes it worthwhile splitting off the common functionality. But then the researchers has to learn how to build a program from multiple source files, which a surprising number are unwilling to do (at least it has always been surprising to me).

Within a research group, sharing across researchers might be a possible (assuming they are making some use of the same algorithms and file formats). Involving multiple people in the ongoing evolution of software creates a need for some coordination. At the individual level it may be more cost-efficient for people to have their own private copies of the source, with savings only occurring at the group level. With software development having a low status in academia, I don’t see any of the senior researchers willingly take on a management role, for this code. Perhaps one of the people working on the code is much better than the others (it often happens), but are they going to volunteer themselves as chief dogs body for the code?

In the world of Open Source, where source code is available, cut-and-paste is rampant (along with wholesale copying of files). Working with a copy of somebody else’s source removes a dependency, and if their code works well enough, then go for it.

A cost often claimed by the peanut gallery is that having all the code in a single file is a signal of buggy code. Given that most of the programmers who do this are novices, rather than really capable developers, such code is likely to contain many mistakes. But splitting the code up into multiple files will not reduce the number of mistakes it contains, just distribute them among the files. Correlation is not causation.

For an individual developer, the main benefit of splitting code across multiple files is that it makes developers think about the structure of their code.

For multi-person projects there are the added potential benefits of reusing code, and reducing the time spent reading other people’s code (it’s no fun having to deal with 10K lines when only a few functions are of interest).

I’m not saying that the original code is good, bad, or indifferent. What I am saying is that the having all the source in one file may, or may not, be the most effective way of working. It’s complicated, and I have no problem going with the flow (and limiting the size of the source files I write), but let’s not criticise others for doing what works for them.

Carlos Fenollosa (carlesfe)

Ponylang (SeanTAllen)

Last Week in Pony - May 10, 2020 May 10, 2020 03:03 PM

It’s been a big week in Pony world for new ponylang projects and releases. ponyc 0.34.1 has been released for users on glibc-based Linux platforms. We’ve also opened a new GitHub repo for collecting information relevant to Pony contributors at

May 09, 2020

Kevin Burke (kb)

If someone asks if you have any questions, ask a question May 09, 2020 09:05 PM

Let's revisit one of the most humiliating (and expensive) moments of my life. It happened a decade ago and even today I cringe and seethe when I think about it.

I was one of 25 finalists for a $20,000 scholarship in my junior year of college. The last step was an hour long interview with three faculty members. I wrote down a list of every single question I thought they would ask - why do you want this scholarship, why you, etc - and rehearsed answers, recording myself, for a week straight. The interview came and went and I thought I did pretty well!

Fast forward a week and I got an email that I was not going to be offered a scholarship. Only two other students out of 25 were rejected. I was dumbfounded. There was no way I should have failed this test. I started thinking back to the interview. Some answers stand out as opportunities to improve - I could still tell you exactly what they are. There's one answer that I really wish I could get back.

At the end of the interview they asked "do you have any questions for us?" By this point I'd done so much research into the scholarship that I couldn't think of anything, and said "No." The interview ended.

But think about it from their perspective. I'd just spent an hour talking about myself; what does it show when I refuse to ask any questions? Not only am I denying them an opportunity for them to talk, I appear very incurious about the program itself. They didn't know that I had been quizzing myself on every aspect of the program for a week. I should have asked about anything — literally anything — and given them a chance to talk.

The funny thing was I had actually asked some of the people who had gone through the interview what they had been asked about. "Ask a question at the end" was both so obvious to them - pretty much every successful candidate had done finance interviews, where cultural signals are more important; I hadn't - and so oblivious to me that it hadn't even come up to people who wanted to help me succeed.

From that point on I made a point to always ask a question when someone asks if I have any questions. Ask anything. Even asking "what did you have for lunch" is better than asking nothing; the interviewer might start talking about whether the company pays for lunch, whether it's any good. My standby question is "what did you do yesterday" - it has a unique answer for each interviewer and reveals how people spend their time (vs. how they say they spend their time).

Finally, "person fails interview because interviewer expects to see cultural signal and interviewee does not broadcast cultural signal" is a common failure mode. Think about someone who wears a suit to a tech interview. Some organizations only want to hire people who can utter the secret words, and that's their choice.

But if your goal is to cast a wide net - I am looking at you, tech companies that put up billboards championing your commitment to diversity - and you have a candidate without a traditional background, maybe make a list of every reason you've used to reject a nontraditional candidate in the past and then email that to the candidate in advance of the interview - "wear a dress shirt and jeans," for example.1 You won't get everything, but it's a good start. (You can try to get your interviewers to discard the cultural signals but that's difficult and it might show up in their feedback without them realizing it.) Note that career services departments at good schools are already doing this for their students; what you are doing is leveling the playing field.

1 This is not a new critique by any means, people have made it about the SAT for some time - if the tests quiz applicants on vocabulary and grammar that are more commonly used in white households than nonwhite households, than identical students with identical aptitudes will score differently, which seems unfair.

May 08, 2020

Patrick Louis (venam)

Domain Driven Design Presentation May 08, 2020 09:00 PM



Basic Concept and Motivations

Concept and motivations

We’re used, as software engineers to try to make things perfect, to see things from above, to think we’re great architects and creators. What’s more important though is to create software that does an important job for someone. What are the best ways to create such software?

Trying to model the real world problems might be one, but is it the best way, nobody knows. What we know is that we want to address real world issues with our software. This is what I was talking about in my previous talk on “What programming and computing represent.”
Say for example that I want to create a software that is a virtual library of books, how would I go about doing that. A naive approach would be a simple document storage of names and pdf file. But who are we to say this is the way to solve this.

We don’t work in a vat, there’s usually already ton of software related to this. This is a problem space we’ll start to be part of, a series of inter-related problems. Which part are we going to be, are we going to integrate with a pre-existing software, upgrade it, or are we going to coexist with multiple ones, is it an addon we are trying to make.

This sort of reasoning and questioning is what drives domain driven design, saying “I don’t know about this topic and I want to know to be able to do my job properly”.

Two Views

Two views

There are generally two views, two set of eyes with which we can see a problem. The first one are the eyes of the engineer, very narrow-minded, strict to the principles, and knows what techniques can be used to make great and extensible software. Those are the things we’ve learned over the years while making software, the good practices.
The other view is the one of the experts in whatever we want to solve. A person that could talk endlessly in great details about his vision and his knowledge.

What we really wish for is to integrate and reflect those deep details in our software, but we usually lose this connection with those experts. We take their talk as just talk. There’s a gap between what we show them in the code, the software components, and what they discuss with us. So we got to find another way to do this, there got to be a better way.
Either we force them to understand software jargon, or we got to learn the actual issue we are solving and express it clearly in our code so that from that point on experts can give direct inputs to shape and choose which components should exist.

And from our side as software developers we should still try to maintain clean code, we should combine both perspectives instead of shifting between two sets of eyes, a collaborative creative effort between software experts and domain experts.

That sounds too good to be true though, how are we going to do that.

Experts and Exploration

Experts and exploration

It’s difficult to start a conversation with experts to get information out of them. You can try sitting them down, explain to them what UMLs are but that doesn’t work. You can try meetings, still nothing. We need a sort of format that would help bainstorm and get as much information as possible from experts.

One such format is called event storming, a sort of free for all brainstorming session with sticky notes that you put on a board. That’s one way, but to start before that we need to get a conversation going, we can’t start from nothing. What we can usually do is ask questions about scenarios that need to be implemented and listen carefully to the language they use.

Language hides concepts, language is the most important thing when trying to understand a problem.
That’s one reason why we shouldn’t dismiss when experts talk. We should also pay attention to the technical terms that are used. It’s too easy to dumb down the language and map it in our code like we understood it. But we should use the same language that experts use.
We can’t say, for example, “Book code”, instead of ISBN.

Iteratively, through discussions and brainstorming session we’ll get a better idea of the problem space we are tackling. Initially we had a naive and superficial concept. Each and every word in every discussion could be a hint leading to more knowledge we might have been missing. We need to pay attention to the meaning of the words themselves, especially technical ones. We need as many ideas as possible, as many brainstorms, not getting stuck on a single thread but continuing exploring.

The best refactoring that have the greatest impact are the ones that are motivated by new insights.

We’re like detectives on a continuous discovery and delivery track. Doesn’t that remind you of the Agile methodology, this is what it should be about: Having developers involved in the business related activities and having knowledge about it. Taking the time to sit down and understand first. Finding techniques to learn about the stuffs experts know about.

That should also help create an area of shared common knowledge in a team, a common language that everyone can talk.

Domain and models

Domain and models

However, we can’t keep exploring forever without any goal. We got to select the important stuffs out of all the information we got. How do we know it’s important, well it should fulfill the requirements of the stakeholders, the business case we want to solve. Otherwise, no money.

That’s why we need to map and distill this information in a model. We usually think of a UML diagram when we think of model, but we’re not limited to this here, when we talk about a model we mean something more abstract, the concept that lay behind the UML diagram.
A model is a representation, though imperfect, of a concept. It’s an abstraction serving a purpose. It should suit a particular problem it tries to solve, a model is a tool.
The models try to reduce the gap between reality and code. But remember, the map is not the territory.

Models live in a domain, the domain being the problem space we are tackling, the stuff the expert is expert about. The sphere of knowlege or activity.

With models, we distill the knowledge and assumptions we have about that domain, we try to find the essence and put it in a form that makes it valuable and purposeful. A model without a purpose shouldn’t exist. For example in the book case, a model could be about the inventory of books we have. That’s something we have to model somehow.

There are many ways to represent this, what can we choose as a model. We have so many choices, but which one is more useful for our case, which one should we reject, should we try them to see which one fits. A good method to choose a model is to create scenarios, business scenarios that is, that the model should fulfill. Those scenarios could be written by the experts themselves, that’s what BDD (Behavior Driven Developement) is about. Here are some good steps for effective modeling.

Effective modeling

Ubiquitous Language

Ubiquitous language

The culmination of the domain, model, and exploration with domain experts give rise to a common language that starts to be used everywhere. This language closes the gap between your software developers teammates and the experts, it shows everywhere, be it in the code components or your daily conversation about the software. But that brings the question, ubiquitous language within what. Teams are separated by projects, right?
The language, and thus the domain and its model, is limited to a bounded context.

Bounded Contexts

Bounded contexts

To be effective, models need to be unified so that there’re no contradictions between them. We need to use the language that we’ve created with the help of our domain experts to create boundaries inside the domain space. Each context should have a unified bounded concept, something it solves. We do that by assembling multiple models together, shop around for them and create a relationship between them by categorizing them into contexts.

This makes sense here, this makes sense there.

We could have a context related to shipping books and another context related to how we can manufacture books.

The language we use should show where the boundaries are. For example, if two things have the same name but mean different things, they probably live in different context. Of if they are actually the same thing but with different names then maybe we should combine them. Again, asking questions, exploring with domain experts is the way to go here.

Core Domain

Core domain

Sometimes, those boundaries exist because of business decisions. We care about some things more than others because they make us generate money. This is what the core domain is about.

We could categorize things into the core domain, generic domain, and supporting domain. The core being the highest value context in our business, the thing we should exploit the most and get the most out of. Generic domain is about anything that are common concepts but not core, and supporting domain could be helpers to do non-core related activities.

One key aspect of bounded contexts is that they also define the relationship between teams in an organization. Different teams use different language because they work on different contexts. The shape of the organization sometimes gives rise to the contexts. So we should be explicit about the interrelationships between those bounded contexts.

Integrity Between Bounded Contexts

Integrity between bounded contexts

Different context still need to interface with one another, especially when there’s a part of the domain that is common between them, something shareable. There are many ways to do this, some people may say this is enterprise architecture related talk at this level.

Here’s a bunch of methods to do so:

I won’t go in details about them

  • Shared Kernel (overlap allied contexts)
  • Customer/Supplier relationship (relate allied contexts)
  • Conformist (overlap unilaterally)
  • Separate ways (free teams to go)
  • Anticorruption layer (translate and insulate unilaterally)

The Blocks of DDD

The blocks of ddd

As software engineers we like to memorize things and force ourselves into stricts behavior, and as with anyting DDD has given rise to this too, even though it’s not mandatory to apply the philosophy. However, this can be helpful when we try to apply the principles in our code using some libraries that implement blocks we can refer to in our domain-expert knowledge of DDD.

Those blocks are the following.

An entity is an object with an identity. Something that is not only defined by its attributes but also by its functionality within a system. A good rule of thumb is that if making mistakes in the identity of the object leads to corruption of something then it’s probably an entity.

A value object is container of attributes, a bag of attributes. It’s treated as an immutable that describes some characteristics. It has no identity. For example: Blue or Yellow. But again those might be entities in specific domains, beware.

Aggregates are a collection of objects bounded togeter by a root entity. The root entity gives sense to this cluster of objects and thus it can be treated as a single unit of data.

A factory is a method for creating domain objects that are too complex to create by themselves.

A repository is an abstraction that encapsulate a storage mechanism and acts like a collection. It’s a way to retrieve and store domain objects.

A domain event is an object that represents an event, something that happens in services.

So services are the operations that do not belong to an object.



Let’s conclude.

Most software complexity lies in understanding the domain itself. That complexity is the inherent one that stands against the accidental one. This is why we need to cultivate models based on the language of experts. This is what DDD teaches us.

No mater the amount of SOLID, high quality, modular things we make, if we don’t tackle the essential complexity of our domain we won’t be solving anything.

This is all extremely important in today’s world as software gets entangled in every part of our lives.

May 07, 2020

Benjamin Pollack (gecko)

Goodbye, Twitter May 07, 2020 10:51 PM

Let’s zoom back to early April. We’re several weeks into the COVID-19 epidemic. I’m not sleeping well. In fact, a “good night” for me is just a few hours. I’ve realized a couple days ago I’m averaging about 30 to 40 hours of sleep per week. The talking heads on Fox have already begun their drumbeat about how we should reopen businesses to save the economy, despite zero economists arguing for that. The Overton window is already starting to shift from trying to avoid deaths to discussing how many hundreds of thousands of deaths are a “reasonable number” of deaths. The CDC itself will ultimately become a part of this narrative, revising their death count below what anyone else thinks so that they can say a “mere” thirty-nine thousand people have died, which they’re in turn doing because it helps Trump’s narrative when the Overton window doesn’t shift fast enough.

But I don’t even know that yet. That happens in late April. It’s still early April. What I do know, even now, is that the deaths are of course going to hit the poorest, the most disenfranchised, the people who cannot do what I’m doing and work from home. The deaths are going to hit people who have to work the grocery stores, the drug stores, the delivery services. I am unsurprised to see the death tolls comically high amongst people of color, and I’m equally unsurprised to see the talking heads “wonder” why. I’d honestly be more surprised if that weren’t the case.

I cannot make my brain stop thinking about this. Ever. So I have night after night of unending insomnia, forever, because I cannot do anything to fix any of this.

But there is one small thing I can “do”: I can scream on Twitter. I have relatively few followers, but I’m an old account, so my screaming “means something.” But that is itself complicated. Alongside the people whom I merely disagree with are the conspiracy theorists who are trying to say the US caused it. And they themselves are right alongside those saying China deliberately unleashed it. And both of them are alongside those saying the entire thing is a scam. None of these arguments make any sense; they don’t survive the first couple of questions, let alone an actual dialog. But that doesn’t matter; Twitter values all of these positions equally, and escalates them equally, as long as The Ratio is high enough.

So it’s bots against bots, all the way down. Even the people who aren’t bots are basically bots, because it’s not about facts; it’s about sides. There is zero room for nuance or discussion. You need enough people who either agree with you—or even disagree with you, as long as they elevate you—and you are going to be elevated in The Feed. It’s about retweets, and likes, and comments, and only the numbers matter; not what you said. If you tweet a middle finger emoji and you get a hundred thousand retweets, you are more valuable than the epidemiologist who tweets out an actual cure. No actual discussion happens. No one learns. No one is persuaded. Everyone is just angry.

Twitter is so caustic that I long ago had to use scripts that block many, many people. I only follow a couple hundred people, but I’ve had to block nearly forty thousand just to make Twitter tolerable, let alone hospitable. But Twitter is how I communicate. I want to stay involved. I want to stay a part of the conversation. And I have convinced myself that staying on Twitter is how I can do that. I cannot, absolutely cannot, put down this particular bullhorn. I must remain a part of the discussion. So I lay awake and don’t sleep and wonder what the debate will be tomorrow.

That was a few weeks ago.

This evening, for whatever reason, I had it. I’m done. There are many great people on Twitter, but it’s no longer worth engaging with them this way. It’s not even worth it when I’m not getting harassed, because Twitter’s algorithms—correctly, for their bottom line—are only sated when I’m angry enough to use Twitter when I’m on the toilet, and that’s a very high bar. So Twitter goes to great lengths to ensure that I’m as angry as possible all the time. And that will only capture me at my worst. No one will be persuaded, and nothing will be gained.

And I realized I’d felt this once before, about Facebook. And I deleted my account then. And I hadn’t deleted my account on Twitter because I “needed” it.

But the truth is…I don’t need Twitter. Twitter needs me.

So I’m done. If you follow me on Twitter at @gecko, please feel free to subscribe to this blog’s RSS feed, or to email me at if you want to contact me. I’m available a multitude of other ways, many in person if you live in the Raleigh/Durham/Chapel Hill area. And if you’re a real person and want to talk, I’d really like to chat with you about what you feel, and why, and about how I feel, and why, and see where we overlap and where we differ, and where we can learn from each other.

Twitter and me, though? We’re done.

May 05, 2020

Bogdan Popa (bogdan)

Using GitHub Actions to Test Racket Code (Revised) May 05, 2020 07:00 AM

A little over a year ago, I wrote about how you could use the GitHub’s new-at-the-time Actions feature to test Racket code. A lot has changed since then, including the release of a completely revamped version of GitHub actions and so I thought it was time for an update. A Basic Package Say you’re working on a Racket package for computing Fibonacci sequences. Your main.rkt module might look something like this:

May 04, 2020

Simon Zelazny (pzel)

Connecting Arduino IDE to Uno on Raspberry PI3B+ May 04, 2020 10:00 PM

Today I got my Arduino Uno working, programmable via Arduino IDE from a Raspbian installation on a Raspberry Pi3B+. I connected the Uno via USB and at first had some problems getting the board detected. Here's how I resolved the issue.

After writing a simple blink application, I tried uploading it to the connected Uno, only to get this error in the Serial Monitor window:

Serial Port 'COM1' not found"

The Tools -> Serial Port option in the menu was grayed out. After scouring the internet I figured it's likely a permissions issue. Here are the group changes I enacted to make things work:

 sudo gpasswd -a pi uucp
 sudo gpasswd -a pi tty
 sudo gpasswd -a pi dialout

After the changes, group membership is:

pi$ groups
pi adm tty uucp dialout cdrom sudo audio video plugdev games users input netdev gpio i2c spi

This is on Raspbian:

pi$ echo $(uname -a); cat /etc/issue
Linux raspberrypi 4.19.97-v7+ #1294 SMP Thu Jan 30 13:15:58 GMT 2020 armv7l GNU/Linux
Raspbian GNU/Linux 10 \n \l

Now, the Uno is detected successfully as /dev/ttyACM0, and can be programmed with the Arduino IDE.

Jan van den Berg (j11g)

Jitsi finetuning and customization May 04, 2020 06:26 PM

Jitsi offers a great user experience because it doesn’t require an account, you just go to a Chrome URL and you’re pretty much good to go. You get a full blown video chat environment: complete with gridview, screensharing and chat options. No add ons or third party installations needed. I greatly prefer this instead of Zoom, Google Hangouts or Microsoft Teams or what have you.

Jitsi is also a great piece of software to host. And installing and hosting your own video software conferencing software has never been easier.

Chatting with 8 people. No problem. Emojis added for privacy (not a Jitsi feature, yet)

Here are some tips to run the Jitsi stack smoothly on your server and how to customize Jitsi Meet.


  • Put Ngnix in front of Jitsi. This helps to handle the bulk of the web requests. Otherwise Java will take care of this, and this is not what Java is particularly good at.
  • Use JRE11 to run jicofo and jitsi-videobridge2. The latest Debian 10 (Buster) comes with both JRE8 and JRE11. Make sure to use JRE11. This made quite a bit of difference in our tests.
ii openjdk-11-jre-headless:amd64 11.0.7+10-2ubuntu2~18.04 amd64 OpenJDK Java runtime, using Hotspot JIT (headless)
  • Always use the latest Jitsi packages. They get updated quite frequently, and you definitely want the latest. E.g. last Friday the latest release had a bug, this was fixed the same day. So make sure you always run the latest version.
root@meet01:/# cat /etc/apt/sources.list.d/jitsi-stable.list
deb stable/

We run the following packages.

ii jitsi-meet 2.0.4548-1 all WebRTC JavaScript video conferences
ii jitsi-meet-prosody 1.0.4074-1 all Prosody configuration for Jitsi Meet
ii jitsi-meet-turnserver 1.0.4074-1 all Configures coturn to be used with Jitsi Meet
ii jitsi-meet-web 1.0.4074-1 all WebRTC JavaScript video conferences
ii jitsi-meet-web-config 1.0.4074-1 all Configuration for web serving of Jitsi Meet
ii jitsi-videobridge2 2.1-197-g38256192-1 all WebRTC compatible Selective Forwarding Unit (SFU)
  • Running from package also has the benefit that the package maintainers tune several kernel parameters specifically for video chat with the installation. You definitely want this.

Other tips

With all of the above you should be good to go. The following two tips are optional and more user specific, if you still run into (bandwidth) issues.

  • Ask clients to scale their video quality to low definition. There is a server wide setting that should theoretically be able to enforce this, but I have not been able to get this to work.
  • Use Chrome. Jitsi does not work on Safari at all, but should work just fine on Firefox. However it seems specifically designed for Chrome. In my experience: when everyone is on Chrome, Jitsi Meet seems to work best.

Customizing Jitsi Meet

Every time you upgrade your Jitsi packages, all your custom changes will be overwritten. You can run this script after every upgrade to change your personal settings. Please change appropriate settings for your installation.

#Run this after a Jitsi upgrade

cp -ripv own-favicon.ico /usr/share/jitsi-meet/images/favicon.ico
cp -ripv own-watermark.png /usr/share/jitsi-meet/images/watermark.png

sed -i 's/Secure, fully featured, and completely free video conferencing/REPLACE THIS WITH YOUR TITLE TEXT/g' /usr/share/jitsi-meet/libs/app.bundle.min.js

sed -i 's/Go ahead, video chat with the whole team. In fact, invite everyone you know. {{app}} is a fully encrypted, 100% open source video conferencing solution that you can use all day, every day, for free — with no account needed./REPLACE THIS WITH YOUR OWN WELCOME TEXT/g' /usr/share/jitsi-meet/libs/app.bundle.min.js

sed -i 's/Start a new meeting/REPLACE THIS WITH YOUR OWN TEXT/g' /usr/share/jitsi-meet/libs/app.bundle.min.js

sed -i 's/' /usr/share/jitsi-meet/interface_config.js

sed -i 's/Jitsi Meet/YOUR OWN TITLE/g' /usr/share/jitsi-meet/interface_config.js

/etc/init.d/jicofo restart && /etc/init.d/jitsi-videobridge2 restart && /etc/init.d/nginx restart && /etc/init.d/prosody restart

The post Jitsi finetuning and customization appeared first on Jan van den Berg.

Benaiah Mischenko (benaiah)

Everything You Didn’t Want to Know About Lua’s Multi-Values May 04, 2020 07:00 AM

Everything You Didn’t Want to Know About Lua’s Multi-Values

I’ve been spending a lot of time lately writing Fennel and contributing to its compiler. Since it’s a language that compiles to Lua, that also means spending a lot of time getting to know Lua. This article will be focused on Lua (though I’ll address Fennel from time to time in the footnotes).

Lua is a neat, elegant, and relatively simple language. I find it particularly notable for its embeddability into other programs; capability of hosting many different styles of programming; and the excellent performance of one of its implementations, LuaJIT. In many ways, it feels like a more elegant and restrained JavaScript - it supports tail call elimination[1] and coroutines, it doesn’t have the confusing var/let distinction (though it lacked a feature like const until Lua 5.4), and it doesn’t privilege class syntax over other styles of programming (object oriented programming can be done in Lua with metatables, but it’s not given special syntax like it is in JS).

That said, it has aspects that are undoubtedly quirky - arrays and dictionaries are unified into "tables", whose integer indices start at 1; the standard library is unusually minimal, a nod to its embeddability; it, like JavaScript, makes the immense error of making variables global by default and local only when you ask. Curiously, it also supports a quite uncommon language feature: multiple return values (which we’ll call “multivals” for brevity’s sake). It is this last subject that we’ll be unpacking more about today.

Since Lua is a dynamic language (as opposed to, say, Go) multiple returns are implemented in a rather odd way. This isn’t explored in as much detail as I’d like in Lua’s documentation, so I wanted to provide a one-stop resource that explains everything I can think to write down about this idiosyncratic part of the language.

A spoiler: unfortunately, multivals will turn out to be somewhat underwhelming for use in code you care about. They’re awkward to work with, have numerous gotchas, are a maintenance hazard, and don’t perform notably better (that I know of) than their primary alternative, tables. This is not a very useful article, unless you have a use for knowing minutiae about Lua’s behavior.

Let’s dive in!

An Introduction to Multivals - 1, 2, 3, unpack, and ...

There are three primary ways to represent a multival in Lua. Each of these has their own distinct uses. Each of the following code snippets produces the same multival:

  • A multival literal: 1, 2, 3. (Note that this does not include table literals like {1, 2, 3}).
  • Unpacking a table: table.unpack({1, 2, 3})
  • The vararg, ..., in a function: local function x(...) return ... end x(1, 2, 3)

Multivals can be packed into a table by simply calling them:

local function two_values() return "first", "second" end
local tab = {two_values()}
print(tab[1], tab[2])
  -- prints "first second"

To manipulate multivals, we can use two techniques: function signatures and select:

local function multival_first(x, ...) return x end
print(multival_first(1, 2, 3))
  -- prints "1"

local function multival_rest(x, ...) return ... end
print(multival_rest(1, 2, 3))
  -- prints "2 3"

print(select(2, 1, 2, 3))
  -- prints "2 3"; identical to the last line

In order to get values from the end of a multival, we have three options. The first is recursion:

local function increment_each_value(first, ...)
  if first then
    return first + 1, increment_each_value(...)

print(increment_each_value(1, 2, 3))
  -- prints "2 3 4"

The second is packing the multival into a table. There are two ways to do this, both of which are demonstrated below:

-- 1. You can pack a multival directly into a table literal
local function pack_multival(...)
  return {...}

-- Lua's default printing for tables only shows identity; we'll just
-- print the length instead.
print(#pack_multival(1, 2, 3))
  -- prints "3"

-- 2. You can use table.pack, which is built in to Lua 5.2 and later
local tab = table.pack(1, 2, 3)

  -- prints "3"

-- table.pack also sets the "n" field on the table it returns. This is
-- more efficient than using #tab, which iterates through the table.
  -- prints "3"

The final way of getting values from the end of a multival is select combined with select("#", ...) (which we’ll dig into more later on):

local function get_end_of_multival(...)
  local count = select('#', ...)
  return select(count, ...)

print(get_end_of_multival(1, 2, 3, 4, 5))
  -- prints "5"

The Vararg: A Second-Class Value

When you start working with multivals, they might seem pretty straightforward. As long as you’re working on something that uses a list with fixed length or that can be expressed by iterating through the head and tail of a list, it seems like working with multivals should be nice and predictable.

The first issue you’ll likely run into if you try to use multivals heavily comes from expecting them to work the same way the rest of Lua does. In short, Lua’s local variables use what’s called lexical scope. This means that the scope of a variable definition is the block of code that you see it contained in in your source code file. It doesn’t matter what order other functions are called in, or whether they’re called at all - the definitions of local variables can be determined just by looking at the structure of the code. This is as opposed to dynamic scope, where the variables defined at a given point are dependent on the runtime of the program itself. Lua’s global variables effectively use dynamic scope.

As discussed above, there’s only three ways to refer to a multival: calling a function for its return values, literally inserting multiple values separated by commas, or using the vararg symbol ... as a function argument. It’s this last case that we’re interested in.

When you use ... in the signature of a function, you must place it at the end of the argument list. (This is what prevents you - syntactically, at least - from accessing the end of ... without either assigning the whole thing to individual variables, recursing through it, or packing it into a table.)

You also can’t assign the whole thing to a variable, because assigning a ... to a variable means unpacking the multival and assigning the first value to the variable - or multiple values, if that’s specified. For example:

-- Assigning the vararg to a single variable will assign that variable
-- to the _first value_ of the vararg.
local function try_to_assign_vararg(...)
  local x = ...
  return x

print(try_to_assign_vararg(1, 2, 3))
  -- prints "1"

There’s one additional rule about the vararg, however, that’s very unlike the behavior of the rest of Lua. You can only use a vararg within the function where it is defined. This prevents you from, for instance, saving the vararg into a closure:

local function try_to_return_vararg_from_closure(...)
  return function()
    return ...

This fails to run with the following error: cannot use '...' outside a vararg function near '...'. This means that, while you can use ... in two nested functions, each ... can only refer to the vararg of the function that’s currently running. You can’t capture ... within a closure to persist it. Effectively, ... is not lexically scoped, but rather a dynamically scoped variable with certain extra rules like not being reassignable and not being usable outside a function that defined it.

Another way to put this would be to say that Lua’s multivals are “second class” values. You might be familiar with languages that have second class functions, which can’t be assigned to variables or passed as a parameter to a functions. This is similar, with the notable exception that we can pass the vararg to another function. We just can’t save it in a variable, save it in a closure, or manipulate it in certain ways.

Cutoff Multivals

Passing multivals to functions and packing them into tables is pretty straightforward, but there’s one major gotcha about it we haven’t gone over yet. Take a look at the following example:

local function returns_three_values()
  return 1, 2, 3

  -- prints "1 2 3"
  -- prints "3"

print(returns_three_values(), 4, 5)
  -- prints "1 4 5"
print(#{returns_three_values(), 4 5})
  -- prints "3"

As the second pair of print expressions demonstrates, a function call can only return multiple values into a multival if it is the last thing in the multival. Following a function call in a multival with any other value, even nil, will cut off its return values at the first item.

This also applies to the vararg, ..., in function definitions:

local function f(...)
  return ..., 4, 5

print(f(1, 2, 3))
  -- prints "1 4 5"

Lua’s Most Jarring Feature: select("#", ..)

One of Lua’s nicer properties is how a key set to nil and a non-existent key are indistinguishable in a table. In Javascript, for instance, you have both obj.x = undefined to set a property to undefined and delete obj.x to actually remove it from the object. (There’s also obj.x = null, but let’s ignore that for now.) In Lua, there’s just tab.x = nil, and there’s no way to distinguish between a property that was set to nil and a property that was never set at all.

With that in mind, let’s look at the following example:

local y = {nil,nil,nil}

   -- prints "0" - nils at the end of a table don't matter

local function multival_length(...) select("#", ...) end

  -- prints "0"

print(multival_length(nil, nil, nil))
  -- prints "3". wtf?

As this example demonstrates, multivals throw that nice property out the window. Just to be clear:

  • There is no distinction between {nil, nil} and {} in Lua.
  • There is one distinction between multival_length(nil, nil) and multival_length() in Lua. That distinction is made byselect("#", ...).

This becomes even more confusing when combined with a function that returns zero values, as opposed to nil:

print(select("#", print("a"), print("b"), print("c")))
  -- first prints each of "a", "b", and "c" on their own lines
  -- next, prints "2". wait, what?

What in the world? What’s going on here?

As it turns out, a function that returns zero values creates an empty multival, as you’d expect. It also doesn’t add anything to the end of a multival if you call it at the end. However, if a call to a zero-return-value function appears before the end of a multival, it results in a nil being inserted into the multival in place of its zero return values. Thus, instead of collapsing the multival made up of all three of the print calls into a single zero-length multival, Lua instead collapses the last call to print, but turns the other two calls into nil. This all doesn’t apply to tables, because, unlike in multivals, nils in tables are truly identical to absent values.

At first, this all seems like something that doesn’t belong in Lua. Without the select("#", ...) form, there’d be no way of telling a nil at the end of a multival from the end of the multival itself. However, this makes certain abstractions much easier to create.

One notable situation where select("#", ...) becomes particularly useful is when wrapping functions that return multiple values. select("#", ...) lets you easily tell if you can simply save the first return value from the function or if you need to create a table to save all the values.

Are You Sure That’s a Tail Call?

When first looking at multivals in Lua, a programmer who’s determined to use them somehow might first be inclined to combine them with tail recursion to neatly express different functions. For instance, we could implement range as follows:

local function range1(acc, n)
  if n < 1 then return
  elseif n == 1 then return acc
  else return acc, range1(acc+1, n-1)

local function range(n)
  if n < 1 then error("n must be >= 1") end
  return range1(1, n)

  -- prints "1 2 3"

  -- prints "10"

This works great! We can call range(n) when we want a multival of length n, and surround it with braces like {range(n)} when we want the equivalent table. This should let us avoid allocating extra tables when we don’t need them. And since Lua has tail call elimination, the recursive call range1 makes to itself should never blow the stack. Right?

print(#{range(1000 * 1000)})
  -- throws a stack overflow error


As it turns out, multivals break tail call elimination. If you’re returning multiple values, Lua needs to collect all those values before it can actually begin to return them. This means that it can’t throw away the stack frames of the recursing function. It’s very similar to what would happen if you returned {range1(acc, n)} within range1. That case is clearly not a tail call. It’s the particular syntax of multivals that makes this confusing, since it doesn’t look like the faux-tail-call is “surrounded” by anything.[2]

In order to avoid the stack overflow and properly recurse the function, we can implement range with a table as follows:

local function range1(tab, acc, n)
  if n < 1 then return tab
    tab[acc] = acc
    return range1(tab, acc+1, n-1)

local function range(n)
  if n < 1 then error("n must be >= 1") end
  return range1({}, 1, n)

print(#range(1000 * 1000))
  -- prints "1000000"

It’s perhaps less elegant, mixing recursion and mutation, but its runtime characteristics are vastly better.

Maximum Occupancy

It makes sense that recursing to much could cause issues by blowing the stack. What’s less intuitive is that multivals have a cap on their size even when you make them without recursion. Consider this use of the range function we defined above:

local t = range(1000 * 1000)
  -- prints a table identity

Great! Now, let’s try unpacking that into a multival. No recursion is involved, so this should work fine, right?

local t = range(1000 * 1000)
  -- throws "too many results to unpack" error

As it turns out, multivals simply have a cap on the number of values they can contain. There’s no way to work around this - if you need to handle a list of items without worrying about how big it is, use a table, not a multival.

Multival literals in function arguments and return values are even more restricted:

print((function() return 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
            14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
            28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
            56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
            70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
            84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
            98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
            110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
            121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131,
            132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
            143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153,
            154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
            165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
            176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186,
            187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197,
            198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208,
            209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219,
            220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230,
            231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241,
            242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252,
            253, 254, 255 end)())
  -- throws "function or expression needs too many registers" error

print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
      19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
      35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
      51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
      67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,
      83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
      99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
      112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124,
      125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,
      138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150,
      151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163,
      164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176,
      177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189,
      190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202,
      203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215,
      216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228,
      229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241,
      242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
  -- throws "function or expression needs too many registers" error

There is, however, no limit on table literals, reinforcing the fact that the contents of table literals are not multivals, despite their similarity in appearance:

-- generate.lua
for i = 1, 1000 * 1000 do
    print(i ..",")
lua generate.lua > generated.lua && lua generated.lua
# prints "1000000"

Multivals are Data Structures, Just Bad Ones

What all these different examples of multival quirks demonstrate is that multivals are just another data structure. (I find this case is made best by how tail call elimination doesn’t work when recursing within a multival, just like it wouldn’t work within a table literal.) They aren’t exceptions to the rules that data structures follow - in fact, they have extra rules that tables don’t. To sum up:

  • Multivals cannot be assigned to variables. They can only referred to as literals, function call expressions, or the vararg ....
  • The vararg cannot be used outside the function that creates it (including within closures made within that function).
  • Multivals are cut off at the first value when inserting them into another multival before its end.
  • Unlike tables, multivals have a built-in length that is unrelated to the arrangement of nils within the multival. This length can be retrieved with select("#", ...).
  • When a multival contains a call to a zero-return-value function before the end of the multival, a nil is inserted where the function’s return value would go.
  • When a function makes a recursive call within a multival, tail call elimination is not applied. Recursing too many times within a multival will thus blow the stack.
  • Unpacking too many values from a table into a multival will result in an error.
  • Trying to call or return from a function with too many arguments will result in an error. This limit is much lower than the previously-mentioned limit on unpacking tables, being just below 255 items.

A Last Tiny Nitpick

Finally, one incredibly subjective thing that bugs me about multivals is the syntax used for the vararg. In my opinion, the fact that ... is valid Lua makes it unnecessarily hard to insert an easily-understood placeholder into example code.


While multivals remain a strange and sharp-edged corner of Lua, I hope that they’re a little bit easier to understand thanks to this post. While they’re rarely the best solution to a given problem, it’s still helpful for a Lua user to understand their strengths and limitations, even if it’s only to justify avoiding them.

If you’d like to leave a comment, please email

  1. Also known as Tail Call Optimization. JavaScriptCore, used in Safari, has Proper Tail Call Support, but most JS developers can’t make use of this, as it’s not available in popular browsers on most platforms.

  2. In Fennel, this syntactical confusion gets even worse! There, instead of having their own special syntax, multivals are represented with the values special. (values 1 2 3) in Fennel is the same as 1, 2, 3 in Lua. Unfortunately, this makes recursive functions that run into this scenario look even more like real tail-recursive functions. This is a gotcha to keep in mind whenever working with multivals and recursion in Fennel!

May 03, 2020

Ponylang (SeanTAllen)

Last Week in Pony - May 3, 2020 May 03, 2020 11:57 PM

Ponyc 0.34.0 has been released!

Derek Jones (derek-jones)

Enthusiasm on the Fortran standards committee May 03, 2020 09:58 PM

The Fortran language standards committee, SC22/WG5, has an unusual situation on its hands. Two people have put themselves forward to chair the committee, when the current chairman’s three year term ends. What is unusual is that it is often difficult to find anybody willing to do the job.

The two candidates are the outgoing chair (the person who invariably does the job, until they decide they have had enough, or can arm wrestle someone else to do it), and a scientist at Los Alamos; I don’t know either person.

SC22 (the ISO committee responsible for language standards), and INCITS (the US Standards body; the US is the Fortran committee secretariate) will work something out.

I had heard that the new guy was ruffling some feathers, and I thought good for him (committees could do with having their feathers ruffled every now and again). Then I read the running for convenor announcement; oh dear. Every committee has a way of working: the objectives listed in this announcement would go down really well with the C++ committee (which already does many of the points listed), but the Fortran committee don’t operate this way.

The language standards world appears to be very similar to the open source world, in that they are both driven by the people who do the work. One person can have a big impact in the open source world, simply by doing the work, but in the language standards world there is voting (people can vote in the open source world by using software or not). One person can write papers and propose lots of additions to a standard, but the agreement of committee members is needed before any wording is added to a draft standard, which eventually goes out for a round of voting by national bodies.

Over the years I have seen several people on a standards committee starting out very enthusiastic, writing proposals and expounding them at meetings; then after a year or two becoming despondent because nothing has happened. If committee members don’t like your proposal (or choose to spend their time on other proposals), they do nothing. A majority doing nothing is enough to stop something happening.

Once a language has become established, many of its users want the committee to move slowly. Compiler vendors don’t want to spend all their time keeping up with language updates (which rarely help sell more product), and commercial users don’t want the hassle of having to spend time working out how a new standard might impact them (having zero impact on existing is a common aim of language committees).

The young, the enthusiastic, and magazines looking to sell clicks are excited by change. An ISO language committee is generally not the place to find it.

Bogdan Popa (bogdan)

Announcing racksnaps May 03, 2020 11:00 AM

Racket’s package manager doesn’t currently have the notion of locking package sets to specific versions1 per project so, as someone who operates a couple production Racket applications, I’ve been concerned about the possibility that new deployments could introduce bugs in production due to changing dependencies. To solve this problem, over the past weekend I’ve put together a service that creates daily snapshots of the official package catalog. You can find it at racksnaps.

May 02, 2020

Unrelenting Technology (myfreeweb)

Ported the Firefox Profiler to FreeBSD in order to investigate why May 02, 2020 10:01 PM

Ported the Firefox Profiler to FreeBSD in order to investigate why WebRender has some jank when scrolling some walls of text on my 4K/HiDPI setup.

The profiler code initially looked somewhat scary: some Google Breakpad code is used, a custom stack unwinder called LUL is used on Linux (which also partially derived from Breakpad code)…

Initially, I got it working with “pre-symbolication” (an option to build the goblin ELF parser into Firefox for this purpose) only, ifdef’ing any Breakpad code out.

Turns out:

  • the only part of Breakpad used is extracting build IDs from shared objects (and in fact the “base profiler”, a copy (for now) of the Gecko profiler used for profiling during the early startup phase, just copied all the relevant code);
  • devel/breakpad was there in FreeBSD Ports (but expires in like three days!), and its patches showed that it’s really trivial to get all it working.

So I got the main symbolication system working. Which, it turns out, runs WebAssembly-compiled goblin in a web worker! Fun stuff. Requires stripping libxul for now tho.

In the end, the patch turned out to be mostly just ifdef’s! The only meaningful parts are: thr_self/thr_kill2 instead of gettid/tgkill, supporting the different mcontext structs, and (for the pre-symbolication code path) ignoring symbol names returned by dladdr because they’re hilariously bad.

BTW, earlier in the dev-tools-on-FreeBSD space: heaptrack! I even used it to find a real memory leak in Wayfire.

May 01, 2020

Jeremy Morgan (JeremyMorgan)

The Best Editors for Go Programming May 01, 2020 10:33 PM

So I’m running a poll right now to see what editor Go programmers prefer. Vote for your favorite and I’ll post the results here. I write go with several different editors. But here are some of my favorites for Go programming: Goland Goland is by far my favorite editor, especially for larger projects. It’s tailor made for Go programming and has some excellent features like: Built-in debugging Smart completion Inspections Refactoring tools It’s the best IDE I know of right now for serious Go development.

Patrick Louis (venam)

Time on Unix May 01, 2020 09:00 PM


What is time

  • Time is relative
  • Measuring time and standards
  • Coordinating time
  • Time zones
  • DST

Time, a word that is entangled in everything in our lives, something we’re intimately familiar with. Keeping track of it is important for many activities we do.

Over millennia, we’ve developed different ways to calculate it. Most prominently, we’ve relied on the position the sun appears to be at in the sky, what is called apparent solar time.

We’ve decided to split it as seasons pass, counting one full cycle of the 4 seasons as a year, a full rotation around the sun. We’ve also divided the passing of light to the lack thereof as days, a rotation of the earth on itself. Moving on to more precise clock divisions such as seconds, minutes, and hours, units that meant different things at different points in history. Ultimately, as travel got faster, the different ways of counting time that evolved in multiple places had to converge. People had to agree on what it all meant.

In physics, time is the progression of events, without events there’s no time. It is defined by its measurement, what changes at a specific interval can be considered a unit, though still infinitely divisible.
In physics there are two ways to view time. In classical physics, time is absolute, independent of the perceiver, synchronized for everyone. While in modern physics, we have Einstein’s special and general relativity that applies, things depend on a frame of reference, time can dilate or contract with the effect of gravity, we talk of space-time.
In physics, equations work equally well in both ways, the math holds up, in the future and in the past. However, the arrow of time in our universe seems to go in a unique direction. Peculiarly, we’ll see that time in computers, unlike in our universe, can actually go backward at specific events.

All of this to say that because of the importance of tracking time, we’ve created ultra-precise atomic clocks that have an error of 1 second every 30 million years. We can be categorically sure of the lapse that happens between two beats/oscillations, if there’s an error then it’s outside the human life-span, and we’ve got many of them to correct errors. Those clocks are our sources of truth, they give us the definition of the standard unit of second, SI second.

We have, on one hand, the atomic clocks counting time regardless of events happening around, and on the other hand, we have a moving planet in space that is subject to forces, where we’ve chosen the fact that one full orbit around the sun equals a year and that one full (approximate) rotation on itself is a solar day, the space between two transit of the sun (maximum height in the sky). Both of those values ought to diverge and differ eventually.
The earth, because of its unevenness and current position in its orbit, could rotate around the sun or itself faster or slower, its speed changing how long days and years are.

What we’ve done is used this standard definition of the SI second as our anchor. A day is now not defined by the apparent sun position but as the average number of standard unit seconds that make up an average stellar day, somewhat around 86400.002 seconds. This actual uniform clock time is called the mean time. Mean solar time is the exact average mean time for a day in a year. That is the sum of all solar days over n days.

Thus, clocks that have a uniform fixed value, mean-time, will differ with the apparent sun time. This difference is called the equation of time (EOT) and it can vary as much as 15min (ahead 14 minutes near February 6, behind 16 minutes near November 3) but rebalances itself as the earth finishes its orbit around the sun. There are many simulations you can find online to understand this concept.

As for years, our calendars can only hold entire days but the actual number of days it takes to finish an orbit is fractional. And so we accumulate this fraction over multiple years and add an extra day to the year that follows making it a leap year, 366 days instead of 365. On a Julian calendar, a year is 365.25, however this is not precise, it is higher than the actual number of days it takes to form a year: 365.242199. The Gregorian calendar, which is the most common today, defines it more precisely as 365.2425, adding a leap year 97 out of 400 years.

But we still use 86400 seconds to define a day in our current lives, in our software, right? What about the rest of this complex system, how do we cope with these discrepancies, who chooses the mean time, how can we all sync on those values, who’s in charge? Noon where you are might not be noon where I am.

The local time that shows on our clock is chosen by our local authorities, we call it civil time.
And because we all live on the same planet, instead of each syncing in our local community with what appears to us as the mean time, we can choose a fixed geographical spot, create a rigorous standard time there, and for the rest of the world, derive from it. Anything further away in longitude from this meridian can add a delta time difference. That way we can all sync and make less of a mess in this interconnected world.

The first major standard was set at the Royal Observatory in Greenwich, London. The mean time recorded there was used as the one to derive your local civil time as an offset from, called Greenwich Mean Time, or GMT for short. However, it was not as precise as it could be and thus got replaced in 1967 by another standard called Universal Coordinated Time, UTC.

UTC is a version of the Universal Time standard. In this standard we also find UT1, that keeps track of Earth rotation angle using GPS satellites, it is the mean solar time at 0 degree longitude, it’s a better and more precise version of GMT.

UTC, instead of relying on the rotation of the earth, relies on the International Atomic Time (TAI), which is the time we talked about that defines precisely the SI second using 400 atomic clocks at multiple laboratories worldwide. Additionally, to keep count of the rotation of the Earth, and keeps in sync with UT1, the UTC authorities can add or remove a second in a day, a leap second. The difference between UTC and UT1 is DUT1, basically when DUT1 is one second we need a leap second.
So in UTC, a second is well known, but the number of seconds in a minute can vary between 59,60, or 61 if there was a leap second. Any unit bigger than the SI second reference can vary. Let’s also note that UTC uses the Gregorian calendar as previously said.

As you could’ve guessed, introducing a leap second isn’t a decision we take instantaneously, it’s announced at least six months in advance in “Bulletin C” by the International Earth Rotation and Reference Systems Service (IERS) which is one of the authority. There’s also involvement in the standard from the International Astronomical Union (IAU) and the International Telecommunication Union (ITU).

With this, we’re set, we have a clean standard, now how should we divide the world such that civil local time keep in sync with the sun.

Time and longitude difference is what we need, we split the world into 24 meridians, each 15 degrees apart, each meridian zone represents one hour separation offset from UTC. Those are called time zones, they can go from UTC-12 to UTC+14, and can sometimes be referred to by their name, for example Western European Time, Central European Time, etc… However, countries don’t fall precisely on meridian, and thus local authorities can choose which section of the country follows which time zone as their civil local time, this difference doesn’t even have to be an integer number of hours, it could be 15 or 30min for example.

Moreover, there’s a practice called daylight saving time (DST), or summer time, which is used in civil time to advance forward the clock by one hour in spring and set it back one hour in autumn/fall. For example in winter the region could be on UTC+2 (EET) and in summer on UTC+3 (EEST). Creating a 23h day in late winter and a 25h day in autumn/fall. This practice is being reconsidered in the EU and planned to be removed by 2021.

So that’s it we’re all in sync!

Now on computers, how do we write time, how do we represent it textually.

Representing time

  • locale
  • tzdata

The easiest way I’ve found to test many formats is to use the date(1) command.

It can show the time both in human-readable format string and more machine-readable numeric formats.

Some formats include the timezone as numeric, others as the alphabetic time zone abbreviation. You can represent the date with or without the time zone, with it to make it more precise to the viewer.

Some formats prepend the time zone part with the ‘Z’ character, which has origins in the GMT standard but that now stands for zone description of zero hours, also sometimes called “Zulu time”.

We can see that the date command automatically knows our time zone and how to display the time in a way we can read it. How do we set this format and where does it take it from. How to set the time zone.

Let’s start with the formatting.

The date commands relies on locale, which is an internationalization mechanism used by Unix-like operating systems. Locale are configurations that can be used to set the language, money, and other representational values that can change by location. The libc on the system, and consequentially the coreutils, should be aware of those locale values.

The specific category of locale we are interested in is the LC_TIME, which is used for the formatting of time and date information, spelling of weekdays, months, format of display, time zone, 24 vs 12h format, etc.

To inspect specific values in LC_TIME you can do, see man locale(7) for info:

$ locale -ck date_fmt
date_fmt="%a %b %e %H:%M:%S %Z %Y"

The available locale are usually found in:


Locale can also be set on a user level in ~/.config/locale.conf, or $XDG_CONFIG_HOME/locale.conf, or generally $HOME/.config/locale.conf.

All of this works because of the way profiles are loaded in the shell, you can take a look at /etc/profile.d/

Now regarding the time zone.

The time zone information database is distributed by the IANA, it’s referred to as the tz database. The Unix distribution downloads when updated and installs it in /usr/share/zoneinfo/ so that other libraries and programs can use it. In the tz data/zoneinfo db we can find all the information required to keep track of time in specific places. It’s distributed in such a way to make it easier to choose time zone by choosing the city name or location instead of the exact drift/skew from UTC. That takes care of all the differences in civil time, all historic weirdness over time, daylight saving, and leap seconds.

To change the timezone globally we have to link /etc/localtime to an entry in /usr/share/zoneinfo/. For instance:

ln -s /usr/share/zoneinfo/America/New_York /etc/localtime

Many Unix-like system provide helpers to not have to manually link it. There’s timedatectl(1) from systemd and /etc/timezone on Debian, for instance.

The TZ POSIX environment variable can also be used to specify the zone in human-readable format on the command line, and the TZDIR to specify the location of the tzdata. That means separate users on a single system can have different time zones.


TZ='America/Los_Angeles' date

The format of the tz database aka tzdata is explained in details in the man tzfile(5). To create it you can use a textual format, and convert it using the command zic(1) (zone information compiler).

Example of creating your own tzdata:

> echo "Zone COOL +2:25 - COOL" >
> mkdir ./zoneinfo
> zic -d ./zoneinfo
> TZDIR=zoneinfo TZ=COOL date
# Should Output something similar to
Sun 12 Apr 2020 11:44:00 AM COOL

So now programs that rely on the standard time.h header can be aware of the time zone info. You can also load your own dynamically using tzset(3) from the TZ env.

Where do we usually find time on Unix

  • POSIX time
  • Uptime
  • time(1)
  • Programming language and timestamp
  • File system atime, ctime, mtime, etc.
  • Cron

Time is used in so many places in our operating system. We’re going to list some common places where it is found and then build on them to approach more complex topics in the following sections.

Let’s start with the infamous POSIX time. POSIX time, or Unix time, or Epoch time, is the number of seconds that have elapsed since Epoch time, which is the 1st of January 1970 00:00:00 UTC, minus the leap seconds difference, so Unix time is the same everywhere. That means that in Unix time, each day is considered to be exactly 86400 SI seconds, which would mean it should skew away from real UTC, and thus drifts away from UT1 (mean time).

To answer this, when there’s a leap second introduced in UTC, POSIX time will repeat the same second or omit one because its minute cannot go over 60 seconds unlike real UTC.

Some rare systems can use TAI (International Atomic Time) as a source instead of UTC but will need a table on the system with the leap seconds in it to know how to calibrate itself to civil time.

Because Unix time start in 1970, dates that are before this need to be represented as negative numbers. However, prior to this date we have to keep in mind that there might not have been the UTC standard yet and thus it’s better to rely on something else to represent the time and date accurately. Some real time operating systems (RTOS), that we’ll see in a bit, like QNX choose to represent Unix time as an unsigned integer, which means it cannot represent time before 1970.

So is the Unix time a signed or an unsigned integer.

Historically, Unix time was a signed 32-bit integer that incremented at 60 Hz, the same rate as the system clock on the hardware of early Unix system, that is 60 times per second.
Epoch differed too, it was the 1st of January 1971 in the first edition of Unix Programmer’s Manual. However, at 1/60th of a second precision, the 32-bit integer would have used all its range in only 2.5 years. Thus, it was changed again to the Epoch of 1 January 1970 at the rate of 1Hz, an increment of 1 bit every second. This gave about 68 years after 1970 and 68 years before 1970 using a 32-bit signed integer.

When the concept was made there was no consideration of all the issues with leap seconds, time zones, leap years, we’ve mentioned previously. It’s only in the 2001 edition of POSIX.1 that there was some rectification about the faulty leap year rule in the definition of Unix time.

So what’s this Unix time used for?

Unix time is the value that Unix system look at to keep track of their system time. Its value is kept in a time_t type format, defined vaguely in POSIX.1 as previously mentioned, and that can be included via the time.h header. POSIX only mandates that it should be an integer, and doesn’t say anything about if it should be signed or not, explaining why QNX chose an unsigned integer.
For more precise time manipulation, the time.h header also includes other types such as struct timeval and struct timespec.

However, usually it is a single signed number which size is defined per system. For example on mine, <time.h> redirects to <bits/timesize.h> as a 64 bits signed integer.

One issue called the Year 2038 problem, Y2038, or Y2k38, is when system that chose to represent Unix time as a signed 32 bit value reach the maximum range of 32-bit integers and overflows, creating an undefined behavior.
This issue is completely avoided using a 64-bit signed integer.

Let’s move to another topic, uptime.

The uptime of a machine is a measure of how long the machine has been running since the last reboot, the current time minus the time it was booted at.
Some system may require high availability due to service level agreement and this is one of the measure that can be looked at. However, high uptime can be a sign of negligence and rebooting after a long time may lead to unexpected consequences as some changes may only happen on reboot.

Most Unix-like OS come with the BSD uptime(1) command which shows the current time, how long the system has been running, how many users are currently logged in, and the system load average for the last 1, 5, and 15 minutes (Though those values aren’t good metrics, see Brendan Gregg’s blog). Load average being the average number of processing running at that time. It’s the same information one can find in the first line of the w(1) command.

On Linux the uptime can also be found in the /proc filesystem, as /proc/uptime, the file containing 2 values, the first one being the number of seconds elapsed since last reboot and the second how much time each core has spent idle in seconds, which are both indicator of system usage.

Another command that is common as a metric is time(1). It is a simple command which reports the time a command has taken.
By default, it reports on real time, user time, and sys time. Real time means the wall-clock time, it’s the total time for everything to execute from start to finish. User time is the amount of cpu time spent in user mode, while sys time is the amount of time spent in kernel mode, system calls time and more. CPU time is calculated as how much each cpu uses, so if you’re on a multicore system you may have two cores executing in 0.1s in parallel in user mode and have a total user cpu time of 0.2s. This shows that this has no direct relation with the actual time elapsed from start to finish without knowing which cores have executed, but this gives a small idea.

In fact, there exists a panoply of commands that can be used for benchmarking processes and what they use, how much time they spend in which particular section of their code. We’ll keep it at that.

That said, we often have to take snapshots of time in our programs, timestamps, saying this has happened at this specific time, attaching a metadata. These timestamps can be stored in Unix time, or UTC time, or in the specific time zone. However, time-stamping records with local time isn’t advised because of issues that could arise with daylight saving, it’s much easier to recalculate the offset from UTC. Though in certain cases, it would be valuable to know in which timezone the event happened, such as when timestamping pictures.

Examples of metadata timestamps are the Unix atime, ctime, mtime that are stored with files inodes on the file system.

Atime is the last access timestamp changed whenever the file is read or executed for instance, ctime is the last status change timestamp changed whenever the file is written to or its inode changes, mtime is the last modification timestamp changed whenever the file has data written to it, is created, truncated, etc.
An additional non-standard timestamp that we can find on some system is btime, the file creation/birth timestamp.

Additionally, some filesystems support flags related to those timestamps, usually for optimization to avoid disk load, such as ones that change the way atime is updated, remove access time from directories, etc.
This is the prevalent default on a lot of filesystems and thus gives a false sense of the definition of those timestamps.

To have more information about when those timestamps are actually updated have a look at man 7 inode, and check your file current values by calling the stat command or related system call on it. You can also use touch(1) to arbitrarily change the timestamps on files.

Let’s end this section with timers and chronometers that trigger events at specified time, Unix alarm clocks if you want, clock daemons.

The de facto implementation of this is cron, which first appeared in Unix V6, a clock daemon tool written by Ken Thompson, in reference to the word Chronos which is the Greek word/prefix for time.

Cron specialized in scheduling the execution of programs periodically, at certain time events. It registers the events in a table.

It initializes its entries from directories containing the scripts: /etc/crontab or /etc/cron.*/ or /var/spool/cron, you can take a look at man 8 cron for more info on that.

It can also be managed from the command line, for instance:

crontab -e

Cron will by default execute the entries using sh, that means those are simple shell scripts, letting you set environment variables, and more.
In the crontab you have to specify at which time to repeat the execution of events using a special syntax composed of every minute, every hour, every day, every month, every week day.

Despite cron being the go-to solution for repeated scheduled execution, others have created new solutions. Namely, init systems and service managers have tried to re-implement cron their own way and integrate timers as a type of service. Centralizing the management of timers along with services.

Prominently, systemd implements this function using systemd.timers, which are systemd unit files and inherit all the facilities and security that systemd provides for them.

You can list current running timers on systemd using:

sudo systemctl list-timers --all

man 5 systemd.timer provides all the info you need about systemd timer units.

System time, hardware time, internal timers

  • Hardware time vs system/local time
  • Clock hardware sources and configurations
  • Tickers, timers, and their usages

We’ve said previously that POSIX time was used by Unix-like OS to keep track of system time, but we’ve cut it short at that. There’s still a lot to add to this, like where does the system get its time to begin with, how does it store it when it’s not running, what in the OS triggers the count of seconds, etc.

In this section I’ll limit myself to Linux and FreeBSD as examples but the concepts should apply to any other Unix-like OS. I’ve chosen to do this because resources were scarce on this topic as most have chosen to not mention it and skip directly to the topic of NTP, Network Time Protocol, which we’ll see in the next section.

There are two types of clocks on our machines, the first type goes by the name of RTC (Real Time clock)/CMOS Clock/BIOS Clock/Hardware clock, and the other by the name of system clock/kernel clock/software clock.
The system clock is the one we’ve mentioned before, that keeps track of time using a counter of seconds after the Epoch, and the hardware clock is one we haven’t mentioned that resides physically on the system and runs without any interaction.
Their usages differ, the system clock runs only when the system is on, and so is not aware of time when its off, and the hardware clock has the purpose of keeping time when the system is not running.

As you would have guessed, any two clocks are bound to drift apart eventually, those clocks will differ from each other and from the real time. However, there are many methods to keep them in sync and accurate without using external sources.

Hardware clocks are usually found on the PC motherboard and interfaced with using an IO bus. Because some of those are on standard architecture such as the ISA (Industry Standard Architecture), it can be easy to know how to query and modify them. However, it’s still hardware dependent and so can vary widely.

These clocks run independently of any control program and are even running when the machine is powered off, kept alive by their own power source, a small battery.
They are normally consulted only when the machine is turned on, to set the system time on boot. Unfortunately, they are known to be inaccurate, but weirdly inaccurate in a predictable way, gaining or losing the same amount of time each day, drifting monotonously, a systematic drift.

Hardware clocks don’t have to store time as Unix time or UTC, and don’t have to be limited in precision to seconds. It’s up to the hardware implementation to decide what can be done and on the user to decide what to do with it. In theory these clocks have a frequency that varies between 2Hz and 8192Hz, from 0.5s to 0.1ms precision.

Let’s also note that there can be more than one hardware clock on a system.

Linux and FreeBSD come with drivers to interact with RTC.

On Linux for example, the RTC clocks are mapped to special device files /dev/rtc* backed by the driver. The star denoting the number of the clock if there are many, and /dev/rtc being the default RTC clock.
As with anything hardware, there could possibly be issues with the driver of the RTC and the clocks might not be mapped properly, especially if not following the industry standard. Linux has fallback mechanisms for other systems it wants to support.

On the other side, the only time that matters is the one you see when the system is running, and that is the system time.

As we said, system time is the number of seconds since the Epoch that is stored and kept track of by the kernel, however internally it might be more precise than seconds, it could go up to the precision offered by the architecture. We’ll come back to this topic of high precision soon, just keep the simple concept in mind.

The system time, when displayed to us, refers to the timezone information and files we’ve previously mentioned. It’s good to know that the Linux kernel also keeps its own timezone information for specific actions such as filesystem related activities, and this kernel timezone is updated at boot or via the utility hwclock(8) by issuing hwclock --systz.

When booting, the system clock is also initialized from the RTC that keeps running when the system is off. In some cases it can be initialized from external sources and not rely on the RTC.

Thus, when the system is running the hardware clock is pretty much useless, and we could do whatever we want with it. However, we have to beware of discrepancies on reboot.

The counter that the kernel uses to increment the system clock is usually based on a timer functionality offered by the Instruction Set Architecture of the CPU (ISA not to be confused with the other ISA we spoke about, the Industry Standard Architecture). In simple terms, that means that the CPU gets interrupted at known programmable intervals periodically, and when it’s interrupted it executes a timer service/event routine. The routine is responsible for incrementing/ticking the system time and do some housekeeping. We’re going to discuss this later.
Let’s note that the frequency of the interrupt can be configured for better precision.

To set the system time and date, we can rely on the date(1) command, which takes many formats via its --set option.

For example:

date --set "5 Aug 2012 12:54 IST"

We could also initialize the system time from a remote server using rdate(1):

rdate -s

Or even, on some system, rely on the service manager. The infamous timedatectl(1) of systemd comes to mind, which can set and give information about pretty much everything we’re mentioning in this section.

Example of output:

               Local time: Fri 2020-04-17 12:40:00 EEST
           Universal time: Fri 2020-04-17 09:40:00 UTC 
                 RTC time: Fri 2020-04-17 09:40:01     
                Time zone: Asia/Beirut (EEST, +0300)   
System clock synchronized: yes                         
              NTP service: active                      
          RTC in local TZ: no                          

What’s this line about “RTC in local TZ”, can the time on the hardware clock be stored with timezone info, and why would we do this? What is stored on this clock, is it UTC or local time?

The answer to this, like most things, is that it depends. The time on the RTC can be configured to be set to whatever the system wants it to be. Yet, storing it in UTC is the best choice as UTC doesn’t change regardless of time zones and daylight saving. Having the RTC storing the local civil time means that it would need to be aware of all the complication that it implies, which most RTC clock won’t. If in local time, and the system has been down for a while, RTC might differ with actual local time. And even with clocks that have the ability to do the daylight saving themselves, the feature is mostly unused.

So it’s preferable to store the time of the RTC in UTC but some systems still choose to not adhere to this. For instance, when dual-booting, the other operating system may expect the RTC to contain local time and update it accordingly. That creates a discrepancy, and the RTC has no way to indicate if it is storing local time or UTC, hence the OS has to keep track of this information internally.
This is the kind of scenario that gives rise to the rule of not letting more than one program change the time of the RTC.

On FreeBSD, this information is given via the /etc/wall_cmos_clock file: if the file exists it means that the hardware clock keeps local time, otherwise the hardware clock is in UTC.
On Linux, this information is passed to the kernel at boot time via the persistent_clock_is_local kernel parameter/stanza (see notes timekeeping.c). The RTC can also be queried and set in localtime or UTC via the hwclock(8) options --localtime or --utc which indicate which timescale the hardware clock is set to, hwclock will store this info in /etc/adjtime.

Hence, we have to keep those clocks in sync. The best way to do this is to rely on the predictable inaccuracy/systematic drift/instrument bias of the hardware clock. We can measure its drift rate, and apply a correction factor in software.

On Linux there are two tools to perform this, hwclock(8) and adjtimex(8), while on FreeBSD there is adjkerntz(8).

hwclock(8), and its predecessor clock(8), let you query, calculate drift, and adjust the hardware clock and kernel/system clock in both direction. While with clock(8) you had to calculate the drift manually, hwclock(8) does it automatically.
It does it by keeping track, in an ASCII file called /etc/adjtime, of the historical information, the records of how the clock drifts over time, and if the hardware clock is in UTC or local time, as we said before.

Here are some example runs:

# adjust drift of RTC
> hwclock --adjust

# set RTC to the same time given by --date
> hwclock --set --date='19:30'

# set the RTC from system clock
# and update the drift at the same time
> hwclock --systohc --update-drift

Thus, it would be a good idea to call hwclock(8) periodically in a cron job to keep the hardware time in sync and calibrate the drift.

On FreeBSD, the utility adjkerntz(8) is used similarly but only for local time RTC. It’s called at system startup and shutdown with the -i option from the init system before any other daemon is started and sets the kernel clock from the RTC, managing the DST and timezone related configuration.

Taking a look at some hwclock(8) options gives us an idea about many RTC quirks.

We can select the clock device using these:

# if it's an ISA system
# and maybe precise the device file explicitly

We can set the Epoch as something other than 1970:

# get it

# set it

However, this is only available on machines that support it.

We can specify if it’s a clock that has issues with years above 1999 (which I can’t find in the man page on my machine though):

# indicate the clock can't support years
# outside the 1994-1999 range

As for the adjtimex(8) tool, on Linux, it doesn’t actually change the hardware clock at all but specializes in the nitty-gritty details of the system/kernel clock and its relation with hardware.

It’s especially useful for manually readjusting the system clock based on the drift of the RTC and raw access to kernel settings related to system time.

For instance, it can be used to change the speed of the system clock, telling it how much to add to time whenever it receives an interrupt. For example if the system clock ticks faster than it’s supposed to be, it could be made to tick slower or it could be made such that each tick represents a smaller value to add to the time, both options are possible.
Those are done through the --frequency and --tick options respectively.

It can also be used to change the offset/drift, apply adjustment to the system time, and what affects the hardware clock.

Interesting options are the -c and -a which keep comparing the system time and hardware clock time every 10s and print the ticks and frequency offsets, which can be useful for estimating the systematic drift and then store it in /etc/adjtime, which -a actually does.

Example of a run:

                                      --- current ---   -- suggested --
cmos time     system-cmos  error_ppm   tick      freq    tick      freq
1587136817       0.277212
1587136827       0.278389      117.7  10000  -1701056
1587136837       0.279261       87.2  10000  -1701056    9998   5690519
1587136847       0.280304      104.3  10000  -1701056    9998   4571769

So my system considers 10000 ticks to be equal to 10s, basically having 1K ticks a second, but it’s suggested that I use 9998 per 10s instead.

Note also the error_ppm, ppm (part per million), meaning that I’ve gotten a delta error of around 103 ticks per million that I need slew forward.

The -p option prints the internal kernel parameters related to time ticking.

         mode: 0
       offset: -852985
    frequency: -1701056
     maxerror: 483000
     esterror: 0
       status: 8193
time_constant: 7
    precision: 1
    tolerance: 32768000
         tick: 10000
     raw time:  1587136914s 254051439us = 1587136914.254051439

The status is a bit mask that represents the following:

    1   PLL updates enabled
    2   PPS freq discipline enabled
    4   PPS time discipline enabled
    8   frequency-lock mode enabled
   16   inserting leap second
   32   deleting leap second
   64   clock unsynchronized
  128   holding frequency
  256   PPS signal present
  512   PPS signal jitter exceeded
 1024   PPS signal wander exceeded
 2048   PPS signal calibration error
 4096   clock hardware fault
 8192   Nanosecond resolution (0 is microsecon)
16384   Mode is FLL instead of PLL
32768   Clock source is B instead of A

PPS standing for Pulse Per Second, PLL for Phase-lock loop, and FLL for Frequency-Locked Loop, which are different clock circuitries/feedback-loop, discipline, and slewing techniques, basically different methods of looping for time adjusting frequency and ticks to match real one, which are affected differently by the environment and have ups and downs.

What we can deduce is that my clock with status 8193 has PLL updates and a nanosecond precision.

That means that if 1000 ticks make up a second, a tick happens every millisecond. Can’t we have more precise ticks, aren’t I supposed to get nanosecond precision. But wouldn’t that clog the CPU altogether, can we get multiple timers too. We’ll see that with high resolution clocks later on, for now, again, let’s keep those questions and concepts in mind.

An interesting line in the output of adjtimex catches our attention, bit 6, or 64 in decimal, “clock unsynchronized” what does it mean. It’s similar to the “System clock synchronized: yes” line of systemd’s timedatectl output.

An inspection of the Linux kernel source code let us know that there’s a mechanism in the kernel to automatically synchronize the hardware clock with the system clock. It goes under the name of NTP “11 minute mode” because it adjusts it every 11 minutes.

Many other Unix-like operating systems choose to do this, have the kernel be the only program that syncs hardware time to system time. To not let other programs have to worry about all the drifting and calculation. In this case we don’t need to create a cron job that adjusts the time, the kernel already does it for us. Sometimes however, the kernel won’t record the drift time anywhere while in this mode.

So it is synchronized by default on my system.

On Linux, the only way to turn it off is to stop the NTP daemon (network time protocol daemon, which we’ll see in the next section), call any program that sets the system clock from the hardware clock such as hwclock --hctosys or actually recompile the kernel without the related option: RTC_SYSTOHC.

The ntptime(1) command also shows the Linux kernel time_status value:

  status 0x2001 (PLL,NANO),

Let’s move to this high precision topic we’ve kept in mind.

So my system clock has nanosecond precision, are there ways to get higher precision from my system clock, what does that mean from the timer interrupt perspective. How much time should we spend on timer events handling instead of executing programs. How are they implemented in the instruction set, do I have a choice of clocks, are there other instructions to call. How do I check all that?

We said briefly that system time was kept track with using interrupts that were generated at predefined time or at specific periodic intervals. Whenever they happen, the kernel needs to handle time-based events such as the scheduling of processes, statistics calculation, time keeping (time of day), profiling, measurements, and more.
Different machines have different kinds of timer devices that can create this functionality. The job of the OS is to try to provide a system that unifies them abstractly to handle timer events for specific usages, using the best types of timer for the type of event its handling. It does it by programming them to act periodically or one-shot and keeping track of the event subscriptions that it’ll need to handle.
Which hardware is available depends on many factors, but the most important one is related to the CPU and thus the architecture of the platform and its instruction set. Let’s see what sort of timers we can find in our systems today that we could possibly choose from as clock event device.

  • RTC - Real Time Clock

We could choose to actually rely on the RTC directly and not anything else. However, that comes with a cost because its quite slow, it ticks every 0.5s to 0.1ms, to stay energy efficient. So let’s leave the RTC for boot time only and not timers.

  • TSC - Time Stamp Counter

The Time Stamp Counter is a 64 bit register called TSC present on all x86 processors since the Pentium. It’s connected via a CLK input pin that also drives the CPU clock and thus ticks as the same frequency. For example a 2GHz CPU will make this register tick every 0.5 nanosecond. The TSC register can be queried by the rdtsc, read tsc, instruction. It’s very useful as it ticks along with the CPU, it can help us calculate the time, for example, if we can know the frequency of the CPU, giving us precise measurements. However, it’s not so precise if the frequency can change over time.

  • PIT - Programmable Interrupt Timer

The Programmable Interrupt Timer is also part of an instruction set. The way it works is that it can be programmed to send global interrupts after a certain time has elapsed, one-shot or periodically. It has a 16 bit precision and variable frequency rates that can be configured.

  • APIC - Advanced Programmable Interrupt Controller

Similar to PIT in such that it can issue one-shot or periodic interrupts. Its size is 32 bits. And the interrupt it sent to the specific processor that requested it instead of globally (which PIT would do). Its frequency is based on the bus clock signal and can also be controlled but less flexible than PIT.

  • ACPI_PM - ACPI Power Management Timer

ACPI Power Management Timer is part of the ACPI-based motherboards and have quite a low frequency of 3.58MHz, ticking every 279ns. It isn’t nearly as accurate as other timers but has the advantage that it isn’t affected by change based on power-management. It should be used as a last resort system clock.

  • HPET - High Precision Event Timer

The High Precision Event Timer is a chip integrated into the Southbridge. It provides multiple hardware timers, up to eight 32 or 64 bit independent counters, each having their own clock signal and a frequency of at least 10MHz, 100ns. It is less precise than the TSC but has the advantage that it is separated and has multiple clocks.

It’s always good to keep in mind that all those numbers about precision are for the best case scenario and that we may have overheads. We still have to remember that, for example, when querying the TSC we have to first issue the rdtscp command which has to be interpreted. It doesn’t mean that we have a machine ticking at 0.5ns that we’re going to be able to measure such intervals precisely.
Regarding TSC, we can only use it as a real time counter when it is stable. If it changes with CPU frequency we can’t rely on it to calculate time properly as the distance between ticks will vary. TSC are categorized as “constant”, “invariant”, and “non-stop”, or none. “Constant” meaning TSC stops during C state transition, C state referring to the low power mode of the CPU. “Invariant” meaning frequency isn’t affected by the CPU state. “Non-stop” meaning it’s both “invariant” and “constant”.

On Linux you can check the features your CPU supports by consulting the flags in /proc/cpuinfo.


flags : tsc constant_tsc nonstop_tsc tsc_scale

NB: tsc_scale is used for virtualisation.

Before checking for the availability of the hardware timers and what is currently set for what on your system, let’s take a moment to understand where we can use these timers.

There are in general 3 uses for the timers we’ve seen: clock source, clock event, and clock scheduling.

The clock source is the one that provides the basic timeline, it should be a continuous, non-stop (ideally), monotonic, uniform timer that tells you where you are in time. It’s used to provide the system time, the POSIX time counter. When issuing date this is what is going to be consulted.
So the clock source should have a high resolution and the frequency should be as stable and correct as possible, otherwise it may require an external source to sync it properly.

Clock events are the reverse of clock source, they take snapshots on the timeline and interrupts at certain point, providing a higher resolution. They could in theory use the same hardware as the clock source but are not limited to it. They could use all the other hardware that are specialized in sending interrupts after a programmed time to trigger the events on the system timeline. It’s also interesting to have the events triggered per CPU to have them handled independently, so APIC is especially useful here.

The clock scheduling is about how time affects the scheduling of processes on the system, what timeslice should be used to run processes and then switch to another one. This can possibly be the same counter as the clock source, however usually it needs smaller intervals as it has to be very fast but doesn’t have to be accurate.

The clock source keeps time as a counter we refer to as jiffies. Jiffies are used to keep the number of ticks that happened since the system has booted, it is incremented by 1 at each timer interrupt. The number of ticks/interrupts in a second is denoted by a constant defined at compile time or in a kernel parameter called HZ, for Hertz, it’s named this way in most Unix-like OS.
That means that there are HZ ticks in a second, thus there are HZ jiffies in a second. So that means HZ represents the precision of our clock source, and thus system time. For example, if HZ=1000, that means the system time has a resolution of 1ms (1K/HZ seconds).

On Linux you can check that value using:

getconf CLK_TCK

However, it is deprecated and will always return 100 (10ms), regardless of the precision. The actual value has to be set as a kernel parameter in CONFIG_HZ.


Nonetheless, it isn’t such a good idea to go to higher precision HZ because if scheduling relies on jiffies it could affect performance.

Now let’s check how we can see which devices we support and change the clocks.

On Linux, there’s not many options regarding anything other than the clock source (system time). To check the ones available and the one currently in used you can rely on the /sys/ filesystem.

> cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm

> cat /sys/devices/system/clocksource/clocksource0/current_clocksource

The clock source can be changed while the system is running by echoing the new clock to the same location:

> echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource

For permanent changes you can recompile the kernel with different options or set the clock at boot by passing it as the clocksource option to the Linux kernel (kernel stanza) in grub or any other boot-manager.

linux    /boot/vmlinuz-linux root=UUID=12345678-1234-1234-1234-12345678 rw quiet clocksource=acpi_pm hpet=enable

Additionally, you can enable or disable hpet to be used as the base time for events clock.

As of today here are the relevant configurations and different clock sources for multiple CPU architectures:

clocksource=	Override the default clocksource
		Format: <string>
		Override the default clocksource and use the clocksource
		with the name specified.
		Some clocksource names to choose from, depending on
		the platform:
		[all] jiffies (this is the base, fallback clocksource)
		[ACPI] acpi_pm
		[ARM] imx_timer1,OSTS,netx_timer,mpu_timer2,
		[X86-32] pit,hpet,tsc;
			scx200_hrt on Geode; cyclone on IBM x440
		[PARISC] cr16
		[S390] tod
		[SH] SuperH
		[SPARC64] tick
		[X86-64] hpet,tsc

hpet=		[X86-32,HPET] option to control HPET usage
		Format: { enable (default) | disable | force |
			verbose }
		disable: disable HPET and use PIT instead
		force: allow force enabled of undocumented chips (ICH4,
			VIA, nVidia)
		verbose: show contents of HPET registers during setup

The process is quite similar on FreeBSD. By default, it’s aware of the timer available on the system, it automatically ranks and chooses the best possible ones.

It has three timekeeping, one it calls hardclock running at 1000HZ (1ms), which is the same as the clock source, one it calls statclock used for statistics and scheduler events with a frequency of 128HZ, and a last one called profclock which is a bit higher in precision, 0.125ms. Those obviously can be tuned to preference.

To see the list them via you can use sysctl:

> sysctl kern.eventtimer
# or
> sysctl -a | grep kern.eventtimer

This should return the list of possible timers in the kern.eventtimer.choice entry.

Example output:

kern.eventtimer.choice:             HPET(550) LAPIC(400) i8254(100) RTC(0)     15 0   400     1 1193182   100       17   32768     0      7  14318180    550

The current time should be stored in the kern.eventtimer.timer entry.

The documentation about what the flag means can be found via the manpage eventtimers(4), they are related to what the clock supports (periodic or not, per CPU or not). Those values can be changed in the /etc/sysctl.conf file or tunable via sysctl on the command line.

As with Linux, on FreeBSD hpet can be used for events if the driver is installed and enabled, it’s part of the acpi. FreeBSD offers some beautiful documentation about it in the hpet(4) manpage, discussing the configurations too. For instance if it can be used to support event timers functionality, and tune how many timers on the hpet per CPU can be used.

So now we should be all set, if we call POSIX functions part of <time.h> such as gettimeofday we can get the result in a structure that contains microseconds (0.001ms) if the precision allows it. And actually POSIX 1003.1b requires nanosecond precision.
There are also the POSIX clock_gettime() family of functions, which let you specify from which clock to get the time from, and clock_getres() which let you get the precision of the clocks available. The clock you can pass to those methods are predefined in the manpage and are useful for profiling. CLOCK_MONOTONIC being the best to calculate the time between two events in time.

There used to be a time when on Linux all the timers on the system were coupled to the jiffies, it isn’t the case today. We have a decoupled clock event subsystem that gets delegated and manages interrupts, and the source device can be swapped without breaking everything. Linux also added a kernel configuration called CONFIG_HIGH_RES_TIMERS to allow high resolution time, which is enabled now everywhere.
This lead to the concept of dynamic ticks, having the clock for scheduling ticks at different speed while not affecting the clock source timeline, which can be used to save energy/power.
This furthermore lead to the idea of having tickless system, systems where the timeslice for scheduling is actually controlled by the scheduler instead of follow HZ. The CONFIG_NO_HZ option in the kernel can be set to allow this. It is also enabled on most desktops today.

# CONFIG_NO_HZ_FULL is not set

On Linux, all the information about timers and their statistics is propagated to user space in /proc for advanced debugging.

For instance, /proc/timer_list, gets us a list of the currently configured clocks and running timers. We can use it to check their precisions:

Example output:

now at 294115539550 nsecs

cpu: 0
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     0 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
 event_handler:  hrtimer_interrupt

We can see that the .resolution is 1 nsecs and that the event_handler is hrtimer_interrupt instead of tick_handle_periodic, which would be for lower resolution timers.

/proc/timer_stats is an advance debugging feature that can be enabled via the CONFIG_TIMER_STATS kernel configuration and that let us gather statistics about all the timers on the system, you turn it on and off whenever you want. It can tell us which routines in the kernel are using the timers and how frequently they are requesting them. The format is as follows:

<count>,  <pid> <command>   <start_func> (<expire_func>)

Now let’s move to syncing the system time using an external source.

Syncing time with external sources

  • Ways to update time
  • Why use external sources
  • A primer on precision calculation and terms
  • Discipline/slewing and smearing
  • Leap second propagation
  • List of external sources
  • NTP
    • Protocol
    • Tools/clients
    • Pool and organization
    • List of public servers
  • Other implementations
  • Security issues
  • PTP

Before starting this section I’d like to point out three ways that time can be updated.
One is called stepping and it consists of making one discontinuous change to time, a sudden instant jump from one value to another that could be backward or forward in time. This happens when something triggers the system time to go to a specific value, external sources of time can possibly do this.
Another is called slewing or sometimes disciplining and it consists of making the clock frequency ticks faster or slower, or to change the value that ticks represent. It means that it’s adjusting it gradually over time. This is what we’ve seen in the previous section with tools such as adjtimex(8) for system time.
The last way to change time is actually a category of slewing called smearing or fudging and it consists of changing time by gradually adding parts of a larger chunk of time over a period. This leads to less breaking changes if, for example, we add 10s to our system time but split it across a whole day, it could be 0.1ms every second or even less. However, in fact, it can fit in the slewing category, but we usually talk about smearing when we are forced to add an expected change, such as a leap second.

We also have to keep in mind the time it takes to fetch the value of time from a source: for it to be transmitted and interpreted by our system. It won’t be an instant transfer and sometimes may take such a long time that we are forced to take it in consideration when doing our calculations of adjustment. We call this the delay, the time it takes to do the round trip. As we’ll see there are many factors to take in consideration too.

So why should we rely on an external source of time, why should we care about having precise time.

At human scale, on my watch I can rotate the hands of the clock a little and adjust it to whatever someone else has. Nothing horrible is going to happen because of it, right? The majority of people don’t need precision higher than a couple of minutes.
However, computers are different, you’re not the one monitoring time, and their errors accumulate pretty fast if you forget about them; clocks drift as we’ve repeatedly said.

So who needs millisecond accuracy or more? Who has a need for precise time?
And precise time according to what. Haven’t we said in the previous section that our system clocks already have a pretty good calibration mechanism in place.
However, even with all the accuracy we can get, we’re still going to drift, no matter what, and we still need to be aware of UTC changes such as leap seconds.

So what does accurate mean, do you need accuracy across your internal network of machines, each not drifting too far away from the other even though not in sync with UTC. Or do you need them all to be in sync with UTC or your time zone.
How much does the software you are running tolerate change in time. Does it expect monotonic time, can it handle jumps, is it an issue if it differs from real UTC time. Do you have to keep in sync with UTC because of the software itself or because of compliance with standards or because otherwise the meaning is lost.

Here’s a list of systems that actually require accurate synchronization with UTC:

  • Security related software that verify certificates
  • Security related software to match timestamp with real life events such as CCTV
  • Similarly, intruder detection software
  • Similarly, any type of audit logs or timestamping based on real world events
  • Similarly, any network monitoring, measurement and control tool
  • Radio, Telecommunication, and TV programs
  • Any type of real time multimedia synchronization
  • Many types of distributed systems
  • Money related events, such as stock market
  • Aviation traffic control software

We have to add to this list all the machines with nasty system clocks that drift in unexpected ways. Such machines are better off syncing with an external source.

So, in which case should we not synchronize time with an external source then.

  • If the accuracy and adjustment that the system clock provides is enough
  • If we don’t want to worry about managing another daemon on our system
  • If the system doesn’t have access neither to the internet nor to any physical external source of time.

Now that we know if we need an external source of time or not, let’s see how to calculate how precise and stable these sources are.

When using a clock as reference we want to make sure it’s stable and precise.

A clock’s frequency stability is rated and measured by its ppm or ppb error, part per million errors or part per billion errors. The “part” can be either the number of ticks or the number of seconds that are drifting from the actual value (both can be used interchangeably). The smaller this ppm value is the more stable the clock is.
The reasons why a clock drifts are environmental variations such as temperature, aging of the material, G-force, change in voltage, etc.

What does this mean.

Let’s take as an example an HPET clock with a frequency of 10MHz (10_000_000Hz) in an environment between -40 to 80 degrees C. Add the fact that the clock manufacturer has specified that it has a stability that varies between -7.5 and +7.5 ppm.

For every 1 million ticks or 1 million seconds, there is a plus or minus 7.5 variation. Over a whole day the clock would drift by 0.648 seconds.

(7.5/1M) * 86400 = 0.648s

Atomic clocks have tremendously tiny error variations, those can be between 0.0001 and 0.000001 ppb (part per billion). They are drifting by a second every ~300k to 30M years, which confirms what we’ve explored in the first section of this article.

Temperature plays such a big role in the stability of our clock so to make them more stable we could either lock our machines in a controlled temperature environment or try to come up with a way to automatically compensate for how much the temperature is affecting the clock.
While the first option is possible in data centers, it’s not something that is possible for most of us.

However, we could devise an experiment in which we find a formula that calculates how much the temperature affects the clock frequency and slew our clock appropriately.
We could monitor the temperature, feed it, and use it to train the correction mechanism in whatever software or mean we use to handle setting time. Which all comes up to gathering points that fit the data temperature and how much the clock drifts, plotting them on a graph and finding an equation that passes by those points. This is simple math using polynomial interpolation.

Unfortunately, no solution is perfect and this could be overly optimistic. Correlation doesn’t equate causation. Still, such mechanism are great to keep the clock stable within a certain temperature range. Some experiments have found that temperature compensation reduces deviation by 3.5 times. Our earlier drift of 0.648s would be reduced to 0.185, or 2.14 ppm instead of 7.5 ppm.

Let’s now define some important terms we need in our inventory to understand everything that is coming next about external clocks.

  • “Reference clock” or “ordinary clock”, this means any machine that can be used to retrieve accurate time, usually in UTC so that it can be used by anyone. Those could range from cesium clock, to GPS, to terrestrial broadcast like radio clocks.

The time from the reference clock will be forwarded from one server to another until it reaches you. Thus, the reliability of the network, and how far you are from it, play a big role in how accurate the value from the reference clock will be.

We’re using server that are connected to external time source and external time source themselves interchangeably here unless explicitly mentioned.

  • “Delay”, a word we’ve seen before that means the time it takes to do a round trip. It’s normally calculated by timestamping on both ends and doing an estimate of the difference in transport and processing.

  • “Offset” or “Phase”, is the time difference/deviation between the clock on one end and the clock on another end, usually your clock and a reference clock. Phase referring to the oscillation rhythm difference, as in “out of phase”.

  • “Jitter” or “Dispersion”, is the successive time values difference after subsequent requests from remote servers or action. It’s a great criterion to measure the stability of the network, how much delay changes, if it varies a lot the network isn’t reliable. This term can be used as a measure of stability of any other repeatable action too.

  • “clockhopping”, the spreading of time, jumping from one server to another, which results in less and less accuracy.

  • “Frequency error”, this is how much the reference clock or our local clock drifts over time, measured in ppm and ppb, as we’ve seen before.

  • “Stability”, the generic term to refer to how much we can trust a clock. It’s also a term used in control theory to refer to how far we are from a reaching a stable point (0).

  • “Accuracy”, also a generic term that means how far apart a machine’s time is away from UTC. The typical accuracy on the internet ranges from about 5ms to 100ms, varying with network delays.

  • “PPS”, or “Pulse Per Second”, a method of synchronizing two clocks based on a tick that happens every second.

  • “Watchdog timer”, is a timer that keeps the time since the last poll or update of time from the external source of time.

  • “Fudge”, I couldn’t put my hand on the precise definition of this term other than that it refers to any special way in which you can configure an external clock.

  • “Max Poll” and “Min Poll”, throttling parameters which are the maximum and minimum number amount of time that should pass before the remote server allows you to query it again. This is usually expressed in powers of 2, for example 6 means 2^6 or 64 seconds.

  • “Stiffness” or “Update Interval” or “Time Constant” (τ tau), how much the clock is allowed to change in a specified amount of time, and the time between the updates. A small time constant (update interval) means a clock that is less stiff and slews quickly. It’s usually expressed like the max poll in powers of 2.

When all those values differ a lot, we can’t allow an abrupt jump of time on our end, that would disrupt local processes. So what we do is slew time, but that would also mean having a slow calibration.
In an ideal world, all the reference clocks would be the same everywhere, however they aren’t. So what should we do if there’s a big offset.

First off, if the offset is too big we don’t trust it until we have the same offset from multiple time sources.
If it’s small enough we go on with our slewing.
If the offset is still big, we have to set the clock anew, step it.

However, on boot we have to sync from the hardware clock, like we’ve seen before, which might be off, so we have to either slew the system time which can take several hours or make the updates less stiff to quickly reduce the offset (in 5min usually with a less stiff PLL).

Moreover, we can’t also believe any remote server or machine as a time source, so we ought to devise a mechanism, a sanity check, to filter which machines we trust and which ones we don’t, maybe even combining multiple time sources in a quorum-like fashion.
We can evaluate remote machines for how stable they are by making them pass a statistical filter for their quality.

That also creates a trust issue on boot, so what can be done is to send multiple quick requests to multiple external time source servers to insure their reliability and get an estimate within 10 seconds at boot.

As time goes on, our system clock should become more stable, and we should be requesting the remote servers less frequently.

This is possible through different feedback mechanisms that learn to adjust the system time appropriately. In a way, this is similar to the mechanism that fixes the hardware clock drifting but for the system clock which we haven’t tackled before.

Different Unix-like OS and software provide different means of adjusting the system clock according to external time. There are 4 mechanisms or system calls that can be used to implement the adjustment of the system clock.

The first method is through settimeofday(2), which is used to jump to a fixed place in time, to step it. This could be used for very big offsets.

The second method is through adjtime(2) which is used to slew the time by changing the frequency of the clock. You pass a delta to the function and if this is positive the clock will be sped up until it gains that delta and if negative the clock will slow down until it has lost this delta. This is intended to be used to make small adjustments to the system time and thus there’s a limit to how big the delta can be (plus or minus 2145 seconds).

The third method is through the hardpps() function that is internal to the kernel and handles an interrupt service that listens to a constant pulse that happens every second. The RFC 2783 defines how this API should behave, basically syncing the transition between pulses with the system clock.

The fourth, and last method, is through the ntp_adjtime(2) function, that is an advanced mechanism to discipline the system clock. It is defined in RFC 1589 called “A Kernel Model for Precision Timekeeping”, also going under the name of “kernel clock discipline”. Initially created as a better version of adjtime(2) that can be called from the software handling external precision time source as it accumulates successive precise corrections (could be in the microsecond range).
This method of adjusting time is based on an algorithm that depends on multiple environmental factors and that can be tweaked as needed. From correcting frequency and offset, to enabling or disabling PPS events processing, to synchronization status, handling leap second, estimating error and frequency change tolerance, and more.
At the core of this kernel clock discipline algorithm lies a concept from the domain of control theory, a closed loop that accumulates successive corrections, an adaptive feedback loop mechanism that tries to minimize network overhead. Today, the algorithm uses two kinds of loops, one is a phase/offset locked loop (PLL), and the other is a frequency locked loop (FLL). We’ve hinted at those previously when checking the status bit of the adjtimex -p and ntptime commands.

> adjtimex -p

         mode: 0
       offset: -7431812
    frequency: -1677305
     maxerror: 2000
     esterror: 0
       status: 8193
time_constant: 7
    precision: 1
    tolerance: 32768000
         tick: 10000
     raw time:  1588007066s 608698606us = 1588007066.608698606
> ntptime

ntp_gettime() returns code 0 (OK)
  time e2518f79.db34a44c  Mon, Apr 27 2020 20:06:01.856, (.856272195),
  maximum error 49500 us, estimated error 0 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset -6172.114 us, frequency -25.594 ppm, interval 1 s,
  maximum error 49500 us, estimated error 0 us,
  status 0x2001 (PLL,NANO),
  time constant 7, precision 0.001 us, tolerance 500 ppm,

Phase locked loop and frequency locked loops main difference is their predictor part that will output the value of their feedback loop. They both take as input the timestamp and compare it with local time but what happens afterwards, how they change either the phase/offset or the frequency depends on which one is chosen.

PLL is an offset discipline mode, its predictor is an integral of the offset over past updates and outputs the offset amortized over time in order to avoid setting the clock backward. It adjusts it gradually by small increments or decrements until the offset is gone. The time constant aka update interval is the rate at which it executes this update. The smaller the time constant, the less stiff it is, and the faster it’ll converge to an offset of 0 (stability in control theory).

FLL is a frequency based discipline mode, its predictor takes the offset and divides it by the time since the last update and adjusts the clock frequency such that at the next update the offset will be as small as possible.

In the most recent software, the two modes are used together and mixed. They are weighted according to the polling intervals, when it is bellow Allan intercepts, which is 2048s (this can be changed), then a phased-lock loop is used with more weight, and whenever the polling interval is higher, then a frequency-locked loop is heavier.

When not fetching the time from another machine that is connected to a reference clock but having it connected directly to us, we’ll require some hardware driver to interface with it.
These physical source of time can implement their own clock discipline algorithm and synchronization protocol and thus we have to adapt appropriately. If they do provide such mechanism via their drivers, we let the external clock be in control and determine which discipline should be used, normally they will themselves call ntp_adjtime(2) with the parameters they know about. If it fails, we can fallback to the previous way of adjusting time. Keep in mind that when there is an external clock taking care of system time adjustment, no other software can be aware of the error statistics and parameters it maintains.

Before moving on to what those devices could possibly be, let’s have a small note on leap second smearing.

There are two main ways to handle a leap second, we could either step, that is stopping the clock completely for a second (repeating a second) or skipping an entire second, or we could either slew the clock using the smooth kernel discipline we’ve seen.
This is what leap smearing is about, it’s a standard pushed by Google to “smear”, that is slew that second by running the clock 0.0014% slower over 24h, from noon before the leap second to noon after the leap second. This is such that the slewing happens linearly around the leap second.
The change for a small smear is about 11.6 ppm.

However, keep in mind that such standard only has weight if everyone adhere to it, otherwise during the leap second event multiple servers will have different time and the dispersion will be bigger. That is why we shouldn’t mix smearing with non-smearing time servers.

We’ve said we could connect to physical precise time source to become a reference clock, so what can those be? We’ll give as example the two most popular ones.

In the first category we have terrestrial broadcasts, radio stations that broadcast time.
The most well known are the CHU in Canada, WWV in Colorado, USA, and WWVH in Hawai, USA.

CHU broadcasts at 3.33MHz, 7.85MHz, 14.67MHz since 1923, and WWV broadcasts on 2.5MHz, 5MHz, 10MHz, 15MHz, and 20MHz since 1945. They both get their time from other reliable sources such as atomic clocks and GPS.

As they are radio broadcasts, you need a radio receiver and a way to analyze the audio to be able to synchronize with them.

What they broadcast is some repetitive beep to sync on pulse per second and minute, some binary coded decimal, and literally someone talking from time to time to say in English, or French for the Canadian version, what time it currently is in UTC hour and minute. So they alternate beeps, ticks, and voice announcements.
You can give those a listen by searching their names on Wikipedia, Youtube, or actually turning on your radio on the right frequency.

Furthermore, there are also telephone numbers that you can call to get the time, similarly to the radio. One of them is provided by the same organization as the CHU, the NRC, National Research Council of Canada.

In the second category we have GPS, the Global Positioning System.
And let’s be more explicit here, we’re talking about the American NAVSTAR GPS that is composed of multiple satellites at 20K km orbit, always having 4 satellites visible from any point on Earth.

To sync time with a GPS you need a GPS receiver, some of those also come with a pulse per second feature for accuracy’s sake. The receiver catches the civilian band frequency that the GPS continuously broadcasts and decodes the signal to get the messages in it. This message contains a multitude of information, but we’re only interested in what is time related.

The GPS satellites include atomic clocks that are accurate to the nanosecond. However, we’re loosing a bit of accuracy because of the delay between us and the satellite. So you’d think that since they have atomic clocks they would follow TAI (International Atomic Time), however they follow their own special time format called GPST, the Global Positioning System Time.
The GPST is similar to TAI, as in it is constant and unaffected by the rotation of the Earth, but the difference is that it’s epoch, it’s 0h, was set on 6 January 1980. Consequentially, it includes all the leap seconds before that date but none of the ones afterwards so currently it differs from TAI by 19 seconds, and from UTC even more. That is why newer GPS units include in their messages a field of 8 bits that contain the time difference between GPST and UTC, the number of leap seconds it missed, so that you can easily get back UTC.

The format time that GPS store and broadcast doesn’t use the year/month/day Gregorian calendar but express it as a week number and a seconds-into-week number. The week is a 10-bit field and thus rotates every 1024 weeks, which is approximately 19.6 years. The first rollover happened on August 21, 1999 and the second one on April 6, 2019.
So to determine the Gregorian date you need to know in which GPS epoch you are. Future GPS will update the standard to use a 13-bit field instead, rolling over every 157 years.

This phenomenon of week rollover has been deemed the “Y2K” of GPS because many device drivers didn’t anticipate it and had hard-coded the GPS epoch.
A solution to this would be to derive the GPS epoch from the leap second data broadcasted by the GPS and a leap second table. Weirdly, a GPS vendor, has a patent on a similar technique so you can’t use it exactly the same way. Sometimes shipped software is shipped software and nobody is going to touch nor update it, beware.

Apart from NAVSTAR, there are plenty of other space agencies that have launched GPS technology.

  • Beidou, People’s Republic of China’s
  • Galileo, European Union and other partner countries
  • GLONASS, Russia, peculiarly, its UTC is in Russia (BIH timezone)
  • NavIC, Indian Space Research Organisation
  • Michibiki, regional navigation system receivable in the Asia-Oceania regions

A last nota-bene, your position is derived from how far you are, the delay, from 4 satellites and calculating the intersection.

To link this to the previous ideas, if you have a driver that supports those external clocks hardware device receiver, it should implement the ntp_adjtime(2) or a custom discipline to take care of adjusting time itself. Be sure to check the list of drivers available for your solution.

Let’s proceed from the abstract talk to the concrete: which protocols and standards can be used to implement time synchronization with external sources of time.

The most trivial protocol is the Time Protocol defined in rfc 868. It’s a simple client-server protocol where the server when receiving a request directly replies the time in seconds since midnight 1st 1900 GMT as a 32 bit binary number.
The protocol runs on UDP and TCP on port 37, as /etc/services shows:

time               37/tcp
time               37/udp

Because it’s based on a 32 bit value, it’s going to rollover at some point in 2036 which will deprecate it easily unless the value is upgraded to 64 bits.

While it’s simple, it doesn’t take into consideration leap seconds, delays, is only precise to the second, and disregards all the stuffs about time we’ve previously mentioned.

You can give the time protocol a try by testing is using the rdate utility.

rdate - get the time via the network

rdate connects to an RFC 868 time server over a TCP/IP network, printing
the returned time and/or setting the system clock.

The evolution of the time protocol is the Network Time Protocol, or NTP. It takes into consideration multiple things the other did not including: leap seconds, broadcasting mechanism, active/passive modes, security and digest, a hierarchical level of accuracy, polling mechanisms, more precision, versioning, considerations of delays, categorizing known clocks by reference identification, and much more.

The current protocol stands at version 4, NTPv4, which is documented in rfc 5905 but has additional addendum for extensions. It is backward compatible with its previous version, NTPv3, in rfc 1305.

NTP runs on UDP and TCP on port 123, as /etc/services shows:

ntp               123/tcp
ntp               123/udp

The timestamps that NTP sends and receives rely on UTC time, the timezone information is kept for local machines to decide. Additionally, NTP warns of any impending leap second adjustment.
Thus, in theory, all NTP servers should store the same UTC time up to a certain precision.

When an NTP client is running we have to choose what to do with the hardware clock, do we sync it with system time. Many implementations either save the drift to a file so that it can be used on the next boot and/or rely on the kernel “11 minute mode” we talked about earlier. Moreover, if a network connection is available at boot time there’s the possibility of using NTP right away. Like this we remove the burden of relying on RTC when the machine is offline.

NTP uses a hierarchy, a semi-layered division, to classify clocks that are available. It calls them strata.

The stratum, singular, is a measure of the synchronization distance to a reference clock. Remember a reference clock is an actual hardware that can be used to get precise time, like a GPS. The stratum is the number of servers we need to pass through to reach such reference clock. Unlike jitter (dispersion) and delay, the stratum is a static measure, you don’t get further away from a reference clock.
So it’s preferable to use the closest (network distance) and lowest stratum possible NTP server.

The reference clock itself, the timekeeping device, is considered stratum 0 and the closest servers connected to it are at stratum 1. Thus, a server synchronized to a stratum n server will itself be considered stratum n+1.
The upper limit for stratum is 15, in theory, above this the dispersion may grow too much for it to be reliable, though, in practice it doesn’t go above 5.

The stratum hierarchy helps in spreading the load and avoid cyclical clock dependencies as it’s now in the shape of a tree. That means a small number of servers give time to a large number of clients, which in turn could be servers to others. That implies low stratum servers, such as stratum 1 servers, should be highly available and well maintained to support the rest of the hierarchy.

In addition, the NTP contains in its message a reference identifier, refId, which denotes which reference clock is used at stratum 0 on this path. So you can know you’re getting your time from which source.

Let’s also mention that NTP can be deployed locally, in a LAN. It’s possible to create your own hierarchy by acquiring a timekeeping device, such as a GPS, to avoid network delays and get a better precision.

NTP is not limited to the usual client/server architecture, it includes horizontal peering mode and a broadcasting mechanism.

Horizontal peering is when multiple servers are coupled together in a group to synchronize time more accurately.

The broadcasting mode works by having a server sends the time to a broadcast address and have clients listen for NTP packets sent to that address. This mode is useful for leap second propagation instead of having it sent only when the client connects.

On that note, on the day of a leap second event, the leap second could be propagated either from a configuration file, a reference clock, or another NTP server. What then happens, how the leap second is applied, depends on the implementation. It could be a stop or skip mechanism or a leap second smearing. It is applied at the level of the server.

What does an NTP message look like.

In the early days of the NTP, the timestamp in the message used to have the same issue as the Time Protocol, a single 32 bit value, thus having rollover issue.

NTPv4 now uses a 128-bit date format that is split into 2 main parts, one is 64 bits for the seconds and the other 64 bits for fractional seconds.
The seconds part is again split into two others, the most significant 32 bits is the current era number (number of rollovers), and the least significant bits the number of seconds in this era. That removes all ambiguity and the 64 bit value for the fraction is enough for the time it takes a photon to pass an electron at the speed of light, so very precise.

An NTP message looks like this:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      |LI | VN  |Mode |    Stratum     |     Poll      |  Precision   |
      |                         Root Delay                            |
      |                         Root Dispersion                       |
      |                          Reference ID                         |
      |                                                               |
      +                     Reference Timestamp (64)                  +
      |                                                               |
      |                                                               |
      +                      Origin Timestamp (64)                    +
      |                                                               |
      |                                                               |
      +                      Receive Timestamp (64)                   +
      |                                                               |
      |                                                               |
      +                      Transmit Timestamp (64)                  +
      |                                                               |
      |                                                               |
      .                                                               .
      .                    Extension Field 1 (variable)               .
      .                                                               .
      |                                                               |
      |                                                               |
      .                                                               .
      .                    Extension Field 2 (variable)               .
      .                                                               .
      |                                                               |
      |                          Key Identifier                       |
      |                                                               |
      |                            dgst (128)                         |
      |                                                               |

                      Figure 8: Packet Header Format

As you can see NTP is much more advanced than the time protocol. For example, a minimal request would need the version and mode filled, client mode being 3, or 011 in binary. The usual unencrypted messages are 90 bytes long (76 bytes at the IP layer). A broadcast happens every 64 seconds while a client/server architecture requires 2 packets per transaction, initially once per minute and gets reduced to once every 17 minutes in normal conditions.

With such protocol, though requiring minimal bandwidth, but with an insane amount of clients, there needs to be throttling system. The polling interval of a client depends on many factors, including its current precision and the maximum and minimum polling interval allowed by the servers.

NTP servers are viewed as a public utility of sorts and thus need help from the public, especially the people that are knowledgeable and have access to static public IP addresses. The pool of public NTP servers needs to keep growing to serve the increase in clients.

You can view a list of public NTP servers here:

But why rely on publicly available NTP servers instead of building our own NTP server hierarchy on LAN, haven’t we said this would offer more precision.

Not only would it offer more precision because of the stability of the bandwidth and its network distance, it would also mean it’s under our control and thus not throttled, thus more available. That would also mean more security and trust as you could put the NTP server in a local demilitarized zone (DMZ) which is often required to pass security accreditations.
However, to do all this requires cost: the cost to acquire and maintain a timekeeping device such as a GPS, the cost of the setup fees, the cost of additional equipments, the cost of training the team. It’s all a question of money.

But let’s say you want to deploy your own NTP server, what’s available out there for you to use, what are the implementations.

The NTP reference implementation of the protocol, the canonical open source implementation, is called ntpd. It has continuously been developed and maintained for over 25 years.

It comes with sensible default configuration to fetch time from a pool of NTP servers on the internet. Most Unix-like distros have packages that are easy to set up.

What you can configure range from the location of the drift file to control local clock, to the location of the leap second and how it’s applied, to the clock discipline related configurations like jitter rate, to security options, to log locations, and to hardware driver related configurations like when you are setting up a stratum 1 server.

The ntpq utility allows to manage the NTP server, be it local or remote, and query its status and configurations. Similar to openssl, it has an interactive mode and a command line arguments mode.
For instance, the ntpq -p output is quite interesting.

Example output:

     remote           refid      st t when poll reach   delay   offset  jitter
*time-A.timefreq .ACTS.           1 u  152 1024  377   43.527  -11.093   3.982        2 u  230 1024  377   67.958   -7.729   0.071 .ACTS.           1 u  323 1024  377   58.705  994.866 999.084

It displays the server name in the first column along with its state, a + meaning it’s a candidate server and * meaning it’s a peer. The refid column is the reference identifier we’ve mentioned. The st column is the stratum level of the server. The when column shows the number of seconds since we’ve last polled that server. The poll column is the number of seconds we have to wait between polls. The reach column is an octal bitmap of the result of the last 8 polls (377 means success for the last 8 polls). The delay column shows the number of milliseconds for the round trip, which we’ve said varies according to network stability and distance. The offset column is another term we’ve seen, it is the difference in milliseconds between your clock and the host. Finally, the jitter or disp column is the dispersion in the milliseconds, the difference between different queries to the same server, a measure of stability.

Another tool to test remote NTP servers is ntpdate. It can be used to initiate syncing local clock but when used with the -d option it can query an NTP server without changing system time.

Here’s a trace:

> ntpdate -d
30 Apr 19:04:27 ntpdate[72016]: ntpdate 4.2.8p14@1.3728-o Wed Mar 18 13:44:46 UTC 2020 (1)
Looking for host and service ntp reversed to
host found :

server, port 123
stratum 1, precision -29, leap 00, trust 000
refid [NIST], root delay 0.000244, root dispersion 0.000488
reference time:      e2557598.00000000  Thu, Apr 30 2020 19:04:40.000
originate timestamp: e255759a.e0bb5ccd  Thu, Apr 30 2020 19:04:42.877
transmit timestamp:  e255759a.c79bb883  Thu, Apr 30 2020 19:04:42.779
filter delay:  0.20827    0.23691    0.20261    0.21184   
               ----       ----       ----       ----      
filter offset: +0.006270  +0.003106  +0.010331  +0.005027 
               ----       ----       ----       ----      
delay 0.20261, dispersion 0.00426, offset +0.010331

30 Apr 19:04:42 ntpdate[72016]: adjust time server offset +0.010331 sec

By now you should be able to read the values and understand what they mean.

There are many other implementations of NTP servers other than the canonical ntpd. You can find multiple comparison tables online that show the differences between them. Here’s a few things that often get mentioned:

  • The type of license
  • The programming language used
  • The size of the program
  • How well the codebase is cleaned up and maintained
  • The time sources supported and their numbers
  • What reference clocks drivers are supported
  • The NTP modes it supports
  • Which protocol versions it has implemented
  • If you can create clusters/pools
  • The way the clock discipline works and can be configured
  • If it supports temperature compensation
  • How it handles leap seconds correction and if it can be configurable
  • Security and authentication mechanisms
  • If it has rate limiting functionalities
  • The way it timestamps, is it kernel based or hardware based
  • If it has a way to manage the RTC or not
  • Monitoring related functionality

The canonical implementation, ntpd, fully supports the specs as it’s the reference implementation, has been ported to the biggest number of operating systems, has the largest number of drivers, and is probably the most stable.

Chrony is another software that implements NTPv4. It has been written from scratch and is known to be well maintained and secure.
Chrony’s biggest selling point is that it works remarkably well in environments where the external time source isn’t regularly available, a machine that is frequently disconnected from the internet. Though this begs the question of why use Chrony instead of relying on the built-in OS mechanisms we’ve seen in the previous section. You can even explicitly tell the daemon that you are about to go offline. The other biggest advantage of Chrony is that after an audit of multiple NTP implementation, it came out as the most secure between them. Chrony is also thought to be easier to configure than ntpd.
Unfortunately, Chrony lacks in the driver, OS, and specifications support department.

systemd’s timesyncd is a network time protocol daemon that implements an SNTP client, Simple Network Time Protocol, defined in RFC 4330. SNTP is a simplified version of NTP that uses the same network packet format but with a different way to deal with synchronization. It doesn’t bother with the full NTP complexity and only focuses on querying time and synchronizing the system clock.
Thus, there’s no hardware driver support for systemd’s timesyncd, but it’s very simple.

The advantage of having the time synced using a service manager is that it can be hooked to automatically start whenever the network is operational, whenever there’s connectivity.

The status of the clock can be requested using timedatectl status.

Example of output:

> timedatectl status
               Local time: Thu 2020-04-30 19:26:56 EEST
           Universal time: Thu 2020-04-30 16:26:56 UTC 
                 RTC time: Thu 2020-04-30 16:26:56     
                Time zone: Asia/Beirut (EEST, +0300)   
System clock synchronized: no                          
              NTP service: inactive                    
          RTC in local TZ: no     

The NTP synchronization status can also be checked using timedatectl timesync-status.

Example output:

> timedatectl timesync-status
       Server: (
Poll interval: 34min 8s (min: 32s; max 34min 8s)  
         Leap: normal                             
      Version: 4                                  
      Stratum: 3                                  
    Reference: A960804                            
    Precision: 1us (-25)                          
Root distance: 32.142ms (max: 5s)                 
       Offset: -9.034ms                           
        Delay: 49.905ms                           
       Jitter: 21.188ms                           
 Packet count: 418                                
    Frequency: -27.105ppm             

The client configuration can be found in /etc/systemd/timesyncd.conf with a format defined in the timesyncd.conf(5) manpage and systemd-timesyncd.service.

> timedatectl show-timesync --all
PollIntervalMaxUSec=34min 8s
PollIntervalUSec=34min 8s
NTPMessage={ Leap=0, Version=4, Mode=4, Stratum=3, Precision=-25, RootDelay=62.911ms, RootDispersion=366us, Reference=A960804, OriginateTimestamp=Mon 2020-04-20 17:42:43 EEST, ReceiveTimestamp=Mon 2020-04-20 17:42:43 EEST, TransmitTimestamp=Mon 2020-04-20 17:42:43 EEST, DestinationTimestamp=Mon 2020-04-20 17:42:43 EEST, Ignored=no PacketCount=372, Jitter=15.678ms }

BusyBox also offers a compact built-in SNTP implementation. But beware that no driftfile is used.

clockspeed by D. J. Bernstein is an even simpler approach to SNTP that uses the TSC register to adjust the ticking speed.

Another protocol called Berkeley Algorithm works by polling the time from all the machines on a network and taking the average, all the other machines then syncs with this time.

Yet another interesting implementation is HTP, the HTTP Time protocol. HTP relies on the Date header of HTTP, defined in the HTTP/1.1 RFC 2616. It uses statistical analysis to arrive at the most accurate time possible. So if you can access a webpage then you can sync time. Though this protocol isn’t the most accurate nor the most secure.

Hence, let’s see what’s up with security and NTP. We know that anything that is on the network should be secure and trusted.

If, for example, an attacker effectuate a man-in-the-middle attack they would be to be able to change the time source for your machine. The security implications would be that certificates and signatures that shouldn’t be trusted because they expired would be. It would tamper and mess logs too.

That’s one reason why browsers today show errors whenever the system clock is out of sync.

The NTP specifications have been there for so long that we’ve had plenty of years to find security issues and fix them, the reference implementation being the testing ground for them. The codebase is constantly audited.
For example, the first version of NTP were all clear text and thus had no protection against MITM attacks. The specifications added the need for authentication, checksums, and even encryptions via symmetric and public/private keys in the latest addendum.

There has been a move by the IETF to create this encryption overlay called Network Time Security (NTS). CloudFlare currently implements it but not many others.

Another project called NTPSec forks the ntpd source, tries to remove complexity and clean the code. Finding vulnerabilities in it.

Audits are important, as we’ve said, Chrony came out as the most secure between multiple NTP implementations.

I quote:

A 2017 security audit of three NTP implementations, conducted on behalf of the Linux Foundation’s Core Infrastructure Initiative, by the security firm Cure53 suggested that both NTP reference implementation and NTPsec were more problematic than Chrony from a security standpoint.

On the other hand, there are other types of things we need to care about. We’ve discussed the polling issues and the lack of public NTP servers in the pool, and so it’s important for them to be able to withstand heavy loads. One attack relies on computationally expensive operations to take down a public NTP server, a denial of service (DDoS) called the Kiss O’Death Packet. In this attack the client sends a very small query that gets amplified to a huge output content and overloads the server. This is similar to DNS amplification attacks.

Another security issue related to how much we rely on NTP as a public service, is about how some IoT devices have been found to hard-code the address of NTP servers. These kinds of assumptions are dangerous.

The last thing I want to address in this section is another protocol for syncing system time with an external time source called PTP, the Precision Time Protocol.

PTP is used for when NTP doesn’t provide enough precision, for critical measurement and control systems such as financial transactions or mobile phone tower transmissions. It is specially crafted for the local network scenario where you have a machine that has a reference clock device connected to it.

PTP was originally defined by the IEEE in 2002 in the IEEE 1588-2002 standard, officially entitled “Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems”, then reviewed in IEEE 1588-2008, PTPv2, and again reviewed in 2019 in IEEE 1588-2019.

PTP is similar to NTP in the way that it synchronizes time between machines but the difference is that it adds accurate network latency information using hardware timestamping.
Hardware assisted timestamp is done at the very lowest levels of the network stack such as the MAC layer, the ethernet transceiver, right before sending the packet. The clock can also be associated with network equipments. When the message passes through devices, as it traverses them, the timestamp is updated by them accurately. It is all helped with kernel features for PTP such as the socket option SO_TIMESTAMPING for packet timestamping.
So the timestamp offset is accurate, the delay precise and predictable, and with low latency, that’s why PTP can achieve less than microsecond accuracy.

PTP runs on port 319 and 320 using both TCP and UDP for different scenarios.

It uses the same epoch as the Unix time however, while Unix time is based on UTC and is subject to leap seconds, PTP is based on the International Atomic Time (TAI).

An ordinary clock, in NTP parlance a reference clock, is the source of time in PTP that distributes it to a grandmaster, which can then again relay it to boundary clocks that others sync with. It is an architecture that is integrated with the network segmentation.

Within a group the system is automatically able to decide on who to elect as a master clock, which is the clock which is deemed the most accurate.

Let’s end by saying that when push comes to shove, you can always buy a machine that comes with everything integrated instead of setting it up yourself.

In the next section we’ll see systems that rely on precise time in special ways.

What depends on time

  • Consequences of bad system clock
  • Distributed systems
  • RTOS and scheduling

We’ve hinted and mentioned some examples of what could happen if there is misbehavior in system time. What else could happen.

We rely on system time for anything related to communication with other humans, it creates the context of what is happening. It is also useful for sampling, data collecting for statistical analysis.

With an out-of-whack system time, logs will be out of synchronization and events will be hard to debug. It will be hard to correlate things with real life.
Database queries that rely on the now() to get the current time and date will also write the wrong values.
Many backup scripts will be messed up because they don’t expect time to move backward or weirdly.
Similarly, cron jobs that you would have expected to start at a specific time may start at another that isn’t appropriate.
The make utility also relies on the timestamp to know which files need to be recompiled. Mishaps in system time may lead it to always recompile the same files.
Many concepts in security such as certificate verification, one time code, and protocols for authentication rely on system time to be synchronized.

Apart from the human perception issues there are the countless overflows and rollovers we’ve mentioned. All of them being issues related to the size of the data structure that isn’t enough to store what is needed, leading to either a rollover or unpredictable behavior by overflowing.

There are the problems related to time zones and daylight saving events. That can happen during the transition when computers repeat an event for one more hour or one hour less than the expected duration. This is tricky when those computers handle machineries and medical devices, it could harm lives or drive businesses to the ground.
Those can also lead to miscommunication between places that have different time zones or don’t apply the same daylight saving time.
Programs that are not anticipating changes may need upgrade, things like email programs and calendars.

To resolve most of those it’s better not to use time zones and DST in these cases but to rely on UTC and leave civil time for display only. But as you know, there’s still the issue of leap seconds.

Some domains need more synchronization than others, let’s discuss distributed systems and real-time operating systems as examples.

A distributed computer system consists of multiple software pieces running on different computers but that still try to act as a single system.

In a distributed system, it is important to keep clock synchronization, to ensure all the computers have the same notion of time so that the execution runs on the same timeline.
What we truly care about is event ordering, to know what happens before what, cause and consequences.

In general there are two ways to keep things in order, you either use wall clocks, which are all the timekeeping devices we’ve been mentioning thus far, or you use logical clocks which are simple monotonous counters all going in a single direction.
Example of logical clocks include Lamport clocks and vector clocks.

The complication in distributed systems is that system clocks on different machines will eventually drift and so it’s hard to keep a strict ordering. That’s not what’s expected of a consistent monotonic clock.
That is why logical clocks are favored as they resolve the chain of events and conflicts.

However, Logical clocks are not always an option. Another one could be to have a strong coordinator for timestamp in the middle where messages always pass through it. But that solution adds a bottleneck to the architecture and limits the availability of the system.
Yet again you could go back to wall clocks but rely on atomic time along with an NTP server, it is not a perfect solution but it avoids time zone, leap seconds, and daylight saving time.

In distributed systems, virtual machines are often used, so a way to keep them in sync is to sync them with their host.

The other domain to explore is real-time operating systems, systems that are mission critical and require accuracy.

Real-time OS, aka RTOS, are similar to general-purpose OS (GPOS) in that they are responsible to managing computer resources and hosting applications, but they differ in the way they are made to have precise timing and high reliability. They are especially useful in machinery environment and embedded systems.

RTOSs are deterministic by design, they are made to meet deadlines associated with external events, they are made to be responsive.
Its jitters, the measure of error in the timing of subsequent tasks, should be extremely low, tasks should execute with almost no delay or a predictable one.
If a system doesn’t meet the maximum time allocated to perform a critical operation, if it doesn’t fulfill the timing constraint to act on an event, it isn’t considered real-time.

Depending on the type of events it can guarantee, real-time OS are separated into two categories. If it is mandatory that it should be truly deterministic and that a not adhering means a system failure, we categorize it as a hard real-time OS. If the system can only guarantee a certain set of events to happen in real-time and that not adhering won’t lead to catastrophic events, we categorize it as a soft real-time OS.

So programs running on real-time OS should run with consistent timing, but we don’t stop at this, programmers should have full control on how these tasks are prioritized and be able to check for deadlines. They should be able to dictate how the scheduling takes place and expect it to be directly reflected.

The priority given to task are a parameter that is part of the scheduling algorithm an OS uses to choose which task should run next. It schedules tasks based on interrupts. The time to handle this interrupt routine is variable in general-purpose OS but in real-time OS its latency is bounded.
This is the same time interrupt handler we’ve seen before, the one that ticks at specific intervals, allows the OS to schedule tasks, increment system time, call timer routines, do periodic tasks, etc. This is where the constraint is applied.

The scheduling is strict, no low priority task can run before a high priority one. Real-time OSs use rate monotonic, earlier deadline first, preemptive scheduling algorithms while general-purpose OSs use completely fair scheduling, round robin, rotary inventor fair scheduling, etc.
Pre-emptive scheduling differs from co-operative scheduling algorithm in that with co-operative scheduling we trust the task to explicitly relinquish control once it’s done, while with pre-emptive scheduling we can force to suspend its execution and replace it by another. This all allows a RTOS to respond quickly to real-time events.

There are many examples of real-time OS such as VxWorks, QNX, RtLinux, FreeRTOS, RiotOS, and more. Some are open source, some are POSIX compliants, some are hard real-time, some are soft real-time.

On Linux specifically, to enable real-time you have to enable the PREEMPT_RT scheduling patch. Though, it’s arguably soft real-time as it’s not mathematically provable real-time.

Another real-time Linux related project is the ELISA project, the Enabling Linux in Safety Application project, which aims at creating a standard to build mission critical Linux systems.

I quote:

Founding members of ELISA include Arm, BMW Car IT GmbH, KUKA, Linutronix, and Toyota. To be trusted, safety-critical systems must meet functional safety objectives for the overall safety of the system, including how it responds to actions such as user errors, hardware failures, and environmental changes. Companies must demonstrate that their software meets strict demands for reliability, quality assurance, risk management, development process, and documentation. Because there is no clear method for certifying Linux, it can be difficult for a company to demonstrate that their Linux-based system meets these safety objectives.

That just shows how much time in computers affects real humans lives.

Human perception of time - Conclusion

For machines time may need accuracy but for humans it is definitely subjective. We feel time as our emotions are swayed by life. This certainly matters.

We possess clocks within us, biological clocks in which sun time and memories play an integral part in.

For example in emergencies and dangerous situations, time seems to slow down. And as we grow older we perceive time to move faster. It’s interesting to see time from a human perspective.

We live in such an interconnected world that greeting people in the morning on the internet has lead to the Universal Greeting Time. Time in this interconnected world has also led to marketing campaigns and funny new standards such as the Beats.

I’ve written in depth about “internet time perception” in this other article.

I hope this colossal article has cleared the topic of time, and time on Unix. It may not be as concise and scientifically true in some places but still convey an approximate overview to readers. Please reach out to me for corrections or comments.



What is time

Representation on Unix (locale and tz)

Where do we usually find time on Unix

System time, hardware time, internal timers

Syncing time with external sources

What depends on time

Human perception of time

Jan van den Berg (j11g)

Volume 1: From Savoy Stompers to Clock Rockers – Andrew Hickey May 01, 2020 03:35 PM

One of my favorite podcasts is “A History of Rock Music in 500 Songs”. I’ve written about it before, it’s an absolutely terrific podcast.

But this post is not about the podcast but about the book!

After the first 50 episodes creator Andrew Hickey bundled the adapted episode transcripts into the first volume of a book series. And, of course, I had to get it, as an unmissable reference and to support the podcast.

Volume 1: From Savoy Stompers to Clock Rockers – Andrew Hickey (2019) – 551 pages

Here are some thoughts on the book’s look and feel as it arrived in the mail this morning. So this is not a book review!

  • I’ll start with what I don’t like (and what can’t be the author’s fault at all). This book is printed-on-demand , and your mileage may vary, but on my particular copy the cover has been cut off prematurely. So the letter “c” from the word “Music” is right on the edge of the cover. It bothers me a bit and it’s a shame that such a wonderful book has to suffer this fate.
  • It’s quite a meaty book (I like that!). But I ordered the paperback and the postal service wasn’t too careful with it, so there are already some dents on the book. So you might want to get the hardcover.
This is a shame.
  • I was FULLY expecting the spine to have “Volume 1” or a at least a number on it, but that is not the case. I say this because I intend to buy every copy and imagined the series, identifiable by their consecutive numbers, would look majestically encyclopedic on my bookshelf.
The spine (and flappy cover)
  • I love the black and white cover. It’s classy and timeless.
  • As stated, it is a meaty book. I love holding it, it has a very nice feel to it. And the paper is pleasant, not too bright or hard.
  • For a reference book the font is well chosen. I believe it’s Chord Symbol, which is fitting when you think about it. But more so, this font makes it easy to quickly skim and scan parts, which makes sense for a reference book (my intended use).
  • The “Contents” (chapters) section only has the song titles, not the artists. I can think of a few reasons: especially in the early days, some songs were often done by multiple people (even at the same time). And after all it is a podcast about SONGS. But still, the podcast does have artist names. So I don’t quite understand this distinction.
  • The chapters also have no numbers. Which is not a problem. But it seems the reference / link to the podcast has (deliberately?) been cut. The chapters seem to have no link to the podcast episodes.
  • The absolute best parts of this book are the song index and the regular index. These are indispensable. I absolutely love them and they will often be my starting point when I want to look up something. They are very well done and look exhaustive.
  • The page numbers are on the top of the page on the outside. Which is how I like it, this makes thumbing back and forth to the index easy.
  • I thought I couldn’t love Andrew Hickey’s work more than I already did, but then I read his acknowledgement to Donald Knuth! I cannot state how much I adore this. (Knuth holds a special place in my heart, and I even host a podcast RSS feed for a couple of his lectures).


My wish for this podcast is that it will become so famous that Andrew Hickey will get a regular book deal, and the nuisances that come with print-on-demand will become a thing of the past. Nonetheless, this book is already a spectacular body of work by someone truly passionate and gifted, and a book that will look good on any bookshelf.

I love that this fantastic podcast, is now available in a format that can be picked up a 100 years from now and still be instantly accessible. Go buy it!

The post Volume 1: From Savoy Stompers to Clock Rockers – Andrew Hickey appeared first on Jan van den Berg.

April 29, 2020

Derek Jones (derek-jones)

Beta release of data analysis chapters: Evidence-based software engineering April 29, 2020 02:58 AM

When I started my evidence-based software engineering book, nobody had written a data analysis book for software developers, so I had to write one (in fact, a book on this topic has still to be written). When I say “I had to write one”, what I mean is that the 200 pages in the second half of my evidence-based software engineering book contains a concentrated form of such a book.

This 200 pages is now on beta release (it’s 186 pages, if the bibliography is excluded); chapters 8 to 15 of the draft pdf. Originally I was going to wait until all the material was ready, before making a beta release; the Coronavirus changed my plans.

Here is your chance to learn a new skill during the lockdown (yes, these are starting to end; my schedule has not changed, I’m just moving with the times).

All the code+data is available for you to try out any ideas you might have.

The software engineering material, the first half of the book, is also part of the current draft pdf, and the polished form should be available on beta release in about 6 weeks.

If you have a comment or find a problem, either email me or raise an issue on the book’s Github page.

Yes, a few figures and tables still bump into each other. I’m loath to do very fine-tuning because things will shuffle around a bit with minor changes to the words.

I’m thinking of running some online sessions around each chapter. Watch this space for information.

April 26, 2020

Ponylang (SeanTAllen)

Last Week in Pony - April 26, 2020 April 26, 2020 03:00 PM

Nightly builds of ponyc for FreeBSD 12.1 are available. pony-semver has moved into the ponylang organization. The past sync meeting discussed syntax changes for the call site of behaviours.

Gonçalo Valério (dethos)

Security.txt April 26, 2020 12:11 AM

Some days ago while scrolling my mastodon‘s feed (for those who don’t know it is like Tweeter but instead of being a single website, the whole network is composed by many different entities that interact with each other), I found the following message:

To server admins:

It is a good practice to provide contact details, so others can contact you in case of security vulnerabilities or questions regarding your privacy policy.

One upcoming but already widespread format is the security.txt file at https://your-server/.well-known/security.txt.

See and

It caught my attention because my personal domain didn’t had one at the time. I’ve added it to other projects in the past, but do I need one for a personal domain?

After some thought, I couldn’t find any reason why I shouldn’t add one in this particular case. So as you might already have guessed, this post is about the steps I took to add it to my domain.

What is it?

A small text file, just like robots.txt, placed in a well known location, containing details about procedures, contacts and other key information required for security professionals to properly disclose their findings.

Or in other words: Contact details in a text file.

security.txt isn’t yet an official standard (still a draft) but it addresses a common issue that security researches encounter during their day to day activity: sometimes it’s harder to report a problem than it is to find it. I always remember the case of a Portuguese citizen, who spent ~5 months trying to contact someone that could fix some serious vulnerabilities in a governmental website.

Even though it isn’t an accepted standard yet, it’s already being used in the wild:

Need more examples? A small search finds it for you very quickly or you can also read here a small analysis of the current status on Alexa’s top 1000 websites.


So to help the cause I added one for this domain. It can be found at

Below are the steps I took:

  1. Go to and fill the required fields of the form present on that website.
  2. Fill the extra fields if they apply.
  3. Generate the text document.
  4. Sign the content using your PGP key
    gpg --clear-sign security.txt
  5. Publish the signed file on your domain under https://<domain>/.well-known/security.txt

As you can see, this is a very low effort task and it can generate very high returns, if it leads to a disclosure of a serious vulnerability that otherwise would have gone unreported.

April 25, 2020

Simon Zelazny (pzel)

Newtype-like tagged tuples in Elixir April 25, 2020 10:00 PM

A thought (and code) experiment in type wrappers

Some time ago I stumbled upon a hacky way of generating something akin to Haskell's newtype declarations in Elixir. Here it is: defopaque.

A Motivating Example: Units of Measure

I'm going to assume you agree with the maxim that making illegal states unrepresentable is a desirable feature of software systems.

Let's say a part of your system deals with weights. It would be nice to ensure that programmers don't use plain numbers when dealing with units of measure. The consequences of doing so have been bad in the past. Let's define a Weight module with a kg constructor, wrapping any number. This will represent our unit of weight.

defmodule Weight do
  use Defopaque
  defopen(:kg, number())

Our intention here is to create a lightweight wrapper type whose role is primarily to document the meaning of the variable, and also to prevent accidental use of weight-related functions with 'plain' numbers.

The macro defopen gives us:

1) A kg() type, exported from Weight.

2) A kg(n) macro, which will generate a tuple containing the number n as its second element. The first element of the tuple will be an autogenerated atom (guaranteed to be stable for every {wrapper-atom, wrapped-subtype} pair). We can use this macro to generate new kg values and to pattern-match on existing values.

Let's see some examples of the kg unit in use:

defmodule MyApp do
  import Weight

  @spec tell_weight( :: String.t()
  def tell_weight(w) do
    case w do
      kg(12) -> "twelve kilograms"
      kg(other) -> "#{other}kg"
      _ -> "invalid unit"
require Weight
iex()> MyApp.tell_weight(
"twelve kilograms"
iex()> MyApp.tell_weight(
iex()> MyApp.tell_weight(12)
"invalid unit"

You can also use pattern-match syntax to match on values inside the constructor:

defmodule Matches do
  import Weight

  def count(want_value) do
    weights = [kg(1), kg(3.9), kg(5.3), kg(10.1)]
      fn kg(^want_value) -> true
         kg(_) -> false
iex(19)> Matches.count(3.90)
iex(20)> Matches.count(3.91)

You can even pattern match in function heads:

defmodule Conversion do
  import Weight
  def kg_to_lb(kg(n)), do: n * 2.204623
require Weight
iex(6)> Conversion.kg_to_lb(kg(2))
iex(7)> Conversion.kg_to_lb(kg(0.5))

But note! Now our function can only be called once:

Conversion.kg_to_lb(kg(0.5)) |> Conversion.kg_to_lb()
** (FunctionClauseError) no function clause matching in Conversion.kg_to_lb/1

    The following arguments were given to Conversion.kg_to_lb/1:

        # 1

    iex:5: Conversion.kg_to_lb/1

This is precisely the behavior we wanted at the very beginning. Only a kg unit can be converted, not a plain number.

If we want to expand our system to deal with pounds as a first-class citizen, we are free to do so. It's very cheap to generate a new wrapper, and we can do it in the same module:

defmodule Weight do
  use Defopaque
  defopen(:kg, number())
  defopen(:lb, number())

defmodule Conversion do
  import Weight
  def kg_to_lb(kg(n)), do: lb(n * 2.204623)
  def lb_to_kg(lb(n)), do: kg(n * 0.4535924)

This will work as expected, only allowing for conversions in the right direction:

import Weight
iex()> kg(res) = Conversion.kg_to_lb(kg(1)) |> Conversion.lb_to_kg()
{:"f7ce213d444ac5216656-kg", 1.0000002376652002}
iex()> res

iex()> kg(res) = Conversion.kg_to_lb(kg(1)) |> Conversion.kg_to_lb()
** (FunctionClauseError) no function clause matching in Conversion.kg_to_lb/1

    The following arguments were given to Conversion.kg_to_lb/1:

        # 1
        {:"be46b1adbd0d445032d6-lb", 2.204623}

    iex:25: Conversion.kg_to_lb/1

As you can see, we got a peek at how our wrapper tagging is implemented. It's not exactly pretty, but it prevents other modules from creating wrapped types without importing the module where these types are defined.

Why not Tagged Tuples?

Tagged tuples are the traditional way of handling this kind of problem, and there's nothing wrong with them. However, they don't prevent 'unauthorized' use of our types. For example, anyone can create the tuple {:kg, 2}.

So there's no reason why some module that should know nothing about weights couldn't match on our tagged tuple:

def an_allegedly_weight_agnostic_function({:kg, weight}) do (...)

With this method, code that does not reference our Weight module cannot in good faith create our tagging atom, since it's more-or-less gibberish.

Also, unique and opaque tags mean that dialyzer can be much more strict when checking our code.

Why not structs?

Structs are Elixir's killer feature and they are great for modeling composite data types. This project came out of an attempt to golf opaque structs, defined as internally nested modules with a single field. That's what I recommend doing in real production projects!

Why the name defopaque?

It's because the original intent behind this hack was to provide a quick-n-dirty way to define @opaque newtypes in a codebase. Later on I figured it would also be nice to provide non-opaque, destructurable newtypes. Hence two macros:

  1. defopaque -- Creates a wrapper and defines the resulting wrapped type as @opaque. The generated constructor and pattern-match macro can only be used in the module where the opaque type is defined.

  2. defopen -- Creates a wrapper and defines the resulting wrapped type as @type. The generated constructor can be used outside the module where it was defined.

(Edited on 2020-04-27: module name, truncated sentence!)

Jeremy Morgan (JeremyMorgan)

9 Courses You Can Take to Become a JavaScript Wizard April 25, 2020 09:19 PM

There are tons of front end frameworks to choose from, and getting good with them is no small task. But sharpening your core JavaScript skills can make you better at all front-end frameworks. By thoroughly understanding JavaScript at its core, you will write better programs, faster, with less struggle. If you aren’t sure where you stand, you can take a JavaScript Skills Test to find out! Here are nine great courses to help you become a JavaScript wizard.

April 24, 2020

Andreas Zwinkau (qznc)

Accidentally Turing-Complete April 24, 2020 12:00 AM

A list of things that were not supposed to be Turing-complete, but are.

Read full article!

April 22, 2020

Caius Durling (caius)

RSpec Given/When/Then with symbols April 22, 2020 05:30 PM

Having a need to write some BDD-esque tests without the need of putting them in front of non-technical people, I was recently playing around with rspec feature specs. Where I’ve used these previously we’ve eventually run into curation issues where the specs are outdated, brittle and require so much maintenance we’ve generally ended up lobbing cucumber into the project as a stopgap.

This is due to ending up with feature specs like the following, which lead you to having to parse the code mentally to work out what it’s testing:

RSpec.feature "Admin: Posts" do
scenario "Authoring a post" do
@user = create :user, :admin
login_as @user
visit new_admin_post_path
fill_in "Title", with: "RSpec feature specs"
fill_in "Body", with: "Some piffle about feature specs"
click_on "Publish!"
visit root_url
expect(page).to have_content("RSpec feature specs")

After some reading around, I eventually stumbled back across this idea from Future Learn where they lay out the above test by splitting it into private methods within the feature block, but leaving it more readable to future readers. I then found Made Tech’s take on this same idea, and riffing off the both of them ended up with the following instead:

RSpec.feature "Admin: Posts" do
scenario "Authoring a post" do
def given_i_am_logged_in_as_an_admin
@user = create :user, :admin
login_as @user
def when_i_publish_a_new_post
visit new_admin_post_path
fill_in "Title", with: "RSpec feature specs"
fill_in "Body", with: "Some piffle about feature specs"
click_on "Publish!"
def then_i_see_the_post_on_the_homepage
visit root_url
expect(page).to have_content("RSpec feature specs")

Now this is fine, but writing lots_of_names_with_underscores_in_is_a_trifle irritating. Now I remember Jim Weirich1 showing off rspec-given at a conference a few years ago, and wondered if that would solve my problem here of wanting to have runtime warn me when my methods are misspelled or missing, without having_to_underscore_them.

Now rspec-given would let me do that, but I’d have to switch from calling them all in turn inside a scenario block to calling them inside context blocks and passing blocks to each of the Given, When, etc methods. I think it would be something like (warning, untested)

Rspec.feature "Admin: Posts" do
Given { @user = create :user, :admin }
Given { login_as @user }
context "authoring a post" do
When { visit new_admin_post_path }
When { fill_in : }
Then { visit root_url }
And { expect(page).to have_content("RSpec feature specs") }

Now this didn’t quite fit with what I wanted. However, I did wonder if it was possible to go down the route of having a Given method that takes a token to identify the code it should call. (A method if you will.) It’s possible in ruby to call a method starting with a Capital letter, but convention dictates those are usually class/module names (constants) rather than methods.

A little bit of hacking later and this is what I ended up getting working:

RSpec.feature "Admin: Posts" do
scenario "Authoring a post" do
Given :"I am logged in as an admin"
When :"I publish a new post"
Then :"I see the post on the homepage"
def_Given :"I am logged in as an admin" do
@user = create :user, :admin
login_as @user
def_When :"I publish a new post" do
visit new_admin_post_path
fill_in "Title", with: "RSpec feature specs"
fill_in "Body", with: "Some piffle about feature specs"
click_on "Publish!"
def_Then :"I see the post on the homepage" do
visit root_url
expect(page).to have_content("RSpec feature specs")

Now there’s two extra things that makes this easier for me to write than underscored methods. Ruby doesn’t only allow :foo as a symbol, it also allows :"foo bar" for writing a symbol. You can then define a method based on that even though it has spaces in the method name.

My text editor2 also autocompletes ruby symbols from partial matches, which makes it easy to write out what I want in the scenario, run the spec and find out what methods need defining, then define the methods using autocomplete to save copy/pasting everything.

By using actual methods for these, we get a couple of other happy accidents along the way. Most ruby installs now include did_you_mean out the box, which suggests methods like the one you called if your method results in a NoMethodError. This works quite nicely, you end up with something like

undefined method `When I pblish a new post' for #<RSpec::ExampleGroups::AdminPosts:0x00007faf1f9fc4c0>
Did you mean? When I publish a new post

And then if you just run it without implementing any of the helper methods at all, you get a nice NoMethodError telling you exactly what you need to implement:

undefined method `Given I am logged in as an admin' for #<RSpec::ExampleGroups::AdminPosts:0x00007fbd06598498>

The magic behind that makes all this work is in spec/support/given_when_then.rb, which is not terrible, but also probably not a great idea. 🙃

  1. 😿 ↩︎

  2. TextMate 2 ↩︎

April 21, 2020

Pete Corey (petecorey)

Guitar Chord Voicings with Prolog April 21, 2020 12:00 AM

Generating guitar chords is my thing. Over the years I’ve written thousands of lines of code and even more words all dedicated to the process of algorithmically generating and recommending guitar chords. In the spirit of generation, let’s throw another language and another few hundred additional words onto that stack!

Prolog is a logic-based programming language that, as I’m learning (disclaimer: I’m very new to Prolog), excels at representing logical relationships between data. The guitar fretboard is a never-ending landscape of interesting relationships ripe for exploration, so Prolog seems like a valuable tool to have at our arsenal as fretboard explorers.

Let’s see how we can use it.

The Magic of Prolog

One of the most mind blowing aspects of Prolog, from the perspective of someone new to the language, is the fluidity and ambiguity of inputs and outputs to “predicates” (think of them as functions).

For example, we can ask the built-in member/2 predicate if 1 is a member of the list [1, 2, 3]:

member(1, [1, 2, 3]).

And Prolog will tell us that yes, 1 is a member of [1, 2, 3]:


We can also bind the first argument of our call to member/2 to a variable, and Prolog will happily report all possible values of that variable for which the predicate holds true:

member(X, [1, 2, 3]).

X = 1 ;
X = 2 ;
X = 3.

When the second argument of member/2 is [1, 2, 3], the first argument can either be 1, 2, or 3.

But we can take things even further. We can bind the second argument of our call to the member/2 predicate to a variable and ask Prolog for all of the lists that contain our first argument, 1:

member(1, X).

X = [1|_5982] ;
X = [_5980, 1|_5988] ;
X = [_5980, _5986, 1|_5994] ;
X = [_5980, _5986, _5992, 1|_6000] ...

This implementation of the Prolog runtime (SWI-Prolog 8.0.3) represents unbound variables with leading underscores. So the first possible value of X is 1 prepended to any other list. Another possible value of X is some value prepended to 1, prepended to any other list. And so on, forever.

The member/2 predicate simply defines the relationship between it’s two arguments. If one of those arguments is omitted, it can be recovered by applying or reversing that relationship appropriately.

Is your mind blown yet?

Chordal Relationships

Let’s write a predicate that accepts a few arguments that describes our guitar’s fretboard in terms of tuning and number of frets, the quality of the chord we’re looking for, and the notes of a specific chord voicing given as string/fret tuples. Our predicate will either confirm or deny that the notes given live within the bounds of the fretboard and accurately depict the desired chord quality.

For example, on a normal guitar tuned to standard tuning, we could ask if fret 3 played on string 1 (starting from the lowest string), fret 2 played on string 2 and the open fret played on string 3 constitute a C major ([0, 4, 7]) chord voicing:

voicing([[0,40], [1,45], [2,50], [3,55], [4,59], [5,64]],
        [0, 4, 7],
        [[1, 3], [2, 2], [3, 0]]).

And the answer is yes, they do:


If we assume that both our Tuning array and the final Voicing array are sorted in terms of string number we can build our predicate with a simple walk across the strings, analyzing each note in the chord along the way.

For every string on the fretboard, we first check that a note in our Voicing lives on that String. If it does, we need to make sure that the Fret being played on that String is between/3 0 and the number of Frets on the fretboard. Next, we calculate the Pitch of the fretted note and verify that it’s a member of the chord Quality we’re checking for. Lastly we remove that pitch from the set of qualities, and recurse to check the rest of the strings and remaining notes in our chord voicing:

voicing([[String,Open]|Tuning], Frets, Quality, [[String,Fret]|Voicing]) :-
  Pitch is (Open + Fret) mod 12,
  member(Pitch, Quality),
  subtract(Quality, [Pitch], RemainingQuality),
  voicing(Tuning, Frets, RemainingQuality, Voicing).

If a string isn’t being played as part of the given chord voicing, we can simply move on to check the next string on the fretboard:

voicing([_|Tuning], Frets, Quality, Voicing) :-
  voicing(Tuning, Frets, Quality, Voicing).

Eventually, we’ll run out of strings to check. In that case, if the remaining set of notes in the chord voicing and the remaining set of pitches in our chord quality are both empty, we can say with confidence that the given set of notes is a valid voicing of the specified chord quality:

voicing([], _, [], []).

If we run out of strings and we’re still looking for either notes in the voicing, or pitches in the quality, we know that something has gone wrong, and the chord we’re looking at isn’t a valid voicing.

Altogether, our complete voicing/4 predicate looks like this:

voicing([], _, [], []).

voicing([_|Tuning], Frets, Quality, Voicing) :-
  voicing(Tuning, Frets, Quality, Voicing).

voicing([[String,Open]|Tuning], Frets, Quality, [[String,Fret]|Voicing]) :-
  Pitch is (Open + Fret) mod 12,
  member(Pitch, Quality),
  subtract(Quality, [Pitch], RemainingQuality),
  voicing(Tuning, Frets, RemainingQuality, Voicing).

We can write a helper predicate that assumes an eighteen fret guitar in standard tuning:

voicing(Quality, Voicing) :-
  voicing([[0,40], [1,45], [2,50], [3,55], [4,59], [5,64]],

We can use our new voicing/4 or voicing/2 predicates to ask whether a certain set of notes played on the fretboard are a valid C major voicing:

voicing([0, 4, 7], [[1, 3], [2, 2], [3, 0]]).

And Prolog happily tells us that it is a valid voicing!



Reversing the Relationship

We’ve seen that we can use our voicing/4 or voicing/2 predicate to check if a given set of notes on the fretboard are a valid voicing for a given chord quality. For example, we can ask if the notes [[1, 5], [2, 5], [4, 4], [5, 6]] represent a G7 ([5, 9, 0, 3]) chord voicing, and our Prolog program will confirm that they do.

But what else can we do? We were promised exploration!

Our voicing/4 implementation didn’t explicitly lay out the steps for constructing a chord voicing of a given quality, but it did define the relationships between a fretboard configuration, the quality of the chord we’re looking for, and the notes in a given chord voicing. Just like we reversed the relationships in member/2 to construct all possible lists containing 1, we can reverse the relationships defined in voicing/4 and find all possible voicings of a given chord quality!

All we have to do is leave the Voicing argument unbound when we call our voicing/2 predicate, and Prolog will reverse the relationship and spit out every possible voicing of our G7 chord spread across out fretboard:

voicing([5, 9, 0, 3], Voicing).

Voicing = [[2, 1], [3, 2], [4, 1], [5, 1]] ;
Voicing = [[2, 1], [3, 2], [4, 1], [5, 13]] ;
Voicing = [[2, 1], [3, 2], [4, 6], [5, 8]] ...

Awesome! This is basically the heart of Glorious Voice Leader compressed into ten lines of code.

Future Work

We should be able to dig deeper into these relationships. In theory, we should be able to leave the Quality off of our call to voicing/2 and Prolog should tell us all of the possible qualities a given set of notes could be interpreted as.

Similarly, we should be able to leave the Tuning argument unbound, and Prolog should give us all of the possible tunings that would give us the given type of chord with the given voicing.

Both of these types of query sound extremely useful and interesting for someone exploring their fretboard and trying to deepen their understanding of the guitar, but they’re infeasible with my current implementation of the voicing/4 predicate. If we try either of them, Prolog will think forever and never give us an answer. If we trace through the execution we’ll see an enormous amount of time being wasted on inevitably doomed partial solutions.

If I were a better Prologger, I’m sure I could implement a version of the voicing/4 predicate that could give us these answers, but I’m just not there yet. Consider it future work.

Robin Schroer (sulami)

The Grumpy Developer's Guide to Meetings April 21, 2020 12:00 AM

While everyone is writing about remote meetings these days, I do not believe that successful remote meetings are actually meaningfully different from successful in-person meetings.I’m glossing over videoconferencing because I don’t believe it is a meaningful hurdle to clear.

Like many other developers, I loathe meetings, especially when they seem like a waste of valuable time. My philosophy is “if you have to set up a meeting, at least do it right”. Consequently, here are some rules for successful meetings, both remote and in-person:

The Best Meeting Is No Meeting

It is important to understand the tradeoffs involved when deciding to schedule a meeting. Meetings are synchronous discussions, which makes them useful for knowledge exchanges and decision making processes, but they are also time-intensive and disruptive for everyone involved.

The alternative to organising a meeting is an asynchronous conversation, for example on a shared document.I find that doing all the same work as for a meeting on a shared document, but just leaving out the actual meeting, is a very effective way of using asynchronous process.

This can be faster than a meeting overall, as it allows participants to participate in their own time, instead of trying to find a slot that fits everyone. In addition to that, it is much less disruptive compared scheduling a meeting, especially for knowledge workers.

The only good reason for a synchronous meeting is making decisions requiring knowledge exchange and/or consensus. A good example would be planning a complicated feature, or cleaning up a backlog.

The Most Important Work Happens Before the Meeting

Every meeting needs an organiser. The organiser needs to define the expected outcome of the meeting, as well as gather all required information, and share all of these in the meeting agenda. The outcome should be tangible, like a decision or a set of tasks to be done.

The agenda is crucial, because the participants will likely have to do some preparation in advance to the meeting. If any specific work needs to be done before the meeting, like research or data retrieval, make sure to clearly assign tasks.

I recommend writing out the full agenda before scheduling the meeting, and then sharing the agenda with the calendar invitation. This allows everyone to schedule their preparation in their own time.

The Actual Meeting

During the meeting it is important to keep notes, usually in a shared document. You can designate a scribe, or just agree to all contribute some bullet points whenever possible.Both approaches have pros and cons. A dedicated scribe leads to more detailed and coherent notes, but requires a person who will not be able to take part in the conversation, at least not effectively.

You want to write down any conclusions reached, important points made, and further questions and future work to be done. Do not care too much about form, the meeting organiser should rewrite the notes after the meeting anyway.

The meeting is prime talking time, and you should treat it as such. Do not spend time watching someone carry out a task.There is really no point in everyone watching a single person manipulate a JIRA board during a meeting.

If you find during the meeting that you require more information, write this down as a future task instead of derailing the meeting.

The organiser should always work towards the defined outcome, but it often happens that some other discussion emerges as a precondition to reaching the outcome. There is a balance to be struck between discussion necessary to get to a meaningful outcome and veering too far off-topic.

Keep in mind that the time slot scheduled is more a rough guideline than a rule. If you can finish early, do so. If you need more time, and everyone involved has more time, take it.Though maybe take a 5 minute break.

Do not try to force an outcome if you did not get there in time. If you find at the start of the meeting that some necessary precondition has not been met do not be afraid to reschedule or cancel altogether.

The Alternative: The Ad Hoc Huddle

The rules above do not mean that you cannot talk to one another without ritual. Especially for small discussions between 2-4 people, ad hoc huddles can be useful.

As a rule of thumb, these should usually be happening within the next 24 hours, so usually “later today” or “tomorrow morning”.

You still need to define a goal, and you will want to write down any outcomes in some form, even if less formal than meeting notes. Even a Slack message with some findings in a relevant channel counts.

April 19, 2020

Derek Jones (derek-jones)

Predicting the future with data+logistic regression April 19, 2020 10:35 PM

Predicting the peak of data fitted by a logistic equation is attracting a lot of attention at the moment. Let’s see how well we can predict the final size of a software system, in lines of code, using logistic regression (code+data).

First up is the size of the GNU C library. This is not really a good test, since the peak (or rather a peak) has been reached.

Growth of glibc, in lines,, with logistic regression fit

We need a system that has not yet reached an easily recognizable peak. The Linux kernel has been under development for many years, and lots of LOC counts are available. The plot below shows a logistic equation fitted to the kernel data, assuming that the only available data was up to day: 2,900, 3,650, 4,200, and 5,000+. Can you tell which fitted line corresponds to which number of days?

Number lines in Linux kernel, on days since release1, and four fitted logistic regression models.

The underlying ‘problem’ is that we are telling the fitting software to fit a particular equation; the software does what it has been told to do, and fits a logistic equation (in this case).

A cubic polynomial is also a great fit to the existing kernel data (red line to the left of the blue line), and this fitted equation can be extended into future (to the right of the blue line); dotted lines are 95% confidence bounds. Do any readers believe the future size of the Linux kernel predicted by this cubic model?

Number of distinct silhouettes for a function containing four statements

Predicting the future requires lots of data on the underlying processes that drive events. Modeling events is an iterative process. Build a model, check against reality, adjust model, rinse and repeat.

If the COVID-19 experience trains people to be suspicious of future predictions made by models, it will have done something positive.

asrpo (asrp)

Flpc is now self-hosted April 19, 2020 03:27 PM

The Forth Lisp Python Continuum (Flpc) can now compile its own source code! Get it from Github.

So instead of

$ python file1.flpc file2.flpc file3.flpc > output.f

you can now run

$ ./flpc precompiled/compiler.f
> push: output.f init_g
> push: file1.flpc compile_file
> push: file2.flpc compile_file
> push: file3.flpc compile_file

Henry Robinson (henryr)

Gray Failures April 19, 2020 05:04 AM

Huang et. al., HotOS 2017 “Gray Failure: The Achilles Heel of Cloud-Scale Systems”

Detecting faults in a large system is a surprisingly hard problem. First you have to decide what kind of thing you want to measure, or ‘observe’. Then you have to decide what pattern in that observation constitutes a sufficiently worrying situation (or ‘failure’) to require mitigation. Then you have to decide how to mitigate it!

Complicating this already difficult issue is the fact that the health of your system is in part a matter of perspective. Your service might be working wonderfully from inside your datacenter, where your probes are run, but all of that means nothing to your users who have been trying to get their RPCs through an overwhelmed firewall for the last hour.

That gap, between what your failure detectors observe, and what clients observe, is the subject of this paper on ‘Gray Failures’, which are the failure modes that happen when clients perceive an issue that is not yet detected by your internal systems. This is a good name for an old phenomenon (every failure detector I have built includes client-side mitigations to work around this exact issue).

April 18, 2020

Chris Double (doublec)

Fun Factor Libraries April 18, 2020 11:00 AM

Factor is a programming language I've written about before and in the early days of Factor development I wrote a number of libraries and contributed to development. It's been a while since I've contributed but I still use Factor. The development environment has a very Smalltalk-like feel to it and it includes full documentation and browseable source code of libraries.

This post isn't about Factor the language, but is about some of the neat fun libraries that people have written that shows off the graphical development system a bit.


The first example is an implementation of the game Minesweeper in Factor. A blog post by the author explains the implementation. To run it inside Factor, do the following:

"minesweeper" run

A new window will open showing the game. Help can be shown with:

"minesweeper" help


Another fun example is displaying XKCD comics inside the Factor REPL. The implementation is explained by the author here.

USE: xkcd
...comic displayed here...
...comic displayed here...


If it seems like all the examples I'm using are from the excellent re-factor blog - well, most of them are. This blog post from re-factor shows pulling historical facts from Wikipedia:

USE: wikipedia
USE: calendar

today historical-events.
...a list of historical events from wikipedia for today...

yesterday historical-births.
...a list of historical births from wikipedia for yesterday...

5 weeks ago historical-deaths.
...a list of historical deaths from wikipedia for five weeks ago...

The items in the list are graphical elements that can be manipulated. Left clicking on the coloured words will open a URL in the default web browser. Right clicking allows you to push the element on the Factor stack and manipulate it.

The calendar vocab has a lot of interesting words that allow doing calculations like "5 weeks ago".

Hacker News

There's a hacker-news vocabulary that provides words to list current articles on the Hacker News website. Like the previous Wikipedia example, the graphical elements are clickable objects:

USE: hacker-news

...list of top articles...

...list articles related to showing projects...

CPU 8080 Emulator

A number of years ago I wrote a CPU 8080 emulator in Factor and used this to implement a Space Invaders Emulator and then emulators for a couple of other 8080 arcade games, Balloon Bomber and Lunar Rescue. These examples require the original arcade ROMs and instructions for using them are in the online help:

"" run
...opens in a new window...
"rom.balloon-bomber" run
...opens in a new window...
"rom.lunar-rescue" run
...opens in a new window...

"" help
...displays help...

Gopher Implementation

Another magical implementation from the re-factor blog, a Gopher server and a graphical Gopher Client. This is a video I made of the gopher client on YouTube:

I also did a video that shows some Factor development tools on the running Gopher client to show how everything is live in Factor:

And More

There's much more buried inside Factor. The list of articles and list of vocabularies from the online help is a good way to explore. This help system is also available offline in a Factor install. By default many libraries aren't loaded when Factor starts but you can force loading everything using load-all:

...all vocabs are loaded - prepare to wait for a while...
...saves the image so when factor is restarted the vocabs remain loaded...

The benefit of doing this while developing is all the online help, source and words are available via the inbuilt tools like "apropos", "usage", etc.

April 17, 2020

Gonçalo Valério (dethos)

Django Friday Tips: Feature Flags April 17, 2020 07:49 PM

This time, as you can deduce from the title, I will address the topic of how to use feature flags on Django websites and applications. This is an incredible functionality to have, specially if you need to continuously roll new code to production environments that might not be ready to be released.

But first what are Feature Flags? The Wikipedia tells us this:

A feature toggle (also feature switch, feature flag, …) is a technique in software development that attempts to provide an alternative to maintaining multiple branches in source code (known as feature branches), such that a software feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during runtime.


It seems a pretty clear explanation and it gives us a glimpse of the potential of having this capability in a given project. Exploring the concept a bit more it uncovers a nice set of possibilities and use cases, such as:

  • Canary Releases
  • Instant Rollbacks
  • AB Testing
  • Testing features with production data

To dive further into the concept I recommend starting by reading this article, that gives you a very detailed explanation of the overall idea.

In the rest of the post I will describe how this kind of functionality can easily be included in a standard Django application. Overtime many packages were built to solve this problem however most aren’t maintained anymore, so for this post I picked django-waffle given it’s one of the few that are still in active development.

As an example scenario lets image a company that provides a suite of online office tools and is currently in the process of introducing a new product while redoing the main website’s design. The team wants some trusted users and the developers to have access to the unfinished product in production and a small group of random users to view the new design.

With the above scenario in mind, we start by install the package and adding it to our project by following the instructions present on the official documentation.

Now picking the /products page that is supposed to displays the list of existing products, we can implement it this way:

from django.shortcuts import render

from waffle import flag_is_active

def products(request):
    if flag_is_active(request, "new-design"):
        return render(request, "new-design/product_list.html")
        return render(request, "product_list.html")
# templates/products.html
{% load waffle_tags %}

<!DOCTYPE html>
    <title>Available Products</title>
        <li><a href="/spreadsheets">Spreadsheet</a></li>
        <li><a href="/presentations">Presentation</a></li>
        <li><a href="/chat">Chat</a></li>
        <li><a href="/emails">Marketing emails</a></li>
        {% flag "document-manager" %}
            <li><a href="/documents"></a>Document manager</li>
        {% endflag %}

You can see above that 2 conditions are checked while processing a given request. These conditions are the flags, which are models on the database with certain criteria that will be evaluated against the provided request in order to determine if they are active or not.

Now on the database we can config the behavior of this code by editing the flag objects. Here are the two objects that I created (retrieved using the dumpdata command):

    "model": "waffle.flag",
    "pk": 1,
    "fields": {
      "name": "new-design",
      "everyone": null,
      "percent": "2.0",
      "testing": false,
      "superusers": false,
      "staff": false,
      "authenticated": false,
      "languages": "",
      "rollout": false,
      "note": "",
      "created": "2020-04-17T18:41:31Z",
      "modified": "2020-04-17T18:51:10.383Z",
      "groups": [],
      "users": []
    "model": "waffle.flag",
    "pk": 2,
    "fields": {
      "name": "document-manager",
      "everyone": null,
      "percent": null,
      "testing": false,
      "superusers": true,
      "staff": false,
      "authenticated": false,
      "languages": "",
      "rollout": false,
      "note": "",
      "created": "2020-04-17T18:43:27Z",
      "modified": "2020-04-17T19:02:31.762Z",
      "groups": [
        1,  # Dev Team
        2   # Beta Customers
      "users": []

So in this case new-design is available to 2% of the users and document-manager only for the Dev Team and Beta Customers user groups.

And for today this is it.

April 16, 2020

Jeremy Morgan (JeremyMorgan)

I Took a COBOL Course and I Liked It April 16, 2020 02:02 AM

COBOL is in the news again. Millions of people are filing unemployment claims nearly all at once, and the systems to process them are failing. Why? They need to scale to unprecedented levels, they’re written in COBOL, and… we don’t have enough COBOL programmers. Here’s a look at the increase in searches for “COBOL programmers”: Most COBOL programmers are retired. The pipeline of new COBOL programmers is nearly nonexistent. Many are coming out of retirement just to help.

Pete Corey (petecorey)

Clapping Music with TidalCycles April 16, 2020 12:00 AM

I’m always looking for ways of making music with my computer that feels natural to me and fits with my mental model of music creation. As part of that search, I decided to try learning TidalCycles. Tidal is a language embedded into Haskell and designed to write and manipulate patterns to create music.

After getting set up and tinkering for a bit, I decided to try using Tidal to create an intentional piece of music. Tidal’s focus on patterns made me think of another musician obsessed with patterns and repetition, Steve Reich. That connection planted a seed in my mind, and I decided to try recreating Steve Reich’s Clapping Music with Tidal:

The meat of my implementation is in these three statements that describe the shifting rhythm:

repeatCycles 4
$ iter "12"
$ n "0 0 0 ~ 0 0 ~ 0 ~ 0 0 ~"

We start with our base pattern (n "0 0 0 ~ 0 0 ~ 0 ~ 0 0 ~"). Next, we use iter to split that pattern into the twelve variations we’ll play throughout the piece. However, we want to repeat each variation for some number of cycles before moving onto the next. It turns out that repeatCycles is a nice way of accomplishing this. We repeat each variation for four cycles before moving onto the next.

Because we’re repeating each variation of our pattern for four cycles, we’ll want to use seqP to sequence 13 * 4 cycles of both rhythms to start and end our piece in unison:

seqP [
    13 * 4,
    13 * 4,

When using seqP, it’s important to resetCycles to make sure we start on cycle 0.

And that’s all there is to it! It’s simple in hindsight, but I spent quite a while figuring this out, and learned from many mistakes along the way. The end result was worth it. Rock on, Steve Reich.

April 15, 2020

Jan van den Berg (j11g)

Ten pieces of software that removed roadblocks April 15, 2020 02:40 PM

Successful software is not defined by the number of lines of code or number of clever algorithms. More often than not, successful software is defined by how many roadblocks it removes for the user.

Sounds obvious, right? But it usually takes a few iterations before software gains critical mass. And for a (critical) mass number of users, you need to remove roadblocks. Roadblocks that power-users or early adopters don’t mind dealing with, but for regular users make all the difference.

Here are some examples of software that were not always the first, but did remove the right roadblocks and cleared the road for the masses.

Netscape (Mosaic)

Netscape is probably the most classic example of this. You already had the internet and the World Wide Web. And you had Gopher, FTP and SMTP and the likes. But critical mass? You needed something much simpler! Something that didn’t require typing in difficult commands after connecting to some remote server. But a graphical user interface where you could just point and click*. That’s what really brought the masses to the World Wide Web.

(*You could argue that Windows 95 did exactly the same, eleven years after the Mac did it).


Remember when you had to download specific video codecs for your media player? I do and trust me you don’t want to do that. VLC was like a breath of fresh air because it took care of all that stuff.

VLC was not the first (or last) desktop video player. But it was the first that bundled all codecs and made sure you could pretty much throw every imaginable video format at it, and it would just play it! It removed that roadblock.


Remember emailing videos? Sure that might work. But how can you be sure the receiver has the right codec (see above)? Or that the receiving email provider won’t mark the video as spam or too big for email? YouTube completely removed all barriers for uploading, sharing and viewing videos online in one go. Just from the browser and without a subscription. A lot of roadblocks: gone.


CDs were already a thing of the past. But downloading, paying for and managing individual songs was still a lot of work. Spotify managed to figure this one out, and it turned out this is actually what a lot of people wanted. Every song available, at all times for a fixed fee? Talk about removing roadblocks.


WhatsApp was not the first or only IM/chat software, not even by a long shot. So why did it succeed (in most parts of the world) as the number one smartphone chat app? Because they removed multiple roadblocks.

Early on WhatsApp put a lot of time and effort in making sure their software worked on any kind of cellphone, and specifically older, less powerful phones. Remember they offered a Java ME version? Because they understood chat is not a one-way street. It only works when everyone involved has the same access. Founder Jan Koum learned this from personal experience when trying to chat with family on the other side of the world on shabby internet connections.

And he and co-founder Brian Acton even carried around old phones for a long time. For this exact reason.


I never had a need for Slack (I’ve been using IRC for over 20 years), but I can clearly see what they did: they removed roadblocks.

While still offering pretty much the same core functionality as IRC offers: persistent group chat (emphasis on persistent). But: without the need of choosing servers, or setting up proxies or using intimidating software and all that other difficult stuff. They took care of all that. Oh, and you can share animated gifs.


The iPhone is an amalgamation of hard- and software. But it probably belongs on this list, for all the same reasons. It was not the first smartphone, but it was the first that did everything right and it didn’t feel second grade (hardware and software wise). Before the iPhone there where many different smartphones in every shape and form, after the iPhone every smartphone looked like the iPhone. That should tell you something.


I have personally never used Zoom, and from what I learned I probably won’t any time soon. But I can clearly see what’s happening here. All the (dirty) tricks they did with the installer and audio-stack: it is all about removing roadblocks. You can (and should be) critical of these kinds of tricks, but you can’t deny it made them the current go to app for video group chat, leaving Skype and the likes in the dust.

(I also think they have the best/easiest to remember name. That probably also helps. I could see it becoming a verb.)

C programming language

I maybe going out on a limb here, but I think C’s portability is undeniably a large factor in the succes of C (among other things). Because C was highly portable, it removed many roadblocks for the years ahead where many different hardware platforms all needed a higher level language but did not want to reinvent the wheel. C removed that roadblock and subsequently became a dominant language.


Entering dodgy terrain here. Not actual software, but a license. There are *many* licenses out there. But GPL was one of the first that removed many important roadblocks, about how to share and and distribute software that paved the way for a whole lot of other things. And caused an explosion of software in the 80s and 90s (GCC, GNU/Linux et al.)


These are just some examples but I always like to hear others! What software do you think removed a bunch of roadblocks to pave the way for mass adoption?

The post Ten pieces of software that removed roadblocks appeared first on Jan van den Berg.

April 14, 2020

Ponylang (SeanTAllen)

Last Week in Pony - April 14, 2020 April 14, 2020 09:38 PM

We return after a long absence with some very sad news.

Frederic Cambus (fcambus)

Chinese BBSes and Unicode ANSi Art April 14, 2020 08:50 PM

After doing my serie on Taiwanese BBSes (first part, second part), I also took some screenshots from two Chinese BBS systems, but only found those files again recently.

Those screens were captured in March 2013 and cover Lilac and NewSMTH systems. While I could not find much English information about Lilac, which seems to be located in Hong Kong, there is a Wikipedia page about SMTH which appears to have had a complicated history.

Lilac Login Screen:


Lilac Main Menu Screens:



Lilac Goodbye Screens:




NewSMTH Welcome Screens:



NewSMTH Login Screen:


NewSMTH Main Menu Screens:







NewSMTH Goodbye Screens:


April 13, 2020

Jan van den Berg (j11g)

String Theory – David Foster Wallace April 13, 2020 08:56 PM

If you read this blog, you know DFW is one of my favorite writers. I even named my book app, in part, after him. So I could be short about String Theory — it’s a absolute pure delight to read — but, of course, I won’t.

String Theory – David Foster Wallace (2016) – 150 pages

String Theory is a collection of 5 DFW essays about tennis. It mostly covers 90s era tennis — Sampras and Agassi — but it closes with 2006 Federer. With DFW’s untimely death in 2008 I find it rather pleasing that by attending the 2006 Wimbledon final, Wallace got to witness, and write about the phenomenon that Federer is. And writing this in 2020, it is even more remarkable that Federer is still playing and competing with the best. Think about that for a second will you.

That said, his piece on Federer is not the best in this collection. But with Wallace that doesn’t mean it’s bad, because for any other writer such an essay would still be the summit of their writing career.

Though it seems with Federer that Wallace was, understandably, genuinely awestruck and smitten in such a way that he finds it hard to describe what makes Federer so special. And that probably says more about Federer’s remarkable talent than it does about Wallace’s.

But it is not just that what sets this essay apart from the others for me, but it is that there is less of Wallace himself in this specific piece. His surprised, bemused and bewildered observations of sometimes unrelated random events or encounters, sprinkled trough his essays, either in footnotes or the main body, are what make his writing so enjoyable. You can find this in most essays, but just a little bit less in the Federer one.

Take his complete letdown by the bland biography of famous tennis player Tracy Austin. I find it hilarious because it bothers him so much. Even though that (hilarity) was not the goal.
Because, mind you: in the end, even from such a dull an uninspiring sport biography, Wallace manages to ask valid questions about genius and talent and let’s you know the premise was not to be agitated and write amusingly about that, but to ask questions.

The essay about Michael Joyce might as well be the greatest thing ever written about tennis (or dare I say, sports in general?). It’s a complex and nuanced, highly technical, hyper personal but still general analysis of what constitutes greatness. He makes you see things with different eyes, while he is learning to see it for himself. Just amazing.

The lack of this personal observations with the Federer essay are a breeding ground for questions. Was this deliberate? Does this mean he was bored with this style? Was it a style? Questions you can endlessly debate.

Fact is never has their been a greater collection of stories about the game of tennis than what you’ll find in String Theory.

The post String Theory – David Foster Wallace appeared first on Jan van den Berg.

April 12, 2020

Derek Jones (derek-jones)

Motzkin paths and source code silhouettes April 12, 2020 10:25 PM

Consider a language that just contains assignments and if-statements (no else arm). Nesting level could be used to visualize programs written in such a language; an if represented by an Up step, an assignment by a Level step, and the if-terminator (e.g., the } token) by a Down step. Silhouettes for the nine possible four line programs are shown in the figure below (image courtesy of Wikipedia). I use the term silhouette because the obvious terms (e.g., path and trace) have other common usage meanings.

Number of distinct silhouettes for a function containing four statements

How many silhouettes are possible, for a function containing n statements? Motzkin numbers provide the answer; the number of silhouettes for functions containing from zero to 20 statements is: 1, 1, 2, 4, 9, 21, 51, 127, 323, 835, 2188, 5798, 15511, 41835, 113634, 310572, 853467, 2356779, 6536382, 18199284, 50852019. The recurrence relation for Motzkin numbers is (where n is the total number of steps, i.e., statements):

(n+2)m_n = (2n+1)m_{n-1}+3(n-1)m_{n-2}

Human written code contains recurring patterns; the probability of encountering an if-statement, when reading code, is around 17% (at least for the C source of some desktop applications). What does an upward probability of 17% do to the Motzkin recurrence relation? For many years I have been keeping my eyes open for possible answers (solving the number theory involved is well above my mathematics pay grade). A few days ago I discovered weighted Motzkin paths.

A weighted Motzkin path is one where the Up, Level and Down steps each have distinct weights. The recurrence relationship for weighted Motzkin paths is expressed in terms of number of colored steps, where: l is the number of possible colors for the Level steps, and d is the number of possible colors for Down steps; Up steps are assumed to have a single color:

(n+2)m_n = d(2n+1)m_{n-1}+(4c-d^2)(n-1)m_{n-2}

setting: c=1 and d=1 (i.e., all kinds of step have one color) recovers the original relation.

The different colored Level steps might be interpreted as different kinds of non-nesting statement sequences, and the different colored Down steps might be interpreted as different ways of decreasing nesting by one (e.g., a goto statement).

The connection between weighted Motzkin paths and probability is that the colors can be treated as weights that add up to 1. Searching on “weighted Motzkin” returns the kind of information I had been looking for; it seems that researchers in other fields had already discovered weighted Motzkin paths, and produced some interesting results.

If an automatic source code generator outputs the start of an if statement (i.e., an Up step) with probability a, an assignment (i.e., a Level step) with probability b, and terminates the if (i.e., a Down step) with probability c, what is the probability that the function will contain at least n-1 statements? The answer, courtesy of applying Motzkin paths in research into clone cell distributions, is:

P_n=sum{i=0}{delim{[}{(n-2)/2}{]}}(matrix{2}{1}{{n-2} {2i}})C_{2i}a^{i}b^{n-2-2i}c^{i+1}

where: C_{2i} is the 2i‘th Catalan number, and that [...] is a truncation; code for an implementation in R.

In human written code we know that a != c, because the number of statements in a compound-statement roughly has an exponential distribution (at least in C).

There has been some work looking at the number of peaks in a Motzkin path, with one formula for the total number of peaks in all Motzkin paths of length n. A formula for the number of paths of length n, having k peaks, would be interesting.

Motzkin numbers have been extended to what is called higher-rank, where Up steps and Level steps can be greater than one. There are statements that can reduce nesting level by more than one, e.g., breaking out of loops, but no constructs increase nesting by more than one (that I can think of). Perhaps the rather complicated relationship can be adapted to greater Down steps.

Other kinds of statements can increase nesting level, e.g., for-statements and while-statements. I have not yet spotted any papers dealing with the case where an Up step eventually has a corresponding Down step at the appropriate nesting level (needed to handle different kinds of nest-creating constructs). Pointers welcome. A related problem is handling if-statements containing else arms, here there is an associated increase in nesting.

What characteristics does human written code have that results in it having particular kinds of silhouettes? I have been thinking about this for a while, but have no good answers.

If you spot any Motzkin related papers that you think could be applied to source code analysis, please let me know.

April 11, 2020

Pierre Chapuis (catwell)

[Quora] Explaining classes to a 10 year old April 11, 2020 01:00 PM

Continuing my Quora answers series with this question which I answered on June 26, 2012:

How would you explain the concept of a "class" in Python to a 10 year old?

You cannot distinguish the concept of "class" from the concept of "instance". I think OOP is often taught the wrong way around, paradoxically because it is taught with languages that have support for class-based Object Oriented Programming such as Python (or Java for that matter). So excuse me but I will use another language (Lua, but you don't need to know to understand) to explain it. (Note: this will be bad Lua on purpose, but the point is not to teach Lua, it is to explain OOP).

Since you wrote "to a 10 year old" let's proceed with examples.

Imagine you come from another planet and you do not know what cats are. You encounter something small that purrs. You decide to name it Sam. Later on, you see something else very similar, except it is bigger, and you name it Max.

Let us describe Sam and Max in Lua.

sam = {
    name = "Sam",
    size = "small",

max = {
    name = "Max",
    size = "big",

Now we said that they purr, so let's define purring:

purr = function(self)
    print( .. " purrs!")

purr is a function that has its first argument called self and represents the thing that purrs. For instance you could make Max purr like this:


Now you could stop here, or you could see purring as a property of Sam and Max. To represent that we could make purr a "method" of the "objects" Sam and Max:

sam.purr = purr
max.purr = purr

Now with the Lua syntax we cound also make Max purr like this:


Which is a short way to write this:


Since we said that max.purr = purr it works as expected.

After some time on Earth you realize there are lots of things like Sam and Max. Moreover there are lots of things Sam, Max and their friends do, such as sleep on keyboards. They also have things in common such as the fact they have two eyes.

You grow tired to say: "Max purrs. Max has two eyes. (...). Sam purrs. (...)". It would be much simpler to say give a name to the set of Sam, Max and their friends (for instance "Cats") and say "Cats purr. Cats have two eyes.".

Note that "Cats" is nothing physical, it is an idea, a category you have created for Sam, Max and their friends in order to be able to express things about them in an easier way.

Let's switch to Python to show how this is done now:

class Cat:
    eyes = 2

    def __init__(self, name, size): = name
        self.size = size

    def purr(self):
        print( + " purrs!")

And how you use it:

sam = Cat(name="Sam", size="small")

Note that we have never said explicitly that Sam can purr. He can purr because he is a Cat.

In this example:

  • Cat is a class, ie. a category of objects ;

  • Sam and Max are instances of the Cat class ;

  • purr is a method of the Cat class, ie. not much more conceptually than a function that takes a Cat instance as its first argument and some syntactic sugar (ie. special notation) to call it.

By the way, this answer got one of the best comments I ever got on a Quora answer:

Sam and Max are not cats!


April 10, 2020

Jan van den Berg (j11g)

The Trial – Franz Kafka April 10, 2020 11:13 AM

Max Brod is probably the worlds’ greatest publicist. He famously refused his writer friends’ dying wish to destroy all his work after his passing.

This friend was of course, Franz Kafka. And against Kafka’s wishes Max Brod did publish his works and subsequently Kafka became known to the world as an absolute literary genius.

The Trial – Franz Kafka (1925) – 286 pages

The Trial

The only other Kafka I read before this was one was The Metamorphosis and liked it a lot. So I hate to admit it: but I was a bit bored reading The Trial.

The Metamorphosis is more concise, and much more over the top. Which absolutely works. The Trial however, is much tamer.

Sure, I can see what’s happening and what Kafka is trying to accomplish. And the ideas and underlying themes he’s playing with. And I really like the dreamlike/nightmarish parallel world Kafka created for the main character. This is of course his well-known watermark: creating these typical Kafkaesque surreal settings. And I thought the doorkeeper story within the story maybe was the most interesting.

Maybe it was my stiff Dutch translation, but overall I had a hard time getting into it.


So mr. blogger, you dare to call Kafka overrated? Not exactly, but I cannot let go of the idea that part of the appeal is Kafka’s elusiveness.

A prolific perfectionist writer who does not want to be published? Who had a very troubled relationship with his abusive father? A writer who died young of malnutrition? A writer who only finds succes after his death? A writer whose books are posthumously (sometimes) scraped together from bits and pieces of scrap paper, so you can forever fawn over the true meaning and interpretation of it?

This is all too much right up the literary world alley.

No doubt The Trial has had great cultural and literary impact. But if you pass this book to someone unbeknownst to all this, I think they might not enjoy it as much as critics tend to think.

Nonetheless these are two pretty good videos explaining what makes Kafka an interesting writer. And I am still interested in his other stories.

The post The Trial – Franz Kafka appeared first on Jan van den Berg.

April 09, 2020

Jan van den Berg (j11g)

Cruddiy: a no-code Bootstrap CRUD generator April 09, 2020 07:18 PM

Sometimes you need to give people access to a MySQL database to do some basic tasks. They should be able to Create, Read, Update or Delete database records. And you probably know the user (preferred use-case), but you don’t want to give them access to phpMyAdmin, which is often too difficult (or let alone give them command line access). But you also don’t want to handcode the same PHP CRUD pages again!

Now you can use Cruddiy (CRUD Do It Yourself). To quickly generate clean CRUD pages with zero code.

You’ve probably seen pages like this a thousand times before. And now you can make them with a few clicks.

Pages like this are generated without writing a single line of code. Proper titles, pagination, actions and sorting included.

I got tired of programming the same pages over and over again for some simple database actions. So in classic yakshaving fashion I decided to automate this, and built a generator that generates PHP CRUD pages.

Cruddiy is a no-code PHP generator that will generate PHP Bootstrap CRUD pages for your MySQL tables.

Cruddiy output is an /app folder with everything you need. You can move this folder anywhere and delete Cruddiy if you like.

Most MVC frameworks (e.g. Symfony, Django, Yii2) are also able to generate CRUD pages for you. I have used all of these. But more often than not you end up with 80MB of code (no joke) with all kinds of dependencies that you need to deploy and maintain for just a couple of PHP pages! This rubs me the wrong way. Of course there are many PHP Crud generators around, but they are not libre or built on top other larger frameworks or lacked something else, that I was looking for. So I first and foremost built Cruddiy for myself.

Goals and characteristics

  • Simple
    • No dependencies, just pure PHP (and Bootstrap from CDN).
    • Written in PHP and output in PHP. So when the generator runs correctly, your generated app will run correctly.
  • Clean
    • Just generate what’s needed, nothing else.
  • Small
    • If it wasn’t obvious from the above, the app it generates should be small. Kilobytes not megabytes.
  • Portable
    • Cruddiy generates everything in one single /app folder. You can move this folder anywhere. You don’t need Cruddiy after generating what you need.
  • Bootstrap
    • Bootstrap looks clean and is relatively simple and small. I use Bootstrap 3 because I like/know it better than 4.


Why PHP?

  • Love it or hate it: but PHP is ubiquitous. You can download Cruddiy on most webservers and you’re good to go. wget the zip -> unpack -> surf to the folder in your browser and follow instructions.
  • Cruddiy is of course a sort of templating engine. It stamps out code based on templates. And if PHP is anything it is in fact by default a template engine itself. So it’s a pretty good language for this kind of thing.

Cruddiy does follow not the MVC paradigm!

Yes, looks elsewhere if you need this. This is not a framework.

Your code is full of dirty hacks

I might do quite a bit of array mangling and string replacement, but the PHP pages Cruddiy generates are as clean as they come. And when you’re done generating pages, you can just delete Cruddiy. It completely builds a self-contained portable app that will run on almost any webserver with PHP (i.e. most).

Your generated code is incomplete

At the moment what’s lacking is error value checking on database inserts/updates (all fields are required and it doesn’t check field types: integers vs dates etc.). These will throw general errors or don’t do anything at all. I will probably improve this, but for most use-cases (see above) this should not be a problem. The generated code does use prepared statements and should not be vulnerable to SQLi. But hey, drop me a line if you find something!

Next features?

I might add these things:

  • Darkmode
  • Bootstrap 4 theme
  • Export to CSV or XLS (users probably want this more often than not)
  • Rearrange column order
  • Search records (at the top of the page)
  • User registration (simple table with username and password: .htaccess works for now)
  • Define table relations (use for cascading deletes etc.)
  • More specific field types (ENUM = drop-down list etc.)
  • More and better input validation
  • Catch more database errors
Cruddiy in action.

The post Cruddiy: a no-code Bootstrap CRUD generator appeared first on Jan van den Berg.

Use find (1) as a quick and dirty duplicate file finder April 09, 2020 01:24 PM

Run the following two commands in bash to get a listing of all duplicate files (from a directory or location). This can help you clean out duplicate files that sometimes accumulate over time.

The first command uses find to print all files (and specific attributes) from a specific location to a file, prefixing the size of the file in the name. This way all files with the same filename and same size can be grouped together. Which is usually a strong indicator that files are similar.

When you run the second command you will get a sorted list of all actual duplicates, grouped together. This way, you can quickly pick out similar files and manually choose which ones to keep or delete.

find . -type f -printf "%s-%f\t %f %c\t %p\n" >> /tmp/findcmd

for i in `sort -n /tmp/findcmd|awk '{print $1}'|uniq -cd|sort -n|awk '{print $2}'`; do grep $i /tmp/findcmd; done

The output will look something like this, you can instantly tell which files are duplicates, based on size, name and/or timestamp.

1067761-P4270521.JPG     P4270521.JPG Wed Apr 27 18:05:04.0000000000 2011        ./Backups Laptops/Ri-janne/2011 Diversen
1067761-P4270521.JPG     P4270521.JPG Wed Apr 27 18:05:04.0000000000 2011        ./Backups Laptops/Ri-janne/2011 camera
1067898-IMG_3418.JPG     IMG_3418.JPG Thu Aug 28 20:08:28.0000000000 2008        ./Piks/2008/Vakantie USA 2008/Dag 7 Louisville Shopping
1067898-IMG_3418.JPG     IMG_3418.JPG Thu Aug 28 19:08:28.0000000000 2008        ./Backups Laptops/Ri-janne/2008 USA
1067969-P9180184.JPG     P9180184.JPG Sat Sep 18 17:45:52.0000000000 2010        ./Backups Laptops/Ri-janne/2010 Diversen
1067969-P9180184.JPG     P9180184.JPG Sat Sep 18 17:45:52.0000000000 2010        ./Backups Laptops/Ri-janne/2010 uitzoeken
1068244-100_2962.jpg     100_2962.jpg Thu Jul 17 18:18:52.0000000000 2008        ./.Trash-1000/files/Mijn afbeeldingen/Italia 09/Greece '08
1068244-100_2962.jpg     100_2962.jpg Thu Jul 17 18:18:52.0000000000 2008        ./Backups Laptops/Jan/Mijn documenten/Mathea/Mijn afbeeldingen/Italia 09/Greece '08
1068284-DSC_7640.JPG     DSC_7640.JPG Sat Apr 26 14:47:58.0000000000 2014        ./Piks/2014/20140426 KDag
1068284-DSC_7640.JPG     DSC_7640.JPG Tue Apr 29 21:56:54.0000000000 2014        ./Piks/2014/20140426 Koningsdag

The post Use find (1) as a quick and dirty duplicate file finder appeared first on Jan van den Berg.

Pierre Chapuis (catwell)

[Quora] What is the call metamethod in Lua? April 09, 2020 08:40 AM

From 2011 to 2014, I used to post answers on Quora. I don't anymore, because I don't really like what the website has become. I have a copy of some of my answers here but someone commented on one of my answers that it should be available more prominently on the Web, so I decided to repost a few of my answers here, starting with this one.

The original question was:

I'm really new to lua and relatively new to programming.,so kindly excuse if I say something stupid.

I have a table named x and its metatable named y. When I have a __call method defined for the metatable y, then I can call x() but if I have a __call for x then I can not call x().

What is __call used for? How does it work and what are some examples of usage

I answered it on February 25, 2013.

__call is a metamethod, that means it is meant to be defined in a metatable. A __call field added to a regular table (x in your example) does nothing.

The role of __call is to make something that is not a function (usually a table) act like a function. There are a few reasons why you may want to do that. Here are two examples.

The first one is a memoizing factorial function. In Lua you could write a recursive factorial like this:

local function fact(n)
    if n == 0 then
        return 1
        return n * fact(n - 1)

Note: this is not a good way to write a recursive factorial because you are not taking advantage of tail calls, but it's enough for what I want to explain.

Now imagine your code uses that function to calculate the factorials of numbers from 1 to N. This would be very wasteful since you would calculate the factorial of N once, the factorial of N-1 twice, and so on. You would end up computing approximately N²/2 factorials.

Instead you could write that:

local fact
fact = setmetatable(
    {[0] = 1},
        __call = function(t, n)
            if not t[n] then
                t[n] = n * fact(n - 1)
            return t[n]

It is an implementation of factorial that memoizes the results it has already computed, which you can call like a function. You can use it exactly like the previous implementation of factorial and get linear complexity.

Another use case for __call is matrices. Imagine you have a matrix implementation that works like that:

local methods = {
    get = function(self, i, j)
        return self[i + 1][j + 1]

local mt = {__index = methods}

local new_matrix = function(t)
    return setmetatable(t, mt)

You can use it like that:

local M = new_matrix({ {1, 2}, {3, 4} })
local v = M:get(0, 1)
assert(v == 2)

However scientists would probably expect something like this:

local v = M(0, 1)
assert(v == 2)

You can achieve that thanks to __call:

local mt = {
    __index = methods,
    __call = function(self, i, j)
        return self:get(i, j)
I hope this gives you enough information to understand how you can use __call. A word of warning though: like most other metamethods, it is useful but it is important not to abuse it. Simple code is better :)

April 07, 2020

Pierre Chapuis (catwell)

Changing the SSH port on Arch Linux April 07, 2020 10:40 AM

April 2020 update

This article is out of date. Arch Linux stopped shipping OpenSSH with socket activation due to the risk of DoS attack. Now you can just set Port in sshd_config as usual.

Original article

I often change the default SSH port from 22 to something else on servers I run. It kind of is a dangerous operation, especially when the only way you have to connect to that server is SSH.

The historical way to do this is editing sshd_config and setting the Port variable, but with recent versions of Arch Linux and the default configuration, this will not work.

The reason is that SSH is configured with systemd socket activation. So what you need to do is run sudo systemctl edit sshd.socket and set the contents of the file to:


where MY_PORT is the port number you want.

I hope this short post will avoid trouble for other people, at least it will be a reminder for me the next time I have to setup an Arch server...

Henry Robinson (henryr)

Availability in AWS' Physalia April 07, 2020 05:04 AM

Brooker et. al., NSDI 2020 “Physalia: Millions of Tiny Databases”

Some notes on AWS’ latest systems publication, which continues and expands their thinking about reducing the effect of failures in very large distributed systems (see shuffle sharding as an earlier and complementary technique for the same kind of problem).

Physalia is a configuration store for AWS’ Elastic-Block Storage (i.e. network-attached disks). EBS disks are replicated using chain replication, but the configuration of the replication chain needs to be stored somewhere - enter Physalia.

April 05, 2020

Derek Jones (derek-jones)

Comments on the COVID-19 model source code from Imperial April 05, 2020 08:50 PM

At the end of March a paper modelling the impact of various scenarios on the spread of COVID-19 infections, by the MRC Centre for Global Infectious Disease Analysis at Imperial College, appears to have influenced the policy of the powers that be. This group recently started publishing their modelling code on Github (good for them).

Most of my professional life has been spent analyzing other peoples’ code, for one reason or another (mostly Fortran, then Pascal, and then C). I had heard that the Imperial software was written in C, but the released code is written in R (as of six hours ago there is the start of a Python version). Ok, I can work with R, but my comments will be general, since I don’t have lots of in depth experience reading R code.

The code comes from a research context, and is evolving, i.e., some amount of messiness is to be expected.

There is not a lot of code to talk about (248 lines setting things up, 111 lines for a Stan model, 371 lines of plotting code, and 85 lines of utility code). The analysis is performed by creating a model using the Stan statistical inference language (in which the high level structure of the problem is specified, compiled to a lower level form and then run; the Stan language is very similar to R). These days lots of problems are coded using a relatively small number of lines that call fancy libraries to do the heavy lifting. It is becoming rare to have to write tens of thousands of lines of code to solve a problem.

I have two points to make about the code, all designed to reduce the likelihood of mistakes being made by the person working on the source. These points mainly apply to the Stan code, because that is where the important stuff happens, but are equally applicable to all code.

  • Numeric literals are embedded in the code, values include: 2.4, 1.0, 0.5, 0.03, 1e-5, and 1e-9. These values obviously mean something to the person who wrote the code, and they can probably be interpreted by experts in the spread of virus infections. But why are they scattered about the code, rather than appearing together (as a sequence of assignments to variables with meaningful names)? Having all the constants in one place makes it easier to spot when a mistake has been made, e.g., one value has been changed without a corresponding change in another value; it also makes it easier for people new to the code to figure out what is going on,
  • when commenting out code, make it very obvious, e.g., have /********************** on its own line, and *****************************/ on its own line. Using just /* and */ makes it easy to miss that code has been commented out.

Why have they started a Python implementation? Perhaps somebody on the team is more comfortable working with Python (when deadlines loom, it is always best to go with what you know).

Having both an R and Python version is good, in that coding mistakes are likely to show up as inconsistencies in the results produced. It’s always good to have the output of two independently written programs to compare (apart from the fact it may cost twice as much).

The README mentions performance issues. I imagine that most of the execution time is spent in the Stan code, so R vs. Python is not a performance issue.

Any reader with expertise tuning Stan models for performance might like to check out the code. I’m sure the Imperial folk would be happy to hear about worthwhile speed-ups.


The R source code of the EuroMOMO model, which aims to “… explain number of deaths by a baseline, influenza activity and extreme ambient temperature.”

Bogdan Popa (bogdan)

Converting byte arrays to UUIDs in Postgres April 05, 2020 04:00 PM

For a project that I’m working on, I have a custom flake id spec that allows me to generate unique, sortable identifiers across computers without any sort of synchronization. The ids themselves can be encoded down to 16 bytes and I wanted to store them in Postgres. A good way to do that is to leverage Postgres’ UUID data type, which lets you efficiently store any 16 byte quantity in a way that can be indexed reasonably well.

Richard Kallos (rkallos)

What my choices told me about my priorities April 05, 2020 02:57 AM

This was a useful exercise for me. I think I’ll try and check in again sometime soon. Here were my thoughts from today.

I was prioritizing long-term storage

  • When I started using plain text files to store my notes as Org mode files (which I am trying to replace)
  • When I started using TiddlyWiki, which can be opened/reused wherever I have access to a browser that executes Javascript. It doesn’t necessarily have the longevity of plain text, but it’s close.

I was prioritizing (re-)reading and reflection

  • When I stopped transcribing my notes into Org mode file, where I would never happen across them unless I specifically went looking for them.
  • When I started transcribing my handwritten notes into TiddlyWiki, breaking my notes into smaller, more reusable chunks (tiddlers), organizing them all, and making small changes to TiddlyWiki’s interface to encourage exploration.

I am prioritizing focus

  • When I step away from the computer to think with pen and paper.
  • When I write notes by hand while reading papers or watching talks.
  • When I close Slack after realizing I’ve wasted too much time clicking on unread channels and not enough time working.

April 03, 2020

Pierre Chapuis (catwell)

Two-factor authentication with pass and oathtool April 03, 2020 04:20 PM

If you're like me, you don't want to depend on your phone to log into a website, and you wish your favorite password manager would support 2FA. Well, it can.

When asked to setup 2FA on a website, get a text code. If the website doesn't give you that option, just use zbar. For instance, with the QR code from the GitHub documentation:

$ zbarimg totp-click-enter-code.png
scanned 1 barcode symbols from 1 images in 0.03 seconds

Once you get the secret, put the command line to generate a code using oathtool in 2fa/github in pass like this:

oathtool --totp --base32 qmli3dwqm53vl7fy

Finally, add this to your .bashrc (or the equivalent for whatever shell you use):

2fa () { eval $(pass 2fa/$1) ; }

You can now get your 2FA codes like this:

$ 2fa github

All the tools used in that article are available as packages in the Arch Linux repositories.


It was a post on Lobsters that prompted me to post this. Someone from the comments and a former colleague on Twitter told me about a pass extension I didn't know about which does almost the same thing.

Also, some people think that putting 2FA codes in a password manager defeats the purpose. But in practice TOTP 2FA does not really add much more to the security of my accounts than the strong random passwords I generate with pass. The "second factor" part isn't really the true benefit.

One actual advantage is that nobody on the network can sniff all of my credentials (like digest-based password verification methods). Another, and I think this is the main one, is that the owner of the website has chosen part of the credentials and hence ensured some degree of strength. What I do preserves both of those properties, so I'm fine with it. By the way, note that password managers like 1Password do the same thing.

The one thing I could do to really improve the security of the whole thing is use 2FA to access pass by storing my GPG key in a Yubikey. I probably will, someday.

Pete Corey (petecorey)

Wolfram Style Cellular Automata with Vim Macros April 03, 2020 12:00 AM

It struck me while taking a break from a heavy refactoring session and browsing /r/cellular_automata that Wolfram-style cellular automata are really just glorified string substitutions. Suddenly I felt a tickling deep in my hindbrain. An idea was forming. Can you implement a Wolfram-style cellular automata, like Rule-30, entirely within Vim using only “normal” features like macros and substitutions?

In an effort to rid myself of the crushing weight of that question (and because it was Friday), I descended into the depths and came back with an answer. Yes, you can.

What is Rule 30?

Before we stick our hands into a big pile of Vim, let’s take a look at what we’re trying to accomplish. Wolfram-style cellular automata are a family of one-dimensional cellular automata where a cell’s state in the next generation is determined by its current state and the state of its immediate neighbors.

Rule 30, a specific type of Wolfram-style cellular automata, derives it’s rules from the binary representation of the number thirty, 00011110:

As you can hopefully infer from the diagram above, an “on” cell with two “on” neighbors will turn off in the next generation. Similarly, an “off” cell with an “on” leftmost neighbor will turn on, and so on.

When computing the next generation, every cell with two neighbors is considered before any modifications are made. Let’s solidify our understanding by considering how the next generation of a longer initial sequence would be computed:


Looking at each cell and its imediate neighbors, we can use the rules described above to come up with the next generation of our sequence. The first three cells are □□■, which map to in the next generation. The next three cells, □■□, map to . And finally, the last three cells, ■□□, map to .

This means the next generation of our initial sequence of cells (□□■□□) is ■■■. Notice that the next generation of our automata is two cells shorter than the previous generation. A simple way of avoiding this problem is to prepend and append “off” cells to the initial generation before computing the subsequent generation. If we had done that, we’d have ended up with □■■■□ as our next generation, rather than ■■■.

We can repeatedly apply this operation on successive generations to create many interesting patterns, balancing on the edge of chaos and order.

The Seeds of a Plan

Coming into this project, I had a high-level plan of attack for accomplishing what I was after. I knew I wouldn’t be able to process every iteration of the cellular automata in one pass. I’d have to break the process down into pieces.

My main strategy was to break each generation, represented as a line of on/off characters, into a paragraph of three-character lines, one for each grouping of neighbors. From there I could encode the cellular automata’s rules into substitution commands, and roll the resulting paragraph of one-character lines back up into a single line.

This current-line substitution was the foundation of my solution:


The idea behind this substitution is to use capture groups to replace a line with everything up until the last character of the line (\1\3), followed by a newline (\r), and the last three characters of the original line (\2). As an aside, I learned that you should use \r instead of \n when using newlines in substitutions.

Executed on a line with the text, □□■□□, we’d get the following result:


Repeatedly applying this substitution would give us the following paragraph of lines:


From here, we can translate our eight Rule 30 rules into global substitutions:


Running those substitutions on our paragraph gives us a new paragraph made up of single character lines. Each of these characters is a cell in our next generation:


Next we just need to roll that paragraph back up into a single line. J is the tool for joining lines in Vim, but J insists on separating newly concatenated lines with a space. It turns out you can join lines directly with the gJ command:


And just like that we have our next generation.

Implementing Our Plan

The outline of our plan includes lots of hand waving, like “repeatedly applying this substitution,” which just won’t cut it for a fully automated cellular automata generation machine.

We need a way of turning our very hands-on algorithm into a hands-off, fully automated solution. An elegant way of doing this is using Vim’s built-in macro system. Vim macros are ridiculously powerful while being ridiculously simple in concept. They’re just stored keystrokes. This means there’s nothing stopping a macro (stored in, say, the q register) from invoking itself recursively (@q). This opens the door for quite a few interesting possibilities.

In the hope of building a fully automated solution, let’s implement our cellular automata generator as a script of keystrokes that can be executed in “batch mode” with the -s flag. Once finished, we’ll be able to invoke our generator on a file containing a seed, or an initial sequence of cells, with the following command:

vim -s script seed

We’ll start off our script by using setreg (instead of let) to define a macro b that will perform our first set of substitutions and build our paragraph of three-character lines:

:call setreg('b', ':.s/^\(..*\)\(\(..\).\)$/\1\3\r\2/|norm!``@b', 'l')

The |norm!`` at the end of our substitution returns the cursor to it’s starting place after each replacement.

Next, let’s define a macro to perform each of our Rule 30 replacements. I built these as eight separate macros, rather than one macro that performed eight substitutions because any of these substitutions might not find what they’re looking for, which causes it’s active macro execution to fail. Breaking each substitution out into it’s own macro prevents its potential failure from affecting other rules:

:call setreg('1', ':%s/^■■■$/□/g', 'l')
:call setreg('2', ':%s/^■■□$/□/g', 'l')
:call setreg('3', ':%s/^■□■$/□/g', 'l')
:call setreg('4', ':%s/^■□□$/■/g', 'l')
:call setreg('5', ':%s/^□■■$/■/g', 'l')
:call setreg('6', ':%s/^□■□$/■/g', 'l')
:call setreg('7', ':%s/^□□■$/■/g', 'l')
:call setreg('8', ':%s/^□□□$/□/g', 'l')

Let’s write one more macro, c, to roll our paragraph of Rule 30 replacements back up into a single line:

:call setreg('c', '''aV}gJ', 'l')

Now we’ll tie everything together with a macro, a, that copies the current line and pastes it below, performs our recursive line destructing, carries out our Rule 30 replacements, and then rolls everything back together, producing the next generation of our cells right below the previous generation:

:call setreg('a', 'yypma:.s/^.*$/ \0 /|norm!``@b@1@2@3@4@5@6@7@8@c', 'l')

We can produce the next two generations of a given starting generation by calling a two times:

:norm 2@a

If we start with a bigger starting generation (fifty “off” cells on either side of a single “on” cell, in this case), we could go even further!

Fifty generations of Rule 30 generated entirely within Vim.

Final Thoughts

And with that, we can finally answer the question that has plagued us for so long. You can implement a Wolfram-style cellular automata, like Rule-30, entirely within Vim using only “normal” features like macros and substitutions. Rest easy, my tired brain.

Altogether our script looks like this, and produces fifty generations of a given seed:

:call setreg('a', 'yypma:.s/^.*$/□\0□/|norm!``@b@1@2@3@4@5@6@7@8@c', 'l')
:call setreg('b', ':.s/^\(..*\)\(\(..\).\)$/\1\3\r\2/|norm!``@b', 'l')
:call setreg('1', ':%s/^■■■$/□/g', 'l')
:call setreg('2', ':%s/^■■□$/□/g', 'l')
:call setreg('3', ':%s/^■□■$/□/g', 'l')
:call setreg('4', ':%s/^■□□$/■/g', 'l')
:call setreg('5', ':%s/^□■■$/■/g', 'l')
:call setreg('6', ':%s/^□■□$/■/g', 'l')
:call setreg('7', ':%s/^□□■$/■/g', 'l')
:call setreg('8', ':%s/^□□□$/□/g', 'l')
:call setreg('c', '''aV}gJ', 'l')
:norm 50@a

I’m no Vim expert by any stretch of the imagination, so if you think you can improve on this solution, either in terms of terseness or clarity, while staying in the bounds of only using normal Vim features (no Vimscript), please let me know!

As a closing remark, I want to be clear and say that this project served absolutely no purpose, and this is a terribly inefficient way of propagating cellular automata. That said, learned quite a lot about my text editor of choice, which I consider to be invaluable.

April 02, 2020

Jeff Carpenter (jeffcarp)

Vanguard Funds vs ETFs April 02, 2020 08:23 PM

After researching Vanguard funds vs. ETFs I still haven’t found a good resource that lists in detail the benefits and downsides of each. A Vanguard mutual fund is provided and managed by Vanguard. You can only buy Vanguard funds on or over the phone. A Vanguard Exchange Traded Fund is packaged up like a stock and its shares can be traded on any market with any brokerage account. This is my attempt to compile a comprehensive list of tradeoffs.

April 01, 2020

Mark J. Nelson (mjn)

Opening for a funded Masters student April 01, 2020 12:00 PM

I'm recruiting a funded Masters student to study how AI bots play games. The goal is to systematically understand the kinds of difficulty posed by games to AI algorithms, as well as the robustness of any conclusions. Some example experiments include: looking at how performance scales with parameters such as CPU time and problem size; how sensitive results are to rule variations, choice of algorithm parameters, etc.; and identification of games that maximally differentiate algorithm performance. Two previous papers of mine that give some flavor of this kind of research: [1], [2].

The primary desired skill is ability to run computational simulations, and to collect and analyze data from them. The available funding would pay for four semesters of full-ride Masters tuition, plus 15-20 hours/week of a work-study job during the academic year. The American University Game Lab offers three Masters-level degrees: the MS in Computer Science's Game & Computational Media track, the MA in Game Design, and the MFA in Games and Interactive Media.

The successful applicant would be funded on the National Science Foundation grant Characterizing Algorithm-Relative Difficulty of Agent Benchmarks. This does not have any citizenship/nationality requirements.

Anyone interested should both apply for the desired Masters program through the official application linked above (deadline July 1, though earlier is better), and email me to indicate that they would like to be considered for this scholarship. It's also fine to email me with inquiries before applying.

Robin Schroer (sulami)

Restarts in Common Lisp April 01, 2020 12:00 AM

Errata: An earlier version of this post was misrepresenting conditions as exceptions, which has been addressed.

I have been reading Practical Common Lisp by Peter Seibel over the weekend, which is an excellent introduction to Common Lisp, showcasing its power by writing real programs. If you are interested in Lisp or programming languages at all, I recommend at least skimming it, it is free to read online.

Writing a Lisp-descended language professionally, and also living inside Emacs, I had dabbled in Common Lisp before, but I still found something I was not aware of, restarts. I do not think that this is a particularly well known feature outside the Lisp world, so I would like to spread awareness, as I think it is a particularly interesting take on error handling.

The book explains restarts using a mocked parser, which I will slightly modify for my example. Imagine you are writing an interpreter/compiler for a language. On the lowest level you are parsing lines to some internal representation:

(define-condition invalid-line-error (error)
  ((line :initarg :line :reader line)))

(defun parse-line (line)
  (if (valid-line-p line)
      (to-ir line)
    (error 'invalid-line-error :line line)))

We define a condition, which is similar to an exception object with metadata in other languagesA “condition” in Common Lisp, as has been explained to me by Michał “phoe” Herda, is a way of signalling arbitrary events up the stack to allow running of additional code, not just signalling errors. They’re comparable to hooks in Emacs, but dynamically scoped to the current call stack.

, and a function which attempts to parse a single line.This is assuming of course that a line always represents a complete parsable entity, but this is only an example after all.

If it turns out that the line is invalid, it signals a condition up the stack. We attach the line encountered, in case we want to use it for error reporting.

Now imagine your parser is used in two situations: there is a compiler, and a REPL. For the compiler, you would like to abort at the first invalid line you encounter, which is what we are currently set up to do. But for the REPL, you would like to ignore the line and just continue with the next line.I’m not saying that is necessarily a good idea, but it is something some REPLs do, for example some Clojure REPLs.

To ignore a line, we would have to either do it on a low-level, return nil instead of signalling and filter out nil values up the stack. Handling the condition will not help us a lot, because at that point we have lost our position in the file already, or have we?

The next layer up is parsing a collection of lines:

(defun parse-lines (lines)
  (loop for line in lines
        for entry = (restart-case
                     (parse-line line)
                     (skip-line () nil))
        when entry collect it))

This is where the magic begins. The loop construct just loops over the lines, applies parse-line to every element of the list, and returns a list containing all results which are not nil. The feature I am showcasing in this post is restart-case. Think of it this way: it does not handle a condition, but when the stack starts unwinding Technically not unwinding yet, at least not in Common Lisp.

because we signalled a condition in parse-line, it registers a possible restart-position. If the condition is handled at some point,If it isn’t caught, you will get dropped into the debugger, which also gives you the option to restart.

the signal handler can choose to restart at any restart-point that has been registered down the stack.

Now let us have a look at the callers:

(defun parse-compile (lines)
      (parse-lines lines)
    (invalid-line-error (e)
                        (print-error e))))

(defun parse-repl (lines)
  (handler-bind ((invalid-line-error
                  #'(lambda (e)
                      (invoke-restart 'skip-line))))
    (parse-lines lines)))

There is a lot to unpack here. The compiler code is using handler-case, which is comparable to catch in other languages. It unwinds the stack to the current point and runs the signal handling code, in this case print-error.

Because we do not actually want to unwind the stack all the way, but resume execution inside the loop in parse-lines, we use a different construct, handler-bind, which automatically handles invalid-line-error and invokes the skip-line restart. If you scroll up to parse-lines now, you will see that the restart clause says, if we restart here, just return nil, and nil will be filtered on the very next line by when entry.

The elegance here is the split of signal handling code, and decisions about which signal handling approach to take. You can register a lot of different restart-case statements throughout the stack, and let the caller decide if some signals are okay to ignore, without the caller having to have intricate knowledge of the lower-level code.It does need to know about the registered restart-case statements though, at least by name.

If you want to learn more about this, make sure to have a look at the book, it goes into much more detail than I did here.

March 29, 2020

Derek Jones (derek-jones)

Influential programming languages: some of the considerations March 29, 2020 10:38 PM

Which programming languages have been the most influential?

Let’s define an influential language as one that has had an impact on lots of developers. What impact might a programming language have on developers?

To have an impact a language needs to be used by lots of people, or at least have a big impact on a language that is used by lots of people.

Figuring out the possible impacts a language might have had is very difficult, requiring knowledge of different application domains, software history, and implementation techniques. The following discussion of specific languages illustrate some of the issues.

Simula is an example of a language used by a handful of people, but a few of the people under its influence went on to create Smalltalk and C++. Some people reacted against the complexity of Algol 68, creating much simpler languages (e.g., Pascal), while others thought some of its feature were neat and reused them (e.g., Bourne shell).

Cobol has been very influential, at least within business computing (those who have not worked in business computing complain about constructs handling uses that it was not really designed to handle, rather than appreciating its strengths in doing what it was designed to do, e.g., reading/writing and converting a wide range of different numeric data formats). RPG may have been even more influential in this usage domain (all businesses have to specific requirements on formatting reports).

I suspect that most people could not point to the major influence C has had on almost every language since. No, not the use of { and }; if a single character is going to be used as a compound statement bracketing token, this pair are the only available choice. Almost every language now essentially uses C’s operator precedence (rather than Fortran‘s, which is slightly different; R follows Fortran).

Algol 60 has been very influential: until C came along it was the base template for many languages.

Fortran is still widely used in scientific and engineering software. Its impact on other languages may be unknown to those involved. The intricacies of floating-point arithmetic are difficult to get right, and WG5 (the ISO language committee, although the original work was done by the ANSI committee, J3). Fortran code is often computationally intensive, and many optimization techniques started out optimizing Fortran (see “Optimizing Compilers for Modern Architectures” by Allen and Kennedy).

BASIC showed how it was possible to create a usable interactive language system. The compactness of its many, and varied, implementations were successful because they did not take up much storage and were immediately usable.

Forth has been influential in the embedded systems domain, and also people fall in love with threaded code as an implementation technique (see “Threaded Interpretive Languages” by Loeliger).

During the mid-1990s the growth of the Internet enabled a few new languages to become widely used, e.g., PHP and Javascript. It’s difficult to say whether these were more influenced by what their creators ate the night before or earlier languages. PHP and Javascript are widely used, and they have influenced the creation of many languages designed to fix their myriad of issues.

March 28, 2020

Patrick Louis (venam)

Software Distributions And Their Roles Today March 28, 2020 10:00 PM


NB: This is a repost on this blog of a post made on

What is a distribution

What are software distributions? You may think you know everything there is to know about the term software distribution, but take a moment to think about it, take a step back and try to see the big picture.

We often have in mind the thousands of Linux distributions when we hear it, however, this is far from limited to Linux, BSD, Berkeley Software Distribution, has software distribution right in the name. Android, and iOS are software distributions too.

Actually, it’s so prevalent, we may have stopped paying attention to the concept. We find it hard to put a definition together.
There’s definitely the part about distributing software in it. Software that may be commercial or not, open source or not.
To understand it better maybe investigating what problems software distributions address would clear things up.

Let’s imagine a world before software distributions, does that world exist? A world where software stays within boundaries, not shared with anyone outside.
Once we break these boundaries, and we want to share it, we’ll find that we have to package all the software together meaningfully, configure them so that they work well together, adding some glue in between when necessary, find the appropriate medium to distribute the bundle, get it all from one end to another safely, make sure it installs properly, and follow up on it.

Thus, software distribution is about the mechanism and the community that takes the burden and decisions to build an assemblage of coherent software that can be shipped.

The operating system, or kernel if you like, could be, and is often, part of the collage offered, a software just like others.

The people behind it are called distribution maintainers, or package maintainers. Their role vary widely, they could write the software that stores all the packages called the repository, maintain a package manager with its format, maintain a full operating system installer, package and upload software they built or that someone else built on a specific time frame/life cycle, make sure there aren’t any malicious code uploaded on the repository, follow up on the latest security issues and bug reports, fix third party software to fit the distribution philosophical choices and configurations, and most importantly test, plan, and make sure everything holds up together.
These maintainers are the source of trust of the distribution, they take responsibility for it. In fact, I think it’s more accurate to call them distributors.

Different ways to approach it

There’s so many distributions it can make your head spin. The software world is booming, especially the open source one. For instance, we can find bifurcations of distributions that get copied by new maintainers and divert. This creates a tree like aspect, a genealogy of both common ancestors and/or influences in technical and philosophical choices.
Overall, we now have a vibrant ecosystem where a thing learned on a branch can help a completely unrelated leaf on another tree. There’s something for everyone.

Target and speciality

So what could be so different between all those software distributions, why not have a single platform that everyone can build on.

One thing is specialization and differentiation. Each distro caters to a different audience and is built by a community with its philosophy.

Let’s go over some of them:

  • A distribution can support specific sets and combinations of hardware: from CPU ISA to peripherals drivers
  • A distribution may be specifically optimized for a type of environment: Be it desktop, portable mobile device, servers, warehouse size computers, embedded devices, virtualised environment, etc.
  • A distribution can be commercially backed or not
  • A distribution can be designed for different levels of knowledge in a domain, professional or not. For instance, security research, scientific computing, music production, multimedia box, HUD in cars, mobile device interface, etc.
  • A distribution might have been certified to follow certain standards that need to be adhered to in professional settings, for example security standards and hardening
  • A distribution may have a single purpose in a commodity machine, specific machine functionalities such as firewall, a computer cluster, a router, etc.

That all comes to the raison d’être, the philosophy of the distribution, it guides every decision the maintainers have to make. It guides how they configure every software, how they think about security, portability, comprehensiveness.

For example, if a distribution cares about free software, it’s going to be strict about what software it includes and what licenses it allows in its repository, having software to check the consistency of licenses in the core.
Another example is if their goal is to target a desktop audience then internationalization, ease of use, user-friendliness, numerous packages, is going to be prioritized. While, again, if the target is a real time embedded device, the size of the kernel is going to be small, configured and optimized for this purpose, and limiting and choosing the appropriate packages that work in this environment. Or if it’s targeted at advanced users that love having control of their machine, the maintainers will choose to let the users make most of the decisions, providing as many packages as possible with the latest version possible, with a loose way to install the distribution, having a lot of libraries and software development tools.

What this means is that a distribution does anything it can to provide sane defaults that fit its mindset. It composes and configures a layer of components, a stack of software.

The layering

Distribution maintainers often have at their disposition different blocks and the ability to choose them, stacking them to create a unit we call a software distribution. There’s a range of approaches to this, they could choose to have more, or less, included in what they consider the core of the distribution and what is externally less important to it.
Moreover, sometimes they might even leave the core very small and loose, instead providing the glue software that makes it easy for the users to choose and swap the blocks at specific stages in time: installation, run time, maintenance mode, etc.

So what are those blocks of interdependent components.

The first part is the method of installation, this is what everything hinges on, the starting point.

The second part is the kernel, the real core of all operating systems today. But that doesn’t mean that the distribution has to enforce it. Some distributions may go as far as to provide multiple kernels specialised in different things or none at all.

The third part is the filesystem and file hierarchy, the component that manages where and how files are spread out on the physical or virtual hardware. This could be a mix and match where sections of the file system tree are stored on separate filesystems.

The fourth part is the init system, PID 1. This choice has generated a lot of contention these days. PID 1 being the mother process of all other processes on the system. What role it has and what functionalities it should include is a subject of debate.

The fifth part is composed of the shell utilities, what we sometimes refer to as the userland or user space, as it’s the first layer the user can directly interface with to have control of the operating system, the place where processes run. The userland implementations on Unix-based systems usually tries to follow the POSIX standard. There are many such implementations, also subject of contention.

The sixth part is made up of services and their management. The daemons, long-running processes that keep the system in order. Many argue if the management functionality should be part of the init system or not.

The seventh part is documentation. Often it is forgotten but it is still very important.

The last part is about everything else, all the user interfaces and utilities a user can have and ways to manage them on the system.

Stable releases vs Rolling

There exists a spectrum on which distributions place themselves when it comes to keeping up to date with the versions of the software they provide. This most often applies to external third party open source software.
The spectrum is the following: Do we allow the users to always have the latest version of every software while running the risk of accidentally breaking their system, what we call bleeding edge or rolling distro, or do we take a more conservative approach and take the time to test every software properly before allowing it in the repository, while not having all the latest updates, features, and optimizations of the software, what we call release based distro.

The extreme of the first scenario would be to let users directly download from the software vendor/creator source code repository, or the opposite, let the software vendor/creator push directly to the distribution repository. Which could easily break or conflict with the user’s system or lead to security vulnerability. We’ll come back to this later, as this could be avoided if the software runs in a containerized environment.

When it comes to release distributions, it usually involves having a long term support stable version that keeps receiving and syncing with the necessary security updates and bug fixes on the long run while having another version running a bit ahead testing the future changes. On specific time frames, users can jump to the latest release of the distribution, which may involve a lot of changes in both configuration and software.
Some distributions decide they may want to break ABI or API of the kernel upon major releases, that means that everything in the system needs to be rebuilt and reinstalled.

The release cycle, and the rate of updates is really a spectrum.

When it comes to updates, in both cases, the distribution maintainers have to decide how to communicate and handle them. How to let the users know what changes. If a user configuration was swapped for a new one or merged with the new one, or copied aside.
Communication is essential, be it through official channels, logging, mails, etc. Communication needs to be bi-directional, users report bugs and maintainers posts what their decisions are and if users need to be involved in them. This creates the community around the distribution.

Rolling releases require intensive efforts from package maintainers as they constantly have to keep up with software developers. Especially when it comes to the thousands of new libraries that are part of recent programming languages and that keep on increasing.

Various users will want precise things out of a system. Enterprise environments and mission critical tasks will prefer stable releases, and software developers or normal end users may prefer to have the ability to use the latest current software.

Interdistribution standard

With all this, can’t there be an interdistribution standard that creates order, and would we want such standard.

At the user level, the differences are not always noticeable, most of the time everything seems to work as Unix systems are expected to work.
There’s no real standard between distributions other than that they are more or less following the POSIX standards.

Within the Linux ecosystem, the Free Standards Group tries to improve interoperability of software by fixing a common Linux ABI, file system hierarchy, naming conventions, and more. But that’s just the tip of the iceberg when it comes to having something that works interdistributions.

Furthermore, each part of the layering we’ve seen before could be said to have its own standards: There are desktop interoperability standards, filesystem standards, networking standards, security standards, etc.

The biggest player right now when it comes to this is systemd in association with the free desktop group, it tries to create (force) an interdistribution standard for Linux distribution.

But again, the big Question: Do we actually want such inter-distribution standards? Can’t we be happy with the mix and match we currently have? Would we profit from such thing?

The package manager and packaging

Let’s now pay attention to the package themselves, how we store them, how we give secure access to them, how we are able to search amongst them, download them, install them, remove them, and anything related to their local management, versioning, and configuration.

Method of distribution

How do we distribute software, share them, what’s the front-end to this process.

First, where do we store this software.

Historically and still today, software can be shared via physical medium such as CD-ROM, DVD, USBs, etc. This is common when it comes to proprietary vendors to have the distribution come with a piece of hardware they are selling, it’s also common for the procurement of the initial installation image.
However, with today’s hectic software growth, using a physical medium isn’t flexible. Sharing over the internet is more convenient, be it via FTP, HTTP, HTTPS, a publicly available svn or git repo, via central website hubs such as Github or appliation stores such the ones Apple and Google provide.

A requirement is that the storage and the communication to it should be secure, reliable against failures, and accessible from anywhere. Thus, replication is often done to avoid failures but also to have a sort of edge network speeding effect across the world, load balancing. Replication could be done in multiple ways, it could be a P2P distributed system for instance.

How we store it and in what format is up to the repository maintainers. Usually, this is a file system with a software API users can interact with over the wire. Two main format strategies exist: source based repositories and binary repositories.

Second of all, who can upload and manage the host of packages. Who has the right to replicate the repository.

As a source of truth for the users, it is important to make sure the packages have been verified and secured before being accepted on the repository.

Many distributions have the maintainers be the only ones that are able to do this. Giving them cryptographic keys to sign packages and validate them.

Others have their own users build the packages, send them to a central hub for automatic or manual verification and then uploaded to the repository. Each user having its own cryptographic key for signature verification.

This comes down to an issue of trust and stability. Having the users upload packages isn’t always feasible when using binary packages if the individual packages are not containerized properly.

There’s a third option, the road in between, having the two types, the core managed by the official distribution maintainers and the rest by its user community.

Finally, the packages reach the user.

How the user interact with the repository locally and remotely depends on the package management choices. Do users cache a version of the remote repository, like is common with the BSD port tree system.
How flexible can it be to track updates, locking versions of software, allowing downgrades. Can users download from different sources. Can users have multiple version of the same software on their machine.


As we’ve said there are two main philosophy of software sharing format: source code port-style and pre-built binary packages.

The software that manages those on the user side is called the package manager, it’s the link with the repository. Though, in source based repo I’m not sure we can call them this way, but regardless I’ll still refer to them as such.
Many distributions create their own or reuse a popular one. It does the search, download, install, update, and removal of local software. It’s not a small task.

The rule of the book is that if it isn’t installed by the package manager then it won’t be aware of its existence. Noting that distributions don’t have to be limited to a single package manager, there could be many.

Each package manager relies on a specific format and metadata to be able to manage software, be it source or binary formatted. This format can be composed of a group of files or a single binary file with specific information segments that together create recipes that help throughout its lifecycle. Some are easier to put together than others, incidentally allowing more user contributions.

Here’s a list of common information that the package manager needs:

  • The package name
  • The version
  • The description
  • The dependencies on other packages, along with their versions
  • The directory layout that needs to be created for the package
  • Along with the configuration files that it needs and if they should be overwritten or not
  • An integrity, or ECC, on all files, such as SHA256
  • Authenticity, to know that it comes from the trusted source, such as cryptographic signatures checked against a trusted store on the user’s machine
  • If this is a group of package, meta package, or a direct one
  • The actions to take on certain events: pre-installation, post-installation, pre-removal, and post removal
  • If there are specific configuration flags or parameter to pass to the package manager upon installation

So what’s the advantage of having pre-compiled binary packages instead of cloning the source code and compiling ourselves. Won’t that remove a burden from package maintainers.

One advantage is that pre-compiled packages are convenient, it’s easier to download them and run them instantly. It’s also hard, if not impossible, these days, and energy intensive, to compile huge software such as web browsers.
Another point, is that proprietary software are often already distributed as binary packages, which would create a mix of source and binary packages.

Binary formats are also space efficient as the code is stored in a compressed archived format. For example: APK, Deb, Nix, ORB, PKG, RPM, Snap, pkg.tar.gz/xz, etc.
Some package managers may also choose to leave the choice of compression up to the user and dynamically discern from its configuration file how to decompress packages.

Let’s add that there exists tools, such as “Alien”, that facilitate the job of package maintainers by converting from one binary package format to another.

Conflict resolution & Dependencies management

Resolving dependencies

One of the hardest job of the package manager is to resolve dependencies.

A package manager has to keep a list of all the packages and their versions that are currently installed on the system and their dependencies.
When the user wants to install a package, it has to take as input the list of dependencies of that package, compare it against the one it already has and output a list of what needs to be installed in an order that satisfies all dependencies.

This is a problem that is commonly encountered in the software development world with build automation utilities such as make. The tool creates a directed acyclic graph (DAG), and using the power of graph theory and the acyclic dependency principle (ADP) tries to find the right order. If no solution is found, or if there are conflicts or cycles in the graph, the action should be aborted.

The same applies in reverse, upon removal of the package. We have to make a decision, do we remove all the other packages that were installed as a dependency of that single one. What if newer packages depend on those dependencies, should we only allow the removal of the unused dependencies.

This is a hard problem, indeed.


This problem increases when we add the factor of versioning to the mix, if we allow multiple versions of the same software to be installed on the system.

If we don’t, but allow switching from one version to another, do we also switch all other packages that depend on it too.

Versioning applies everywhere, not only to packages but to release versions of the distribution too. A lot of them attach certain version of packages to specific releases, and consequentially releases may have different repositories.

The choice of naming conventions also plays a role, it should convey to users what they are about and if any changes happened.

Should the package maintainer follow the naming convention of the software developer or should they use their own. What if the name of two software conflict with one another, this makes it impossible to have it in the repo, some extra information needs to be added.

Do we rely on semantic versioning, major, minor, patch, or do we rely on names like so many distributions releases do (Toy Story, deserts, etc.), or do we rely on the date it was released, or maybe simply an incremental number.

All those convey meaning to the user when they search and update packages from the repository.

Static vs dynamic linking

One thing that may not apply to source based distro, is the decision between building packages as statically linked to libraries or dynamically linked.

Dynamic linking is the process in which a program chooses not to include a library it depends upon in its executable but only a reference to it, which is then resolved at run-time by a dynamic linker that will load the shared object in memory upon usage. On the opposite, static linking means storing the libraries right inside the compiled executable program.

Dynamic linking is useful when a good deal of software relies on the same library, thus only a single instance of the library has to be in memory at a time. Executables sizes are also smaller, and when it is updated all programs relying on it get the benefit (as long as the interfaces are the same).

So what does this have to do with distributions and package management.

Package managers in dynamic linking environment have to take care of the versions of the libraries that are installed and which packages depend on them. This can create issues if different packages rely on different versions.

For this reason, some distro communities have chosen to get rid of dynamic linking altogether and rely on static linking, at least for things that are not related to the core system.

Another incidental advantage of static linking is that it doesn’t have to resolve dependencies with the dynamic linker, which makes it gain a small boost in speed.

So static builds simplify the package management process. There doesn’t need to be a complex DAG because everything is self-contained. Additionally, this can allow to have multiple versions of the same software installed alongside one another without conflicts. Updates and rollbacks are not messy with static linking.

These give rise to more containerized software, and continuing on this path leads to market platforms such as Android and iOS where distribution can be done by the individual software developers themselves, skipping the middle-man altogether and giving the ability for increasingly impatient users to always have the latest version that works for their current OS. Everything is self-packaged.
However, this relies heavily on the trust of the repository/marketplace. There needs to be many security mechanisms in place to not allow rogue software to be uploaded. We’ll talk more about this when we come back to containers.

This is great for users and, from a certain perspective, software developers too as they can directly distribute pre-built packages, especially when there’s a stable ABI for the base system.

All this breaks the classic distribution scheme we’re accustomed to on the desktop.

Is it all roses and butterflies, though.

As we’ve said, packages take much more space with static linking, thus wasting resources (storage, memory, power).
Moreover, because it’s a model where software developers push directly to users, this removes the filtering that distribution maintainers have over the distro, and encourages licenses uncertainties. There’s no more overall philosophies that surrounds the distribution.
There’s also the issue of library updates, the weight is on the software developers to make sure they have no vulnerabilities or bugs in their code. This adds a veil on which software uses what, all we see is the end products.

From a software developer using this type of distribution perspective, this adds extra steps to download the source code of each library their software depends on, and build each one individually. Turning the system into a source based distro.


Because package management is increasingly becoming messier the past few years, a new trend has emerged to put back a sense of order in all this, reproducibility.

It has been inspired by the world of functional programming and the world of containers. Package managers that respect reproducibility have each of their builds asserted to always produce the same output (functionality wise, though there could be differences, see footnote for info).
They allow for packages of different versions to be installed alongside one another, each living in its own tree, and it allows normal users to install packages only them can access. Thus, many users can have different packages.

They can be used as universal package managers, installed alongside any other package managers without conflict.

The most prominent example is Nix and Guix, that use a purely functional deployment model where software is installed into unique directories generated through cryptographic hashes. Dependencies from each software are included within each hash, solving the problem of dependency hell. This approach to package management promises to generate more reliable, reproducible, and portable packages.

Stateless and verifiable systems

The discussion about trust, portability, and reproducibility can also be applied to the whole system itself.

When we talked about repositories as marketplaces, where software developers push directly to it and the users have instant access to the latest version, we said it was mandatory to have additional measures for security.

One of them is to containerized, to sandbox every software. Having each software run in their own space not affecting the rest of the system resources. This removes the heavy burden of auditing and verifying each and every software. Many solutions exist to achieve this sandboxing, from docker, chroot, jails, firejail, selinux, cgroups, etc.

We could also distance the home directory of the users, making them self-contained, never installing or modifying the globally accessible places.

This could let us have the core of the system verifiable as it is not changed, as it stays pristine. Making sure it’s secure would be really easy.

The idea of having the user part of the distro as atomic, movable, containerized, and the rest reproducible is game changing. But again, do we want to move to a world where every distro is interchangeable?

Do Distros matter with containers, virtualisation, and specific and universal package managers

It remains to be asked if distributions still have a role today with all the containers, virtualisation, and specific and universal package managers.

When it comes to containers, they are still very important as they most often are the base of the stack the other components build upon.

The distribution is made up of people that work together to build and distribute the software and make sure it works fine. It isn’t the role of the person managing the container and much more convenient for them to rely on a distribution.

Another point, is that containers hide vulnerabilities, they aren’t checked after they are put together, while on the other hand, distribution maintainers, have as a role to communicate and follow up on security vulnerabilities and other bugs. Community is what solves daunting problems that everyone shares.
A system administrator building containers can’t possibly have the knowledge to manage and builds hundreds of software and libraries and ensure they work well together.

If packages are self-contained

Do distributions matter if packages are self-contained?

To an extent they do as they could be in this ecosystem the providers/distributors of such universal self-contained packages. And as we’ve said it is important to keep the philosophy of the distro and offer a tested toolbox that fits the use case.

What’s more probable is that we’ll move to a world with multiple package managers, each trusted for its specific space and purpose. Each with a different source of philosophical and technical truth.

Programming language package management specific

This phenomenon is already exploding in the world of programming language package management.

The speed and granularity at which software is built today is almost impossible to follow using the old method of packaging. The old software release life cycle has been thrown out the window. Thus, language-specific tools were developed, not limited to installing libraries but also software. We can now refer to the distribution offered package manager as system-level and others as application-level or specific package managers.

Consequentially, the complexity and conflicts within a system has exploded, and distribution package managers are finding it pointless to manage and maintain anything that can already be installed via those tools. Vice-versa, the specific tool makers are also not interested in having what they provide included in distribution system-level package managers.

Package managers that respect reproducibility, such as Nix, that we’ve mentioned, handle such cases more cleanly as they respect the idea of locality, everything residing withing a directory tree that isn’t maintained by the system-level package manager.

Again, same conclusion here, we’re stuck with multiple package managers that have different roles.

Going distro-less

A popular topic in the container world is “distro-less”.

It’s about replacing everything provided in a distribution, removing its customization, or building an image from scratch and maybe relying on universal package managers or none.

The advantage of such containers is that they are tiny and targeted for a single purpose. This let the sysadmin have full control of what happens on that box.

However, remember that there’s a huge cost to controlling everything, just like we mentioned earlier. This moves the burden upon the sysadmin to manage and be responsible to keep up with bugs and security updates instead of the distribution maintainers.


With everything we’ve presented about distributions, I hope we now have a clearer picture of what they are providing and their place in our current times.

What’s your opinion on this topic? Do you like the diversity? Which stack would you use to build a distribution? What’s your take on static builds, having users upload their own software to the repo? Do you have a solution to the trust issue? How do you see this evolve?

More discussion here:

EDIT: It has come to my understanding that I’ve conflated the meaning of “reproducible”, as in reproducing bit-for-bit identical software, and “reproducible”, as in recreate the functionality of an operating system in this article. I’ve instead taken it to mean anything that we’re sure we could recreate functionality without breaking it, and used verifiable as anything that is identical. Guix and nixos are currently accomplishing the functionality part.

What is a distribution

Different ways to approach it

Package management

Conflict resolution & dependencies management

Do Distros matter with containers, virtualisation, and specific and universal package managers


  • Marjory Collins / Public domain

Gonçalo Valério (dethos)

CSP headers using Cloudflare Workers March 28, 2020 12:35 PM

Last January I made a small post about setting up a “Content-Security-Policy” header for this blog. On that post I described the steps I took to reach a final result, that I thought was good enough given the “threats” this website faces.

This process usually isn’t hard If you develop the website’s software and have an high level of control over the development decisions, the end result ends up being a simple yet very strict policy. However if you do not have that degree of control over the code (and do not want to break the functionality) the policy can end up more complex and lax than you were initially hoping for. That’s what happened in my case, since I currently use a standard installation of WordPress for the blog.

The end result was a different security policy for different routes and sections (this part was not included on the blog post), that made the web-server configuration quite messy.

(With this intro, you might have already noticed that I’m just making excuses to redo the initial and working implementation, in order to test some sort of new technology)

Given the blog is behind the Cloudflare CDN and they introduced their “serverless” product called “Workers” a while ago, I decided that I could try to manage the policy dynamically on their servers.

Browser <--> CF Edge Server <--> Web Server <--> App

The above line describes the current setup, so instead of adding the CSP header on the “App” or the “Web Server” stages, the header is now added to the response on the last stage before reaching the browser. Let me describe how I’ve done it.

Cloudflare Workers

First a very small introduction to Workers, later you can find more detailed information on

So, first Cloudflare added the v8 engine to all edge servers that route the traffic of their clients, then more recently they started letting these users write small programs that can run on those servers inside the v8 sandbox.

The programs are built very similarly to how you would build a service worker (they use the same API), the main difference being where the code runs (browser vs edge server).

These “serverless” scripts can then be called directly through a specific endpoint provided by Cloudflare. In this case they should create and return a response to the requests.

Or you can instruct Cloudflare to execute them on specific routes of your website, this means that the worker can generate the response, execute any action before the request reaches your website or change the response that is returned.

This service is charged based on the number of requests handled by the “workers”.

The implementation

Going back to the original problem and based on the above description, we can dynamically introduce or modify the “Content-Security-Policy” for each request that goes through the worker which give us an high degree of flexibility.

So for my case a simple script like the one below, did the job just fine.

addEventListener('fetch', event => {

 * Forward the request and swap the response's CSP header
 * @param {Request} request
async function handleRequest(request) {
  let policy = "<your-custom-policy-here>"
  let originalResponse = await fetch(request)
  response = new Response(originalResponse.body, originalResponse)
  response.headers.set('Content-Security-Policy', policy)
  return response

The script just listens for the request, passes it to a handler function (lines 1-3), forwards to the origin server (line 12), grabs the response (line 13), replaced the CSP header with the defined policy (line 14) and then returns the response.

If I needed something more complex, like making slight changes to the policy depending on the User-Agent to make sure different browsers behave as expected given the different implementations or compatibility issues, it would also be easy. This is something that would be harder to achieve in the config file of a regular web server (nginx, apache, etc).

Enabling the worker

Now that the script is done and the worker deployed, in order to make it run on certain requests to my blog, I just had to go to the Cloudflare’s dashboard of my domain, click on the “workers” section and add the routes I want it to be executed:

cloudflare workers routes modalConfiguring the routes that will use the worker

The settings displayed on the above picture will run the worker on all requests to this blog, but is can be made more specific and I can even have multiple workers for different routes.

Some sort of conclusion

Despite the use-case described in this post being very simple, there is potential in this new “serverless” offering from Cloudflare. It definitely helped me solve the problem of having different policies for different sections of the website without much trouble.

In the future I might comeback to it, to explore other user-cases or implementation details.

Nikola Plejić (nikola)

Living and Coding In Times of Crises March 28, 2020 10:08 AM

Note: These are a few slightly melodramatic thoughts written in the aftermath of several somewhat traumatic weeks involving a pandemic, an earthquake, and a substantial change in the everyday. The few links that are scattered around I find interesting and important; the rest is here as a permanent reminder to myself. Hope everyone's well. Wash your hands.

I had hoped I would be able to write the word "crisis" in singular in the title of this post, but life tends to be a prescriptivist bastard. Writing this in the middle of a pandemic, after a non-catastrophic, yet devastating earthquake in Zagreb on Sunday, feels surreal. The apartment I just recently moved into roughed up, parts of it possibly significantly damaged, unable to socialize even in the most mundane of ways, exams canceled. Two events tearing into the fabric of what one considers the most social and the most personal, at the same time... and I'm lucky to be healthy, safe, and sound.

It's interesting how disruptive events quickly redefine the "normal". Just a few weeks ago, no one could imagine being quarantined indefinitely, yet here we are in week one of being unable to leave our place of residence. Redefining normal also tends to give way to malice, as we witness more and more calls to increase surveillance and invade people's privacy in order to (at least formally) keep track of the spread and unwanted public gatherings. Once again the political emerges as inseparable from the technological to the detriment of the techno-elite. Go figure.

Science, with all of its problems, gives us hope. Both of these events have been properly assessed: epidemiologists and infectious disease experts have been warning us of the possible consequences of SARS-CoV-2 weeks before the situation got out of control outside of China, and seismologists have been fairly clear that a strong earthquake around Zagreb is imminent. The problem with complex systems is that they're hard to grasp even for experts, and our intuition for statistics is slim to none. Even when we do manage to become aware of some aspects of the risks involved, this tends to fall apart as quickly as it materializes. Coupled with the fear of "collapsing markets" and their existential consequences, this often means we're more comfortable with the status quo than with taking action. Informed apathy can be as dangerous as uninformed sympathy.

As an aside, existential terror is a powerful weapon, and it was interesting to observe its effects on one's thoughts and approaches to life. I am well-aware that there are no reliable methods of predicting earthquakes to any precise extent, yet when on the day of the earthquake someone blurted out a hoax that "there's a stronger one coming at 8:45am", for a few moments it seemed like the most inevitable and obvious thing in the world. I think I've revisited that highly instructive moment more often than the earthquake itself.

Even though it's impossible not to be affected by this, people in IT, and primarily in software, seem to be in a unique position to handle this situation better than the average worker. There will be consequences, but the vast amount of money in tech means there's less chance of being fired, especially in times of enormous reliance on the tech infrastructure. The level of technological literacy gives us an enormous advantage in navigating the inevitable mess during the initial period of adjustment. A lot of us have been able to seamlessly switch to working from home, if we haven't already been doing so.

I believe this gives us a fair deal of responsibility, too: it's increasingly important to use our skills to help the ones in need. Large-scale tragedies take a psychological and organizational toll, and we have tools to help. This means being aware of the fact that the technology we produce is not as obvious as it might seem, and that people often need assistance and support which isn't a sign of weakness or lack of competence, but rather a natural consequence of using a new tool in an unfamiliar setting. It also means using the free time that isolation gives us to help people and organizations by providing them with necessary technological infrastructure and assistance.

More in-depth political analyses have been written by more eloquent and better-read comrades over at Pirate Care, focused on COVID-19 in particular, and by Tomislav, focused on the pandemic in combination with the earthquake. I can just echo that I find it important not to let our imagination atrophy and our empathy to subside, and that I hope the lessons we learn here will be the ones of solidarity and care. ☭

March 27, 2020

Jan van den Berg (j11g)

Dylan Thomas – Sidney Michaels March 27, 2020 09:23 AM

This book is a play from 1965, based on several accounts of the infamous travels Welsh poet Dylan Thomas made in the early 1950s to the US. If you know anything about Dylan Thomas you probably know he died young (39), and that he was an alcoholic.

Dylan Thomas – Sidney Michaels (1965) – 111 pages

This play captures the last two or three years of his life rather vividly. It’s an alcoholic mess and it details an explosive marriage. His tumultuous life and and classic ‘poets-die-young’ death only deepened the already legend. So much so, that a young fellow named Robert Zimmerman, based his stage name on the famous poet a few years later. And that’s why I picked up this book, and learned a little bit more.

The post Dylan Thomas – Sidney Michaels appeared first on Jan van den Berg.

Pete Corey (petecorey)

Glorious Voice Leader Reborn March 27, 2020 12:00 AM

I recently finished a major visual overhaul of my current pet project, Glorious Voice Leader. The redesign was primarily intended to give musicians a (hopefully) more natural interface for working with the tool.

Originally, Glorious Voice Leader displayed its chord suggestions as a series of horizontal fretboards stacked vertically:

The old Glorious Voice Leader.

This gave the guitarist lots of control, but it wasn’t a natural way for them to interact with or consume a chord progression.

Instead, I decided to transition to a more familiar chart-based design where every chord in the progression is represented by a small chord chart that should be familiar to any guitarist. The chord currently being added or modified is also displayed on a full fretboard laid out vertically on the side of the page:

The newly redesigned Glorious Voice Leader.

The redesign was largely a cosmetic overhaul. A few new minor features were added in the process, such as progressive chord naming, and better keyboard controls, but the meat of Glorious Voice Leader is all still there. Your old saved URLs will still work!

Check out Glorious Voice Leader now! And while I’ve got you hooked, be sure to read more about the project and it’s roots in my other projects.

March 24, 2020

Patrick Louis (venam)

The Self, Metaperceptions, and Self-Transformation March 24, 2020 10:00 PM

infinite reflections

How would you describe yourself?
How do you usually talk about yourself?
Do you feel like you are the writer of your own narrative?
Who are you?

We all stand on a balance of being perceived and perceiving, of having a visible and owning an invisible part, and of having control over and being controlled by. It is amongst all this that we can find the nebulous definition of who we are, what Locke calls “the sameness of a rational being”.
In view of this, we are both passengers and conductors of our narrative. So how do we drive this narrative forward, is it possible to have more agency in it than we currently have. And if we are our narrative can we, as the narrative, choose another narrative without self-annihilation.
Metacognition can be dizzying.

I’ve previously discussed the topic of what we are and now I’d like to focus on the self, its formation, its transformation, and its actualization.

Who one is, over time, is created by the amalgamation of the historical events, physical aspects, and external and internal reflections, that get incorporated into one’s identity. The self is this element that sits in the middle, taking in and taking out, what makes sense to us and for us.

From an external point of view, we could define ourselves in reaction to the roles we play for others, the way we interact with them, eventually adjusting our selves to the labels we’ve been given or have chosen.
It is helpful to have others act as calibration to our internal system when we have nothing else to base our definition unto, especially when we are starting with our self exploration in teenage years. We aren’t brains in a vat, a self doesn’t exist without a world. Yet, if we overly emphasize on this sort of self definition, the other becomes our worse nightmare and our only way of finding meaning and salvation. It leads to interpreting the world with a heavy filter, and judge it the way we think it judges us, harshly, and frequently inaccurately, because it is shaped by our individual self-concept and personal biases. This is what we call metaperception, the idea we have of how other people’s view of us.

Metaperception can be destructive if not handled properly. For instance, someone who fixates on it may act in a self-centered way, imagining that everyone is watching and evaluating their every move, that they are the center of social interaction. They’ll shut themselves, limit their spontaneity, and have an increasingly fragile ego. All the while, not considering the unbridgeable gap that exists between selves.
Being overwhelmed by the other makes it difficult to accept criticism, to interpret someone else’s response; everything becomes emotionally charged in a frenzied uncontrollable internal state.

From an internal point of view, we could define ourselves as the main character of our lives, the maker of the story. We could move in the world in relation to what we perceive we’re doing to it.
We are our own persons, with our own choices, so why not make the world what we want of it. Yet, if we overly emphasize on this sort of self definition, we become an actor, the protagonist, reading the main script, trying to get the center of the stage, while everyone around plays a minor role. It leads to clashes in narratives, cognitive dissonance, an illusion of superiority, and an egocentric bias. Just like over-metaperception creates lenses, the self-made-story does too. We cannot deny that, by analogical thinking, others exist and that they have their own selves.

Therefore, that’s where the balance lies: knowing that we can be the masters of our destinies, and knowing we are creatures living in a limited social and physical world.
How do we learn to be comfortable with the ambiguity of the self/other boundary and get a better life experience. Why and how do we change our selves.

Change is hard. It’s an arduous task we’d rather let happen by itself gradually, let it pass by, barely noticing it after it has happened. Unfortunately, life is riddled with issues and dissatisfactions.
At first, we may pretend they are benign, non-existent, trivial. We dismiss them and move on to do activities that take our focus away from them…
Until they are not trivial anymore, until our behavioral pattern becomes destructive, until they become unmanageable, until we become intimately aware of them.
Then a realization emerges: Those problems are created by our sense of self, they are the product of our definition, part of the narrative. Rejecting them would mean rejecting one self. And this is what we do, we build immunity to change, we protect our self consistence, we cocoon ourselves away from the unknown changes. This is what we are, and we feel stuck with it.

The years pass by and nothing seems to change. — Henry David Thoreau

Gradually, we may build constant feelings of guilt, shame, anxiety, and regret. Desperate for change we see as unattainable, seeing everything as an unfulfilling experience. Are we to forever remain haunted by what might have been?

To cope with such emotions, we could rely on our old friend: self-suppressive escapism. Namely, anything that is numbing, numbing to the critical evaluation of the self, a cognitive narrowing, a cognitive detachment from the disturbing elements of the self. All of this being the easiest way to avoid the source of despair. Moreover, in themselves, these kinds of actions could be blamed for our current state. We can blame our inability to take productive actions to change on our anxiety, depression, fear, or lack of confidence in our abilities. Additionally, we may even believe that we have to first get rid of such feelings before moving on to change, we’ll try meditation and introspection. Or we may believe that we’ve wasted too much time, that it’s too late, and be overwhelmed by intense feelings of guilt and regret. However, the negative emotions are not the results of those, but they are inherent to the way we define ourselves and our fear of change.

Any obstruction of the natural processes of development …or getting stuck on a level unsuited to one’s age, takes its revenge, if not immediately, then later at the onset of the second half of life, in the form of serious crises, nervous breakdowns, and all manner of physical and psychic sufferings. Mostly they are accompanied by vague feelings of guilt, by tormenting pangs of conscience, often not understood, in face of which the individual is helpless. He knows he is not guilty of any bad deed, he has not given way to an illicit impulse, and yet he is plagued by uncertainty, discontent, despair, and above all by anxiety — a constant, indefinable anxiety. And in truth he must usually be pronounced “guilty”. His guilt does not lie in the fact that he has a neurosis, but in the fact that, knowing he has one, he does nothing to set about curing it. — Jolande Jacobi, The Way of Individuation

We cannot change anything unless we accept it. — Carl Jung

Thus, we should find the courage to tackle personal growth. If we don’t accept what has been, we can’t move to what will be. The feelings of dissatisfaction should be the catalyst of change, they should be welcomed as stimuli in the struggle for the development of personality.

Neurotic symptoms such as these are a direct result of an inadequate approach to life and act as signals communicating the necessity of change. — Carl Jung

Small changes are great and cumulate but when we’ve reached a point where each step forward gets repelled by all our insecurities, we need more assurance, we need to know which sort of exact self-induced changes are the most useful.
And this is what we need to do, we need to break our immunity to change, we need to remove the shield of our self consistence, we need to face the unknown changes right on. This is the step where we need to take the courage to sacrifice our selves to be reborn.

Sacrifice always means the renunciation of a valuable part of oneself, and through it the sacrificer escapes being devoured. Difficult but necessary step to abandon an aspect of ourselves in order to pave the way for the emergence of the new. The sacrifice is critical in the process of rebirth because what keeps us locked in our problem is the inability to recognize that ways of life that served us in our past may morph from promoters of our well-being to the acute cause of our suffering. — Carl Jung

… The dying of one attitude or need may be the other side of the birth of something new. One can choose to kill a neurotic strategy, a dependency, a clinging, and then find that he can choose to live as a freer self… A “dying” of part of one’s self is often followed by a heightened awareness of self, a heightened sense of possibility. — Rollo May

Unsurprisingly, any sudden unnatural change involves risks, especially when already deep into the abyss. Such change may lead to disorder if, by removing part of ourselves, we have nothing else to fill it with. This may take us to the path of chaos and psychological breakdown.

[This] …is similar in principle to a psychotic disturbance; that is, it differs from the initial stage of mental illness only by the fact that it leads in the end to greater health, while the latter leads to yet greater destruction. — Carl Jung

Enters a labyrinth, and multiplies a thousandfold the dangers that life in itself brings with it — of which not the least is that nobody can see how and where he loses his way, becomes solitary, and is torn to pieces by some cave-Minotaur of conscience. — Nietzsche

So now that we’re aware of our situation and have the courage to leap and let ourselves go, how do we direct the change and get out of the loop we’re currently stuck in, how do we stop being an immovable pillar of the past.

Just like we’ve accepted that parts of the self can be sacrificed, we have to accept its ambiguity and its chaotic nature. Accept that there’s a world within us that we may not currently understand, that there’s a depth in each of us that remains to be discovered. We have to be aware of the reality of our psyche.
Indeed, a lot of the reasons why it remains hidden are related to the self-suppression we externally exert on elements of our personality that we think run counter to the moral system of our days. We are blinded by the unquestioned social beliefs and standards.

All psychology so far has got hung up on moral prejudices and fears: it has not dared to descend into the depths. — Nietzsche

Similarly, how can we know ourselves if we are actors in a play, wearer of the masks of society. How can we know ourselves if all we strive for is fake-perfectionism, a fetishism for perfection and a repulsion for anything that isn’t. Perfection hinders our development.
Consequentially, we have to accept that perfection is not a thing to aim for, that it is non-existent. Because how can one know what perfection is if it is not complete to begin with.

One should never think that man can reach perfection, he can only aim at completion — not to be perfect but to be complete. That would be the necessity and the indispensable condition if there were only question of perfection at all. For how can you perfect a thing if it is not complete?

Make it complete first and see what it is then. But to make it complete is already a mountain of a task, and by the time you arrive at absolute completion, you find that you are already dead, so you never read that preliminary condition for perfecting yourself. — Carl Jung

Completeness means grasping the wholeness of the self, the inner foundation of one’s mind, how to impose form and harmony on the chaos that is the totality of the self. We are chaos.
We are composed of a multitude, and we need to make those emerge, to bring the parts to light. As much as we dig, as much as the elements of consciousness become apparent. We should do all we can to promote this growth, we should be heroes and explorers of the chaos not passive observers that are controlled by its forces. The myriads of chunks are then seen from afar, for what they are, none taking authority on the whole.
Just like the body is coordinated, there can be a master to the chaos of the mind. We can have an organizing idea that drives the rest. But what is this drive, how do we cultivate it, how can we muster its power to train the body it’s composed of.

If we keep experimenting and spend time bringing forward the chaotic parts to understand them, it becomes clear. Our drives don’t come out of thin air. Analogically to the reframing of the definition of the self, we can reframe our connection to space and time.
We are part of our history, our dreams, our will of achievements, our unconscious thoughts, our past cultures, our institutions, our traditions, our physical predispositions, our limits, our sense of imminent death, and our primitive drives, all constitute our location in space-time. Those are also expressions of our personal and collective unconscious. Altogether, those create a common symbolic language, archetypes, universal elements, original forms that are part of the specification of human nature.
When one becomes conscious of their link with archetypes, they gain knowledge of the timeless “pattern of human life”, it provides a link with humanity. Additionally, this type of learning dissolves the feeling that everything is absurd and provides a sense of being rooted.
Considering this, we need to actively learn about basic prehistorical drives and how we can take power over them, we need to learn about history and how it repeats itself, we need to be familiar with the vestiges of the past, feel them flow within us instead of being ashamed and repressing them. The ruling passion should drive it all, sculpting our own heroic meaning to life.

For that, we can choose to study examples of archetypes, anyone and anything that we find excels in life, be it an imaginary being or not. Like an artist, we can be active makers, using the same method that gave birth to our self in the first place: imitation and emulation of the others. We can create a second self based on traits, characteristics, and how they handle adversity and challenges, of a role model while still respecting what we know we can’t change in ourselves, our innate strength and weaknesses.
This mechanism, of having an alter-ego archetype as a feedback mechanism that we slowly dissolve into, makes escapism easier and directed. Instead of reaching for the numbing actions we should reach for the second self we’ve created. Importantly keeping in mind that one of those numbing action is to divulge to others that we are simple “acting it out”, informing others would only be a sign of the anxiety we are having when facing this novel situation and that we are looking for the recalibration of our social norms via the others.
We should not pretend, we should act as if we already were and remind ourselves constantly, when we fall back to our old habits, that the past was the actual acting and that the present is the reality.

All of this until we engender the shift in our mindset. We’ll finally be a new self, looking above the previous one from a distance. It’s only by taking a distance that we can understand the whole.
Thus, we can continue on our self discovery, peeling more and more, revealing potential we didn’t know we had. At this point we can rediscover our internal and external worlds like never before. Introspection, retrospection, discussion, connection of the minds, these take on a different perspective: what we call psychological mindedness.

The field of developmental psychology has been enamored with such encapsulated form of growth, subject/object, the one where, as you move along, the previous self becomes the object of the current self or where you radically change your perspective on life. For instance, Abraham Maslow, Jean Piaget, Erik Erikson, Robert Kegan, are psychologists that used such methods to convey cognitive and personality development.

Maslow talks about self-actualization, “man’s tendency to actualize himself, to become his potentialities”, “the desire for self-fulfillment”. This can be achieved by having something to aim at, not for external rewards or achievement of the goal but rather because the transformation of the self forces us to do it. The type of behavior that requires self-discipline, skills, and that is constructive.
Piaget and Kegan focus on the subject-object and inter-relation when it comes to defining the self. Giving a sense to the self in a world that is nebulous and has roots in history.
Erikson prefers to emphasize what is important at every stage of life, the “ego identities”, from childhood to senior years, all together encompassing a stable self. Putting in perspective our common humanity.

Finally, maybe eventually, with all this, we can know who we are and how to describe ourselves.


  • Gianni Crestani / CC0

Jeremy Morgan (JeremyMorgan)

Stay Home and Learn JavaScript March 24, 2020 08:32 PM

If you’re quarantined at home and always wanted to learn to code, now’s your chance. I’ll be creating a series of tutorials designed to take you from “zero to hero” as a React Developer. Before you start with React you should know some JavaScript. Unlike many front end frameworks/libraries React is uses JavaScript patterns extensively. So we’ll cover some JavaScript basics. In this tutorial, you will learn: How to get your first webpage set up How to write text to the browser How to write text to the console We will cover some of the basics of JavaScript but won’t get too deep.

March 23, 2020

Joe Nelson (begriffs)

Concurrent programming, with examples March 23, 2020 12:00 AM

Mention concurrency and you’re bound to get two kinds of unsolicited advice: first that it’s a nightmarish problem which will melt your brain, and second that there’s a magical programming language or niche paradigm which will make all your problems disappear.

We won’t run to either extreme here. Instead we’ll cover the production workhorses for concurrent software – threading and locking – and learn about them through a series of interesting programs. By the end of this article you’ll know the terminology and patterns used by POSIX threads (pthreads).

This is an introduction rather than a reference. Plenty of reference material exists for pthreads – whole books in fact. I won’t dwell on all the options of the API, but will briskly give you the big picture. None of the examples contain error handling because it would merely clutter them.

Table of contents

Concurrency vs parallelism

First it’s important to distinguish concurrency vs parallelism. Concurrency is the ability of parts of a program to work correctly when executed out of order. For instance, imagine tasks A and B. One way to execute them is sequentially, meaning doing all steps for A, then all for B:


Concurrent execution, on the other hand, alternates doing a little of each task until both are all complete:

Concurrency allows a program to make progress even when certain parts are blocked. For instance, when one task is waiting for user input, the system can switch to another task and do calculations.

When tasks don’t just interleave, but run at the same time, that’s called parallelism. Multiple CPU cores can run instructions simultaneously:


When a program – even without hardware parallelism – switches rapidly enough from one task to another, it can feel to the user that tasks are executing at the same time. You could say it provides the “illusion of parallelism.” However, true parallelism has the potential for greater processor throughput for problems that can be broken into independent subtasks. Some ways of dealing with concurrency, such as multi-threaded programming, can exploit hardware parallelism automatically when available.

Some languages (or more accurately, some language implementations) are unable to achieve true multi-threaded parallelism. Ruby MRI and CPython for instance use a global interpreter lock (GIL) to simplify their implementation. The GIL prevents more than one thread from running at once. Programs in these interpreters can benefit from I/O concurrency, but not extra computational power.

Our first concurrent program

Languages and libraries offer different ways to add concurrency to a program. UNIX for instance has a bunch of disjointed mechanisms like signals, asynchronous I/O (AIO), select, poll, and setjmp/longjmp. Using these mechanisms can complicate program structure and make programs harder to read than sequential code.

Threads offer a cleaner and more consistent way to address these motivations. For I/O they’re usually clearer than polling or callbacks, and for processing they are more efficient than Unix processes.

Crazy bankers

Let’s get started by adding concurrency to a program to simulate a bunch of crazy bankers sending random amounts of money from one bank account to another. The bankers don’t communicate with one another, so this is a demonstration of concurrency without synchronization.

Adding concurrency is the easy part. The real work is in making threads wait for one another to ensure a correct result. We’ll see a number of mechanisms and patterns for synchronization later, but for now let’s see what goes wrong without synchronization.

/* banker.c */

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>

#define N_ACCOUNTS 10
#define N_THREADS  20
#define N_ROUNDS   10000

/* 10 accounts with $100 apiece means there's $1,000
   in the system. Let's hope it stays that way...  */
#define INIT_BALANCE 100

/* making a struct here for the benefit of future
   versions of this program */
struct account
	long balance;
} accts[N_ACCOUNTS];

/* Helper for bankers to choose an account and amount at
   random. It came from Steve Summit's excellent C FAQ */
int rand_range(int N)
	return (int)((double)rand() / ((double)RAND_MAX + 1) * N);

/* each banker will run this function concurrently. The
   weird signature is required for a thread function */
void *disburse(void *arg)
	size_t i, from, to;
	long payment;

	/* idiom to tell compiler arg is unused */

	for (i = 0; i < N_ROUNDS; i++)
		/* pick distinct 'from' and 'to' accounts */
		from = rand_range(N_ACCOUNTS);
		do {
			to = rand_range(N_ACCOUNTS);
		} while (to == from);

		/* go nuts sending money, try not to overdraft */
		if (accts[from].balance > 0)
			payment = 1 + rand_range(accts[from].balance);
			accts[from].balance -= payment;
			accts[to].balance   += payment;
	return NULL;

int main(void)
	size_t i;
	long total;
	pthread_t ts[N_THREADS];


	for (i = 0; i < N_ACCOUNTS; i++)
		accts[i].balance = INIT_BALANCE;

	printf("Initial money in system: %d\n",

	/* start the threads, using whatever parallelism the
	   system happens to offer. Note that pthread_create
	   is the *only* function that creates concurrency */
	for (i = 0; i < N_THREADS; i++)
		pthread_create(&ts[i], NULL, disburse, NULL);

	/* wait for the threads to all finish, using the
	   pthread_t handles pthread_create gave us */
	for (i = 0; i < N_THREADS; i++)
		pthread_join(ts[i], NULL);

	for (total = 0, i = 0; i < N_ACCOUNTS; i++)
		total += accts[i].balance;

	printf("Final money in system: %ld\n", total);

The following simple Makefile can be used to compile all the programs in this article:

CFLAGS = -std=c99 -pedantic -D_POSIX_C_SOURCE=200809L -Wall -Wextra
LDLIBS = -lpthread

		$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $< $(LDLIBS)

We’re overriding make’s default suffix rule for .c so that -lpthread comes after the source input file. This makeefile will work with any of our programs. If you have foo.c you can simply run make foo and it knows what to do without your needing to add any specific rule for foo to the Makefile.

Data races

Try compiling and running banker.c. Notice anything strange?

Threads share memory directly. Each thread can read and write variables in shared memory without any overhead. However when threads simultaneously read and write the same data it’s called a data race and generally causes problems.

In particular, threads in banker.c have data races when they read and write account balances. The bankers program moves money between accounts, however the total amount of money in the system does not remain constant. The books don’t balance. Exactly how the program behaves depends on thread scheduling policies of the operating system. On OpenBSD the total money seldom stays at $1,000. Sometimes money gets duplicated, sometimes it vanishes. On macOS the result is generally that all the money disappears, or even becomes negative!

The property that money is neither created nor destroyed in a bank is an example of a program invariant, and it gets violated by data races. Note that parallelism is not required for a race, only concurrency.

Here’s the problematic code in the disburse() function:

payment = 1 + rand_range(accts[from].balance);
accts[from].balance -= payment;
accts[to].balance   += payment;

The threads running this code can be paused or interleaved at any time. Not just between any of the statements, but partway through arithmetic operations which may not execute atomically on the hardware. Never rely on “thread inertia,” which is the mistaken feeling that the thread will finish a group of statements without interference.

Let’s examine exactly how statements can interleave between banker threads, and the resulting problems. The columns of the table below are threads, and the rows are moments in time.

Here’s a timeline where two threads read the same account balance when planning how much money to transfer. It can cause an overdraft.

Thread A Thread B
payment = 1 + rand_range(accts[from].balance);
payment = 1 + rand_range(accts[from].balance);
At this point, thread B’s payment-to-be may be in excess of the true balance because thread A has already earmarked some of the money unbeknownst to B.
accts[from].balance -= payment;
accts[from].balance -= payment;
Some of the same dollars could be transferred twice and the originating account could even go negative if the overlap of the payments is big enough.

Here’s a timeline where the debit made by one thread can be undone by that made by another.

Lost debit
Thread A Thread B
accts[from].balance -= payment; accts[from].balance -= payment;
If -= is not atomic, the threads might switch execution after reading the balance and after doing arithmetic, but before assignment. Thus one assignment would be overwritten by the other. The “lost update” creates extra money in the system.

Similar problems can occur when bankers have a data race in destination accounts. Races in the destination account would tend to decrease total money supply. (To learn more about concurrency problems, see my article Practical Guide to SQL Transaction Isolation).

Locks and deadlock

In the example above, we found that a certain section of code was vulnerable to data races. Such tricky parts of a program are called critical sections. We must ensure each thread gets all the way through the section before another thread is allowed to enter it.

To give threads mutually exclusive access to a critical section, pthreads provides the mutually exclusive lock (mutex for short). The pattern is:


/* ... do things in the critical section ... */


Any thread calling pthread_mutex_lock on a previously locked mutex will go to sleep and not be scheduled until the mutex is unlocked (and any other threads already waiting on the mutex have gone first).

Another way to look at mutexes is that their job is to preserve program invariants. The critical section between locking and unlocking is a place where a certain invariant may be temporarily broken, as long as it is restored by the end. Some people recommend adding an assert() statement before unlocking, to help document the invariant. If an invariant is difficult to specify in an assertion, a comment can be useful instead.

A function is called thread-safe if multiple invocations can safely run concurrently. A cheap, but inefficient, way to make any function thread-safe is to give it its own mutex and lock it right away:

/* inefficient but effective way to protect a function */

pthread_mutex_t foo_mtx = PTHREAD_MUTEX_INITIALIZER;

void foo(/* some arguments */)

	/* we're safe in here, but it's a bottleneck */


To see why this is inefficient, imagine if foo() was designed to output characters to a file specified in its arguments. Because the function takes a global lock, no two threads could run it at once, even if they wanted to write to different files. Writing to different files should be independent activities, and what we really want to protect against are two threads concurrently writing the same file.

The amount of data that a mutex protects is called its granularity, and smaller granularity can often be more efficient. In our foo() example, we could store a mutex for every file we write, and have the function choose and lock the appropriate mutex. Multi-threaded programs typically add a mutex as a member variable to data structures, to associate the lock with its data.

Let’s update the banker program to keep a mutex in each account and prevent data races.

/* banker_lock.c */

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>

#define N_ACCOUNTS 10
#define N_THREADS  100
#define N_ROUNDS   10000

struct account
	long balance;
	/* add a mutex to prevent races on balance */
	pthread_mutex_t mtx;
} accts[N_ACCOUNTS];

int rand_range(int N)
	return (int)((double)rand() / ((double)RAND_MAX + 1) * N);

void *disburse(void *arg)
	size_t i, from, to;
	long payment;


	for (i = 0; i < N_ROUNDS; i++)
		from = rand_range(N_ACCOUNTS);
		do {
			to = rand_range(N_ACCOUNTS);
		} while (to == from);

		/* get an exclusive lock on both balances before
		   updating (there's a problem with this, see below) */
		if (accts[from].balance > 0)
			payment = 1 + rand_range(accts[from].balance);
			accts[from].balance -= payment;
			accts[to].balance   += payment;
	return NULL;

int main(void)
	size_t i;
	long total;
	pthread_t ts[N_THREADS];


	/* set the initial balance, but also create a
	   new mutex for each account */
	for (i = 0; i < N_ACCOUNTS; i++)
		accts[i] = (struct account)

	for (i = 0; i < N_THREADS; i++)
		pthread_create(&ts[i], NULL, disburse, NULL);

	puts("(This program will probably deadlock, "
	     "and need to be manually terminated...)");

	for (i = 0; i < N_THREADS; i++)
		pthread_join(ts[i], NULL);

	for (total = 0, i = 0; i < N_ACCOUNTS; i++)
		total += accts[i].balance;

	printf("Total money in system: %ld\n", total);

Now everything should be safe. No money being created or destroyed, just perfect exchanges between the accounts. The invariant is that the total balance of the source and destination accounts is the same before we transfer the money as after. It’s broken only inside the critical section.

As a side note, at this point you might think it would be more efficient be to take a single lock at a time, like this:

  • lock the source account
  • withdraw money into a thread local variable
  • unlock the source account
  • (danger zone!)
  • lock the destination account
  • deposit the money
  • unlock the destination account

This would not be safe. During the time between unlocking the source account and locking the destination, the invariant does not hold, yet another thread could observe this state. For instance a report running in another thread just at that time could read the balance of both accounts and observe money missing from the system.

We do need to lock both accounts during the transfer. However the way we’re doing it causes a different problem. Try to run the program. It gets stuck forever and never prints the final balance! Its threads are deadlocked.

Deadlock is the second villain of concurrent programming, and happens when threads wait on each others’ locks, but no thread unlocks for any other. The case of the bankers is a classic simple form called the deadly embrace. Here’s how it plays out:

Deadly embrace
Thread A Thread B
lock account 1
lock account 2
lock account 2
At this point thread A is blocked because thread B already holds a lock on account 2.
lock account 1
Now thread B is blocked because thread A holds a lock on account 1. However thread A will never unlock account 1 because thread A is blocked!

The problem happens because threads lock resources in different orders, and because they refuse to give locks up. We can solve the problem by addressing either of these causes.

The first approach to preventing deadlock is to enforce a locking hierarchy. This means the programmer comes up with an arbitrary order for locks, and always takes “earlier” locks before “later” ones. The terminology comes from locks in hierarchical data structures like trees, but it really amounts to using any kind of consistent locking order.

In our case of the banker program we store all the accounts in an array, so we can use the array index as the lock order. Let’s compare.

/* the original way to lock mutexes, which caused deadlock */

/* move money */

Here’s a safe way, enforcing a locking hierarchy:

/* lock mutexes in earlier accounts first */

#define MIN(a,b) ((a) < (b) ? (a) : (b))
#define MAX(a,b) ((a) < (b) ? (b) : (a))

pthread_mutex_lock(&accts[MIN(from, to)].mtx);
pthread_mutex_lock(&accts[MAX(from, to)].mtx);
/* move money */
pthread_mutex_unlock(&accts[MAX(from, to)].mtx);
pthread_mutex_unlock(&accts[MIN(from, to)].mtx);

/* notice we unlock in opposite order */

A locking hierarchy is the most efficient way to prevent deadlock, but it isn’t always easy to contrive. It’s also creates a potentially undocumented coupling between different parts of a program which need to collaborate in the convention.

Backoff is a different way to prevent deadlock which works for locks taken in any order. It takes a lock, but then checks whether the next is obtainable. If not, it unlocks the first to allow another thread to make progress, and tries again.

/* using pthread_mutex_trylock to dodge deadlock */

while (1)
	if (pthread_mutex_trylock(&accts[to].mtx) == 0)
		break; /* got both locks */

	/* didn't get the second one, so unlock the first */
	/* force a sleep so another thread can try --
	   include <sched.h> for this function */
/* move money */

One tricky part is the call to sched_yield(). Without it the loop will immediately try to grab the lock again, competing as hard as it can with other threads who could make more productive use of the lock. This causes livelock, where threads fight for access to the locks. The sched_yield() puts the calling thread to sleep and at the back of the scheduler’s run queue.

Despite its flexibility, backoff is definitely less efficient than a locking hierarchy because it can make wasted calls to lock and unlock mutexes. Try modifying the banker program with these approaches and measure how fast they run.

Condition variables

After safely getting access to a shared variable with a mutex, a thread may discover that the value of the variable is not yet suitable for the thread to act upon. For instance, if the thread was looking for an item to process in a shared queue, but found the queue was empty. The thread could poll the value, but this is inefficient. Pthreads provides condition variables to allow threads to wait for events of interest or notify other threads when these events happen.

Condition variables are not themselves locks, nor do they hold any value of their own. They are merely events with a programmer-assigned meaning. For example, a structure representing a queue could have a mutex for safely accessing the data, plus some condition variables. One to represent the event of the queue becoming empty, and another to announce when a new item is added.

Before getting deeper into how condition variables work, let’s see one in action with our banker program. We’ll measure contention between the bankers. First we’ll increase the number of threads and accounts, and keep statistics about how many bankers manage to get inside the disburse() critical section at once. Any time the max score is broken, we’ll signal a condition variable. A dedicated thread will wait on it and update a scoreboard.

/* banker_stats.c */

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>

/* increase the accounts and threads, but make sure there are
 * "too many" threads so they tend to block each other */
#define N_ACCOUNTS 50
#define N_THREADS  100
#define N_ROUNDS   10000

#define MIN(a,b) ((a) < (b) ? (a) : (b))
#define MAX(a,b) ((a) < (b) ? (b) : (a))

struct account
	long balance;
	pthread_mutex_t mtx;
} accts[N_ACCOUNTS];

int rand_range(int N)
	return (int)((double)rand() / ((double)RAND_MAX + 1) * N);

/* keep a special mutex and condition variable
 * reserved for just the stats */
pthread_mutex_t stats_mtx = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t  stats_cnd = PTHREAD_COND_INITIALIZER;
int stats_curr = 0, stats_best = 0;

/* use this interface to modify the stats */
void stats_change(int delta)
	stats_curr += delta;
	if (stats_curr > stats_best)
		stats_best = stats_curr;
		/* signal new high score */

/* a dedicated thread to update the scoreboard UI */
void *stats_print(void *arg)
	int prev_best;


	/* we never return, nobody needs to
	 * pthread_join() with us */

	while (1)

		prev_best = stats_best;
		/* go to sleep until stats change, and always
		 * check that they actually have changed */
		while (prev_best == stats_best)
				&stats_cnd, &stats_mtx);

		/* overwrite current line with new score */
		printf("\r%2d", stats_best);


void *disburse(void *arg)
	size_t i, from, to;
	long payment;


	for (i = 0; i < N_ROUNDS; i++)
		from = rand_range(N_ACCOUNTS);
		do {
			to = rand_range(N_ACCOUNTS);
		} while (to == from);

		pthread_mutex_lock(&accts[MIN(from, to)].mtx);
		pthread_mutex_lock(&accts[MAX(from, to)].mtx);

		/* notice we still have a lock hierarchy, because
		 * we call stats_change() after locking all account
		 * mutexes (stats_mtx comes last) */
		stats_change(1); /* another banker in crit sec */
		if (accts[from].balance > 0)
			payment = 1 + rand_range(accts[from].balance);
			accts[from].balance -= payment;
			accts[to].balance   += payment;
		stats_change(-1); /* leaving crit sec */

		pthread_mutex_unlock(&accts[MAX(from, to)].mtx);
		pthread_mutex_unlock(&accts[MIN(from, to)].mtx);
	return NULL;

int main(void)
	size_t i;
	long total;
	pthread_t ts[N_THREADS], stats;


	for (i = 0; i < N_ACCOUNTS; i++)
		accts[i] = (struct account)

	for (i = 0; i < N_THREADS; i++)
		pthread_create(&ts[i], NULL, disburse, NULL);

	/* start thread to update the user on how many bankers
	 * are in the disburse() critical section at once */
	pthread_create(&stats, NULL, stats_print, NULL);

	for (i = 0; i < N_THREADS; i++)
		pthread_join(ts[i], NULL);

	/* not joining with the thread running stats_print,
	 * we'll let it disappar when main exits */

	for (total = 0, i = 0; i < N_ACCOUNTS; i++)
		total += accts[i].balance;

	printf("\nTotal money in system: %ld\n", total);

With fifty accounts and a hundred threads, not all threads will be able to be in the critical section of disburse() at once. It varies between runs. Run the program and see how well it does on your machine. (One complication is that making all threads synchronize on stats_mtx may throw off the measurement, because there are threads who could have executed independently but now must interact.)

Let’s look at how to properly use condition variables. We notified threads of a new event with pthread_cond_broadcast(&stats_cnd). This function marks all threads waiting on state_cnd as ready to run.

Sometimes multiple threads are waiting on a single cond var. A broadcast will wake them all, but sometimes the event source knows that only one thread will be able to do any work. For instance if only one item is added to a shared queue. In that case the pthread_cond_signal function is better than pthread_cond_broadcast. Unnecessarily waking multiple threads causes overhead. In our case we know that only one thread is waiting on the cond var, so it really makes no difference.

Remember that it’s never wrong to use a broadcast, whereas in some cases it might be wrong to use a signal. Signal is just an optimized broadcast.

The waiting side of a cond var ought always to have this pattern:

while (!PREDICATE)
	pthread_cond_wait(&cond_var, &mutex);

Condition variables are always associated with a predicate, and the association is implicit in the programmer’s head. You shouldn’t reuse a condition variable for multiple predicates. The intention is that code will signal the cond var when the predicate becomes true.

Before testing the predicate we lock a mutex that covers the data being tested. That way no other thread can change the data immediately after we test it (also pthread_cond_wait() requires a locked mutex). If the predicate is already true we needn’t wait on the cond var, so the loop falls through, otherwise the thread begins to wait.

Condition variables allow you to make this series of events atomic: unlock a mutex, register our interest in the event, and block. Without that atomicity another thread might awaken to take our lock and broadcast before we’ve registered ourselves as interested. Without the atomicity we could be blocked forever.

When pthread_cond_wait() returns, the calling thread awakens and atomically gets its mutex back. It’s all set to check the predicate again in the loop. But why check the predicate? Wasn’t the cond var signaled because the predicate was true, and isn’t the relevant data protected by a mutex? There are three reasons to check:

  1. If the condition variable had been broadcast, other threads might have been listening, and another might have been scheduled first and might have done our job. The loop tests for that interception.
  2. On some multiprocessor systems, making condition variable wakeup completely predictable might substantially slow down all cond var operations. Such systems allow spurious wakeups, and threads need to be prepared to check if they were woken appropriately.
  3. It can be convenient to signal on a loose predicate. Threads can signal the variables when the event seems likely, or even mistakenly signal, and the program will still work. For instance, we signal when when stats_best gets a new high score, but we could have chosen to signal at every invocation of stats_change().

Given that we have to pass a locked mutex to pthread_cond_wait(), which we had to create, why don’t cond vars come with their own built-in mutex? The reason is flexibility. Although you should use only one mutex with a cond var, there can be multiple cond vars for the same mutex. Think of the example of the mutex protecting a queue, and the different events that can happen in the queue.

Other synchronization primitives


It’s time to bid farewell to the banker programs, and turn to something more lively: Conway’s Game of Life! The game has a set of rules operating on a grid of cells that determines which cells live or die based on how many living neighbors each has.

The game can take advantage of multiple processors, using each processor to operate on a different part of the grid in parallel. It’s a so-called embarrassingly parallel problem because each section of the grid can be processed in isolation, without needing results from other sections.

Barriers ensure that all threads have reached a particular stage in a parallel computation before allowing any to proceed to the next stage. Each thread calls pthread_barrier_wait() to rendezvous with the others. One of the threads, chosen randomly, will see the PTHREAD_BARRIER_SERIAL_THREAD return value, which nominates that thread to do any cleanup or preparation between stages.

/* life.c */

#include <assert.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

/* mandatory in POSIX.1-2008, but check laggards like macOS */
#include <unistd.h>
#if !defined(_POSIX_BARRIERS) || _POSIX_BARRIERS < 0
#error your OS lacks POSIX barrier support

/* dimensions of board */
#define ROWS 32
#define COLS 78
/* how long to pause between rounds */
#define FRAME_MS 100
#define THREADS 4

/* proper modulus (in C, '%' is merely remainder) */
#define MOD(x,N) (((x) < 0) ? ((x) % (N) + (N)) : ((x) % (N)))

bool alive[ROWS][COLS], alive_next[ROWS][COLS];
pthread_barrier_t tick;

/* Should a cell live or die? Using ssize_t because we have
   to deal with signed arithmetic like row-1 when row=0 */
bool fate(ssize_t row, ssize_t col)
	ssize_t i, j;
	short neighbors = 0;

	assert(0 <= row && row < ROWS);
	assert(0 <= col && col < COLS);

	/* joined edges form a torus */
	for (i = row-1; i <= row+1; i++)
		for (j = col-1; j <= col+1; j++)
			neighbors += alive[MOD(i, ROWS)][MOD(j, COLS)];
	/* don't count self as a neighbor */
	neighbors -= alive[row][col];

	return neighbors == 3 ||
		(neighbors == 2 && alive[row][col]);

/* overwrite the board on screen */
void draw(void)
	ssize_t i, j;

	/* clear screen (non portable, requires ANSI terminal) */
	fputs("\033[2J\033[1;1H", stdout);

	for (i = 0; i < ROWS; i++)
		/* putchar_unlocked is thread safe when stdout is locked,
		   and it's as fast as single-threaded putchar */
		for (j = 0; j < COLS; j++)
			putchar_unlocked(alive[i][j] ? 'X' : ' ');

void *update_strip(void *arg)
	ssize_t offset = *(ssize_t*)arg, i, j;
	struct timespec t;

	t.tv_sec = 0;
	t.tv_nsec = FRAME_MS * 1000000;

	while (1)
		if (pthread_barrier_wait(&tick) ==
			/* we drew the short straw, so we're on graphics duty */

			/* could have used pointers to multidimensional
			 * arrays and swapped them rather than memcpy'ing
			 * the array contents, but it makes the code a
			 * little more complicated with dereferences */
			memcpy(alive, alive_next, sizeof alive);
			nanosleep(&t, NULL);

		/* rejoin at another barrier to avoid data race on
		   the game board while it's copied and drawn */
		for (i = offset; i < offset + (ROWS / THREADS); i++)
			for (j = 0; j < COLS; j++)
				alive_next[i][j] = fate(i, j);

	return NULL;

int main(void)
	pthread_t *workers;
	ssize_t *offsets;
	size_t i, j;

	assert(ROWS % THREADS == 0);
	/* main counts as a thread, so need only THREADS-1 more */
	workers = malloc(sizeof(*workers) * (THREADS-1));
	offsets = malloc(sizeof(*offsets) * ROWS / THREADS);

	for (i = 0; i < ROWS; i++)
		for (j = 0; j < COLS; j++)
			alive_next[i][j] = rand() < (int)((RAND_MAX+1u) / 3);

	pthread_barrier_init(&tick, NULL, THREADS);
	for (i = 0; i < THREADS-1; i++)
		offsets[i] = i * ROWS / THREADS;
		pthread_create(&workers[i], NULL, update_strip, &offsets[i]);

	/* use current thread as a worker too */
	offsets[i] = i * ROWS / THREADS;

	/* shouldn't ever get here */

It’s a fun example although slightly contrived. We’re adding a sleep between rounds to slow down the animation, so it’s unnecessary to chase parallelism. Also there’s a memoized algorithm called hashlife we should be using if pure speed is the goal. However our code illustrates a natural use for barriers.

Notice how we wait at the barrier twice in rapid succession. After emerging from the first barrier, one of the threads (chosen at random) copies the new state to the board and draws it. The other threads run ahead to the next barrier and wait there so they don’t cause a data race writing to the board. Once the drawing thread arrives at the barrier with them, then all can proceed to calculate cells’ fate for the next round.

Barriers are guaranteed to be present in POSIX.1-2008, but are optional in earlier versions of the standard. Notably macOS is stuck at an old version of POSIX. Presumably they’re too busy “innovating” with their keyboard touchbar to invest in operating system fundamentals.


Spinlocks are implementations of mutexes optimized for fine-grained locking. Often used in low level code like drivers or operating systems, spinlocks are designed to be the most primitive and fastest sync mechanism available. They’re generally not appropriate for application programming. They are only truly necessary for situations like interrupt handlers when a thread is not allowed to go to sleep for any reason.

Aside from that scenario, it’s better to just use a mutex, since mutexes are pretty efficient these days. Modern mutexes often try a short-lived internal spinlock and fall back to heavier techniques only as needed. Mutexes also sometimes use a wait queue called a futex, which can take a lock in user-space whenever there is no contention from another thread.

When attempting to lock a spinlock, a thread runs a tight loop repeatedly checking a value in shared memory for a sign it’s safe to proceed. Spinlock implementations use special atomic assembly language instructions to test that the value is unlocked and lock it. The particular instructions vary per architecture, and can be performed in user space to avoid the overhead of a system call.

The while waiting for a lock, the loop doesn’t block the thread, but instead continues running and burns CPU energy. The technique works only on true multi-processor systems or a uniprocessor system with preemption enabled. On a uniprocessor system with cooperative threading the loop could never be interrupted, and will livelock.

In POSIX.1-2008 spinlock support is mandatory. In previous versions the presence of this feature was indicated by the _POSIX_SPIN_LOCKS macro. Spinlock functions start with pthread_spin_.

Reader-writer locks

Whereas a mutex enforces mutual exclusion, a reader-writer lock allows concurrent read access. Multiple threads can read in parallel, but all block when a thread takes the lock for writing. The increased concurrency can improve application performance. However, blindly replacing mutexes with reader-writer locks “for performance” doesn’t work. Our earlier banker program, for instance, could suffer from duplicate withdrawals if it allowed multiple readers in an account at once.

Below is an rwlock example. It’s a password cracker I call 5dm (md5 backwards). It aims for maximum parallelism searching for a preimage of an MD5 hash. Worker threads periodically poll whether one among them has found an answer, and they use a reader-writer lock to avoid blocking on each other when doing so.

The example is slightly contrived, in that the difficulty of brute forcing passwords increases exponentially with their length. Using multiple threads reduces the time by only a constant factor – but 4x faster is still 4x faster on a four core computer!

The example below uses MD5() from OpenSSL. To build it, include this in our previous Makefile:

CFLAGS  += `pkg-config --cflags libcrypto`
LDFLAGS += `pkg-config --libs-only-L libcrypto`
LDLIBS  += `pkg-config --libs-only-l libcrypto`

To run it, pass in an MD5 hash and max preimage search length. Note the -n in echo to suppress the newline, since newline is not in our search alphabet:

$ time ./5dm $(echo -n 'fun' | md5) 5

real  0m0.067s
user  0m0.205s
sys	  0m0.007s

Notice how 0.2 seconds of CPU time elapsed in parallel, but the user got their answer in 0.067 seconds.

On to the code:

/* 5dm.c */

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <openssl/md5.h>
#include <pthread.h>

/* build arbitrary words from the ascii between ' ' and '~' */
#define ASCII_FIRST ' '
#define ASCII_LAST  '~'
/* refuse to search beyond this astronomical length */

#define MAX(x,y) ((x)<(y) ? (y) : (x))

/* a fast way to enumerate words, operating on an array in-place */
unsigned word_advance(char *word, unsigned delta)
	if (delta == 0)
		return 0;
	if (*word == '\0')
		*word++ = ASCII_FIRST + delta - 1;
		*word = '\0';
		char c = *word - ASCII_FIRST;
		*word = ASCII_FIRST + ((c + delta) % N_ALPHA);
		if (c + delta >= N_ALPHA)
			return 1 + word_advance(word+1, 1 /* not delta */);
	return 1;

/* pack each pair of ASCII hex digits into single bytes */
bool hex2md5(const char *hex, unsigned char *b)
	int offset = 0;
	if(strlen(hex) != MD5_DIGEST_LENGTH*2)
		return false;
	while (offset < MD5_DIGEST_LENGTH*2)
		if (sscanf(hex+offset, "%2hhx", b++) == 1)
			offset += 2;
			return false;
	return true;

/* random things a worker will need, since thread
 * functions receive only one argument */
struct goal
	/* input */
	pthread_t *workers;
	size_t n_workers;
	size_t max_len;
	unsigned char hash[MD5_DIGEST_LENGTH];

	/* output */
	pthread_rwlock_t lock;
	char preimage[LONGEST_PREIMAGE];
	bool success;

/* custom starting word for each worker, but shared goal */
struct task
	struct goal *goal;
	char initial_preimage[LONGEST_PREIMAGE];

void *crack_thread(void *arg)
	struct task *t = arg;
	unsigned len, changed;
	unsigned char hashed[MD5_DIGEST_LENGTH];
	char preimage[LONGEST_PREIMAGE];
	int iterations = 0;

	strcpy(preimage, t->initial_preimage);
	len = strlen(preimage);

	while (len <= t->goal->max_len)
		MD5((const unsigned char*)preimage, len, hashed);
		if (memcmp(hashed, t->goal->hash, MD5_DIGEST_LENGTH) == 0)
			/* success -- tell others to call it off */

			t->goal->success = true;
			strcpy(t->goal->preimage, preimage);

			return NULL;
		/* each worker jumps ahead n_workers words, and all workers
		   started at an offset, so all words are covered */
		changed = word_advance(preimage, t->goal->n_workers);
		len = MAX(len, changed);

		/* check if another worker has succeeded, but only every
		   thousandth iteration, since taking the lock adds overhead */
		if (iterations++ % 1000 == 0)
			/* in the overwhelming majority of cases workers only read,
			   so an rwlock allows them to continue in parallel */
			int success = t->goal->success;
			if (success)
				return NULL;
	return NULL;

/* launch a parallel search for an md5 preimage */
bool crack(const unsigned char *md5, size_t max_len,
           unsigned threads, char *result)
	struct goal g =
		.workers   = malloc(threads * sizeof(pthread_t)),
		.n_workers = threads,
		.max_len   = max_len,
		.success   = false,
	memcpy(g.hash, md5, MD5_DIGEST_LENGTH);

	struct task *tasks = malloc(threads * sizeof(struct task));

	for (size_t i = 0; i < threads; i++)
		tasks[i].goal = &g;
		tasks[i].initial_preimage[0] = '\0';
		/* offset the starting word for each worker by i */
		word_advance(tasks[i].initial_preimage, i);
		pthread_create(g.workers+i, NULL, crack_thread, tasks+i);

	/* if one worker finds the answer, others will abort */
	for (size_t i = 0; i < threads; i++)
		pthread_join(g.workers[i], NULL);

	if (g.success)
		strcpy(result, g.preimage);

	return g.success;

int main(int argc, char **argv)
	char preimage[LONGEST_PREIMAGE];
	int max_len = 4;
	unsigned char md5[MD5_DIGEST_LENGTH];

	if (argc != 2 && argc != 3)
		        "Usage: %s md5-string [search-depth]\n",
		return EXIT_FAILURE;

	if (!hex2md5(argv[1], md5))
		       "Could not parse as md5: %s\n", argv[1]);
		return EXIT_FAILURE;

	if (argc > 2 && strtol(argv[2], NULL, 10))
		if ((max_len = strtol(argv[2], NULL, 10)) > LONGEST_PREIMAGE)
					"Preimages limited to %d characters\n",
			return EXIT_FAILURE;

	if (crack(md5, max_len, 4, preimage))
		return EXIT_SUCCESS;
				"Could not find result in strings up to length %d\n",
		return EXIT_FAILURE;

Although read-write locks can be implemented in terms of mutexes and condition variables, such implementations are significantly less efficient than is possible. Therefore, this synchronization primitive is included in POSIX.1-2008 for the purpose of allowing more efficient implementations in multi-processor systems.

The final thing to be aware of is that an rwlock implementation can choose either reader-preference or writer-preference. When readers and writers are contending for a lock, the preference determines who gets to skip the queue and go first. When there is a lot of reader activity with a reader-preference, then a writer will continually get moved to the end of the line and experience starvation, where it never gets to write. I noticed writer starvation on Linux (glibc) when running four threads on a little 1-core virtual machine. Glibc provides the nonportable pthread_rwlockattr_setkind_np() function to specify a preference.

You may have noticed that workers in our password cracker use polling to see whether the solution has been found, and whether they should give up. We’ll examine a more explicit method of cancellation in a later section.


Semaphores keep count of, in the abstract, an amount of resource “units” available. Threads can safely add or remove a unit without causing a data race. When a thread requests a unit but there are none, then the thread will block.

A semaphore is like a mix between a lock and a condition variable. Unlike mutexes, semaphores have no concept of an owner. Any thread may release threads blocked on a semaphore, whereas with a mutex the lock holder must unlock it. Unlike a condition variable, a semaphore operates independently of a predicate.

An example of a problem uniquely suited for semaphores would be to ensure that exactly two threads run at once on a task. You would initialize the semaphore to the value two, and allow a bunch of threads to wait on the semaphore. After two get past, the rest will block. When each thread is done, it posts one unit back to the semaphore, which allows another thread to take its place.

In reality, if you’ve got pthreads, you only need semaphores for asynchronous signal handlers. You can use them in other situations, but this is the only place they are needed. Mutexes aren’t async signal safe. Making them so would be much slower than an implementation that isn’t async signal safe, and would slow down ordinary mutex operation.

Here’s an example of posting a semaphore from a signal handler:

/* sem_tickler.c */

#include <semaphore.h>
#include <signal.h>
#include <stdio.h>

#include <unistd.h>
#error your OS lacks POSIX semaphore support

sem_t tickler;

void int_catch(int sig)
	(void) sig;

	signal(SIGINT, &int_catch);
	sem_post(&tickler); /* async signal safe: */

int main(void)
	sem_init(&tickler, 0, 0);
	signal(SIGINT, &int_catch);

	for (int i = 0; i < 3; i++)
		puts("That tickles!");
	puts("(Died from overtickling)");
	return 0;

Semaphores aren’t even necessary for proper signal handling. It’s easier to have a thread simply sigwait() than it is to set up an asynchronous handler. In the example below, the main thread waits, but you can spawn a dedicated thread for this in a real application.

/* sigwait_tickler.c */

#include <signal.h>
#include <stdio.h>

int main(void)
	sigset_t set;
	int which;
	sigaddset(&set, SIGINT);

	for (int i = 0; i < 3; i++)
		sigwait(&set, &which);
		puts("That tickles!");
	puts("(Died from overtickling)");
	return 0;

So don’t feel dependent on semaphores. In fact your system may not have them. The POSIX semaphore API works with pthreads and is present in POSIX.1-2008, but is an optional part of POSIX.1b in earlier versions. Apple, for one, decided to punt, so the semaphore functions on macOS are stubbed to return error codes.


Thread cancellation is generally used when you have threads doing long-running tasks and there’s a way for a user to abort through the UI or console. Another common scenario is when multiple threads set off to explore a search space and one finds the answer first.

Our previous reader-writer lock example was the second scenario, where the threads explored a search space. It was an example of do-it-yourself cancellation through polling. However sometimes threads aren’t able to poll, such as when they are blocked on I/O or a lock. Pthreads offers an API to cancel threads even in those situations.

By default a cancelled thread isn’t immediately blown away, because it may have a mutex locked, be holding resources, or have a potentially broken invariant. The canceller wouldn’t know how to repair that invariant without some complicated logic. The thread to be canceled needs to be written to do cleanup and unlock mutexes.

For each thread, cancellation can be enabled or disabled, and if enabled, may be in deferred or asynchronous mode. The default is enabled and deferred, which allows a cancelled thread to survive until the next cancellation points, such as waiting on a condition variable or blocking on IO (see full list). In a purely computational section of code you can add your own cancellation points with pthread_testcancel().

Let’s see how to modify our previous MD5 cracking example using standard pthread cancellation. Three of the functions are the same as before: word_advance(), hex2md5(), and main(). But we now use a condition variable to alert crack() whenever a crack_thread() returns.

/* 5dm-testcancel.c */

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <openssl/md5.h>
#include <pthread.h>

#define ASCII_FIRST ' '
#define ASCII_LAST  '~'

#define MAX(x,y) ((x)<(y) ? (y) : (x))

unsigned word_advance(char *word, unsigned delta)
	if (delta == 0)
		return 0;
	if (*word == '\0')
		*word++ = ASCII_FIRST + delta - 1;
		*word = '\0';
		char c = *word - ASCII_FIRST;
		*word = ASCII_FIRST + ((c + delta) % N_ALPHA);
		if (c + delta >= N_ALPHA)
			return 1 + word_advance(word+1, 1 /* not delta */);
	return 1;

bool hex2md5(const char *hex, unsigned char *b)
	int offset = 0;
	if(strlen(hex) != MD5_DIGEST_LENGTH*2)
		return false;
	while (offset < MD5_DIGEST_LENGTH*2)
		if (sscanf(hex+offset, "%2hhx", b++) == 1)
			offset += 2;
			return false;
	return true;

struct goal
	/* input */
	pthread_t *workers;
	size_t n_workers;
	size_t max_len;
	unsigned char hash[MD5_DIGEST_LENGTH];

	/* output */
	pthread_mutex_t lock;
	pthread_cond_t returning;
	unsigned n_done;
	char preimage[LONGEST_PREIMAGE];
	bool success;

struct task
	struct goal *goal;
	char initial_preimage[LONGEST_PREIMAGE];

void *crack_thread(void *arg)
	struct task *t = arg;
	unsigned len, changed;
	unsigned char hashed[MD5_DIGEST_LENGTH];
	char preimage[LONGEST_PREIMAGE];
	int iterations = 0;

	strcpy(preimage, t->initial_preimage);
	len = strlen(preimage);

	while (len <= t->goal->max_len)
		MD5((const unsigned char*)preimage, len, hashed);
		if (memcmp(hashed, t->goal->hash, MD5_DIGEST_LENGTH) == 0)

			t->goal->success = true;
			strcpy(t->goal->preimage, preimage);

			/* alert the boss that another worker is done */
			return NULL;
		changed = word_advance(preimage, t->goal->n_workers);
		len = MAX(len, changed);

		if (iterations++ % 1000 == 0)
			pthread_testcancel(); /* add a cancellation point */

	/* alert the boss that another worker is done */
	return NULL;

/* cancellation cleanup function that we also call
 * during regular exit from the crack() function */
void crack_cleanup(void *arg)
	struct task *tasks = arg;
	struct goal *g = tasks[0].goal;

	/* this mutex unlock pairs with the lock in the crack() function */
	for (size_t i = 0; i < g->n_workers; i++)
		/* must wait for each to terminate, so that freeing
		 * their shared memory is safe */
		pthread_join(g->workers[i], NULL);
	/* now it's safe to free memory */

bool crack(const unsigned char *md5, size_t max_len,
           unsigned threads, char *result)
	struct goal g =
		.workers   = malloc(threads * sizeof(pthread_t)),
		.n_workers = threads,
		.max_len   = max_len,
		.success   = false,
		.n_done    = 0,
	memcpy(g.hash, md5, MD5_DIGEST_LENGTH);

	struct task *tasks = malloc(threads * sizeof(struct task));

	for (size_t i = 0; i < threads; i++)
		tasks[i].goal = &g;
		tasks[i].initial_preimage[0] = '\0';
		word_advance(tasks[i].initial_preimage, i);
		pthread_create(g.workers+i, NULL, crack_thread, tasks+i);

	/* coming up to cancellation points, so establish
	 * a cleanup handler */
	pthread_cleanup_push(crack_cleanup, tasks);

	/* We can't join() on all the workers now because it's up to
	 * us to cancel them after one finds the answer. We have to
	 * remain responsive and not block on any particular worker */
	while (!g.success && g.n_done < threads)
		pthread_cond_wait(&g.returning, &g.lock);
	/* at this point either a thread succeeded or all have given up */
	if (g.success)
		strcpy(result, g.preimage);
	/* mutex unlocked in the cleanup handler */

	/* Use the same cleanup handler for normal exit too. The "1"
	 * argument says to execute the function we had previous pushed */
	return g.success;

int main(int argc, char **argv)
	char preimage[LONGEST_PREIMAGE];
	int max_len = 4;
	unsigned char md5[MD5_DIGEST_LENGTH];

	if (argc != 2 && argc != 3)
		        "Usage: %s md5-string [search-depth]\n",
		return EXIT_FAILURE;

	if (!hex2md5(argv[1], md5))
		       "Could not parse as md5: %s\n", argv[1]);
		return EXIT_FAILURE;

	if (argc > 2 && strtol(argv[2], NULL, 10))
		if ((max_len = strtol(argv[2], NULL, 10)) > LONGEST_PREIMAGE)
					"Preimages limited to %d characters\n",
			return EXIT_FAILURE;

	if (crack(md5, max_len, 4, preimage))
		return EXIT_SUCCESS;
				"Could not find result in strings up to length %d\n",
		return EXIT_FAILURE;

Using cancellation is actually a little more flexible than our rwlock implementation in 5dm. If the crack() function is running in its own thread, the whole thing can now be cancelled. The cancellation handler will “pass along” the cancellation to each of the worker threads.

Writing general purpose library code that works with threads requires some care. It should handle deferred cancellation gracefully, including disabling cancellation when appropriate and always using cleanup handlers.

For cleanup handlers, notice the pattern of how we pthread_cleanup_push() the cancellation handler, and later pthread_cleanup_pop() it for regular (non-cancel) cleanup too. Using the same cleanup procedure in all situations makes the code more reliable.

Also notice how the boss thread now cancels workers, rather than the winning worker cancelling the others. You can join a canceled thread, but you can’t cancel an already joined (or detached) thread. If you want to both cancel and join a thread it ought to be done in one place.

Let’s turn out attention to the new worker threads. They are still polling for cancellation, like they polled with the reader-writer locks, but in this case they do it with a new function:

if (iterations++ % 1000 == 0)

Admittedly it adds a little overhead to poll every thousandth loop, both with the rwlock, and with the testcancel. It also adds latency to the time between the cancellation request and the thread quitting, since the loop could run up to 999 times in between. A more efficient but dangerous method is to enable asynchronous cancellation, meaning the thread immediately dies when cancelled.

Async cancellation is dangerous because code is seldom async-cancel-safe. Anything that uses locks or works with shared state even slightly can break badly. Async-cancel-safe code can call very few functions, since those functions may not be safe. This includes calling libraries that use something as innocent as malloc(), since stopping malloc part way through could corrupt the heap.

Our crack_thread() function should be async-cancel-safe, at least during its calculation and not when taking locks. The MD5() function from OpenSSL also appears to be safe. Here’s how we can rewrite our function (notice how we disable cancellation before taking a lock):

/* rewritten to use async cancellation */

void *crack_thread(void *arg)
	struct task *t = arg;
	unsigned len, changed;
	unsigned char hashed[MD5_DIGEST_LENGTH];
	char preimage[LONGEST_PREIMAGE];
	int cancel_type, cancel_state;

	strcpy(preimage, t->initial_preimage);
	len = strlen(preimage);

	/* async so we don't have to pthread_testcancel() */

	while (len <= t->goal->max_len)
		MD5((const unsigned char*)preimage, len, hashed);
		if (memcmp(hashed, t->goal->hash, MD5_DIGEST_LENGTH) == 0)
			/* protect the mutex against async cancellation */
					PTHREAD_CANCEL_DISABLE, &cancel_state);

			t->goal->success = true;
			strcpy(t->goal->preimage, preimage);

			return NULL;
		changed = word_advance(preimage, t->goal->n_workers);
		len = MAX(len, changed);

	/* restore original cancellation type */
	pthread_setcanceltype(cancel_type, &cancel_type);

	return NULL;

Asynchronous cancellation does not appear to work on macOS, but as we’ve seen that’s par for the course on that operating system.

Development tools

Valgrind DRD and helgrind

DRD and Helgrind are Valgrind tools for detecting errors in multithreaded C and C++ programs. The tools work for any program that uses the POSIX threading primitives or that uses threading concepts built on top of the POSIX threading primitives.

The tools have overlapping abilities like detecting data races and improper use of the pthreads API. Additionally, Helgrind can detect locking hierarchy violations, and DRD can alert when there is lock contention.

Both tools pinpoint the lines of code where problems arise. For example, we can run DRD on our first crazy bankers program:

valgrind --tool=drd ./banker

Here is a characteristic example of an error it emits:

==8524== Thread 3:
==8524== Conflicting load by thread 3 at 0x003090b0 size 8
==8524==    at 0x1088BD: disburse (banker.c:48)
==8524==    by 0x4C324F3: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
==8524==    by 0x4E514A3: start_thread (pthread_create.c:456)
==8524== Allocation context: BSS section of /home/admin/banker
==8524== Other segment start (thread 2)
==8524==    at 0x514FD01: clone (clone.S:80)
==8524== Other segment end (thread 2)
==8524==    at 0x509D820: rand (rand.c:26)
==8524==    by 0x108857: rand_range (banker.c:26)
==8524==    by 0x1088A0: disburse (banker.c:42)
==8524==    by 0x4C324F3: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
==8524==    by 0x4E514A3: start_thread (pthread_create.c:456)

It finds conflicting loads and stores from lines 48, 51, and 52.

48: if (accts[from].balance > 0)
49: {
50:		payment = 1 + rand_range(accts[from].balance);
51:		accts[from].balance -= payment;
52:		accts[to].balance   += payment;
53: }

Helgrind can identify the lock hierarchy violation in our example of deadlocking bankers:

valgrind --tool=helgrind ./banker_lock
==8989== Thread #4: lock order "0x3091F8 before 0x3090D8" violated
==8989== Observed (incorrect) order is: acquisition of lock at 0x3090D8
==8989==    at 0x4C3010C: mutex_lock_WRK (hg_intercepts.c:904)
==8989==    by 0x1089B9: disburse (banker_lock.c:38)
==8989==    by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==8989==    by 0x4E454A3: start_thread (pthread_create.c:456)
==8989==  followed by a later acquisition of lock at 0x3091F8
==8989==    at 0x4C3010C: mutex_lock_WRK (hg_intercepts.c:904)
==8989==    by 0x1089D1: disburse (banker_lock.c:39)
==8989==    by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==8989==    by 0x4E454A3: start_thread (pthread_create.c:456)

To identify when there is too much contention for a lock, we can ask DRD to alert us when a thread blocks for more than n milliseconds on a mutex:

valgrind --tool=drd --exclusive-threshold=2 ./banker_lock_hierarchy

Since we throw too many threads at a small number of accounts, we see wait times that cross the threshold, like this one that waited seven ms:

==7565== Acquired at:
==7565==    at 0x483F428: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:888)
==7565==    by 0x483F428: pthread_mutex_lock (drd_pthread_intercepts.c:898)
==7565==    by 0x109280: disburse (banker_lock_hierarchy.c:40)
==7565==    by 0x483C114: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
==7565==    by 0x4863FA2: start_thread (pthread_create.c:486)
==7565==    by 0x49764CE: clone (clone.S:95)
==7565== Lock on mutex 0x10c258 was held during 7 ms (threshold: 2 ms).
==7565==    at 0x4840478: pthread_mutex_unlock_intercept (drd_pthread_intercepts.c:978)
==7565==    by 0x4840478: pthread_mutex_unlock (drd_pthread_intercepts.c:991)
==7565==    by 0x109395: disburse (banker_lock_hierarchy.c:47)
==7565==    by 0x483C114: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
==7565==    by 0x4863FA2: start_thread (pthread_create.c:486)
==7565==    by 0x49764CE: clone (clone.S:95)
==7565== mutex 0x10c258 was first observed at:
==7565==    at 0x483F368: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:885)
==7565==    by 0x483F368: pthread_mutex_lock (drd_pthread_intercepts.c:898)
==7565==    by 0x109280: disburse (banker_lock_hierarchy.c:40)
==7565==    by 0x483C114: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
==7565==    by 0x4863FA2: start_thread (pthread_create.c:486)
==7565==    by 0x49764CE: clone (clone.S:95)

Clang ThreadSanitizer (TSan)

ThreadSanitizer is a clang instrumentation module. To use it, choose CC = clang and add -fsanitize=thread to CFLAGS. Then when you build programs, they will be modified to detect data races and print statistics to stderr.

Here’s a portion of the output when running the bankers program:

WARNING: ThreadSanitizer: data race (pid=11312)
  Read of size 8 at 0x0000014aeeb0 by thread T2:
    #0 disburse /home/admin/banker.c:48 (banker+0x0000004a4372)

  Previous write of size 8 at 0x0000014aeeb0 by thread T1:
    #0 disburse /home/admin/banker.c:52 (banker+0x0000004a43ba)

TSan can also detect lock hierarchy violations, such as in banker_lock:

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=10095)
  Cycle in lock order graph: M1 (0x0000014aef78) => M2 (0x0000014aeeb8) => M1

  Mutex M2 acquired here while holding mutex M1 in thread T1:
    #0 pthread_mutex_lock <null> (banker_lock+0x000000439a10)
    #1 disburse /home/admin/banker_lock.c:39 (banker_lock+0x0000004a4398)

    Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message

  Mutex M1 acquired here while holding mutex M2 in thread T9:
    #0 pthread_mutex_lock <null> (banker_lock+0x000000439a10)
    #1 disburse /home/admin/banker_lock.c:39 (banker_lock+0x0000004a4398)


While Valgrind DRD can identify highly contended locks, it virtualizes the execution of the program under test, and skews the numbers. Other utilities can use software probes to get this information from a test running at full speed. In BSD land there is the plockstat provider for DTrace, and on Linux there is the specially-written mutrace. I had a lot of trouble trying to get plockstat to work on FreeBSD, so here’s an example of using mutrace to analyze our banker program.

mutrace ./banker_lock_hierarchy
mutrace: Showing 10 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       0   200211   153664    95985      991.349        0.005        0.267 M-.--.
       1   200552   142173    61902      641.963        0.003        0.170 M-.--.
       2   199657   140837    47723      476.737        0.002        0.125 M-.--.
       3   199566   140863    39268      371.451        0.002        0.108 M-.--.
       4   199936   141381    33243      295.909        0.001        0.090 M-.--.
       5   199548   141297    28193      232.647        0.001        0.084 M-.--.
       6   200329   142027    24230      183.301        0.001        0.066 M-.--.
       7   199951   142338    21018      142.494        0.001        0.057 M-.--.
       8   200145   142990    18201      107.692        0.001        0.052 M-.--.
       9   200105   143794    15713       76.231        0.000        0.028 M-.--.
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1896.903 ms.

mutrace: Results for SMP with 4 processors.

Off-CPU profiling

Typical profilers measure the amount of CPU time spent in each function. However when a thread is blocked by I/O, a lock, or a condition variable, then it isn’t using CPU time. To determine where functions spend the most “wall clock time,” we need to sample the call stack for all threads at intervals, and count how frequently we see each entry. When a thread is off-CPU its call stack stays unchanged.

The pstack program is traditionally the way to get a snapshot of a running program’s stack. It exists on old Unices, and used to be on Linux until Linux made a breaking change. The most portable way to get stack snapshots is using gdb with an awk wrapper, as documented in the Poor Man’s Profiler.

Remember our early condition variable example that measured how many threads entered the critical section in disburse() at once? We asked whether synchronization on stats_mtx threw off the measurement. With off-CPU profiling we can look for clues.

Here’s a script based on the Poor Man’s Profiler:

./banker_stats &

while kill -0 $pid
    gdb -ex "set pagination 0" -ex "thread apply all bt" -batch -p $pid
  done | \
awk '
  BEGIN { s = ""; }
  /^Thread/ { print s; s = ""; }
  /^\#/ { if (s != "" ) { s = s "," $4} else { s = $4 } }
  END { print s }' | \
sort | uniq -c | sort -r -n -k 1,1

It outputs limited information, but we can see that waiting for locks in disburse() takes the majority of program time, being present in 872 of our samples. By contrast, waiting for the stats_mtx lock in stats_update() doesn’t appear in our sample at all. It must have had very little affect on our parallelism.

    872 at,__GI___pthread_mutex_lock,disburse,start_thread,clone
     11 at,__random,rand,rand_range,disburse,start_thread,clone
      9 expected=0,,mutex=0x562533c3f0c0,<stats_cnd>,,stats_print,start_thread,clone
      9 __GI___pthread_timedjoin_ex,main
      5 at,__pthread_mutex_unlock_usercnt,disburse,start_thread,clone
      1 at,__pthread_mutex_unlock_usercnt,stats_change,disburse,start_thread,clone
      1 at,__GI___pthread_mutex_lock,stats_change,disburse,start_thread,clone
      1 __random,rand,rand_range,disburse,start_thread,clone

macOS Instruments

Although Mac’s POSIX thread support is pretty weak, its XCode tooling does include a nice profiler. From the Instruments application, choose the profiling template called “System Trace.” It adds a GUI on top of DTrace to display thread states (among other things). I modified our banker program to use only five threads and recorded its run. The Instruments app visualizes every event that happens, including threads blocking and being interrupted:

thread states

thread states

Within the program you can zoom into the history and hover over events for info.

perf c2c

Perf is a Linux tool to measure hardware performance counters during the execution of a program. Joe Mario created a Perf feature called c2c which detects false sharing of variables between CPUs.

In a NUMA multi-core computer, each CPU has its own set of caches, and all CPUs share main memory. Memory is divided into fixed size blocks (often 64 bytes) called cache lines. Any time a CPU reads or writes memory, it must fetch or store the entire cache line surrounding the desired address. If one CPU has already cached a line, and another CPU writes to that area in memory, the system has to perform an expensive operation to make the caches coherent.

When two unrelated variables in a program are stored close enough together in memory to be in the same cache line, it can cause a performance problem in multi-threaded programs. If threads running on separate CPUs access the unrelated variables, it can cause a tug of war between their underlying cache line, which is called false sharing.

For instance, our Game of Life simulator could potentially have false sharing at the edges of each section of board accessed by each thread. To verify this, I attempted to run perf c2c on an Amazon EC2 instance (since I lack a physical computer running Linux), but got an error that memory events are not supported on the virtual machine. I was running kernel 4.19.0 on Intel Xeon Platinum 8124M CPUs, so I assume this was a security restriction from Amazon.

If you are able to run c2c, and detect false sharing in a multi-threaded program, the solution is to align the variables more aggressively. POSIX provides the posix_memalign() function to allocate bytes aligned on a desired boundary. In our Life example, we could have used an array of pointers to dynamically allocated rows rather than a contiguous two-dimensional array.

Intel VTune Profiler

The VTune Profiler is available for free (with registration) on Linux, macOS, and Windows. It works on x86 hardware only of course. I haven’t used it, but their marketing page shows some nice pictures. The tool can visually identify the granularity of locks, present a prioritized list of synchronization objects that hurt performance, and visualize lock contention.

Further reading

March 22, 2020

Derek Jones (derek-jones)

Coronavirus: a silver lining for evidence-based software engineering? March 22, 2020 09:39 PM

People rarely measure things in software engineering, and when they do they rarely hang onto the measurements; this might also be true in many other work disciplines.

When I worked on optimizing compilers, I used to spend time comparing code size and performance. It surprised me that many others in the field did not, they seemed to think that if they implemented an optimization, things would get better and that was it. Customers would buy optimizers without knowing how long their programs took to do a task, they seemed to want things to go faster, and they had some money to spend buying stuff to make them feel that things had gotten faster. I quickly learned to stop asking too many questions, like “how fast does your code currently run”, or “how fast would you like it to run”. Sell them something to make them feel better, don’t spoil things by pointing out that their code might already be fast enough.

In one very embarrassing incident, the potential customer was interested in measuring performance, and my optimizer make their important program go slower! As best I could tell, the size of the existing code just fitted in memory, and optimizing for performance made it larger; the system started thrashing and went a lot slower.

What question did potential customers ask? They usually asked whether particular optimizations were implemented (because they had read about them someplace). Now some of these optimizations were likely to make very little difference to performance, but they were easy to understand and short enough to write articles about. And, yes. I always made sure to implement these ‘minor’ optimizations purely to keep customers happy (and increase the chances of making a sale).

Now I work on evidence-based software engineering, and developers rarely measure things, and when they do they rarely hang onto the measurements. So many people have said I could have their data, if they had it!

Will the Coronavirus change things? With everybody working from home, management won’t be able to walk up to developers and ask what they have been doing. Perhaps stuff will start getting recorded more often, and some of it might be kept.

A year from now it might be a lot easier to find information about what developers do. I will let you know next year.

Marc Brooker (mjb)

Two Years With Rust March 22, 2020 12:00 AM

Two Years With Rust

I like it. I hope it's going to be big.

It's been just over two years since I started learning Rust. Since then, I've used it heavily at my day job, including work in the Firecracker code base, and a number of other projects. Rust is a great fit for the systems-level work I've been doing over the last few years: often performance- and density-sensitive, always security-sensitive. I find the type system, object life cycle, and threading model both well-suited to this kind of work and fairly intuitive. Like most people, I still fight with the compiler from time-to-time, but we mostly get on now.

Rust has also mostly replaced Go as my go-to language for writing small performance-sensitive programs, like the numerical simulators I use a lot. Go replaced C in that role for me, and joined R and Python as my day-to-day go-to tools. I've found that I still spend more time writing a Rust program than I do Go, and more than C (except where C is held back by a lack of sane data structures and string handling). I've also found that programs seem more likely to work on their first run, but haven't made any effort to quantify that.

Over my career, I've done for-pay work in C, C++, Java, Python, Ruby, Go, Rust, Scheme, Basic, Perl, Bash, TLA+, Delphi, Matlab, ARM and x86 assembly, and R (probably forgetting a few). There's likely some of my code in each of those languages still running somewhere. I've also learned a bunch of other languages, because it's something I enjoy doing. Recently, for example, I've been loving playing with Frink. I don't tend to be highly opinionated about languages.

However, in some cases I steer colleagues and teams away from particular choices. C and C++, for example, seem to be difficult and expensive to use in a way that avoids dangerous memory-safety bugs, and users need to be willing to invest deeply in their code if these bugs matter to them. It's possible to write great safe C, but the path there requires a challenging blend of tools and humility. Rust isn't a panacea, but is a really nice alternative where they were fairly thin before. I find myself recommending and choosing it more and more often for small command-line programs, high-performance services, and system-level code.

Why I like Rust There are a lot of good programming languages in the world. There are even multiple that fit Rust's broad description, and place in the ecosystem. This is a very good place, with real problems to solve. I'm not convinced that Rust is necessarily technically superior to its nearest neighbors, but there are some things it seems to do particularly well.

I like how friendly and helpful the compiler's error messages are. The free book and standard library documentation are all very good. The type system is nice to work with. The built-in tooling (rustup, cargo and friends) are easy and powerful. A standard formatting tool goes a long way to keeping code-bases tidy and bikesheds unpainted. Static linking and cross-compiling are built-in. The smattering of functional idioms seem to add a good amount of power and expressiveness. Features that actively lead to obtuse code (like macros) are discouraged. Out-of-the-box performance is pretty great. Fearless Concurrency actually delivers.

There's a lot more, too.

What might make Rust unsuccessful? There are also some things I don't particularly like about Rust. Some of those are short-term. Learning how to write async networking code in Rust during the year or so before async and await were stabilized was a frustrating mess of inconsistent documentation and broken APIs. The compiler isn't as smart about optimizations like loop unrolling and autovectorization as C compilers tend to be (even where it does a great job eliding the safety checks, and other Rust-specific overhead). Some parts of the specification, like aliasing rules and the exact definitions of atomic memory orderings, are still a little fuzzier than I would like. Static analysis tooling has a way to go. Allocating aligned memory is tricky, especially if you still want to use some of the standard data structures. And so on.

In each of these cases, and more like them, the situation seems to have improved every time I look at it in detail. The community seems to be making great progress. async and await were particularly big wins.

The biggest long-term issue in my mind is unsafe. Rust makes what seems like a very reasonable decision to allow sections of code to be marked as unsafe, which allows one to color outside the lines of the memory and life cycle guarantees. As the name implies unsafe code tends to be unsafe. The big problem with unsafe code isn't that the code inside the block is unsafe, it's that it can break the safety properties of safe code in subtle and non-obvious ways. Even safe code that's thousands of lines away. This kind of action-at-a-distance can make it difficult to reason about the properties of any code-base that contains unsafe code. For low-level systems code, that's probably all of them.

This isn't a surprise to the community. The Rust community is very realistic about the costs and benefits of unsafe. Sometimes that debate goes too far (as Steve Klabnik has written about), but mostly the debate and culture seems healthy to me as a relative outsider.

The problem is that this spooky behavior of unsafe tends not to be obvious to new Rust programmers. The mental model I've seen nearly everybody start with, including myself, is that unsafe blocks can break things inside them and so care needs to be paid to writing that code well. Unfortunately, that's not sufficient.

Better static and dynamic analysis tooling could help here, as well as some better help from the compiler, and alternatives to some uses of unsafe. I suspect that the long-term success of Rust as a systems language is going to depend on how well the community and tools handle unsafe. A lot of the value of Rust lies in its safety, and it's still too easy to break that safety without knowing it.

Another long-term risk is the size of the language. It's been over 10 years since I last worked with C++ every day, and I'm nowhere near being a competent C++ programmer anymore. Part of that is because C++ has evolved, which is a very good thing. Part of it is because C++ is huge. From a decade away, it seems hard to be a competent part-time C++ programmer: you need to be fully immersed, or you'll never fit the whole thing in your head. Rust could go that way too, and it would be a pity.