Planet Crustaceans

This is a Planet instance for community feeds. To add/update an entry or otherwise improve things, fork this repo.

October 19, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Instruction encoding interlude October 19, 2020 12:00 AM


Welcome back to the Compiling a Lisp series. In this thrilling new update, we will learn a little bit more about x86-64 instruction encoding instead of allocating more interesting things on the heap or adding procedure calls.

I am writing this interlude because I changed one register in my compiler code (kRbp to kRsp) and all hell broke loose — the resulting program was crashing, rasm2/Cutter were decoding wacky instructions when fed my binary, etc. Over the span of two very interesting but very frustrating hours, I learned why I had these problems and how to resolve them. You should learn, too.

State of the instruction encoder

Recall that I introduced at least 10 functions that looked vaguely like this:

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0xc7);
  Buffer_write8(buf, 0xc0 + dst);
  Buffer_write32(buf, src);

These functions all purport to encode x86-64 instructions. They do, most of the time, but they do not tell the whole story. This function is supposed to encode an instruction of the form mov reg64, imm32. How does it do it? I don’t know!

They have all these magic numbers in them! What is a kRexPrefix? Well, it’s 0x48. Does that mean anything to us? No! It gets worse. What are 0xc7 and 0xc0 doing there? Why are we adding dst to 0xc0? Before this debugging and reading extravaganza, I could not have told you. Remember how somewhere in a previous post I mentioned I was getting these hex bytes from reading the compiled output on the Compiler Explorer? Yeah.

As it turns out, this is not a robust development strategy, at least with x86-64. It might be okay for some more regular or predictable instruction sets, but not this one.

Big scary documentation

So where do we go from here? How do we find out how to take these mystical hexes and incantations to something that better maps to the hardware? Well, we once again drag Tom1 into a debugging session and pull out the big ol’ Intel Software Developer Manual.

This is an enormous 26MB, 5000 page manual comprised of four volumes. It’s very intimidating. This is exactly why I didn’t want to pull it out earlier and do this properly from the beginning… but here we are, eventually needing to do it properly.

I will not pretend to understand all of this manual, nor will this post be a guide to the manual. I will just explain what sections and diagrams I found useful in understanding how this stuff works.

I only ever opened Volume 2, the instruction set reference. In that honking 2300 page volume are descriptions of every Intel x86-64 instruction and how they are encoded. The instructions are listed alphabetically and split into sections based on the first letter of each instruction name.

Let’s take a look at Chapter 3, specifically at the MOV instruction on page   1209. For those following along who do not want to download a massive PDF, this website has a bunch of the same data in HTML form. Here’s the page for MOV.

This page has every variant of MOV instruction. There are other instructions begin with MOV, like MOVAPD, MOVAPS, etc, but they are different enough that they are different instructions.

It has six columns:

  • Opcode, which describes the layout of the bytes in the instruction stream. This describes how we’ll encode instructions.
  • Instruction, which gives a text-assembly-esque representation of the instruction. This is useful for figuring out which one we actually want to encode.
  • Op/En, which stands for “Operand Encoding” and as far as I can tell describes the operand order with a symbol that is explained further in the “Instruction Operand Encoding” table on the following page.
  • 64-Bit Mode, which tells you if the instruction can be used in 64-bit mode (“Valid”) or not (something else, I guess).
  • Compat/Leg Mode, which tells you if the instruction can be used in some other mode, which I imagine is 32-bit mode or 16-bit mode. I don’t know. But it’s not relevant for us.
  • Description, which provides a “plain English” description of the opcode, for some definition of the words “plain” and “English”.

Other instructions have slightly different table layouts, so you’ll have to work out what the other columns mean.

Here’s a preview of some rows from the table, with HTML courtesy of Felix Cloutier’s aforementioned web docs:

Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description
88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8.
REX + 88 /r MOV r/m8***,r8*** MR Valid N.E. Move r8 to r/m8.
89 /r MOV r/m16,r16 MR Valid Valid Move r16 to r/m16.
89 /r MOV r/m32,r32 MR Valid Valid Move r32 to r/m32.
C7 /0 id MOV r/m32, imm32 MI Valid Valid Move imm32 to r/m32.
REX.W + C7 /0 id MOV r/m64, imm32 MI Valid N.E. Move imm32 sign extended to 64-bits to r/m64.

If you take a look at the last entry in the table, you’ll see REX.W + C7 /0 id. Does that look familiar? Maybe, if you squint a little?

It turns out, that’s the description for encoding the instruction we originally wanted, and had a bad encoder for. Let’s try and figure out how to use this to make our encoder better. In order to do that, we’ll need to first understand a general layout for Intel instructions.

Instruction encoding, big picture

All Intel x86-64 instructions follow this general format:

  • optional instruction prefix (1 byte)
  • opcode (1, 2, or 3 bytes)
  • if required, Mod-Reg/Opcode-R/M, also known as ModR/M (1 byte)
  • if required, Scale-Index-Base, also known as SIB (1 byte)
  • displacement (1, 2, or 4 bytes, or none)
  • immediate data (1, 2, or 4 bytes, or none)

I found this information at the very beginning of Volume 2, Chapter 2 (page 527) in a section called “Instruction format for protected mode, real-address mode, and virtual-8086 mode”.

You, like me, may be wondering about the difference between “optional”, “if required”, and “…, or none”. I have no explanation, sorry.

I’m going to briefly explain each component here, followed up with a piece-by-piece dissection of the particular MOV instruction we want, so we get some hands-on practice.

Instruction prefixes

There are a couple kind of instruction prefixes, like REX (Section 2.2.1) and VEX (Section 2.3). We’re going to focus on REX prefixes, since they are needed for many (most?) x86-64 instructions, and we’re not emitting vector instructions.

The REX prefixes are used to indicate that an instruction, which might normally refer to a 32-bit register, should instead refer to a 64-bit register. Also some other things but we’re mostly concerned with register sizes.


Take a look at Section 2.1.2 (page 529) for a brief explanation of opcodes. The gist is that the opcode is the meat of the instruction. It’s what makes a MOV a MOV and not a HALT. The other fields all modify the meaning given by this field.

ModR/M and SIB

Take a look at Section 2.1.3 (page 529) for a brief explanation of ModR/M and SIB bytes. The gist is that they encode what register sources and destinations to use.

Displacement and immediates

Take a look at Section 2.1.4 (page 529) for a brief explanation of displacement and immediate bytes. The gist is that they encode literal numbers used in the instructions that don’t encode registers or anything.

If you’re confused, that’s okay. It should maybe get clearer once we get our hands dirty. Reading all of this information in a vacuum is moderately useless if it’s your first time dealing with assembly like this, but I included this section first to help explain how to use the reference.

Encoding, piece by piece

Got all that? Maybe? No? Yeah, me neither. But let’s forge ahead anyway. Here’s the instruction we’re going to encode: REX.W + C7 /0 id.


First, let’s figure out REX.W. According to Section 2.2.1, which explains REX prefixes in some detail, there are a couple of different prefixes. There’s a helpful table (Table 2-4, page 535) documenting them. Here’s a bit diagram with the same information:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } 0100WRXBHigh bitLow bitREX

In English, and zero-indexed:

  • Bits 7-4 are always 0b0100.
  • Bit 3 is the W prefix. If it’s 1, it means the operands are 64 bits. If it’s 0, “operand size [is] determined by CS.D”. Not sure what that means.
  • Bits 2, 1, and 0 are other types of REX prefixes that we may not end up using, so I am omitting them here. Please read further in the manual if you are curious!

This MOV instruction calls for REX.W, which means this byte will look like 0b01001000, also known as our friend 0x48. Mystery number one, solved!


This is a hexadecimal literal 0xc7. It is the opcode. There are a couple of other entries with the opcode C7, modified by other bytes in the instruction (ModR/M, SIB, REX, …). Write it to the instruction stream. Mystery number two, solved!


There’s a snippet in Section 2.1.5 that explains this notation:

If the instruction does not require a second operand, then the Reg/Opcode field may be used as an opcode extension. This use is represented by the sixth row in the tables (labeled “/digit (Opcode)”). Note that values in row six are represented in decimal form.

This is a little confusing because this operation clearly does have a second operand, denoted by the “MI” in the table, which shows Operand 1 being ModRM:r/m (w) and Operand 2 being imm8/16/32/64. I think it’s because it doesn’t have a second register operand that this space is free — the immediate is in a different place in the instruction.

In any case, this means that we have to make sure to put decimal 0 in the reg part of the ModR/M byte. We’ll see what the ModR/M byte looks like in greater detail shortly.


id refers to an immediate double word (32 bits). It’s called a double word because, a word (iw) is 16 bits. In increasing order of size, we have:

  • ib, byte (1 byte)
  • iw, word (2 bytes)
  • id, double word (4 bytes)
  • io, quad word (8 bytes)

This means we have to write our 32-bit value out to the instruction stream. These notations and encodings are explained further in Section (page 596).

Overall, that means that this instruction will have the following form:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } REXOpModR/MImmediate01237

If we were to try and encode the particular instruction mov rax, 100, it would look like this:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } REXOpModR/MImmediate012370x480xc70xc00x64 0x00 0x00 0x00

This is how you read the table! Slowly, piece by piece, and with a nice cup of tea to help you in trying times. Now that we’ve read the table, let’s go on and write some code.

Encoding, programatically

While writing code, you will often need to reference two more tables than the ones we have looked at so far. These tables are Table 2-2 “32-Bit Addressing Forms with the ModR/M Byte” (page 532) and Table 2-3 “32-Bit Addressing Forms with the SIB Byte” (page 533). Although the tables describe 32-bit quantities, with the REX prefix all the Es get replaced with Rs and all of a sudden they can describe 64-bit quantities.

These tables are super helpful when figuring out how to put together ModR/M and SIB bytes.

Let’s start the encoding process by revisiting Emit_mov_reg_imm32/REX.W + C7 /0 id:

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  // ...

Given a register dst and an immediate 32-bit integer src, we’re going to encode this instruction. Let’s do all the steps in order.

REX prefix

Since the instruction calls for REX.W, we can keep the first line the same as before:

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  // ...



This opcode is 0xc7, so we’ll write that directly:

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0xc7);
  // ...

Also the same as before. Nice.

ModR/M byte

ModR/M bytes are where the code gets a little different. We want an abstraction to build them for us, instead of manually slinging integers like some kind of animal.

To do that, we should know how they are put together. ModR/M bytes are comprised of:

  • mod (high 2 bits), which describes what big row to use in the ModR/M table
  • reg (middle 3 bits), which either describes the second register operand or an opcode extension (like /0 above)
  • rm (low 3 bits), which describes the first operand

This means we can write a function modrm that puts these values together for us:

byte modrm(byte mod, byte rm, byte reg) {
  return ((mod & 0x3) << 6) | ((reg & 0x7) << 3) | (rm & 0x7);

The order of the parameters is a little different than the order of the bits. I did this because it looks a little more natural when calling the function from its callers. Maybe I’ll change it later because it’s too confusing.

For this instruction, we’re going to:

  • pass 0b11 (3) as mod, because we want to move directly into a 64-bit register, as opposed to [reg], which means that we want to dereference the value in the pointer
  • pass the destination register dst as rm, since it’s the first operand
  • pass 0b000 (0) as reg, since the /0 above told us to

That ends up looking like this:

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0xc7);
  Buffer_write8(buf, modrm(/*direct*/ 3, dst, 0));
  // ...

Which for the above instruction mov rax, 100, produces a modrm byte that has this layout:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } ModR/Mmodregrm11000direct/0RAX000

I haven’t put a datatype for mods together because I don’t know if I’d be able to express it well. So for now I just added a comment.

Immediate value

Last, we have the immediate value. As I said above, all this entails is writing out a 32-bit quantity as we have always done:

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0xc7);
  Buffer_write8(buf, modrm(/*direct*/ 3, dst, 0));
  Buffer_write32(buf, src);

And there you have it! It took us 2500 words to get us to these measly four bytes. The real success is the friends we made along the way.

Further instructions

“But Max,” you say, “this produces literally the same output as before with all cases! Why go to all this trouble? What gives?”

Well, dear reader, having a mod of 3 (direct) means that there is no special-case escape hatch when dst is RSP. This is unlike the other mods, where there’s this [--][--] in the table where RSP should be. That funky symbol indicates that there must be a Scale-Index-Base (SIB) byte following the ModR/M byte. This means that the overall format for this instruction should have the following layout:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } REXOpModR/M0123SIB4Disp5

If you’re trying to encode mov [rsp-8], rax, for example, the values should look like this:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } REXOpModR/M01230x480x890x44SIB4Disp50x240xf8

This is where an instruction like Emit_store_reg_indirect (mov [REG+disp], src) goes horribly awry with the homebrew encoding scheme I cooked up. When the dst in that instruction is RSP, it’s expected that the next byte is the SIB. And when you output other data instead (say, an immediate 8-bit displacement), you get really funky addressing modes. Like what the heck is this?

mov qword [rsp + rax*2 - 8], rax

This is actual disassembled assembly that I got from running my binary code through rasm2. Our compiler definitely does not emit anything that complicated, which is how I found out things were wrong.

Okay, so it’s wrong. We can’t just blindly multiply and add things. So what do we do?

The SIB byte

Take a look at Table 2-2 (page 532) again. See that trying to use RSP with any sort of displacement requires the SIB.

Now take a look at Table 2-3 (page 533) again. We’ll use this to put together the SIB.

We know from Section 2.1.3 that the SIB, like the ModR/M, is comprised of three fields:

  • scale (high 2 bits), specifies the scale factor
  • index (middle 3 bits), specifies the register number of the index register
  • base (low 3 bits), specifies the register number of the base register

Intel’s language is not so clear and is kind of circular. Let’s take a look at sample instruction to clear things up:

mov [base + index*scale + disp], src

Note that while index and base refer to registers, scale refers to one of 1, 2, 4, or 8, and disp is some immediate value.

This is a compact way of specifying a memory offset. It’s convenient for reading from and writing to arrays and structs. It’s also going to be necessary for us if we want to write to and read from random offsets from the stack pointer, RSP.

So let’s try and encode that Emit_store_reg_indirect.

Encoding the indirect mov

Let’s start by going back to the table enumerating all the kinds of MOV instructions (page 1209). The specific opcode we’re looking for is REX.W + 89 /r, or MOV r/m64, r64.

We already know what REX.W means:

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  // ...

And next up is the literal 0x89, so we can write that straight out:

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  // ...

So far, so good. Looking familiar. Now that we have both the instruction prefix and the opcode, it’s time to write the ModR/M byte. Our ModR/M will contain the following information:

  • mod of 1, since we want an 8-bit displacement
  • reg of whatever register the second operand is, since we have two register operands (the opcode field says /r)
  • rm of whatever register the first operand is

Alright, let’s put that together with our handy-dandy ModR/M function.

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  // Wrong!
  Buffer_write8(buf, modrm(/*disp8*/ 1, dst.reg, src));
  // ...

But no, this is wrong. As it turns out, you still have do this special thing when dst.reg is RSP, as I keep mentioning. In that case, rm must be the special none value (as specified by the table). Then you also have to write a SIB byte.

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  if (dst.reg == kRsp) {
    Buffer_write8(buf, modrm(/*disp8*/ 1, kIndexNone, src));
    // ...
  } else {
    Buffer_write8(buf, modrm(/*disp8*/ 1, dst.reg, src));
  // ...

Astute readers will know that kRsp and kIndexNone have the same integral value of 4. I don’t know if this was intentional on the part of the Intel designers. Maybe it’s supposed to be like that so encoding is easier and doesn’t require a special case for both ModR/M and SIB. Maybe it’s coincidental. Either way, I found it very subtle and wanted to call it out explicitly.

For an instruction like mov [rsp-8], rax, our modrm byte will look like this:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } ModR/Mmodregrm11100disp8RAXnone000

Let’s go ahead and write that SIB byte. I made a sib helper function like modrm, with two small differences: the parameters are in order of low to high bit, and the parameters have their own special types instead of just being bytes.

typedef enum {
  Scale1 = 0,
} Scale;

typedef enum {
  kIndexRax = 0,
} Index;

byte sib(Register base, Index index, Scale scale) {
  return ((scale & 0x3) << 6) | ((index & 0x7) << 3) | (base & 0x7);

I made all these datatypes to help readability, but you don’t have to use them if you don’t want to. The Index one is the only one that has a small gotcha: where kIndexRsp should be is kIndexNone because you can’t use RSP as an index register.

Let’s use this function to write a SIB byte in Emit_store_reg_indirect:

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  if (dst.reg == kRsp) {
    Buffer_write8(buf, modrm(/*disp8*/ 1, kIndexNone, src));
    Buffer_write8(buf, sib(kRsp, kIndexNone, Scale1));
  } else {
    Buffer_write8(buf, modrm(/*disp8*/ 1, dst.reg, src));
  // ...

If you get it right, the SIB byte will have the following layout:

@font-face { font-family: "Virgil"; src: url(""); } @font-face { font-family: "Cascadia"; src: url(""); } SIBscaleindexbase001000noneRSP100

This is a very verbose way of saying [rsp+DISP], but it’ll do. All that’s left now is to encode that displacement. To do that, we’ll just write it out:

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  if (dst.reg == kRsp) {
    Buffer_write8(buf, modrm(/*disp8*/ 1, kIndexNone, src));
    Buffer_write8(buf, sib(kRsp, kIndexNone, Scale1));
  } else {
    Buffer_write8(buf, modrm(/*disp8*/ 1, dst.reg, src));
  Buffer_write8(buf, disp8(indirect.disp));

Very nice. Now it’s your turn to go forth and convert the rest of the assembly functions in your compiler! I found it very helpful to extract the modrm/sib/disp8 calls into a helper function, because they’re mostly the same and very repetitive.

What did we learn?

This was a very long post. The longest post in the whole series so far, even. We should probably have some concrete takeaways.

If you read this post through, you should have gleaned some facts and lessons about:

  • Intel x86-64 instruction encoding terminology and details, and
  • how to read dense tables in the Intel Developers Manual
  • maybe some third thing, too, I dunno — this post was kind of a lot

Hopefully you enjoyed it. I’m going to go try and get a good night’s sleep. Until next time, when we’ll implement procedure calls!

Here’s a fun composite diagram for the road:

This is a composite of all the instruction encoding diagrams present in the post. If you're seeing this text, it means your browser cannot render SVG.

Mini Table of Contents

  1. If you are an avid reader of this blog (Do those people exist? Please reach out to me. I would love to chat.), you may notice that Tom gets pulled into shenanigans a lot. This is because Tom is the best debugger I have ever encountered, he’s good at reverse engineering, and he knows a lot about low-level things. I think right now he’s working on improving open-source tooling for a RISC-V board for fun. But also he’s very kind and helpful and generally interested in whatever ridiculous situation I’ve gotten myself into. Maybe I should add a list of the Tom Chronicles somewhere on this website. Anyway, everyone needs a Tom. 

October 18, 2020

Derek Jones (derek-jones)

Learning useful stuff from the Human cognition chapter of my book October 18, 2020 09:37 PM

What useful, practical things might professional software developers learn from the Human cognition chapter in my evidence-based software engineering book (an updated beta was release this week)?

Last week I checked the human cognition chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

I had spent a lot of time of learning about cognition when writing my C book; for this chapter I was catching up on what had happened in the last 10 years, which included: building executable models has become more popular, sample size has gotten larger (mostly thanks to Mechanical Turk), more researchers are making their data available on the web, and a few new theories (but mostly refinements of existing ideas).

Software is created by people, and it always seemed obvious to me that human cognition was a major topic in software engineering. But most researchers in computing departments joined the field because of their interest in maths, computers or software. The lack of interested in the human element means that the topic is rarely a research topic. There is a psychology of programming interest group, but most of those involved don’t appear to have read any psychology text books (I went to a couple of their annual workshops, and while writing the C book I was active on their mailing list for a few years).

What might readers learn from the chapter?

Visual processing: the rationale given for many code layout recommendations is plain daft; people need to learn something about how the brain processes images.

Models of reading. Existing readability claims are a joke (or bad marketing, take your pick). Researchers have been using eye trackers, since the 1960s, to figure out what actually happens when people read text, and various models have been built. Market researchers have been using eye trackers for decades to work out where best to place products on shelves, to maximise sales. In the last 10 years software researchers have started using eye trackers to study how people read code; next they need to learn about some of the existing models of how people read text. This chapter contains some handy discussion and references.

Learning and forgetting: it takes time to become proficient; going on a course is the start of the learning process, not the end.

One practical take away for readers of this chapter is being able to give good reasons how other people’s proposals, that are claimed to be based on how the brain operates, won’t work as claimed because that is not how the brain works. Actually, most of the time it is not possible to figure out whether something will work as advertised (this is why user interface testing is such a prolonged, and expensive, process), but the speaker with the most convincing techno-babble often wins the argument :-)

Readers might have a completely different learning experience from reading the human cognition chapter. What useful things did you learn from the human cognition chapter?

Bogdan Popa (bogdan)

Web Development with Koyo October 18, 2020 06:00 PM

Inspired by Brian Adkins’ RacketCon talk from yesterday, I decided to record a screencast on what it’s like to write a little web application using my not-quite-a-web-framework, koyo. You can watch it over on YouTube and you can find the resulting code on GitHub. It’s unscripted and I don’t go too deep on how everything works, but hopefully it’s easy enough to follow and I’ve left the various mistakes I’ve made in since it’s usually helpful to watch someone get out of a tricky situation so look forward to those if you watch it!

Carlos Fenollosa (carlesfe)

You may be using Mastodon wrong October 18, 2020 05:13 PM

I'm sure you have already heard about Mastodon, typically marketed as a Twitter alternative.

I will try to convince you that the word alternative doesn't mean here what you think it means, and why you may be using Mastodon wrong if you find it boring.

An alternative community

You should not expect to "migrate from Twitter to Mastodon."

Forget about the privacy angle for now. Mastodon is an alternative community, where people behave differently.

It's your chance to make new internet friends.

There may be some people for whom Mastodon is a safe haven. Yes, some users really do migrate there to avoid censorship or bullying but, for most of us, that will not be the case.

Let's put it this way: Mastodon is to Twitter what Linux is to Windows.

Linux is libre software. But that's not why most people use it. Linux users mostly want to get their work done, and Linux is an excellent platform. There is no Microsoft Word, no Adobe Photoshop, no Starcraft. If you need to use these tools, honestly, you'd better stick with Windows. You can use emulation, in the same way that there are utilities to post to Twitter from Mastodon, but that would miss the point.

The bottom line is, you can perform the same tasks, but the process will be different. You can post toots on Mastodon, upload gifs, send DMs... but it's not Twitter, and that is fine.

The Local Timeline is Mastodon's greatest invention

The problem most people have with Mastodon is that they "get bored" with it quickly. I've seen it a lot, and it means one thing: the person created their account on the wrong server.

"But," they say, "isn't Mastodon federated? Can't I chat with everybody, regardless of their server?" Yes, of course. But discoverability works differently on Mastodon.

Twitter has only two discoverability layers: your network and the whole world. Either a small group of contacts, or everybody in the whole world. That's crazy.

They try very hard to show you tweets from outside your network so you can discover new people. And, at the same time, they show your tweets to third parties, so you can get new followers. This is the way that they try to keep you engaged once your network is more or less stable and starts getting stale.

Mastodon, instead, has an extra layer between your network and the whole world: messages from people on your server. This is called the local timeline.

The local timeline is the key to enjoying Mastodon.

How long it's been since you made a new internet friend?

If you're of a certain age you may remember BBSs, Usenet, the IRC, or early internet forums. Do you recall how exciting it was to log into the unknown and realize that there were people all around the world who shared your interests?

It was an amazing feeling which got lost on the modern internet. Now you have a chance to relive it.

The local timeline dynamics are very different. There is a lot of respectful interactions among total strangers, because there is this feeling of community, of being in a neighborhood. Twitter is just the opposite, strangers shouting at each other.

Furthermore, since the local timeline is more or less limited in the amount of users, you have the chance to recognize usernames, and being recognized. You start interacting with strangers, mentioning them, sending them links they may like. You discover new websites, rabbit holes, new approaches to your hobbies.

I've made quite a few new internet friends on my Mastodon server, and I don't mean followers or contacts. I'm talking about human beings who I have never met in person but feel close to.

People are humble and respectful. And, for less nice users, admins enforce codes of conduct and, on extreme cases, users may get kicked off a server. But they are not being banned by a faceless corporation due to mass reports, everybody is given a chance.

How to choose the right server

The problem with "generalist" Mastodon servers like is that users have just too diverse interests and backgrounds. Therefore, there is no community feeling. For some people, that may be exactly what they're looking for. But, for most of us, there is more value on the smaller servers.

So, how can you choose the right server? Fortunately, you can do a bit of research. There is an official directory of Mastodon servers categorized by interests and regions.

Since you're reading my blog, start by taking a look at these:

And the regionals

There are many more. Simply search online for "mastodon server MY_FAVORITE_HOBBY." And believe me, servers between 500 and 5,000 people are the best.

Final tips

Before clicking on "sign up", always browse the local timeline, the about page, and the most active users list. You will get a pretty good idea of the kind of people who chat there. Once you feel right at home you can continue your adventure and start following users from other servers.

Mastodon has an option to only display toots in specific languages. It can be very useful to avoid being flooded by toots that you just have no chance of understanding or even getting what they're about.

You can also filter your notifications by types: replies, mentions, favorites, reposts, and more. This makes catching up much more manageable than on Twitter.

Finally, Mastodon has a built-in "Content Warning" feature. It allows you to hide text behind a short explanation, in case you want to talk about sensible topics or just about spoiling a recent movie.

Good luck with your search, and see you on the Fediverse! I'm at

Tags: internet

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Sevan Janiyan (sevan)

LFS, round #1 October 18, 2020 02:07 AM

Following on from the previous blog post, I started on the path of build a Linux From Scratch distribution. The project offers two paths, one using traditional Sys V init and systemd for the other. I opted for systemd route and followed the guide, it was all very straight forward. Essentially you fetch a bunch …

October 16, 2020

Patrick Louis (venam)

October 2020 Projects October 16, 2020 09:00 PM

Conveyor belt

Seven long and perilous months have gone by since my previous article, what feels like an eternity, and yet feels like a day — Nothing and everything has happened.
All I can add to the situation in my country, that I’ve already drawn countless times, is that my expectations weren’t fulfilled. Indeed, after a governmental void and a horrific explosion engulfing a tremendous part of the capital, I’m not sure any words can express the conflicting feelings and anger I have. Today marks exactly 1 year since the people started revolting.
Sentimentalities aside, let’s get to what I’ve been up to.

Psychology, Philosophy & Books

zadig cover

Language: brains
Explanation: My reading routine has been focused one part on heavy technical books, and one part on leisure books.
On the leisure side, I’ve finished the following books:

  • The Better Angels of Our Nature: Why violence declined — Stephen Pinker
  • The Book of M — Peng Shepherd
  • The Gene — Siddhartha mukherjee
  • The second sex — Simone de Beauvoir
  • Aphorism on Love and Hate — Nietzsche
  • How To Use Your Enemies — Baltasar Gracian
  • Zadig ou La Destinée — Voltaire
  • Great Expectations — Charles Dickens [ongoing]

While on the technical side, I’ve finished these bricks:

  • Computer Architecture: A Quantitative approach - Hennesy Patterson
  • Beyond Software Architecture - Creating and sustaining winning solutions — Luke Hohmann
  • Compilers/dragon book - Aho Lam Seth Ullman
  • Operating system concepts - Silberschatz [ongoing]

Obviously, I still want to build a bookshelf, however the current situation has postponed this project.

As far as podcasts go, I’ve toned down on them and only listen when exercising; which doesn’t amount to much compared to when I was commuting to work.

Life Refresh & Growth


Language: growth
Explanation: As you might have already noticed, I’ve redesigned my blog. I tried to give it more personality and to be more reflective of who I am as a person.
That involved reviewing the typography, adding meta-tags and previews, adding relevant pictures for every articles, including general and particular descriptions for sections of the blog, and more.

Additionally, I’ve polished my online presence on StackOverflow and LinkedIn. It is especially important these days, when in need of new opportunities.

LinkedIn Profile

As far as software architecture goes, I’m still on the learning path which consists of reading articles, watching videos, and trying to apply the topics to real scenarios. Recently, I’ve started following Mark Richards’ and Neil Ford’s Foundations Friday Forum, which is a monthly webinar on software architecture.

When it comes to articles, that’s where I’ve put the most energy. Here’s the list of new ones.

  • The Self, Metaperceptions, and Self-Transformation: One of my favorite article about the self and growth. It has been influenced by theories from Carl Jung and Nietzsche.
  • Software Distributions And Their Roles Today: This is an article I had in mind for a long while but didn’t get to write. It was initially supposed to be a group discussion as a podcast but I ended up writing it as an article, and then recording a podcast too.
  • Time on Unix: My biggest and most complete article to date. I consider it an achievement, and it has been well received by readers. It’s now the goto article when it comes to time.
  • Domain Driven Design Presentation: The transcript of a talk I’ve given for the MENA-Devs community.
  • Evolutionary Software Architecture: An article where I apply my knowledge of software architecture to explain a trending topic.
  • D-Bus and Polkit, No More Mysticism and Confusion: There’s a lot of confusion and hate about dbus and everything around it. I personally had no idea what these tech implied so I wrote an article and found out for myself if the hate was justified.
  • Computer Architecture Takeaways: An article reviewing my knowledge on computer architecture after reading a book on it.
  • Notes About Compilers: Another article reviewing my knowledge on compilers after finishing the dragon book and other related content.
  • Did You Know Fonts Could Do All This?: Fonts is a topic that is very deep and complex, you can talk endlessly about it. In this particular article, I had a go at different settings and how they affect the rendering of fonts.
  • Corruption Is Attractive!: I’m fascinated by glitch art, and so I wrote an article about it, trying to sum up different techniques and give my personal view of what it consists of.

Recently, I’ve also had an Interview with my friend Oday on his YouTube channel. We had an interesting talk.

Now on the programming language side, I’m hoping on the bandwagon and learning Rust. I’m still doing baby steps.

When it comes to personal fun, I’ve stopped my Elevate subscription because of the spending restriction my country has implemented. But instead, I’ve started with a word of the day app.



Language: mycelium
Explanation: Finally, I’ve pushed my research about mushrooms in Lebanon online. It’s here and is composed of a map with information about each specimen.
I hope to soon go hike and discover new ones.

Unix, Nixers, 2bwm

nixers workflow compilation

Language: Unix
Explanation: There were a lot of ups and downs in the nixers community the past few months. We had to detach ourselves from the previous people that managed IRC because of their unprofessional and unacceptable behavior. Soon after, I’ve created our own room on freenode, and since then things have gone smoothly. That was until hell happened around me and I decided to close the forums. However, the community made it clear that they wanted to help and keep it alive and well. Thus, I retracted my decision and started implementing mechanisms to make the forums more active such as: A thread of the week, a gopher server, a screenshots display, fixed dmarc for emails, and fix the forums mobile view.
I’ve also uploaded all the previous year’s video compilation on YouTube.

When it comes to 2bwm, we finally added support for separate workpaces per monitor.


CTF Arab Win

Language: Security
Explanation: A friend of mine recently invited me to be part of his CTF team for the national competition. Lo and behold a couple of months later we win the national competition, get 2nd place in the b01lers CTF, and get 1st place in the Arab & Africa CybertTalents regional CTF.
In the coming months I’ll train on topics I haven’t dealt with before.

Ascii Art & Art

Nature ASCII Art

Language: ASCII
Explanation: I haven’t drawn too many pieces recently, however I’d rather emphasize quality over quantity. You can check my pieces here.

Additionally, I’ve tried to make my RedBubble Shop more attractive, maybe it’ll help in these hard economic times. Otherwise, it’s always nice to have it around.
Other than that, I’ve joined the small community, which also has a love for ascii art.

Life and other hobbies

Goat farm

Language: life
Explanation: Life has been a bit harsh recently but I’ve tried to make the most of it.

We started gardening my father and I, we planted everything from sunflowers, cucumbers, tomatoes, rocca, parsley, cilantro, zucchinis, eggplants, bell peppers, hot peppers, ginger, green beans, garlic, onions, and more.
After the tomato harvest, we made our own tomato sauce and pasteurized it — It was heaven.

Recently, I’ve visited a local farm called Gout Blanc, it was a fun experience, but marked in time by the reality of the economic crisis we are in. The owners were wonderful and friendly.

Like anyone in this lock down, I’m going through a bread making phase. I quite enjoy ciabatta bread with halloum:

Homemade Ciabatta Bread

When the initial lock down started, I ordered some joysticks to play retro games with my brother, little did I know that I would only get them 5 months later. My brother left Lebanon to continue his study in France by that time, but I still got a retro-gaming setup.
Like many people close to me, he left for better pastures…

An anecdote, I’ve started to be hassled by Google and YouTube. I simply cannot open a video these days without being asked to fill captchas, so I’ve gotten quite good at finding cars, traffic light, and other trifles in random pictures. More than that, the kind of ads I’ve been getting are of the weirdest kind. Just take a look.

ad 1 ad 2 ad 3 ad 4 ad 5


What’s in store for tomorrow… I’m not sure anymore. There hasn’t been more of a need for change.

This is it!
As usual… If you want something done, no one’s gonna do it for you, use your own hands, even if it’s not much.
And let’s go for a beer together sometime, or just chill.


  • Internet Archive Book Images, No restrictions, via Wikimedia Commons
  • Claud Field, Public domain, via Wikimedia Commons

October 15, 2020

Caius Durling (caius)

Let's Peek: A tale of finding "Waypoint" October 15, 2020 07:00 PM

Following a product launch at work earlier this year, I theorised if someone was watching the published lists of SSL Certificates they could potentially sneak a peak at things before they were publicised. Probably far too much noise to monitor continuously, but as a potential hint towards naming of things with a more targeted search it might be useful. Sites like and make these logs searchable and queryable.

Fast forward to this week, where at HashiConf Digital HashiCorp are announcing two new products, which they’ve been teasing for a month or so. Watching Boundary get announced in the HashiConf opening keynote I then wondered what the second project might be called.

I’ve spent a chunk of the last month looking at various HashiCorp documentation for their projects, and I noticed they have a pattern recently of using <name> as the product websites. The newly announced Boundary also fits this pattern.

🤔 Could I figure out the second product name 24 hours before public release? Amazingly, yes! 🎉

Searching at random for all certificates issued for * was probably going to be a bit futile, so to narrow the search space slightly I started by looking at when had its certificate issued, and who by. The list of things I spotted were:

  • Common name is “”
  • Issued by LetsEncrypt (no real surprise there)
  • Issued on 2020-09-23
  • Leaf certificate
  • Not yet expired (still trusted)
  • No alternate names in the certificate

Loading up and building a query for this, resulted in a regexp lookup against the common name, and an issued at date range of 10 days, just before and a week after the boundary certificate issued date.

parsed.subject.common_name:/[a-z]+project\.io/ AND
parsed.issuer.organization.raw:"Let's Encrypt" AND
parsed.validity.start:["2020-09-20" TO "2020-09-30"] AND
tags.raw:"leaf" AND

(Run the search yourself)

Searching brought back a couple of pages of results, I scanned them by eye and copied out the ones that only had the single name in the certificate which resulted in the following shortlist:


We already know about Boundary, so the fact I found it in our list suggests the query might have captured the new product site too. Loading all these sites in a web browser showed some had password protection on them (ooh!) and some just plain didn’t load (ooh!), and some others were blatently other things (boo!). Removing the latter ones left us with a much shorter list:

  • udproject.io1

All domains on the internet have to point somewhere, using DNS records. On a hunch I looked up a couple of the existing HashiCorp websites to see if they happened to all point at the same IP Address(es).

$ host has address
$ host has address
$ host | head -1 has address

Ah ha, now I wonder if any of the shortlist also points to 🤔2

$ host | head -1 has address
$ host | head -1 has address
$ host has address

🎉 Excellent, was a password protected site pointed at HashiCorp’s IP address 🎉

I then wondered if I could verify this somehow ahead of waiting for the second keynote. I firstly tweeted about it but didn’t name Waypoint explicitly, just hid “way” and “point” in the tweet. I got a reply from @ksatirli which suggested it was correct (and then later @mitchellh confirmed it.3)

HashiCorp also does a lot in public, and all the source code and related materials are on GitHub so perhaps some of their commit messages or marketing sites will contain reference to Waypoint. One github search later across their organisation: and I’d discovered a commit in the newly-public hashicorp/boundary-ui repo which references Waypoint: 346f76404

chore: tweak colors to match waypoint and for a11y

Good enough for me, now to wait and see what the project is for. Given it’s now all announced and live, you can just visit to find out! (It’s so much cooler/useful than I’d hoped for.)

  1. I so hope whoever registered this was going for UDP in the name, rather than UD Project. ↩︎

  2. I’m a massive fan of IP address related quirks. Facebook’s IPv6 address contains face:b00c for example. A nice repeating is almost IPv4 art somehow. ↩︎

  3. Secrets are more fun when they are kept secret. 🥳 ↩︎

Jeremy Morgan (JeremyMorgan)

Building a Go Web API with the Digital Ocean App Platform October 15, 2020 05:03 AM

Recently, Digital Ocean announced they’re entering the PaaS market with their new application platform. They’ve hosted virtual machines (droplets) and Kubernetes based services for years, but now they’re creating a platform that’s a simple point and click to get an application up and running. So I decided to try it. In this tutorial, we’re going to build an application on that platform. I will use Go to make a small web API and have it backed by a SQLite database.

October 14, 2020

Gokberk Yaltirakli (gkbrk)

Status update, October 2020 October 14, 2020 09:00 PM

To nobody’s surprise, the consistency of status updates have been less than perfect. But still, here I am with another catch-up post. Since the last update was a while back, this one might end up slightly longer.

First off, let me start with a career update. I have received my undergraduate degree and I am now officially a Software Engineer. Recently I’ve started working with a company that does mobile network optimization. I’m now a part of their Integration team, and I get to work with a lot of internals of mobile networks. This is exciting for me because of my interest in radio communications, as I get to work on non-toy problems now.

I migrated my personal finances from basic CSV files to double-entry bookkeeping. I decided to go with a homebrew solution, so I published It has a syntax that roughly resembles ledger-cli and beancount, but is currently not compatible with either.

I have also written a few throw-away scripts that can read both my previous budget CSV and exports from my previous bank, so I get to backfill a lot of historical data.

I started working on a networking stack, along with a custom packet routing algorithm. There is no name for the project yet, and it is not quite ready for a fancy public release, but I am occasionally publishing code dumps on gkbrk/network01. I am testing this network in a sort of closed-alpha with a small group of friends.

The network is intended to work with a topology where nodes don’t have direct links to other nodes. This is different from the so-called overlay networks. While most links between nodes go through the internet via our ISPs right now, we are intending to add radio links between some nodes in order reduce our reliance on ISPs. There is nothing in the network design that prevents different kinds of links from being used.

As of now, the network can find paths between nodes, can recover and discover new paths in case some links fail, and can route packets between all nodes. We have done some trivial tests including private messaging and a few extremely choppy voice calls.

I am intending to work more on this project and even write some blog posts about it if I manage to stay interested.

As I have moved countries, I have a lot of paperwork to do. And some of this paperwork involves grabbing difficult-to-get appointments. I had the joy of automating this work and keeping me up to date using selenium and the SMS API from AWS.

I initially thought I would go with Twilio, but to my disappointment things weren’t too smooth with them. Everything went smoothly and I started to integrate their APIs, and it was time to put some credits in my account. While I looked completely normal to their automated systems, they decided to block me seconds after charging my card. Apparently paying for services is suspicious these days. And of course, no reply to support tickets and currently no refund in sight.

That’s all for this month, thanks for reading!

October 13, 2020

Mark Fischer (flyingfisch)

Sharpie Stainless Steel Pen Refill Update October 13, 2020 01:14 PM

I have finally tried another refill with my Sharpie pen that I think is much similar to the original. The refill is a Schmidt 6040 FineLiner Fiber Tip Metal Refill M. I wish they made a Fine version, but this is close enough. It doesn’t bleed, and feels similar to the original pen.

October 12, 2020

Noon van der Silk (silky)

New job; moving to Cambridge! October 12, 2020 12:00 AM

Posted on October 12, 2020 by Noon van der Silk

So, I’m very excited to share that we’re moving overseas! We’re headed to Cambridge in the UK at the end of this month.

I’m very lucky to be starting work with a very cool quantum computing company: Riverlane.

Leaving Melbourne

It’s going to be interesting leaving Melbourne. I’ve lived here all my life basically, and I’ve found a really nice group of friends. I’m going to miss everyone.

I owe a big thanks to everyone that’s helped me in my career and life over the years here. I won’t list all of you, but thanks :) I wouldn’t be where I am if I hadn’t had your help and support. ❤️ 💖

In particular the meetup community has been a great place for me and somewhere I was able to forge some really strong friendships. Specifically, I’ve had a great time at the Melbourne Maths and Science Meetup, the Haskell Meetup, back in the day I loved MXUG, and of course I have to thank my friend David Kemp for being the only consistent attendee of the Quantum Lunch, started way back in the day! I’ll also miss hanging out with the cool people who’ve come to events I’ve helped organise, such as compose conference and post-prediction conference. I love Melbourne for the really nice connection you get between different communities, and it’s been some of my favourite times meeting people from outside my standard little bubble.

I also have to thank my friend Charles Hill from Melbourne University, who taught me so much about quantum computing, is an amazing researcher and very kind and generous person.

Thanks also to the other people in the tech community here that have supported me and helped me first move into new jobs and learn interesting and fun things. I’ve been very lucky.

Of course, thanks again to all the people that helped out with Braneshop whom I’ve already mentioned over there.


We’re totally new to Cambridge, so if you’ve got suggestions/connections I’d love to hear about it! Definitely keen to meet some people and get into the community over there.

Reach out if you like!

I’ve benefited a lot from emailing random people and just asking questions, so please feel free to reach out to me if you think I can help with anything!

October 11, 2020

Derek Jones (derek-jones)

Learning useful stuff from the Cognitive capitalism chapter of my book October 11, 2020 10:19 PM

What useful, practical things might professional software developers learn from the Cognitive capitalism chapter in my evidence-based software engineering book?

This week I checked the cognitive capitalism chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

Software systems are the product of cognitive capitalism (more commonly known as economics).

My experience is that most software developers don’t know anything about economics, so everything in this chapter is likely to be new to them. The chapter is more tutorial like than the other chapters.

Various investment models are discussed. The problem with these kinds of models is obtaining reliable data. But, hopefully the modelling ideas will prove useful.

Things I learned about when writing the chapter include: social learning, group learning, and Open source licensing is a mess.

Building software systems usually requires that many of the individuals involved to do lots of learning. How do people decide what to learn, e.g., copy others or strike out on their own? This problem is not software specific, in fact social learning appears to be one of the major cognitive abilities that separates us from other apes.

Organizational learning and forgetting is much talked about, and it was good to find some data dealing with this. Probably not applicable to most people.

Open source licensing is a mess in that software containing a variety of, possible incompatible, licenses often gets mixed together. What future lawsuits await?

For me, potentially the most immediately useful material was group learning; there are some interesting models for how this sometimes works.

Readers might have a completely different learning experience from reading the cognitive capitalism chapter. What useful things did you learn from the cognitive capitalism chapter?

Andreas Zwinkau (qznc)

Pondering Amazon's Manyrepo Build System October 11, 2020 12:00 AM

Amazon's build system provides valuable insights for manyrepo environments.

Read full article!

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Heap allocation October 11, 2020 12:00 AM


Welcome back to the “Compiling a Lisp” series. Last time we added support for if expressions. This time we’re going add support for basic heap allocation.

Heap allocation comes in a couple of forms, but the one we care about right now is the cons primitive. Much like AST_new_pair in the compiler, cons should:

  • allocate some space on the heap,
  • set the car and cdr, and
  • tag the pointer appropriately.

Once we have that pair, we’ll want to poke at its data. This means we should probably also implement car and cdr primitive functions today.

What a pair looks like in memory

In order to generate code for packing and pulling apart pairs, we should probably know how they are laid out in memory.

Pairs contain two elements, side by side — kind of like a two-element array. The first element (pair[0]) is the car and the second one (pair[1]) is the cdr.

...| car | cdr |...

The untagged pointer points to the address of the first element, and the tagged pointer has some extra information (kPairTag == 1) that we need to get rid of to inspect the elements. If we don’t, we’ll try and read from one byte after the pointer, somewhere in the middle of the car. This will give us bad data.

To make things more concrete, imagine our pair is allocated at 0x10000. Our car lives at *(0x10000) (using C notation) and our cdr lives at *(0x10000 + kWordSize). The tagged pointer in this case would be 0x10001 and kWordSize is 8.

Allocating some memory

We could make a call to malloc whenever we need a new object. This has a couple of downfalls, notably that malloc does a lot of internal bookkeeping that we really don’t need, and that there’s no good way to keep track of what memory we have allocated and needs garbage collecting (which we’ll handle later). It also has the unfortunate property of requiring C functional call infrastructure, which we don’t have yet.

What we’re going to do instead is allocate a big slab of memory at the beginning of our process. That will be our heap. Then, to keep track of what memory we have used so far, we’re going to bump the pointer every time we allocate. So here’s what the heap looks like before we allocate a pair:

|     |     |     |     |     |     |     |     |...

The empty cells aren’t necessarily empty, but they are unused and they are garbage data.

Here is what it looks like after we allocate a pair:

| car | cdr |     |     |     |     |     |     |...
^           ^
pair        heap

Notice how the heap pointer has been moved over 2 words, and the pair pointer is the returned cons cell. Although we’ll tag the pair pointer, I am pointing it at the beginning of the car for clarity in the diagram.

In order to get this big slab of memory in the first place, we’ll have the outside C code (right now, that’s our test handler) call malloc.

You’re probably wondering what we’re going to do when we run out of memory. At some point in this series we’ll have a garbage collector that can reclaim some space for us. Right now, though, we’re just going to do … nothing. That’s right, we won’t even raise some kind of “out of memory” error. Remember, we don’t yet have error reporting facilities! Instead, we’ll use tools like Valgrind and AddressSanitizer to make sure we’re not overrunning our allocated buffer.

Implementation strategy

In order to make allocation from that big buffer fast and easy, we’re going to keep the heap pointer in a register. Our compiler emits instructions that use rbp, rsp, and rax, so we’ll have to pick another one. Ghuloum uses rsi, so we’ll use that as well.

In order to get the heap pointer in rsi in the first place, we’ll have to capture it from the outside C code. To do this, we’ll add a parameter to our entrypoint by modifying the function prologue.

Remember JitFunction? This is what the C code uses to understand how to call our mmap-ed function. We’re going to need to modify this first.

// Before:
typedef uword (*JitFunction)();

// After:
typedef uword (*JitFunction)(uword *heap);

That’s going to need to take a new parameter now — a pointer to the heap. This means that our kFunctionPrologue will need to expect that in the first parameter register in the calling convention, and store it somewhere safe. This register is rdi, so we can emit a mov rsi, rdi to store our heap pointer away.

Now, for the lifetime of the Lisp entrypoint, we can refer to the heap by the name rsi and modify it accordingly. We’ll keep an internal convention that rsi always points to the next available chunk of memory.

Want to allocate memory? Copy the current heap pointer into rax and update the heap pointer with add rsi, AllocationSize. We’ll need to add a new instruction for moving data between registers. Honestly, I am kind of surprised we haven’t needed that yet.

Want to store your car and cdr in your new pair? Write to offset 0 and kWordSize of rax, respectively. We’ll reuse our indirect store instruction.

Want to tag your pointer? add rax, Tag or or rax, Tag. These two instructions are equivalent because all the three taggable bits in a heap object will be zero.

This word-alignment is easy to maintain now because all pairs will be size 16, which is a multiple of 8. Later on, when we add symbols and strings and other data types that have non-object data in them, we’ll have to insert padding between allocations to keep the alignment invariant.

Once we have pairs allocated, it’s kind of useless unless we can also poke at their elements.

To implement car, we’ll remove the tag from the pointer and read from the memory pointed to by the register: mov rax, [Ptr+Car-Tag]. You can also do this with a sub rax, Tag and then a mov.

Implementing cdr is very similar, except we’ll be doing mov rax, [Ptr+Cdr-Tag].

Brass tacks

Now that we’ve gotten our minds around the abstract solution to the problem, we should write some code.

First off, here is the addition to the prologue I mentioned earlier:

const byte kEntryPrologue[] = {
  // Save the heap in rsi, our global heap pointer
  // mov rsi, rdi
  kRexPrefix, 0x89, 0xfe,

Let’s once more add an entry to Compile_call.

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index, Env *varenv) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "cons")) {
      return Compile_cons(buf, /*car=*/operand1(args), /*cdr=*/operand2(args),
                          stack_index, varenv);
    // ...

We don’t really need to add a whole new function for cons since we’re not doing structural recursion on the parameters or anything, but Compile_call just keeps getting bigger and this helps keep it smaller.

Compile_cons is pretty much exactly what I described above. I pulled out rsi into kHeapPointer so that we can change it later if we need to.

const Register kHeapPointer = kRsi;

int Compile_cons(Buffer *buf, ASTNode *car, ASTNode *cdr,
                 int stack_index, Env *varenv) {
  // Compile and store car on the stack
  _(Compile_expr(buf, car, stack_index, varenv));
                          /*dst=*/Ind(kRbp, stack_index),
  // Compile and store cdr
  _(Compile_expr(buf, cdr, stack_index - kWordSize, varenv));
  Emit_store_reg_indirect(buf, /*dst=*/Ind(kHeapPointer, kCdrOffset),
  // Fetch car and store in the heap
  Emit_load_reg_indirect(buf, /*dst=*/kRax, /*src=*/Ind(kRbp, stack_index));
  Emit_store_reg_indirect(buf, /*dst=*/Ind(kHeapPointer, kCarOffset),
  // Store tagged pointer in rax
  Emit_mov_reg_reg(buf, /*dst=*/kRax, /*src=*/kHeapPointer);
  Emit_or_reg_imm8(buf, /*dst=*/kRax, kPairTag);
  // Bump the heap pointer
  Emit_add_reg_imm32(buf, /*dst=*/kHeapPointer, kPairSize);
  return 0;

Note that even though we’re compiling two expressions one right after another, we don’t need to bump stack_index or anything. This is because we’re storing the results not on the stack but in the pair.

As it turns out, we do need to store one of the intermediates on the stack because otherwise we risk overwriting random data in the heap. As Leonard Schütz pointed out to me, the previous version of this code would fail if either the car or cdr expressions modified the heap pointer. Thank you for the correction!

As promised, here is the new instruction to move data between registers:

void Emit_mov_reg_reg(Buffer *buf, Register dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  Buffer_write8(buf, 0xc0 + src * 8 + dst);

Alright, that’s cons. Let’s implement car and cdr. These are extraordinarily short implementations:

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index, Env *varenv) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "car")) {
      _(Compile_expr(buf, operand1(args), stack_index, varenv));
      Emit_load_reg_indirect(buf, /*dst=*/kRax,
                             /*src=*/Ind(kRax, kCarOffset - kPairTag));
      return 0;
    if (AST_symbol_matches(callable, "cdr")) {
      _(Compile_expr(buf, operand1(args), stack_index, varenv));
      Emit_load_reg_indirect(buf, /*dst=*/kRax,
                             /*src=*/Ind(kRax, kCdrOffset - kPairTag));
      return 0;
    // ...

Both car and cdr compile their argument and then load from the resulting address.

That’s it. That’s the whole implementation! It’s kind of nice that now we have these building blocks, adding new features is not so hard.


I’ve written a couple of tests for this implementation. In order to make this testing painless, I’ve also added a new type of test harness that passes the tests a buffer and a heap. I call it — wait for it — RUN_HEAP_TEST.

Anyway, here’s a test that we can allocate pairs. To fully test it, I’ve added some helpers for poking at object internals: Object_pair_car and Object_pair_cdr. Note that these may be the same as but are not necessarily the same as the corresponding AST functions. The C compiler could hypothetically re-order struct elements, I think. Joker_vD on Hacker News points out that C compilers are not permitted to re-order elements, but may insert padding for alignment.

TEST compile_cons(Buffer *buf, uword *heap) {
  ASTNode *node = Reader_read("(cons 1 2)");
  int compile_result = Compile_entry(buf, node);
  ASSERT_EQ(compile_result, 0);
  // clang-format off
  byte expected[] = {
      // mov rax, 0x2
      0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
      // mov [rbp-8], rax
      0x48, 0x89, 0x45, 0xf8,
      // mov rax, 0x4
      0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00,
      // mov [rsi+Cdr], rax
      0x48, 0x89, 0x46, 0x08,
      // mov rax, [rbp-8]
      0x48, 0x8b, 0x45, 0xf8,
      // mov [rsi+Car], rax
      0x48, 0x89, 0x46, 0x00,
      // mov rax, rsi
      0x48, 0x89, 0xf0,
      // or rax, kPairTag
      0x48, 0x83, 0xc8, 0x01,
      // add rsi, 2*kWordSize
      0x48, 0x81, 0xc6, 0x10, 0x00, 0x00, 0x00,
  // clang-format on
  uword result = Testing_execute_entry(buf, heap);
  ASSERT_EQ_FMT(Object_encode_integer(1), Object_pair_car(result), "0x%lx");
  ASSERT_EQ_FMT(Object_encode_integer(2), Object_pair_cdr(result), "0x%lx");

Here is a test for that tricky nested cons case that messed me up originally:

TEST compile_nested_cons(Buffer *buf, uword *heap) {
  ASTNode *node = Reader_read("(cons (cons 1 2) (cons 3 4))");
  int compile_result = Compile_entry(buf, node);
  ASSERT_EQ(compile_result, 0);
  uword result = Testing_execute_entry(buf, heap);
                Object_pair_car(Object_pair_car(result)), "0x%lx");
                Object_pair_cdr(Object_pair_car(result)), "0x%lx");
                Object_pair_car(Object_pair_cdr(result)), "0x%lx");
                Object_pair_cdr(Object_pair_cdr(result)), "0x%lx");

Here’s a test for reading the car of a pair. The test for cdr is so similar I will not include it here.

TEST compile_car(Buffer *buf, uword *heap) {
  ASTNode *node = Reader_read("(car (cons 1 2))");
  int compile_result = Compile_entry(buf, node);
  ASSERT_EQ(compile_result, 0);
  // clang-format off
  byte expected[] = {
      // mov rax, 0x2
      0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
      // mov [rbp-8], rax
      0x48, 0x89, 0x45, 0xf8,
      // mov rax, 0x4
      0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00,
      // mov [rsi+Cdr], rax
      0x48, 0x89, 0x46, 0x08,
      // mov rax, [rbp-8]
      0x48, 0x8b, 0x45, 0xf8,
      // mov [rsi+Car], rax
      0x48, 0x89, 0x46, 0x00,
      // mov rax, rsi
      0x48, 0x89, 0xf0,
      // or rax, kPairTag
      0x48, 0x83, 0xc8, 0x01,
      // add rsi, 2*kWordSize
      0x48, 0x81, 0xc6, 0x10, 0x00, 0x00, 0x00,
      // mov rax, [rax-1]
      0x48, 0x8b, 0x40, 0xff,
  // clang-format on
  uword result = Testing_execute_entry(buf, heap);
  ASSERT_EQ_FMT(Object_encode_integer(1), result, "0x%lx");

Other objects

I didn’t cover variable-length objects in this post because I wanted to focus on the basics of allocating and poking at allocated data structures. Next time, we’ll add symbols and strings we’ll learn about instruction encoding.

Mini Table of Contents

Robin Schroer (sulami)

Testing Hexagonal Architecture October 11, 2020 12:00 AM

Hexagonal Architecture, also known as Ports and Adapters, was first conceived by Cockburn in 2005, and popularised by Freeman & Pryce’s Growing Object-Oriented Software, Guided by Tests in 2009. For those unfamiliar, it describes an application architecture entirely comprised of ports, which are interfaces, and adaptors, which are implementations for those interfaces. The adaptors can depend on other ports, but not on other adaptors. A system is then constructed by selecting a full set of adaptors, depending on the requirements, and composing them using dependency injection.

A port can represent an external resource or service, but also a logical component of the system, like an HTTP server or a queue handler.

An Example Port & Adaptor

A simple example for a port could be blob storage. I will be using Clojure in this post, but no prior knowledge is required for understanding.This also allows me to gloss over types (or specs) which I would normally add in various places. As an aside, Clojure in particular is not great at this, as record methods cannot be defined by specs, requiring function wrappers.

Also, the example ports & adaptors in this post are modelled after Stuart Sierra’s component library.

A port in this case is a protocol, which we implement like so:

(defprotocol BlobStoragePort
  (store-object [this loc obj]
    "Store `obj` at  `loc`.")
  (retrieve-object [this loc]
    "Retrieve the object at `loc`.
    Returns `nil` if not found."))

Now that we have a port with an interface in the form of abstract method declarations, we can implement an adaptor, for example using S3:

(defrecord S3StorageAdaptor [bucket-loc]
  (store-object [this loc obj]
    (s3/put-object :bucket-loc bucket-loc
                   :key loc
                   :file obj))
  (retrieve-object [this loc]
    (s3/get-object :bucket-loc bucket-loc
                   :key loc)))

(defn new-s3-storage-adaptor [bucket-loc]
  (s3/create-bucket bucket-loc)
  (->S3StorageAdaptor bucket-loc))

During tests, we would like to use a blob storage that is much faster and not dependent on external state, so we can use a simple map in an atom:For those not familiar with Clojure, an atom is reference type that allows us to essentially implement a shared, safely mutable value among Clojure’s normally immutable values. Think of it as a pointer with automatic locking.

(defrecord MemoryBlobStorageAdaptor [storage-map]
  (store-object [this loc obj]
    (swap! storage-map assoc loc obj))
  (retrieve-object [this loc]
    (:loc @storage-map)))

(defn new-memory-blob-storage-adaptor []
  (->MemoryBlobStorageAdaptor (atom {})))

Testing the Port

It has been long known that a direct mapping of tests to internal methods is an anti-pattern to be avoided.Again, Growing Object-Oriented Software has a sub-chapter devoted to this, Unit-Test Behavior, Not Methods. It highlights the difference in ease of understanding, but another factor is ease of refactoring, which is significantly higher if the internal method hierarchy is not married to the test suite.

As such we will prefer testing on a port-level over testing on an adaptor-level. In practice that means we assert a certain set of behaviours about every adaptor for a given port by using only the public port methods in our tests, and using the same tests for all adaptors.

;; Abstract port test suite

(defn- store-and-retrieve-test [adaptor]
  (testing "store and retrieve returns the object"
    (let [loc "store-and-retrieve"
          obj "test-object"]
      (store-object adaptor loc obj)
      (is (= obj
             (retrieve-object adaptor loc))))))

(defn- not-found-test [adaptor]
  (testing "returns nil for nonexistent objects"
    (is (nil? (retrieve-object adaptor "not-found")))))

;; Specific adaptor tests

(deftest blob-storage-adaptor-test
  (let [adaptors [(new-memory-blob-storage-adaptor)
                  (new-s3-blob-storage-adaptor "test")]]
    (for [adaptor adaptors]
      (store-and-retrieve-test adaptor)
      (not-found-test adaptor))))

This has the advantage of establishing a consistent set of behaviours across all adaptors and keeping them in sync. One might wonder about intended behavioural differences between adaptors for the same port, but I would argue that from the outside, all adaptors for a given port should exhibit the same behaviour.If you really need different behaviour in some situations, I would recommend adding a flag or switch controlling this behaviour across all adaptors.

Because we are only using the public interface for testing, any internal differences are conveniently hidden from us.

The Rest of the System

Now that we have established a port, as well as some adaptors, we can build on top of them. Blob storage is a lower level ports in our system, and we are going to add a higher level port that implements some kind of business logic which requires blob storage.

;; Port definition omitted for brevity.

(defrecord BusinessLogicAdaptor [blob-storage-adaptor]
  (retrieve-double [this loc]
    (* 2 (retrieve-object blob-storage-adaptor loc))))

We are free to use different blob storage adaptors for different systems, for example production, staging, CI, or local development. The business logic adaptor is oblivious to the actual blob storage implementation injected.

On Mocks & Stubs

The careful reader might have noticed that the dependency injection of different adaptors looks a lot like mocking, and this is very much true. While mocking has been considered more and more problematic in recent years, the fact that we assert the same set of behaviours for our mocks as we assert for the “real components” leads us to much more fully featured and realistic mocks, compared to the ones which are written for specific tests and then rarely touched after.

If the difference in behaviour between different adaptors leads to problems which are not caught by the test suite, the problems is not mocking, but an incomplete behaviour specification for the adaptor in question.

eta (eta)

Strict COVID-19 restrictions in universities are irresponsible October 11, 2020 12:00 AM

This post is about mostly personal circumstances / issues, as well as current affairs. If that’s not what you want, turn back now.

At the start of the current coronavirus disease 2019 (COVID-19) pandemic, we were told that “flattening the curve” was a good idea – i.e. attempting to limit the spread of the disease by staying at home, wearing face coverings, etc. was a necessary step we should all take in order to prevent the national health services from getting overwhelmed (leading to an excess of deaths of people who could otherwise be helped).

A significant number of months have passed since March, and a new wave of unsuspecting secondary school graduates have descended on the UK’s universities1 – but, obviously, since there’s still a pandemic going on, things are different from the way they used to be. Pretty much all universities have new precautions to limit the spread of the disease, including things like

  • grouping students into (logical) “households”, and restricting interaction between said households
  • enforcing social distancing requirements
  • enforcing face covering usage
  • limiting the number of students that can be in the same place at one time (in line with the nationwide “rule of six”)
  • getting rid of all face-to-face tuition, and moving everything online
  • adding a curfew to, or closing, pubs and social spaces

Some of these precautions involve more sacrifices on the part of the students than others; wearing face coverings is relatively zero-cost, and has been shown to limit the spread of the disease quite significantly2. However, the goal of the overwhelming majority of the restrictions is clear: limit social interaction as far as practicable. (This ‘makes sense’, because social interaction is how the virus is spread.)

The point I want to express here is that having that as a goal in the context of universities is somewhat irresponsible, and seems to completely ignores the mental health concerns of an entire year’s worth of students at university right now3. Most students have left the (hopefully relatively comfortable) environment of secondary school to come to university – sometimes in an entirely new city, or indeed country. These students typically don’t have many people they can talk to once they arrive, having left the vast majority of their friends behind from school; instead, they must somehow discover new people, usually by having a lot of spontaneous interactions until they’re able to bed in and start to establish some friendships.

It doesn’t take a genius to realise that this process is not compatible with the above stated goal of not having much social interaction.

However, what I think is particularly irresponsible is the lack of discussion surrounding the consequences of not letting this process play out as normal. The need for students to socialize and make friends is invariant; the feeling of loneliness is inherent to being human and isn’t going away any time soon, so people will (attempt to) socialize to feel less lonely, especially when placed in an unfriendly new environment. Examples of consequences arising from a lack of social interactions among students include

  • greater incidences of mental health problems, as loneliness creates new or exacerbates existing issues
  • a reduced ability to even notice and help with such problems, as remote learning can mask all sorts of issues that are more easily recognizable in person
  • reduced academic performance and ability, due to previously mentioned mental health problems
  • a greater dropout rate, leading to reduced income for universities (some of which are already struggling to stay afloat)
  • in the extreme case, greater incidences of suicides

It’s also the case that not everyone is perfectly rule-abiding. While more meek students might follow restrictions and suffer the associated consequences, others will flagrantly disobey them, a fact which has consequences of its own:

  • instead of socializing in ‘controlled’ environments, under the purview of (e.g.) student wellbeing officers, students will socialize elsewhere (e.g. a random park)
    • in these ‘uncontrolled’ environments, a greater prevalence of dangerous behaviours (excessive drinking, drug use, etc.) would be expected
    • …but since these are the only opportunities available to undersocialized students, more students might end up taking unwise risks than would otherwise
    • there is already evidence to suggest more students are taking drugs and dying from it through precisely this mechanism4
  • from the perspective of the virus, the replacements for the now-banned opportunities to socialize are likely a lot worse, increasing net transmission

A lot of the problems here tie into greater issues with the discussion of the pandemic in the media and elsewhere; a lot of people seem to think that the worrying graph of growing cases is unquestionably something that must be dealt with immediately (perhaps with a lockdown, which is even worse for students). Don’t get me wrong – COVID-19 is a deadly disease, and must not be underestimated. Letting the disease run completely unchecked throughout the population, without any restrictions whatsoever, is a terrible idea and would kill many people unnecessarily; a very contested document called the Great Barrington declaration calls for something akin to that (albeit with protections in place for vulnerable members of society).

The reality is that it’s very difficult to come to a decision, and neither extremist view is correct; making everyone sit on their hands until a vaccine is available is stupid, but so is letting the disease run wild. There’s much we don’t know about the impacts of the virus, including whether or not it has long-term health implications for certain groups (and the conditions under which such long-term complications might arise) – but sensationalizing (e.g. evocative news headlines that attempt to instil fear as to the deadliness of the disease) does not help us come to a reasoned conclusion about risk.

To conclude, then, I believe the evidence to support strict COVID-19 restrictions in UK universities is questionable, and a re-think about the rationale for, and the consequences of, such strict restrictions is sorely required. It’s really unclear whether the benefits conferred by severely limiting social interaction (at least, imposing rules that attempt to achieve such) are worth the consequences of doing so – heck, it’s even unclear whether people even follow the rules enough to limit transmission at all (and the recent outbreaks in universities across the nation confirm that).

A lack of humane thinking seems to be the case amongst those who impose said restrictions; the problem cannot be viewed as a simple mathematical calculation of how to reduce cases (if reducing cases is even something worth attempting to do!), but one that leads to significant human suffering for those affected. With the world being more divided and polarised than ever, it’s worth trying to be empathetic – to both see the fear on the part of those pushing for a lockdown and limitation of cases, and to recognize the crushing impact restrictions have on the restricted.

  1. I’m one of these, of course, which is why I’m writing this. 

  2. Even if you disagree with the evidence here, face coverings are still basically zero-cost – you really don’t sacrifice much by wearing one! 

  3. If you disagree with me, please read the whole article first before getting angry. 

  4. I can’t find a citation for this, so take this claim with a pinch of salt. 

October 09, 2020

Sevan Janiyan (sevan)

How to open source: going from NetBSD to Linux October 09, 2020 01:30 AM

TL;DR: some BSD user tries something other and wonder why things are different. This post has sat in draft form for quite some time. At first it was written with highlighting the NetBSD project in mind and I started thinking about revisiting it recently due to frustration with running a mainstream Linux distribution when investigating …

October 07, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: If October 07, 2020 11:00 AM


Welcome back to the “Compiling a Lisp” series. Last time we added support for let expressions. This time we’re going to compile if expressions.

Compiling if expressions will allow us to write code that performs decision making. For example, we can write code that does something based on the result of some imaginary function coin-flip:

(if (= (coin-flip) heads)

If the call to coin-flip returns heads, then this whole if-expression will evaluate to 123. Otherwise, it will evaluate to 456. To determine if an expression is truthy, we’ll check if it is not equal to #f.

Note that the iftrue and iffalse expressions (consequent and alternate, respectively) are only evaluated if their branch is reached.

Implementation strategy

People normally compile if expressions by taking the following structure in Lisp:

(if condition

and rewriting it to the following pseudo-assembly (where ...compile(X) is replaced with compiled code from the expressions):

  compare result, #f
  jump-if-equal alternate
  jump end

This will evaluate the condition expression. If it’s falsey, jump to the code for the alternate expression. Otherwise, continue on to the code for the consequent expression. So that the program does not also execute the code for the alternate, jump over it.

This transformation requires a couple of new pieces of infrastructure.

Implementation infrastructure

First, we’ll need two types of jump instructions! We have a conditional jump (jcc in x86-64) and an unconditional jump (jmp in x86-64). These are relatively straightforward to emit.

Somewhat more complicated are the targets of those jump instructions. We’ll need to supply each of the instructions with some sort of destination code address.

When emitting text assembly, this is not so hard: make up names for your labels (as with alternate and end above), and keep the names consistent between the jump instruction and the jump target. Sure, you need to generate unique labels, but the assembler will at least do address resolution for you. This address resolution transparently handles backward jumps (where the label is already known) and forward jumps (where the label comes after the jump instruction).

Since we’re not emitting text assembly, we’ll need to calculate both forward and backward jump offsets by hand. This ends up not being so bad in practice once we come up with an ergonomic way to do it. Let’s take a look at some production-grade assemblers for inspiration.

How Big Kid compilers do this

I read some source code for assemblers like the Dart assembler. Dart is a language runtime developed by Google and part of their infrastructure includes a Just-In-Time compiler, sort of like what we’re making here. Part of their assembler is some slick C++-y RAII infrastucture for emitting code and doing cleanup. Their implementation of compiling if expressions might look something like:

// Made-up APIs to make the Dart code look like our code
int Compile_if(Buffer *buf, ASTNode *cond, ASTNode *consequent,
               ASTNode *alternate) {
   Label alternate;
   Label end;
   compile(buf, cond);
   buf->cmp(kRax, Object::false());
   buf->jcc(kEqual, &alternate);
   compile(buf, consequent);
   compile(buf, alternate);

Their Label objects store information about where in the emitted machine code they are bound with bind. If they are bound before they are used by jcc or jmp or something, then the emitter will just emit the destination address. If they are not yet bound, however, then the Label will keep track of where it has to go back and patch the machine code once the label is bound to a location.

When the labels are destructed — meaning they can no longer be referenced by C++ code — their destructors have code to go back and patch all the instructions that referenced the label before it was bound.

While x86-64 has multiple jump widths available (for speed, I guess), it is a little tricky to use them for forward jumps. Because we don’t know in advance how long the intermediate code will be, we’ll just stick to generating 32-bit relative jumps always.

Virtual Machines like ART, OpenJDK Hotspot, SpiderMonkey, V8, HHVM, and Strongtalk also use this approach. So do the VM-agnostic AsmJit and GNU lightning assemblers. If I didn’t link an implementation, it’s either because I found the it too complicated to reproduce or couldn’t quickly track it down. Or maybe I don’t know about it!

Basically what I am trying to tell you is that this bind-and-backpatch approach is tried and tested and that we’re going to implement it in C. I hope you enjoyed the whirlwind tour of assemblers in various other runtimes along the way.

Compiling if-expressions, finally

Alright, so we finally get the big idea about how to do this transformation. Let’s put it into practice.

First, as with let, we’re going to need to handle the if case in Compile_call.

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index, Env *varenv) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "if")) {
      return Compile_if(buf, /*condition=*/operand1(args),
                        /*alternate=*/operand3(args), stack_index, varenv);
  // ...

As usual, we’ll pull apart the expression so Compile_if has less work to do. Since we now have more than two operands (!), I’ve added operand3. It works just like you would think it does.

For Compile_if, we’re going to largely replicate the pseudocode C++ from above. I think you’ll find that if you squint it looks similar enough.

int Compile_if(Buffer *buf, ASTNode *cond, ASTNode *consequent,
               ASTNode *alternate, word stack_index, Env *varenv) {
  _(Compile_expr(buf, cond, stack_index, varenv));
  Emit_cmp_reg_imm32(buf, kRax, Object_false());
  word alternate_pos = Emit_jcc(buf, kEqual, kLabelPlaceholder); // je alternate
  _(Compile_expr(buf, consequent, stack_index, varenv));
  word end_pos = Emit_jmp(buf, kLabelPlaceholder); // jmp end
  Emit_backpatch_imm32(buf, alternate_pos);        // alternate:
  _(Compile_expr(buf, alternate, stack_index, varenv));
  Emit_backpatch_imm32(buf, end_pos); // end:
  return 0;

Instead of having a Label struct, though, I opted to just have a function to backpatch forward jumps explicitly. If you prefer to port Label to C, be my guest. I found it very finicky1.

Also, instead of bind, I opted for a more explicit backpatch. This makes it clearer what is happening, I think.

This explicit backpatch approach requires manually tracking the offsets (like alternate_pos and end_pos) inside the jump instructions. We’ll need those offsets to backpatch them later. This means functions like Emit_jcc and Emit_jmp should return the offsets inside buf where they write placeholder offsets.

Let’s take a look inside these helper functions’ internals.

jcc and jmp implementations

The implementations for jcc and jmp are pretty similar, so I will only reproduce jcc here.

word Emit_jcc(Buffer *buf, Condition cond, int32_t offset) {
  Buffer_write8(buf, 0x0f);
  Buffer_write8(buf, 0x80 + cond);
  word pos = Buffer_len(buf);
  Buffer_write32(buf, disp32(offset));
  return pos;

This function is like many other Emit functions except for its return value. It returns the start location of the 32-bit offset for use in patching forward jumps. In the case of backward jumps, we can ignore this, since there’s no need to patch it after-the-fact.

Backpatching implementation

Here is the implementation of Emit_backpatch_imm32. I’ll walk through it and explain.

void Emit_backpatch_imm32(Buffer *buf, int32_t target_pos) {
  word current_pos = Buffer_len(buf);
  word relative_pos = current_pos - target_pos - sizeof(int32_t);
  Buffer_at_put32(buf, target_pos, disp32(relative_pos));

The input target_pos is the location inside the jmp (or similar) instruction that needs to be patched. Since we need to patch it with a relative offset, we compute the distance between the current position and the target position. We also need to subtract 4 bytes (sizeof(int32_t)) because the jump offset is relative to the end of the jmp instruction (the beginning of the next instruction).

Then, we write that value in. Buffer_at_put32 and disp32 are similar to their 8-bit equivalents.

Congratulations! You have implemented if.

A fun diagram

Radare2 has a tool called Cutter for reverse engineering and binary analysis. I decided to use it on the compiled output of a function containing an if expression. It produced this pretty graph!

Fig. 1 - Call graph as produced by Cutter.

It’s prettier in the tool, trust me.


I added two trivial tests for the condition being true and the condition being false. I also added a nested if case as a smoke test but I did not foresee that being troublesome with our handy recursive approach.

TEST compile_if_with_true_cond(Buffer *buf) {
  ASTNode *node = Reader_read("(if #t 1 2)");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  byte expected[] = {
      // mov rax, 0x9f
      0x48, 0xc7, 0xc0, 0x9f, 0x00, 0x00, 0x00,
      // cmp rax, 0x1f
      0x48, 0x3d, 0x1f, 0x00, 0x00, 0x00,
      // je alternate
      0x0f, 0x84, 0x0c, 0x00, 0x00, 0x00,
      // mov rax, compile(1)
      0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
      // jmp end
      0xe9, 0x07, 0x00, 0x00, 0x00,
      // alternate:
      // mov rax, compile(2)
      0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00
      // end:
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(1), result, "0x%lx");

TEST compile_if_with_false_cond(Buffer *buf) {
  ASTNode *node = Reader_read("(if #f 1 2)");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  byte expected[] = {
      // mov rax, 0x1f
      0x48, 0xc7, 0xc0, 0x1f, 0x00, 0x00, 0x00,
      // cmp rax, 0x1f
      0x48, 0x3d, 0x1f, 0x00, 0x00, 0x00,
      // je alternate
      0x0f, 0x84, 0x0c, 0x00, 0x00, 0x00,
      // mov rax, compile(1)
      0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
      // jmp end
      0xe9, 0x07, 0x00, 0x00, 0x00,
      // alternate:
      // mov rax, compile(2)
      0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00
      // end:
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(2), result, "0x%lx");

I made sure to test the generated code because we added some new instructions and also because I had trouble getting the offset computations perfectly right initially.

Anyway, that’s all for today. This post was made possible by contributions2 to my blog from Viewers Like You. Thank you.

Next time on PBS, heap allocation.

Mini Table of Contents

  1. Maybe it would be less finicky with __attribute__((cleanup)), but that is non-standard. This StackOverflow question and associated answers have some good information.

  2. By “contributions” I mean thoughtful comments, questions, and appreciation. Feel free to chime in on Twitter, HN, Reddit,, the mailing list… 

October 05, 2020

Ponylang (SeanTAllen)

Last Week in Pony - October 4, 2020 October 05, 2020 12:30 AM

There’s a new meeting URL for the weekly Pony developer sync meeting.

Marc Brooker (mjb)

Consensus is Harder Than It Looks October 05, 2020 12:00 AM

Consensus is Harder Than It Looks

And it looks pretty hard.

In his classic paper How to Build a Highly Available System Using Consensus Butler Lampson laid out a pattern that's become very popular in the design of large-scale highly-available systems. Consensus is used to deal with unusual situations like host failures (Lampson says reserved for emergencies), and leases (time-limited locks) provide efficient normal operation. The paper lays out a roadmap for implementing systems of this kind, leaving just the implementation details to the reader.

The core algorithm behind this paper, Paxos, is famous for its complexity and subtlety. Lampson, like many who came after him1, try to build a framework of specific implementation details around it to make it more approachable. It's effective, but incomplete. The challenge is that Paxos's subtlety is only one of the hard parts of building a consensus system. There are three categories of challenges that I see people completely overlook.


"How can we arrange for each replica to do the same thing? Adopting a scheme first proposed by Lamport, we build each replica as a deterministic state machine; this means that the transition relation is a function from (state, input) to (new state, output). It is customary to call one of these replicas a ‘process’. Several processes that start in the same state and see the same sequence of inputs will do the same thing, that is, end up in the same state and produce the same outputs" - Butler Lampson (from How to Build a Highly Available System Using Consensus).

Conceptually, that's really easy. We start with a couple of replicas with state, feed them input, and they all end up with new state. Same inputs in, same state out. Realistically, it's hard. Here are just some of the challenges:

  • Concurrency. Typical runtimes and operating systems use more than just your program's state to schedule threads, which means that code that uses multiple threads, multiple processes, remote calls, or even just IO, can end up with non-deterministic results. The simple fix is to be resolutely single-threaded, but that has severe performance implications2.
  • Floating Point. Trivial floating-point calculations are deterministic. Complex floating point calculations, especially where different replicas run on different CPUs, have code built with different compilers, may not be3. In Physalia we didn't support floating point, because this was too hard to think about.
  • Bug fixes. Say the code that turns state and input into new state has a bug. How do you fix it? You can't just change it and then roll it out incrementally to different replicas. You don't want to deploy all your replicas at once (we're trying to build an HA system, remember?) So you need to come up with a migration strategy. Maybe a flag sequence number. Or complex migration code that changes buggy new state into good new state. Possible, but hard.
  • Code updates. Are you sure that version N+1 produces exactly the same output as version N for all inputs? You shouldn't be, because even in the well-specified world of cryptography that's not always true.
  • Corruption. In reality, input isn't just input, it's also a constant stream of failing components, thermal noise, cosmic rays, and other similar assaults on the castle of determinism. Can you survive them all?

And more. There's always more.

Some people will tell you that you can solve these problems by using byzantine consensus protocols. Those people are right, of course. They're also the kind of people who solved their rodent problem by keeping a leopard in their house. Other people will tell you that you can solve these problems with blockchain. Those people are best ignored.

Monitoring and Control

Although using a single, centralized, server is the simplest way to implement a service, the resulting service can only be as fault tolerant as the processor executing that server. If this level of fault tolerance is unacceptable, then multiple servers that fail independently must be used. - Fred Schneider (from Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial)

The whole point of building a highly-available distributed system is to exceed the availability of a single system. If you can't do that, you've added a bunch of complexity for nothing.

Complex systems run in degraded mode. - Richard Cook (from How Complex Systems Fail)

Depending on what you mean by failed, distributed systems of f+1, 2f+1 or 3f+1 nodes can entirely hide the failure of f nodes from their clients. This, combined with a process of repairing failed nodes, allows us to build highly-available systems even in the face of significant failure rates. It also leads directly to one of the traps of building a distributed system: clients can't tell the difference between the case where an outage is f failures away, and where it's just one failure away. If a system can tolerate f failures, then f-1 failures may look completely healthy.

Consensus systems cannot be monitored entirely from the outside (see why must systems be operated?). Instead, monitoring needs to be deeply aware of the implementation details of the system, so it can know when nodes are healthy, and can be replaced. If they choose the wrong nodes to replace, disaster will strike.

Control planes provide much of the power of the cloud, but their privileged position also means that they have to act safely, responsibly, and carefully to avoid introducing additional failures. - Brooker, Chen, and Ping (from Millions of Tiny Databases)

Do You Really Need Strong Consistency?

It is possible to provide high availability and partition tolerance, if atomic consistency is not required. - Gilbert and Lynch

The typical state-machine implementation of consensus provides a strong consistency property called linearizability. In exchange, it can't be available for all clients during a network partition. That's probably why you chose it.

Is that why you chose it? Do you need linearizability? Or would something else, like causality be enough? Using consensus when its properties aren't really needed is a mistake a lot of folks seem to make. Service discovery, configuration distribution, and similar problems can all be handled adequately without strong consistency, and using strongly consistent tools to solve them makes systems less reliable rather than more. Strong consistency is not better consistency.


Despite these challenges, consensus is an important building block in building highly-available systems. Distribution makes building HA systems easier. It's a tool, not a solution.

Think of using consensus in your system like getting a puppy: it may bring you a lot of joy, but with that joy comes challenges, and ongoing responsibilities. There's a lot more to dog ownership than just getting a dog. There's a lot more to high availability than picking up a Raft library off github.


  1. Including Raft, which has become famous for being a more understandable consensus algorithm. Virtual Synchrony is less famous, but no less a contribution.
  2. There are some nice patterns for building deterministic high-performance systems, but the general problem is still an open area of research. For a good primer on determinism and non-determinism in database systems, check out The Case for Determinism in Database Systems by Thomson and Abadi.
  3. Bruce Dawson has an excellent blog post on the various issues and challenges.
  4. Bailis et al's Highly Available Transactions: Virtues and Limitations paper contains a nice breakdown of the options here, and Aphyr's post on Strong Consistency Models is a very approachable breakdown of the topic. If you really want to go deep, check out Dziuma et al's Survey on consistency conditions

October 04, 2020

Derek Jones (derek-jones)

Memory capacity growth: a major contributor to the success of computers October 04, 2020 09:32 PM

The growth in memory capacity is the unsung hero of the computer revolution. Intel’s multi-decade annual billion dollar marketing spend has ensured that cpu clock frequency dominates our attention (a lot of people don’t know that memory is available at different frequencies, and this can have a larger impact on performance that cpu frequency).

In many ways memory capacity is more important than clock frequency: a program won’t run unless enough memory is available but people can wait for a slow cpu.

The growth in memory capacity of customer computers changed the structure of the software business.

When memory capacity was limited by a 16-bit address space (i.e., 64k), commercially saleable applications could be created by one or two very capable developers working flat out for a year. There was no point hiring a large team, because the resulting application would be too large to run on a typical customer computer. Very large applications were written, but these were bespoke systems consisting of many small programs that ran one after the other.

Once the memory capacity of a typical customer computer started to regularly increase it became practical, and eventually necessary, to create and sell applications offering ever more functionality. A successful application written by one developer became rarer and rarer.

Microsoft Windows is the poster child application that grew in complexity as computer memory capacity grew. Microsoft’s MS-DOS had lots of potential competitors because it was small (it was created in an era when 64k was a lot of memory). In the 1990s the increasing memory capacity enabled Microsoft to create a moat around their products, by offering an increasingly wide variety of functionality that required a large team of developers to build and then support.

GCC’s rise to dominance was possible for the same reason as Microsoft Windows. In the late 1980s gcc was just another one-man compiler project, others could not make significant contributions because the resulting compiler would not run on a typical developer computer. Once memory capacity took off, it was possible for gcc to grow from the contributions of many, something that other one-man compilers could not do (without hiring lots of developers).

How fast did the memory capacity of computers owned by potential customers grow?

One source of information is the adverts in Byte (the magazine), lots of pdfs are available, and perhaps one day a student with some time will extract the information.

Wikipedia has plenty of articles detailing cpu performance, e.g., Macintosh models by cpu type (a comparison of Macintosh models does include memory capacity). The impact of Intel’s marketing dollars on the perception of computer systems is a PhD thesis waiting to be written.

The SPEC benchmarks have been around since 1988, recording system memory capacity since 1994, and SPEC make their detailed data public :-) Hardware vendors are more likely to submit SPEC results for their high-end systems, than their run-of-the-mill systems. However, if we are looking at rate of growth, rather than absolute memory capacity, the results may be representative of typical customer systems.

The plot below shows memory capacity against date of reported benchmarking (which I assume is close to the date a system first became available). The lines are fitted using quantile regression, with 95% of systems being above the lower line (i.e., these systems all have more memory than those below this line), and 50% are above the upper line (code+data):

Memory reported in systems running the SPEC benchmark on a given date.

The fitted models show the memory capacity doubling every 845 or 825 days. The blue circles are memory that comes installed with various Macintosh systems, at time of launch (memory doubling time is 730 days).

How did applications’ minimum required memory grow over time? I have a patchy data for a smattering of products, extracted from Wikipedia. Some vendors probably required customers to have a fairly beefy machine, while others went for a wider customer base. Data on the memory requirements of the various versions of products launched in the 1990s is very hard to find. Pointers very welcome.

Patrick Louis (venam)

Corruption Is Attractive! October 04, 2020 09:00 PM

Chaos, an important theme in hermetism

We live in a world that is gradually and incessantly attracted by over-rationality and order. In this article we’ll burst the enchanted bubble and embrace corruption and chaos — We’re going to discuss the topic of image glitch art.

w̸h̸a̷t̴’̶s̴ ̶a̴ ̷g̷l̸i̷t̴c̵h̵

Welcome to the land of creative destruction: image glitch art. Our story starts with a simple idea: glitching a wallpaper to create a slideshow of corrupted pictures.
The unfortunate victim of our crime: The world (Right click > View image, while keeping the Control key pressed, to admire it in more details while its still in its pristine form):

World Map, nominal case

Before we begin, let’s attempt to define what we’re trying to do: What is glitch art?
Like any art movement, words can barely express the essence behind the meaning, they are but fleeting and nebulous. Regardless, I’ll be an infidel and valiantly express what I think glitch art is.

A glitch is a perturbation, a minor malfunction, a spurious signal. In computers, glitches are predominantly accidental events that are undesirable and could possibly corrupt data.
Glitch art started as people developed a liking for such unusual events and the effects glitches had on the media they were perturbing. Some started to collect these glitches that happened naturally in the wild, and others started to intentionally appropriate the effects by manually performing them.
In the art scene, some started using image processing to “fake” true glitching effects.

Glitches happen all the time and everywhere, information is never as durable and reliable as we might like it to be, and living in a physical world makes it even less so. You’ve probably encountered or heard of the effect of putting a magnet next to anything electronic that hasn’t been rugged to withstand such scenario.
That’s why many techniques have been put in place to avoid glitches, at all layers, from the hardware storage, to the software reading and interpreting it. Be it error correcting codes (ECC) or error detection algorithms, they are all enemies of glitch art and the chaos we like.

However, this niche aesthetic is more than a fun pass-time for computer aficionado, there is a bigger picture. Similar to painters with brushes on a canvas, we are offered a material, an object to work with — a material made of bits and formatted in a specific way.
Like any object, our medium has a form and meaning, it can move, it has a size, it can be transferred, and interpreted — information theory is the field interested in this.
Like any object, our medium can be subject and react to deformations, forces, and stressors. How it flows is what the field of rheology is interested in (not to be confused with computational rheology, the field of fluid simulation.) The medium fluidity can be studied to answer questions such as: is it elastic, solid, viscous, or oily, how does it respond, within the bound of information theory, to different types of applied forces.

Here are some words you may encounter and that you definitely want to know:

  • Misregistration: Whenever a physical medium misread data because of damages caused by scratches, dirt, smudges, gamma rays, or any other treasures the universe throws at us.

  • Datamoshing, Photomosh, Imagemosh: Abusing the format of a medium, normally compression artefacts, to create glitches. For example, video compression often use i-frames for fixed images and p-frame for the movement/transition of pixels on that image. Removing i-frames is a common glitching method.

  • Databending: An idea taken from circuit bending, bending the circuit board of a toy to generate weird sounds. Databending is about bending the medium into another unrelated one, reinterpreting it as something it is not meant to be.

Let me add that glitch art is vast and fascinating, this article is but a glimpse into this space. If you’re captivated as much as I am, please take a look at and Rosa Menkman’s Beyond Resolution. Images can be pleasantly destroyed in a great number of ways to create masterpieces.

I̷m̷a̷g̴e̴ ̸G̸l̴i̴t̴c̵h̸ ̴A̶r̵t̵

Before starting let’s give some advices:

  • Back up your precious files before corrupting them.
  • Any glitching techniques can be combined and/or applied multiple times.
  • Sometimes too little has no effect, and sometimes too much can destroy the file.
  • It’s all about trials and errors, especially errors that result in glitches.

̷H̷o̵w̶ ̵T̸o̴ ̶I̷n̶d̸u̷c̶e̵ ̶A̸ ̶G̸l̵i̷t̶c̸h̴

Now it’s time to think about how we can apply our mischievous little stimuli, its size, the level or layer at which it’ll be applied, and the methodological recipe we’ll concoct to poison our images.

Glitch artist Benjamin Berg classifies glitches into 3 categories:

  • Incorrect Editing: Editing a file using a software that wasn’t made to edit such file. Like editing an image file as if it was a text file.
  • Reinterpretation aka Databending: Convert or read a file as if it was another type of medium. Like listening to an image file as if it was an audio file (aka sonification).
  • Forced errors, Datamoshing, and Misregistration: A software or hardware bug to force specific errors in the file. This can be about the corruption of specific bytes in the file to induce glitches, or something happening accidentally like a device turning off when saving a file.

So let’s get to work!

M̷a̵s̵h̸i̶n̶g̷ ̴T̶h̷e̷ ̷D̶a̸t̵a̸ ̸R̷a̸n̶d̶o̸m̴l̵y̷

The easiest, but roughest, way to glitch a file is to put on our monkey suit and overwrite or add random bytes in our image. As you would have guessed, this isn’t very efficient but half the time it does the trick and forces errors.

This technique is better suited for stronger materials like images in raw format — without metadata and headers. We’ll understand why in a bit.
To convert the file to raw format, open it in GIMP, select Export As, select the file by extension, and choose the raw type. For now, it doesn’t matter if you pick pixel-ordered or planar, but we’ll come back to this choice later because it’s an important one.

GIMP process to save image as raw

# Targa image data - Map (771-3) 771 x 259 x 1 - 1-bit alpha "\003\003\003\003\003\003\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001\001"

You should also note the width and height of the image as it now doesn’t contain this information anymore, and we’ll need those to reopen it in GIMP. In our case it is 2000x1479.

We now proceed to hand over the file to our least favorite staff and let them have an anger tantrum at it. So what does it look like, let’s take a look at the result our monkey did:

World Map, monkey have been randomly mashing the

Not bad at all for something random, but we can do better.

C̸o̶m̸p̷r̶e̷s̸s̵i̸o̶n̴ ̶D̵e̵f̶o̶r̴m̷a̶t̷i̵o̸n̷

Some medium are more malleable when squished properly and squished in different ways. The image sheds a lot of information and only the essence stays. That’s a form of databending.
For example, increasing the compression of JPEG images can open the path for glitches to happen more frequently. This is a key asset, especially when trying to create errors related to the compression parameters within the format of the file.

convert -quality 2 world_map.jpg world_map_compressed.jpg

World Map, compressed to extract its

Keep this in your toolbox to use along with other techniques.

G̵e̴t̵t̶i̸n̵g̷ ̴I̸n̵t̷i̵m̷a̴t̸e̴ ̸W̶i̵t̸h̶ ̴T̵h̸e̸ ̴F̷o̴r̶m̸a̴t̷

We want to corrupt in the most efficient way possible, to create attractive chaos from the smallest change possible. To do that we have to get intimate with the medium, to understand its deepest secrets, tickle the image in the right places. This is what we previously referred to as imagemoshing.

There’s a panoply of image formats, and they all are special in their own ways. However, there’s still some commonality:

  • Header, Footer, and Metadata: If the format contains these extra information, be it extraneous or essential, what they represent, and how they affect the rest of the image.
  • Compression: The format can either be compressed or not. When it is compressed, there can be extra bits of information to help other software uncompress the image data.
  • How the data is laid out: Usually, the image color information is decomposed into its components such as HSL, RGB, or others. These components then need to be represented in the image data, either in an interleaved or planar manner. Planar refers to writing components independently in the data (ex: all R, then all G, then all B), while interleaved refers to having them joined non-contiguously in an alternate sequence (ex: RGB, then RGB, then RGB..).

Manipulating these to our advantage can lead to wonderful glitches. For example, in our previous raw image example — an image bare of header, footer, and without compression — the pixels were interleaved which gave rise to the effect we’ve seen, namely shifts and changes in some colors. Having them in planar form would’ve led to different glitches in separate color channels.

R̵e̷i̷n̴t̶e̷r̶p̸r̴e̸t̸a̷t̶i̷o̵n̴ ̵A̸s̵ ̵R̸i̷c̵h̸ ̸T̷e̵x̵t̴ ̴A̴K̷A̸ ̷W̶o̴r̵d̴P̴a̸d̵ ̷E̵f̸f̴e̶c̶t̶

Let’s give this a try with the well-known WordPad effect, which is about databending an image into rich text: opening the image in WordPad and saving it.
Keep in mind that this only works with raw images as it’s highly destructive and otherwise could break fragile key info in the header and footer. So let’s reuse our interleaved raw image of earlier but also get a planar one.

This is our results for interleaved:

World Map, WordPad effect interleaved

And for planar:

World Map, WordPad effect planar

Technically, what happens is that during the bending and interpretations as rich text, some bytes are inserted in some places and others are replaced. Namely, carriage return (0x0D aka \r) and line feed (0x0A aka \n) association needs to be respected, so if one is missing then WordPad adds it. It also replaces other characters such as 0x07 aka \b by 0x20 aka a space, but that replacement doesn’t affect much the image.
You can find a code simulating this bending here.

R̸e̵i̶n̴t̵e̷r̴p̶r̷e̵t̶a̷t̴i̶o̴n̴ ̶A̵s̷ ̷A̸u̵d̴i̷o̵ ̴A̴K̶A̴ ̶S̸o̸n̵i̷f̴i̸c̷a̸t̴i̵o̴n̴

Another popular databending is sonification, which is about converting non-audio information into auditory information — Something extremely useful for visually impaired people.
In our case, we’ll use an image as audio content, edit it as if it was sound, and visualize it again as an image. Similarly, like most databending, this is almost impossible to do with any other format than a raw uncomressed one, so I hope you haven’t thrown the two originals from the previous section.

We’ll opt for Audacity as our audio editor. Launch it, select: File > Import > Raw Data. Then pick either your planar or interleaved image data and you’ll be presented with this screen:

Audacity raw import options

Don’t freak out! What you pick doesn’t really matter but as far as I’m concerned, better results are found with encoding such as U-law or A-Law, big-endian byte orientation, and mono channel.
When you are done with the editing, go to the File menu > Export > Export Audio. Then pick on the bottom right “Other uncompressed files”, then select the RAW header-less along with the encoding you’ve previously chosen when importing the file (NB adding the extension .data makes it easier to open in GIMP later). Don’t fill anything on the next screen asking for metadata.

You can now have fun applying different types of audio filters on sections of your image. Here are some examples of my favorite songs.

Interleaved image, reverb filter applied:

World Map, Sonification reverb effect interleaved

And the same reverb filter applied to the planar version:

World Map, Sonification reverb effect planar

Interleaved image, echo filter aplied:

World Map, Sonification echo effect planar

And the same echo filter applied to the planar version:

World Map, Sonification echo effect planar

Reverse some sections with others in a planar image:

World Map, Sonification reverse effect planar

Tremolo effect on an interleaved image:

World Map, Sonification termolo effect interleaved

Wahwah effect on a planar image:

World Map, Sonification wahwah effect planar

Overall, most audio effects work pretty well in both interleaved and planar formats. One of them that actually works on compressed media is the cut-paste-and-reverse, as it is not as destructive as other techniques, we’ll give it a shot in the next section.

C̵a̵s̸e̸ ̷S̸t̴u̵d̷y̸:̷ ̵J̵P̸E̶G̷

Let’s get acquainted with one of the most popular compressed image format: JPEG.
We’ll peel its layers, spread them apart, get more intimate, and understand its deepest feelings.

Like most binary formats, JPEG is composed of TLV segments, Type-Length-Value, which as the name implies have well-defined tags for type followed by the length of the value that will be associated with the tag (plus the length of the length itself, 2 in the case of JPEG).
All tags in the JPEG format standard (you’ll find multiples links in the further reading section) are 2 bytes long and always start with 0xFF. JPEG has a well-defined header that starts with the appropriately named “Start of Image” or SoI tag (0xFFD8) and ends with the also appropriately named “End of Image” or EoI tag (0xFFD9).

Anything that starts with 0xFF is considered a tag if it isn’t within the value part of another tag.
There’s one exception to this: when it comes to the actual content of the image, the Entropy Coded Segment (ECS), as it doesn’t have a fixed size but reads until it finds another tag. That is why if 0xFF needs to be included in it, it is stuffed with 0x00 afterward to know it’s really meant to be 0xFF.

Concretely, JPEG are formed of a header which dictates how to decompress the data of the image. This data is then found in a series of loops/scans that each encode a different type of information about the image, be it lighting, tints of red, or others. This loop goes as follows:

JPEG scan loop

The name Entropy Coded Segment, ECS, comes from the fact that it is encoded, usually compressed using a huffman table. As JPEG can contain different tables to compress different color components, the table itself has to be pointed to by the information that is read before the ECS comes along, namely that Start of Scan or SoS section (in the diagram above “Scan”).
So each iteration of the loop tells us how to decode the information in the ECS.

That’s the gist of it, we don’t really care about understanding precisely the full scope of the JPEG specifications, but just enough to be dangerous with it.
Let’s write a script to split up the components of the JPEG so that we can manipulate them independently, to then recompose it and admire the result. You can find such script here.

These are the parts I get after running the script on the world — we divide and conquer.

tree -L 1
├── 01_header.jpg
├── 02_scan.jpg
├── 03_data.jpg
├── 04_scan.jpg
├── 05_data.jpg
├── 06_scan.jpg
├── 07_data.jpg
├── 08_scan.jpg
├── 09_data.jpg
└── 10_end.jpg

0 directories, 10 files

We can definitely hire another monkey to mash the data part of the image, make a mess of its internal structure. However, the JPEG is a bit more sensitive to changes. Still, you can get pretty good results, this is what happens when we have a go at 09_data.jpg and 03_data.jpg.
We can get back our image by doing:

cat [0-9]*.jpg > reconstituted.jpg

World Map, mash jpeg

Even though the changes were minimal, the effect is radical.

Let’s see if we can employ the cut-paste-reverse with the ECS data, just like we’ve done with the raw image for sonification. Let’s get a clean version and open it in audacity.

World Map, cut-paste-reverse jpeg

Cutting and pasting definitely works, but reversing most of the time doesn’t, and the data is extremely sensible.

Each section affects a different feature of the image. The glitches above are caused by editing 03_data.jpg and 09_data.jpg.
If we want to know what each scan adds to the image we can shortcut them by removing other scans or inserting the End of Image early on. This is what each of the JPEG layer does:

World Map, layer 1 jpeg World Map, layer 2 jpeg World Map, layer 3 jpeg World Map, layer 4 jpeg

It’s interesting to notice that the first layer is the most colorful but the smallest, while the last layer is the biggest but has more minutia and fewer colors.

Now we’re ready to dive deeper and to mess with the header parts of the image, fire up your hex editor and get ready. I won’t bother you with the details, but it all comes down to editing specific bits in the headers. Let’s explain by example.

The most-significant bits in the quantization tables and huffman tables have more weight than the least-significant ones. That comes from the way they are laid out, the quantization is in zig-zag while the huffman is coded in a tree, and both are ordered by importance. Also note that each huffman and quantization table is used for different parts of the image, the ones we’ve shown above.

In the header there are 3 quantization tables 0xFFDB, let’s edit one of them and see what it does. In our image they all seem to be for luminance. Editing a single byte in the third quantization table results in this — a true glitch that would be caused by a single bit flip:

World Map, jpeg header dct single bit flip corruption

What about manipulating one of the 3 huffman tables 0xFFC4, which is a bit more tricky but feasible, we just have to edit the symbols in it, 17B after the length tag. Let’s swap two of these symbols in the beginning.

World Map, jpeg header huffman swap bytes corruption

Pretty impressive, what about symbols in the middle of the table.

World Map, jpeg header huffman swap bytes middle corruption

The effect is less pernicious, barely noticeable, because the least significant changes are at the end of the table and represented by more bits.

Instead of manipulating the huffman table in the header we can manipulate which one is pointed to by each part of the image scan, in the Start of Scan aka SoS section 0xFFDA.
This section encodes a lot of interesting things, not only which huffman table should be used, but for what they should be used for, the coefficient of the values in the DCT, the first and last coefficient to use, and much more.
Because the first part is more colorful, let’s play with its SoS, open 02_scan.jpg in your favorite hex editor.

Hex editor

We can see there are 3 components, that means the bytes in the ECS will encode 3 different color components and builds each using two different tables for DC and AC, which are two different coefficients for the DCT table.
Anyway, let’s change the table number, and obviously, that glitches the file.

World Map, jpeg header SoS huffman table corruption

We can also change the way these factors are applied by modifying the last byte of the SoS, the successive approx, the result is dazzling.

World Map, jpeg header SoS successive factor corruption

We can also edit the spectral selection, which is the first and last DCT coefficient used in the zig-zag order. The effect depends on the segment it is applied to, the following was done on the second and third data segments.

World Map, jpeg header SoS spectral selection corruption

This is it for our destructive love of the JPEG format, now let’s move to other things.

I̴m̷a̷g̴e̵ ̶P̸r̷o̷c̸e̷s̵s̸i̴n̵g̵

If you dare to mention image processing in the glitch art community, you’ll bring hell upon yourself. This will be called art that looks glitchy, art glitch and not glitch art.
Technically, there isn’t much difference between databending the pixel components in a raw image form and manipulating the pixel in memory via an image processing algorithm. However, I can agree that the soul and essence of the art will be missing. Meanwhile, the imagemoshing we’ve done with JPEG was closer to what glitch art is about.
You’re going to ask “who cares, and who’ll know the difference”, well I will, and I’ll be watching you along with my evil datamashing monkeys.

Image processing that looks like glitching works by using an image manipulation library such as Python pillow and manipulating the color components of the image to make them look glitchy. If that sounds similar to raw images, it’s because it is.

Let’s mention some techniques I find fun. Let’s start with Pixel Sorting.
Pixel sorting is about selecting pixels that pass a certain criterion, gather them in an array, and order them according to that same criterion. You can apply it horizontally, vertically, or in blocks. You can apply it according to lightness, darkness, hue, or anything that suits you.
Here, you’ll find a script that does just that, and the result looks like this:

World Map, image processing sorting, not corruption

Another technique is called Channel Shifting. It is about storing each color component in a different array and shifting them on the X or Y axis. Doesn’t that remind you of planar raw images? Certainly, it’s because it’s the same technique!

There’s a lot of image processing effects that can be done, as is shown by this website, however the result isn’t always as appealing as a real glitch.
Yes, we’re now true glitch amateurs!

We can even go as far as mixing these by doing an overlay/composite of them using imagemagick.

A̸n̵i̷m̶a̷t̴e̵d̶ ̵W̸a̷l̴l̷p̵a̴p̵e̶r̸

Back to our initial goal: Create a wallpaper carousel of madness.

I’ve tested different approaches, from gifview to mplayer, and the best way I’ve found is the simplest way: A basic script that loops through the images, sleeps, and sets them as wallpaper.
The script can be found here.


Chaos, an important theme in hermetism

Congratulations, you know enough corruption to start a career in politics! You’ll thank me later when you’re rich.
Now go have fun with what you’ve learned, and enjoy your day.

Further Reading


  • S. Michelspacher, Cabala, Augsburg, 1616
  • Homer B. Sprague, Milton's Cosmography, Boston, 1889

Sevan Janiyan (sevan)

Trying to operate macOS in single user mode October 04, 2020 07:52 PM

Wednesday lunch time, I opened up my laptop and in the middle of writing an email my machine froze and after a few seconds rebooted. Uh oh, the system sat at the grey screen for a few seconds and then the dreaded folder with a question mark began flashing which means there was no bootable …

Gustaf Erikson (gerikson)

Two more novels by Paul McAuley October 04, 2020 05:20 PM


  • War of the Maps
  • Austral

McAuley has a wide range. These books were read in reverse publication order.

War of the Maps is a far-future SF story. After our sun has become a white dwarf, post-modern humans construct a Dyson sphere around it and seed it with humans and Earth life. According to the internal legends, they play around a bit then buzz off, leaving the rest of the environment to bumble along as best they can.

The tech level is more or less Victorian but people contend with unique challenges, such as a severe lack of metallic iron and malevolent AIs buried here and here.

Austral is a near-future crime story. A genetically modified young woman gets dragged into a kidnapping plot in a post-AGW Antarctica.

Both are well worth reading!

Andreas Zwinkau (qznc)

Notes with TiddlyWiki October 04, 2020 12:00 AM

Describing my note taking system inspired by Zettelkasten

Read full article!

October 01, 2020

Eric Faehnrich (faehnrich)

Plus-Minus Operator in C October 01, 2020 07:16 PM

The C language has some features that try to achieve orthogonality. For instance, there's both an increment operator ++ and decrement operator --.

I don't think many C programers realize that this is fully orthogonal, with the plus-minus operator +- and minus-plus operator -+, which are combinations of the increment and decrement operators, and result in an unchanged value.


int main() {
   int i = 3;
   printf("i: %d\n", i);  // 3
   printf("++i: %d\n", i);// 4
   printf("--i: %d\n", i);// 3
   printf("+-i: %d\n", i);// 3
   printf("-+i: %d\n", i);// 3

Of course I'm kidding, but not exactly how you might think.

I'm not kidding about this code working as I claim, it does compile and +- and -+ really do leave i unchanged.

I'm kidding about +- and -+ being whole operators on their own. This is similar to the "goes to" operator -->. They're not operators on their own, but a combination of multiple operators, in this case the negation operator - and the unary plus operator +.

The unary plus operator may be the part that makes this trick work because I don't think it's that well know. How it works, the negation and unary plus operator don't change the value of their operand, so in the above code the operator is applied then we just drop the value. You'll see an unused value warning if you turned them on.

If you thought the reason I'm kidding is because they wouldn't have added operators that do nothing just for the sake of symmetry, well that's where you're wrong. Because the unary plus operator is just that. It's like negation, but not negating, so it gives you the same value. (Ok, it doesn't do "nothing", but really it is just the same value.)

And it really was added just so we have something matching the negation operator. From K&R second edition:

The unary + is new with the ANSI standard. It was added for symmetry with the unary -.

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Let October 01, 2020 04:00 PM


Welcome back to the “Compiling a Lisp” series. Last time we added a reader (also known as a parser) to our compiler. This time we’re going to compile a new form: let expressions.

Let expressions are a way to bind variables to values in a particular scope. For example:

(let ((a 1) (b 2))
  (+ a b))

Binds a to 1 and b to 2, but only for the body of the let — the rest of the S-expression — and then executes the body.

This is similar to this very rough translated code in C:

int result;
  int a = 1;
  int b = 2;
  result = a + b;

It’s also different because let-expressions do not make previous binding names available to expressions being bound. For example, the following program should fail because it cannot find the name a:

(let ((a 1) (b a))

There is a form that makes bindings available serially, but that is called let* and we are not implementing that today.

For completeness’ sake, there is also letrec, which makes names available to all bindings, including within the same binding. This is useful for binding recursive or mutually recursive functions. Again, we are not implementing that today.

Name binding implementation strategy

You’ll notice two new things about let expressions:

  1. They introduce ways to bind names to values, something we have to figure out how to keep track of
  2. In order to use those names we have to figure out how to look up what the name means

In more technical terms, we have to add environments to our compiler. We can then use those environments to map names to stack locations.

“Environment” is just a fancy word for “look-up table”. In order to implement this table, we’re going to make an association list.

An association list is a list of (key value) pairs. Adding a pair means tacking it on at the end (or beginning) of the list. Searching through the table involves a linear scan, checking if keys match.

You may be wondering why we’re using this data structure to implement environments. Didn’t I even take a data structures course in college? Shouldn’t I know that linear equals slow and that I should obviously use a hash table?

Well, hash tables have costs too. They are hard to implement right; they have high overhead despite being technically constant time; they incur higher space cost per entry.

For a compiler as small as this, a tuned hash table implementation could easily be as many lines of code as the rest of the compiler. Since we’re also compiling small programs, we’ll worry about time complexity later. It is only an implementation detail.

In order to do this, we’ll first draw up an association list. We’ll use a linked list, just like cons cells:

// Env

typedef struct Env {
  const char *name;
  word value;
  struct Env *prev;
} Env;

I’ve done the usual thing and overloaded Env to mean both “a node in the environment” and “a whole environment”. While one little Env struct only holds a one name and one value, it also points to the rest of them, eventually ending with NULL.

This Env will map names (symbols) to stack offsets. This is because we’re going to continue our strategy of not doing register allocation.

To manipulate this data structure, we will also have two functions1:

Env Env_bind(const char *name, word value, Env *prev);
bool Env_find(Env *env, const char *key, word *result);

Env_bind creates a new node from the given name and value, borrowing a reference to the name, and prepends it to prev. Instead of returning an Env*, it returns a whole struct. We’ll learn more about why later, but the “TL;DR” is that I think it requires less manual cleanup.

Env_find takes an Env* and searches through the linked list for a name matching the given key. If it finds a match, it returns true and stores the value in *result. Otherwise, it returns false.

We can stop at the first match because Lisp allows name shadowing. Shadowing occurs when a binding at a inner scope has the same name as a binding at an outer scope. The inner binding takes precedence:

(let ((a 1))
  (let ((a 2))
; => 2

Let’s learn about how these functions are implemented.

Name binding implementation

Env_bind is a little silly looking, but it’s equivalent to prepending a node onto a chain of linked-list nodes. It returns a struct Env containing the parameters passed to the function. I opted not to return a heap pointer (allocated with malloc, etc) so that this can be easily stored in a stack-allocated variable.

Env Env_bind(const char *name, word value, Env *prev) {
  return (Env){.name = name, .value = value, .prev = prev};

Note that we’re prepending, not appending, so that names we add deeper in a let chain shadow names from outside.

Env_find does a recursive linear search through the linked list nodes. It may look familiar to you if you’ve already written such a function in your life.

bool Env_find(Env *env, const char *key, word *result) {
  if (env == NULL)
    return false;
  if (strcmp(env->name, key) == 0) {
    *result = env->value;
    return true;
  return Env_find(env->prev, key, result);

We search for the node with the string key and return the stack offset associated with it.

Alright, now we’ve got names and data structures. Let’s implement some name resolution and name binding.

Compiling name resolution

Up until now, Compile_expr could only compile integers, characters, booleans, nil, and some primitive call expressions (via Compile_call). Now we’re going to add a new case: symbols.

When a symbol is compiled, the compiler will look up its stack offset in the current environment and emit a load. This opcode, Emit_load_reg_indirect, is very similar to Emit_add_reg_indirect that we implemented for primitive binary functions.

int Compile_expr(Buffer *buf, ASTNode *node, word stack_index,
                 Env *varenv) {
  // ...
  if (AST_is_symbol(node)) {
    const char *symbol = AST_symbol_cstr(node);
    word value;
    if (Env_find(varenv, symbol, &value)) {
      Emit_load_reg_indirect(buf, /*dst=*/kRax, /*src=*/Ind(kRbp, value));
      return 0;
    return -1;
  assert(0 && "unexpected node type");

If the variable is not in the environment, this is a compiler error and we return -1 to signal that. This is not a tremendously helpful signal. Maybe soon we will add more helpful error messages.

Ah, yes, varenv. You will, like I had to, go and add an Env* parameter to all relevant Compile_XYZ functions and then plumb it through the recursive calls. Have fun!

Compiling let, finally

Now that we can resolve the names, let’s go ahead and compile the expressions that bind them.

We’ll have to add a case in Compile_expr. We could add it in the body of Compile_expr itself, but there is some helpful setup in Compile_call already. It’s a bit of a misnomer, since it’s not a call, but oh well.

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index, Env *varenv) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "let")) {
      return Compile_let(buf, /*bindings=*/operand1(args),
                         /*body=*/operand2(args), stack_index,
  assert(0 && "unexpected call type");

We have two cases to handle: no bindings and some bindings. We’ll tackle these recursively, with no bindings being the base case. For that reason, I added a helper function Compile_let2.

As with all of the other compiler functions, we pass it an machine code buffer, a stack index, and an environment. Unlike other functions, we passed it two expressions and two environments.

I split up the bindings and the body so we can more easily recurse on the bindings as we go through them. When we get to the end (the base case), the bindings will be nil and we can just compile the body.

We have two environments for the reason I mentioned above: when we’re evaluating the expressions that we’re binding the names to, we can’t add bindings iteratively. We have to evaluate them in the parent environment. It’ll be come clearer in a moment how that works.

We’ll tackle the simple case first — no bindings:

int Compile_let(Buffer *buf, ASTNode *bindings, ASTNode *body,
                word stack_index, Env *binding_env, Env *body_env) {
  if (AST_is_nil(bindings)) {
    // Base case: no bindings. Compile the body
    _(Compile_expr(buf, body, stack_index, body_env));
    return 0;
  // ...

In that case, we compile the body using the body_env as the environment. This is the environment that we will have added all of the bindings to.

In the case where we do have bindings, we can take the first one off and pull it apart:

  // ...
  // Get the next binding
  ASTNode *binding = AST_pair_car(bindings);
  ASTNode *name = AST_pair_car(binding);
  ASTNode *binding_expr = AST_pair_car(AST_pair_cdr(binding));
  // ...

Once we have the binding_expr, we should compile it. The result will end up in rax, per our internal compiler convention. We’ll then store it in the next available stack location:

  // ...
  // Compile the binding expression
  _(Compile_expr(buf, binding_expr, stack_index, binding_env));
  Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
  // ...

We’re compiling this binding expression in binding_env, the parent environment, because we don’t want the previous bindings to be visible.

Once we’ve generated code to store it on the stack, we should register that stack location with the binding name in the environment:

  // ...
  // Bind the name
  Env entry = Env_bind(AST_symbol_cstr(name), stack_index, body_env);
  // ...

Note that we’re binding it in the body_env because we want this to be available to the body, but not the other bindings.

Also note that since this new binding is created in a way that does not modify body_env (entry only points to body_env), it will automatically be cleaned up at the end of this invocation of Compile_let. This is a little subtle in C but it’s clearer in more functional languages.

At this point we’ve done all the work required for one binding. All that’s left to do is emit a recursive call to handle the rest – the cdr of bindings. We’ll decrement the stack_index since we just used the current stack_index.

  // ...
  _(Compile_let(buf, AST_pair_cdr(bindings), body, stack_index - kWordSize,
                /*binding_env=*/binding_env, /*body_env=*/&entry));
  return 0;

That’s it. That’s let, compiled, in five steps:

  1. If in the base case, compile the body
  2. Pick apart the binding
  3. Compile the first binding expression
  4. Store it in the environment
  5. Recurse

Well done!

Internal state and debugging

It’s hard to write the above code without really proving to yourself that it does something reasonable. For that, we can add some debug print statements to our compiler that print out at what stack offsets it is storing variables.

sequoia% ./bin/compiling-let --repl-eval
lisp> (let () (+ 1 2))
lisp> (let ((a 1)) (+ a 2))
binding 'a' at [rbp-8]
lisp> (let ((a 1) (b 2)) (+ a b))
binding 'a' at [rbp-8]
binding 'b' at [rbp-16]
lisp> (let ((a 1) (b 2)) (let ((c 3)) (+ a (+ b c))))
binding 'a' at [rbp-8]
binding 'b' at [rbp-16]
binding 'c' at [rbp-24]

This shows us that everything looks like it is working as intended! Variables all get sequential locations on the stack.

Compiling let* and modifications

A thought exercise for the reader: what would it mean to compile let*? What modifications would you make to the Compile_let function? Take a look at the footnote3 if you want to double check your answer. I’m not going to implement it in my compiler, though. Too lazy.


As usual, we have a testing section. There are a couple checks that a reasonable compiler should do to reject bad programs that we’ve left on the table, so we won’t test:

  • let expressions that bind a name twice
  • poorly formed binding lists
  • poorly formed let bodies

I suppose we expect programmers to write well-formed programs. You’re more than welcome to add informative error messages and helpful return values, though.

Here are some tests that I added for let. One for the base case:

TEST compile_let_with_no_bindings(Buffer *buf) {
  ASTNode *node = Reader_read("(let () (+ 1 2))");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(3), result, "0x%lx");

One for let with one binding:

TEST compile_let_with_one_binding(Buffer *buf) {
  ASTNode *node = Reader_read("(let ((a 1)) (+ a 2))");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(3), result, "0x%lx");

and for multiple bindings:

TEST compile_let_with_multiple_bindings(Buffer *buf) {
  ASTNode *node = Reader_read("(let ((a 1) (b 2)) (+ a b))");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(3), result, "0x%lx");

Last, most interestingly, we have a test that let is not actually let* in disguise. We check this by compiling a let expression with bindings that expect to be able to refer to one another. I wrote this test afer realizing that I had accidentally written let* in the first place:

TEST compile_let_is_not_let_star(Buffer *buf) {
  ASTNode *node = Reader_read("(let ((a 1) (b a)) a)");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, -1);

Next time

That’s a wrap, folks. Time to let go. Har har har. Next time we’ll add if-expressions, so our programs can make decisions! Have a great day. Don’t forget to tell your friends you love them.

Mini Table of Contents

  1. While I am very pleased with the bind/find symmetry, I am less pleased with the Env/bool asymmetry. Maybe I should have gone for Node

  2. If you’re a seasoned Lisper, you may be wondering why I don’t rewrite let to lambda and use my implementation of closures to solve this problem. Well, right now we don’t have support for closures because I’m following the Ghuloum tutorial and that requires a lot of to-be-implemented machinery.

    Even if we did have that machinery and rewrote let to lambda, the compiler would generate unnecessarily slow code. Without an optimizer to transform the lambdas back into lets, the naïve implementation would output call instructions. And if we had the optimizer, well, we’d be back where we started with our let implementation. 

  3. To compile let*, you could do one of two things: you could remove the second environment parameter and compile the bindings in the same environment as you compile the body. Alternatively, you could get fancy and make an AST rewriter that rewrites (let* ((a 1) (b a)) xyz) to (let ((a 1)) (let ((b a)) xyz)). The nested let will have the same effect. 

Gustaf Erikson (gerikson)

Noon van der Silk (silky)

Privileges October 01, 2020 12:00 AM

Posted on October 1, 2020 by Noon van der Silk

Inspired by this recent paper - Towards decolonising computational sciences - I thought it might be a nice idea to write down a list of privileges that I can identify I have. I’ll probably update this list as I think of them.

My list:

  • White, Male, Cisgendered: I don’t often feel out of place.
  • Healthy: Nothing has prevented me from finding work.
  • Able-bodied: Getting around is easy.
  • Somewhat extroverted: Able to engage and network somewhat easily when I want to.
  • No dependants: Freedom to move around and spend time as I wish.
  • Financial stability from a young age: Able to take (moderate) risks.
  • Supportive parent: I know I’ll have a place to go.
  • Supportive partner: Financial/emotional support to experiment.
  • Live close to the city: Access to events, convenience.
  • Good mental health: I’m able to focus when I want; am most of the time moderately calm.
  • Language: I speak English natively.
  • Exposed to tech from a young age: I had a computer in the house since I was young, and was able to learn programming on my own as a result.
  • Well-educated: I went to a private school, and was eventually able to attend good universities and get access to experts and knowledge in this way.

These things have helped me get jobs in the past. Some of them are traits that you can improve, as well as baseline things thare are out of my control to some degree (i.e. mental health; certainly it can be improved, but certainly it’s not all my doing.)

I also found this: Privilege Checklist from this Social Justice Training program, that is a very good starting point for building a comprehensive list. This feels like a great exercise for teams to get together and think about.

What’s your list?

September 30, 2020

Jan van den Berg (j11g)

Working 101 September 30, 2020 10:05 AM

Do you struggle to organise your work, because it seems everybody wants something from you? Of do you often wonder whether you’re doing the right things? This post helps you to answer those questions.

Here are the six basic responsibilities you have as a professional in the modern workplace. Follow these and you are on the right path.

I wrote these down as a reminder to myself and to pass on to people. Because it is easy to lose sight of your basic responsibilities. I also noticed a lot of young professionals struggle with what is asked of them.

Regardless of your specific job — whether you are a manager or engineer — just starting out or a seasoned professional, the following basics will always apply:

  • Know thy time
  • Add value
  • Use your leverage
  • Manage expectations
  • Track your tasks
  • Prepare for a different job

I distilled these from my experiences as an engineer, Engineering Manager and CTO for a tech company. And all of them were shaped or sharpened by reading and applying what I read.

The basics are presented as instructions. The key action per item is bold and at the bottom of each item are the book references in cursive.

Know thy time

Time is totally perishable and cannot be stored. Yesterday’s time is gone forever and will never come back.

Peter F. Drucker

This is the most important thing you can learn about the most valuable asset you have: your time. Every second is unique and you can only spend it once.

Know where your time goes and demand that your time is used wisely.


Measure your time. There are many ways. Here is the most basic one: write down, during the day what you did and then every morning — and this key: take a moment and reflect on the previous day.

  • What worked, what didn’t, what are you happy about, what not?
  • What would have liked to done differently?

Do this every day and you will see patterns emerge, and you will learn more about yourself and your talents and your future (more on these two things later).

If you have never tracked your time: this is the most basic thing you can do. As you become more skilled in this, weekly, monthly or yearly reflections provide even more insight. As will discussing and reflecting on your accomplishments with an accountability partner or coach.


You value your time, and will even more when you start writing down your daily accomplishments.
And you should also demand people to take your time seriously. Examples:

  • Skip meetings if you don’t feel you can add value or if you think you can add more value somewhere else.
  • Shorten your meetings, someone shoots an invite for an hour? A very common thing that happens in organisations. Reply to make it half an hour.

Companies and teams have the characteristic to follow Parkinson’s law. Study this. This is a real thing organisations and teams struggle with. Be on the lookout for it.

If you cannot manage your time you can’t manage anything else.

Read on and you will see that this specific instruction permeates all other instructions.

Read Drucker and Aurelius to know more about this responsibility.

Add value

You are paid to add value. You are not paid for your time; for simply clocking in 8 hours every day. If you are, your company is doing it wrong and you are on the wrong path.

If you are an engineer there are only two things that add value, and only two:

Creating things and solving problems.

That’s it.

You help your company by creating things and solving problems that their customers pay for. This is your contribution. Everything else is a byproduct of the above. And if it isn’t, immediately stop doing it.
You are by no means paid to have meetings, they can be a necessity, a means to an end, but never the end itself.

Meetings are also arrangements for people to socialize. This is fine and has its purpose (teambonding or building trust or just fun). But again, the real purpose and goal is to always add value.

Yes, but I am a manager?

Make no mistake, as a manager you are paid for the exact same two things. However as a product-, customer- or teammanager, your work is often less tangible or more indirectly related to the above. But if you drill down, your responsibilities as a manager are:

  • Decide priorities of things that need to be created or solved
  • Keep track of projects and commitments
  • Communicate within team and with other teams
  • Help team members grow

These four duties as a manager (or senior engineer) are to ensure the team is still doing either one of these two things: creating the right things or solving the right problems. There is no difference in responsibilities, just different tasks.

Read Grove, Drucker and Evans to know more about this responsibility.

Use your leverage

If you combine the above two instructions (Know thy time and Add Value) it leads to this: you are always trying to spend your time to add as much value as possible.

Whether you are an engineer or manager: you have unique talents. This is your leverage, this is what enables you to add value, this is why you were hired. Use your talents as a leverage to always try to add the most value.
You know your talents. And if you don’t, start writing down what you did the day before, reflect on it and I assure you your talents will soon emerge (Know thy time). And with this knowledge:

Always ask: where can I at this moment add the most value?

Is sitting in a meeting with junior engineers to train them the best use of my time? Or should I try to finish building this database cluster? Or should I call this supplier and discuss their proposal? Different tasks that ask for different talents. And the answer is never straightforward and depends on many things. You have to decide.

But the rule of thumb is: always pick the activity where your unique talents can have the most impact to the added value of the team or company at that moment.

Read Grove to know more about this responsibility.

Manage expectations

In trying to reach the goals of either creating things or solving problems there are only two outcomes.

  1. You reach the result: you created what was expected or you solved the problem. Great!
  2. You communicate early that things weren’t going as planned. Not great, but this happens all the time.

It is your responsibility to manage expectations and try to eliminate surprises.

For companies: surprises are bad, avoid them. A job is not a birthday surprise party. Your coworker does not like surprises, nor does your manager and I can assure you his / her manager even less (unless it is their actual birthday of course).
The way to avoid surprises is to communicate often and early. And sometimes this is the only tool you have to manage things that are beyond your control (suppliers, illness etc.).

Of course always focus first on the first outcome (reach the result), but don’t wait to communicate when commitments or expectations are on the verge of being broken.

Read Allen, Drucker and Crucial Conversations to know more about this responsibility.

Track your tasks

You cannot slay the dragon until you can see it.

Cal Newport

If you have a job where you don’t have to write down what your team, manager, customers, third parties need from you, you are either a genius or your job cannot be very satisfying. Let’s assume you are not a genius and that you have a challenging job. You need to start writing things down. You need a system to keep track of everything that is in your head, to get it out of your head and actively work on it.

Clear your head by writing everything down. Please don’t use precious brain cycles to keep track of what needs to be done. I repeat: you are not paid to keep track of things. You are paid to add value. You do not add value by keeping track of things, you add value by creating things or solving problems.

At the simplest form this is the opposite of Know thy time where you write down and reflect on what you did the previous day (backwards). Track your tasks is, at the most basic level, a list of what you will do today (forward). You can combine these two activities in one sitting, every morning. It will only take a couple of minutes.

Write down what you want to achieve today.

This not only gives you a reflection point for the next morning (Know Thy Time) but it will also structure your day and give you a good guideline of when you need to demand your time to be taken seriously (“Sorry I can’t work on that right now because..” etc.).

It will also ensure that you add value and it will be an invaluable resource in deciding whether you need to manage expectations.

Of course there are all sorts of ways to structure this to prioritize or specify your tasks. Here are the three main ones:

  • First things first
  • Start with the end in mind
  • Do one thing at a time

You can discuss at length about these, but see it as starting point. The key thing here is: you need a system: pen & paper, a computerfile or specialized tools. It does not matter what system you use. But please: clear your head.

Read Allen, Grove, Drucker and Covey to know more about this responsibility.

Prepare for a different job

This is not your last job. You will need to find another job. Prepare for this. This is your responsibility. Now. Not when you find yourself looking for a new job.

Know where your time goes and you will know your talents. You will also need to know or find out if your talents apply to other areas.

Actively prepare for a different job by always testing your talents against other jobs.

Know the difference between skill and talent. A good employer will look for talent more than skill. Say you are masterfully skilled in the custom, specialized CRM of your current employer. Your next employer will not have this CRM. This skill is useless. Your talent however could be you are very quick in picking up working with CRMs in general.

See training as a continuous process and not an event. You should always be trying to learn new things. Look to train for things that are generally applicable.

Don’t know what to do next? Write your own eulogy, be candid. What would you want people to say or remember about you? This is not some morbid experiment but one that will reveal your true desires. See if they line up with your talents. Where is the gap? Actively try to close this gap.

Read Drucker, Kotter, Covey and Johnson to know more about this responsibility.


This post is a summary of every responsibility you have as a professional. It presents a coherent model of six principles that can sharpen your views on your professional responsibilities.

This post also offers a variety of literary references as a starting point for you to dig deeper into the mentioned subjects. Because every subject here is, of course, much broader and deeper than will fit into a blogpost.


This question was left unanswered. And for all intended purposes it could as well have been at the top of the post. Why indeed a summary of principles and instructions?

Simple: you spend about half your waking life at your job. This is time you can only spend once. So this is extremely valuable time.

Your time is valuable and important and you want to spend it on something that is both satisfying and fulfilling. You don’t want to spend your days propped up behind a screen doing things until you can clock out, right? This a a dead-end. And you know it. I believe that a job that is satisfying and fulfilling provides meaning and leads to a richer life. And I am sure these instructions can you help you achieve that.

This post is also available in Dutch 🇳🇱.

The post Working 101 appeared first on Jan van den Berg.

September 28, 2020

Phil Hagelberg (technomancy)

in which many rays are cast September 28, 2020 01:06 PM

The Lisp Game Jam is a semiannual tradition I enjoy participating in. This time around I created Spilljackers, a 3D cyberspace heist game with my friend Emma Bukacek. Rather than use Fennel with the LÖVE framework (my usual choice) we went back to TIC-80 which we had used previously on our last game collaboration.

3D brightly-colored maze with menacing enemy approaching

There's a lot to say about the style and feel of the game; Emma's punchy writing and catchy tunes brought so much to the table, but for this post I want to focus on the technical aspects. This was the first time I tried writing a 3D game. Instead of doing "proper 3D" which requires a lot of matrix math. I took a much simpler approach and built it using raycasting which applies some limitations but results in much simpler and faster1 code.

Raycasting (not to be confused with ray-tracing) works by making each column of pixels on the screen cast a "ray" out to see what walls it encounters, and we use the information about walls to draw a column of pixels representing the wall. Some raycasting engines (like that of the famous Wolfenstein 3D) force all walls to be the same height, which means you can stop tracing when you hit the first wall, but we want to allow walls of various heights, so it traces the ray out to the distance limit, then tracks back and draws all the lines back-to-front so that the nearer lines cover the further ones.

diagram of casting rays

The final jam version of Spilljackers was 1093 lines, but I had the basic rendering of the map nailed down after the first evening of coding, and in fact the core of the algorithm can be demoed in 43 lines of code. There is some trig used, but if you've ever built a 2D game that used trig to do movement or collision detection, the math here is no more complex than that. Let's take a look!

We start by defining some constants and variables, including screen size, player characteristics (speed, turn speed, width, and height), and position/rotation:

(local (W MW H MH tau) (values 240 136 140 68 (* math.pi 2)))
(local (spd tspd pw ph) (values 0.7 0.06 0.7 0.5))
(var (x y rotation) (values 12 12 0))

Next we have the movement code. This looks a lot like it would in a 2D TIC game—when we move, we check all four corners of the player's bounding box to see if the new position is valid based on whether the map coordinates for that position show an empty tile (zero) or not. In a real game we would have some momentum and sliding across walls, but that's omitted for clarity.

(fn ok? [x y] (= 0 (mget (// x 8) (// y 8))))

(fn move [spd]
  (let [nx (+ x (* spd (math.cos rotation))) ny (+ y (* spd (math.sin rotation)))]
    (when (and (ok? (- nx pw) (- ny pw)) (ok? (- nx pw) (+ ny pw))
               (ok? (+ nx pw) (- ny pw)) (ok? (+ nx pw) (+ ny pw)))
      (set (x y) (values nx ny)))))

(fn input-update []
  (when (btn 0) (move spd))
  (when (btn 1) (move (- spd)))
  (when (btn 2) (set rotation (% (- rotation tspd) tau)))
  (when (btn 3) (set rotation (% (+ rotation tspd) tau))))

Now before we get to the actual raycasting, let's take a look at the map data. In TIC-80 each map position has a tile number in it which corresponds to a location on the sprite sheet. Rather than encoding complex tables of tile numbers to the visual properties of the map cells they describe, we encode properties about the tile in its sprite sheet position.

grid of colored sprites

The color of the cell is determined by the sprite's column, and the height of the cell is determined by the its row. In this image tile #34 is selected. Since the sprites are arranged in a 16x16 grid, we calculate its column (and therefore its color) by taking the tile number and calculating modulo 16, getting 3. Likewise 34 divided by 16 (integer division) is 3, which gives us our height multiplier.

Back to the code—let's jump to the outermost function and work our way inwards. The TIC global function is called sixty times per second and ties everything together: reading input, updating state, and drawing. The for loop here steps thru every column to call draw-ray after precalculating a few things.

(fn _G.TIC []
  (for [col 0 W] ; draw one column of the screen at a time
    (let [lens-r (math.atan (/ (- col MW) 100))]
      (draw-ray (math.sin (+ rotation lens-r)) (math.cos (+ rotation lens-r))
                (math.cos lens-r) col x y x y 16))))
fisheye effect

If we calculate all distances as being from the single x,y point representing the player's position, (as is the case in the video here) then columns at the player's peripheral vision will look further away than columns near the center of the screen, resulting in a fisheye lens effect. The lens-r value above is used below to counteract that by calculating how far away the current column is from the midpoint of the screen. We also precalculate cos and sin once as an optimization to avoid having to do it repeatedly within draw-ray.

The draw-ray function below starts by calling the cast helper function to see where the ray will intersect with the next tile, and what tile number that is. The height of the wall is calculated based on the distance of that intersection point from the player, with the lens-factor applied as mentioned above to counteract the fisheye effect. Once we have the height factor, it's used to calculate the top and bottom of the "wall slice" line by offsetting it from MH (the vertical midpoint of the screen), the ph height of the player, and (// tile 16), which tells us which row in the sprite sheet we're looking at.

Because we have to draw some walls behind other walls, the draw-ray function must be recursive. The limit argument tells us how far to recurse; if we haven't hit our limit, keep casting the ray before calling line to actually render the wall we've just calculated. This ensures that more distant walls are drawn behind closer walls. Finally we only call line if the tile is nonzero, because the zeroth tile indicates empty space. The color of the line is (% tile 16) since as per above, the column in the 16-tile-wide sprite sheet determines wall color.

(fn draw-ray [sin cos lens-factor col rx ry x y limit]
  (let [(hit-x hit-y tile) (cast rx ry cos sin) ; where and what tile is hit?
        dist (math.sqrt (+ (math.pow (- hit-x x) 2) (math.pow (- hit-y y) 2)))
        height-factor (/ 800 (* dist lens-factor))
        top (- MH (* height-factor (+ (// tile 16) (- 1 ph))))
        bottom (+ MH (* height-factor ph))]
    (when (< 0 limit) ; draw behind the current wall first
      (draw-ray sin cos lens-factor col hit-x hit-y x y (- limit 1)))
    (when (not= tile 0) ; only draw nonzero tiles
      (line col top col bottom (% tile 16)))))

In order to determine which tile a ray hits next, the cast function must check whether the ray will hit a horizontal edge of the next map cell or a vertical edge. Once it determines this it can use the precalculated cos and sin values to pinpoint the coordinates at which the next cell is hit, and call mget to identify the tile of the cell.

(fn cast-n [n d]
  (- (* 8 (if (< 0 d) (+ 1 (// n 8)) (- (math.ceil (/ n 8)) 1) )) n))

(fn ray-hits-x? [nx ny nxy nyx]
  (< (+ (* nx nx) (* nxy nxy)) (+ (* ny ny) (* nyx nyx))))

(fn cast [x y cos sin]
  (let [nx (cast-n x cos) nxy (/ (* nx sin) cos)
        ny (cast-n y sin) nyx (/ (* ny cos) sin)]
    (if (ray-hits-x? nx ny nxy nyx)
        (let [cx (+ x nx) cy (+ y nxy)]
          (values cx cy (mget (// (+ cx cos) 8) (// cy 8))))
        (let [cx (+ x nyx) cy (+ y ny)]
          (values cx cy (mget (// cx 8) (// (+ cy sin) 8)))))))

And that's it! That's all you need2 for a minimal raycasting game in TIC-80. Below is an embedded HTML export of the game so you can try it out for yourself! Pressing ESC and clicking "close game" will bring you to the TIC console where you can press ESC again to see the code and map editors. Making changes to the code or map and entering RUN in the console will show you the effects of your changes!

.modal{display:none;position:fixed;z-index:1;padding-top:100px;left:0;top:0;width:100%;height:100%;overflow:auto;background-color:#000;background-color:rgba(0,0,0,.4)} .modal-content{color: #333c57;position:relative;background-color:#fefefe;margin:auto;padding:2px 16px;border:1px solid #888;width:500px;box-shadow:0 4px 8px 0 rgba(0,0,0,.2),0 6px 20px 0 rgba(0,0,0,.19);-webkit-animation-name:animatetop;-webkit-animation-duration:.4s;animation-name:animatetop;animation-duration:.4s}@keyframes animatetop{from{top:-300px;opacity:0}to{top:0;opacity:1}} .close{color:#000;float:right;font-size:28px;font-weight:700} .close:focus, .close:hover{color:#000;text-decoration:none;cursor:pointer} #game-frame > div { font-size: 44px; font-family: monospace; font-weight: bold;} .game { width: 800px; height: 500px; }


I learned about raycasting from reading and modifying the source to the game Portal Caster which creates some neat puzzles using portals. I also found this write-up of FPS-80, a somewhat more elaborate TIC-80 raycaster that includes some impressive lighting effects. In the final version of Spilljackers, I used the interlaced rendering strategy from FPS-80. Every even tick, you render the even columns, and every odd tick you render the odd columns. This results in some "fuzzy" visuals, but it improves performance to the point where the game runs smoothly even with a long render distance even on an old Thinkpad from 2008.

[1] In this context, the reason raycasting is faster is that the platform I'm using (TIC-80) does not have any access to the GPU and does all its rendering on the CPU. If you have a GPU then things are different!

[2] You can see the full code on its own in a text file here. If you have TIC-80 downloaded, you can run tic80 mini.fnl to load up the game locally; the data for the map and palette are encoded as comments in the bottom of the text file.

September 27, 2020

Derek Jones (derek-jones)

Learning useful stuff from the Ecosystems chapter of my book September 27, 2020 09:35 PM

What useful, practical things might professional software developers learn from the Ecosystems chapter in my evidence-based software engineering book?

This week I checked the ecosystems chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

A casual reader would conclude that software engineering ecosystems involved lots of topics, with little or no theory connecting them. I had great plans for the connecting theories, but lack of detailed data, time and inspiration means the plans remain in my head (e.g., modelling the interaction between the growth of source code written in a particular language and the number of developers actively using that language).

For managers, the usefulness of this chapter is the strategic perspective it provides. How does what they and others are doing relate to everything else, and what patterns of evolution are to be expected?

Software people like to think that everything about software is unique. Software is unique, but the activities around it follow patterns that have been followed by other unique technologies, e.g., the automobile and jet engines. There is useful stuff to be learned from non-software ecosystems, and the chapter discusses some similarities I have learned about.

There is lots more evidence of the finite lifetime of software related items: lifetime of products, Linux distributions, packages, APIs and software careers.

Some readers might be surprised by the amount of discussion about what is now historical hardware. Software needs hardware to execute it, and the characteristics of the hardware of the day can have a significant impact on the characteristics of the software that gets written. I suspect that most of this discussion will not be that useful to most readers, but it provides some context around why things are the way they are today.

Readers with a wide knowledge of software ecosystems will notice that several major ecosystems barely get a mention. Embedded systems is a huge market, as is Microsoft Windows, and very many professional developers use C++. However, to date the focus of most research has been around Linux and Android (because its use of Java, a language often taught in academia), and languages that have a major package repository. So the ecosystems chapter presents a rather blinkered view of software engineering ecosystems.

What did I learn from this chapter?

Software ecosystems are bigger and more complicated that I had originally thought.

Readers might have a completely different learning experience from reading the ecosystems chapter. What useful things did you learn from the ecosystems chapter?

Ponylang (SeanTAllen)

Last Week in Pony - September 27, 2020 September 27, 2020 02:55 PM

Ponyc 0.38.1 has been released. Support for prebuilt “generic glibc Linux” ponyc binaries is being dropped in favor of prebuilt images for specific Linux distributions. We are also pleased to announce Jason Carr, AKA @jasoncarr0, is now a Pony committer!

September 24, 2020

Unrelenting Technology (myfreeweb)

Noticed the mgb driver for Microchip LAN7430 September 24, 2020 05:31 PM

Noticed the mgb driver for Microchip LAN7430 (/31) NIC in FreeBSD commit logs. Huh, interesting stuff: Microchip publishes so much documentation.. a “Programmer’s Guide” PDF with lots of driver pseudocode, and even evaluation board design files!

September 23, 2020

Sevan Janiyan (sevan)

Book review: BPF Performance Tools: Linux System and Application Observability September 23, 2020 07:21 PM

It’s more than 11 years since the shouting in the data centre video landed and I still manage to surprise folks in 2020 who have never seen it with what is possible.The idea that such transparency is a reality in some circles comes as a shock. Without the facility to be able to dynamically instrument …

September 22, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Reader September 22, 2020 05:00 AM


Welcome back to the “Compiling a Lisp” series. This time I want to take a break from compiling and finally add a reader. I’m finally getting frustrated manually entering increasinly complicated ASTs, so I figure it is time. After this post, we’ll be able to type in programs like:

(< (+ 1 2) (- 4 3))

and have our compiler make ASTs for us! Magic. This will also add some nice debugging tools for us. For example, imagine an interactive command line utility in which we can enter Lisp expressions and the compiler prints out human-readable assembly (and hex? maybe?). It could even run the code, too. Check out this imaginary demo:

lisp> 1
; mov rax, 0x4
=> 1
lisp> (add1 1)
; mov rax, 0x4
; add rax, 0x4
=> 2

Wow, what a thought.

The Reader interface

To make this interface as simple and testable as possible, I want the reader interface to take in a C string and return an ASTNode *:

ASTNode *Reader_read(char *input);

We can add interfaces later to support reading from a FILE* or file descriptor or something, but for now we’ll just use strings and line-based input.

On success, we’ll return a fully-formed ASTNode*. But on error, well, hold on. We can’t just return NULL. On many platforms, NULL is defined to be 0, which is how we encode the integer 0. On others, it could be defined to be 0x555555551 or something equally silly. Regardless, its value might overlap with our type encoding scheme in some unintended way.

This means that we have to go ahead and add another immediate object: an Error object. We have some open immediate tag bits, so sure, why not. We can also use this to signal runtime errors and other fun things. It’ll probably be useful.

The Error object

Back to the object tag diagram. Below I have reproduced the tag diagram from previous posts, but now with a new entry (denoted by <-). This new entry shows the encoding for the canonical Error object.

High                                                         Low
0000000000000000000000000000000000000000000000000XXXXXXX00001111  Character
00000000000000000000000000000000000000000000000000000000X0011111  Boolean
0000000000000000000000000000000000000000000000000000000000101111  Nil
0000000000000000000000000000000000000000000000000000000000111111  Error <-

If we wanted to, we could even add additional tag bits to the (currently all 0) payload, to signal different kinds of errors. Maybe later. For now, we add a tag constant and associated Object and AST functions:

const unsigned int kErrorTag = 0x3f; // 0b111111
uword Object_error() { return kErrorTag; }

bool AST_is_error(ASTNode *node) { return (uword)node == Object_error(); }
ASTNode *AST_error() { return (ASTNode *)Object_error(); }

That should be enough to get us going for now. Perhaps we could even convert our Compile_ suite of functions to use this object instead of an int. It would certainly be more informative. Maybe in a future post.

Language syntax

Let’s get back to business and think about what we want our language to look like. This is a Lisp series but really you could adapt your reader to read any sort of syntax. No need for parentheses if you’re allergic.

I’m going to use this simple Lisp reader because it’s short and simple, so we’ll have some parens.

First, our integers will look like integers in most languages — 0, 123, -123.

You can add support for other bases if you like, but I don’t plan on it here.

Second, our characters will look like C characters — 'a', 'b', etc. Some implementations opt for #'a but that has always looked funky to me.

Third, our booleans will be #t and #f. You’re also welcome to go ahead and use symbols to represent the names, avoid special syntax, and have those symbols evaluate to truthy and falsey values.

Fourth, the nil object will be (). We can also later bind the symbol nil to mean (), too.

I’m going to skip error objects, because they don’t yet have any sort of user-land meaning yet — they’re just used in compiler infrastructure right now.

Fifth, pairs will look like (1 2 3), meaning (cons 1 (cons 2 (cons 3 nil))). I don’t plan on adding support for dotted pair syntax. Whitespace will be insignificant.

Sixth, symbols will look like any old ASCII identifier: hello, world, fooBar. I’ll also include some punctuation in there, too, so we can use + and - as symbols, for example. Or we could even go full Lisp and use train-case identifiers.

I’m going to skip closures, since they don’t have a syntactic representation — they are just objects known to the runtime. Vectors and strings don’t have any implementation right now so we’ll add those to the reader later.

That’s it! Key points are: mind your plus and minus signs since they can appear in both integers and symbols; don’t read off the end; have fun.

The Reader implementation

Now that we’ve rather informally specified what our language looks like, we can write a small reader. We’ll start with the Reader_read function from above.

This function will just be a shell around an internal function with some more parameters.

ASTNode *Reader_read(char *input) {
  word pos = 0;
  return read_rec(input, &pos);

This is because we need to carry around some more state to read through this string. We need to know how far into the string we are. I chose to use an additional word for the index. Some might prefer a char** instead. Up to you.

With any recursive reader invocation, we should advance through all the whitespace, because it doesn’t mean anything to us. For this reason, we have a handy-dandy skip_whitespace function that reads through all the whitespace and then returns the next non-whitespace character.

void advance(word *pos) { ++*pos; }

char next(char *input, word *pos) {
  return input[*pos];

char skip_whitespace(char *input, word *pos) {
  char c = '\0';
  for (c = input[*pos]; isspace(c); c = next(input, pos)) {
  return c;

We can use skip_whitespace in the read_rec function to fetch the next non-whitespace character. Then we’ll use that character (and sometimes the following one, too) to determine what structure we’re about to read.

bool starts_symbol(char c) {
  switch (c) {
  case '+':
  case '-':
  case '*':
  case '>':
  case '=':
  case '?':
    return true;
    return isalpha(c);

ASTNode *read_rec(char *input, word *pos) {
  char c = skip_whitespace(input, pos);
  if (isdigit(c)) {
    return read_integer(input, pos, /*sign=*/1);
  if (c == '+' && isdigit(input[*pos + 1])) {
    advance(pos); // skip '+'
    return read_integer(input, pos, /*sign=*/1);
  if (c == '-' && isdigit(input[*pos + 1])) {
    advance(pos); // skip '-'
    return read_integer(input, pos, /*sign=*/-1);
  if (starts_symbol(c)) {
    return read_symbol(input, pos);
  if (c == '\'') {
    advance(pos); // skip '\''
    return read_char(input, pos);
  if (c == '#' && input[*pos + 1] == 't') {
    advance(pos); // skip '#'
    advance(pos); // skip 't'
    return AST_new_bool(true);
  if (c == '#' && input[*pos + 1] == 'f') {
    advance(pos); // skip '#'
    advance(pos); // skip 'f'
    return AST_new_bool(false);
  if (c == '(') {
    advance(pos); // skip '('
    return read_list(input, pos);
  return AST_error();

Note that I put the integer cases above the symbol case because we want to catch -123 as an integer instead of a symbol, and -a123 as a symbol instead of an integer.

We’ll probably add more entries to starts_symbol later, but those should cover the names we’ve used so far.

For each type of subcase (integer, symbol, list), the basic idea is the same: while we’re still inside the subcase, add on to it.

For integers, this means multiplying and adding (concatenating digits, so to speak):

ASTNode *read_integer(char *input, word *pos, int sign) {
  char c = '\0';
  word result = 0;
  for (char c = input[*pos]; isdigit(c); c = next(input, pos)) {
    result *= 10;
    result += c - '0';
  return AST_new_integer(sign * result);

It also takes a sign parameter so if we see an explicit -, we can negate the integer.

For symbols, this means reading characters into a C string buffer:

const word ATOM_MAX = 32;

bool is_symbol_char(char c) {
  return starts_symbol(c) || isdigit(c);

ASTNode *read_symbol(char *input, word *pos) {
  char buf[ATOM_MAX + 1]; // +1 for NUL
  word length = 0;
  for (length = 0; length < ATOM_MAX && is_symbol_char(input[*pos]); length++) {
    buf[length] = input[*pos];
  buf[length] = '\0';
  return AST_new_symbol(buf);

For simplicity’s sake, I avoided dynamic resizing. We only get at most symbols of size 32. Oh well.

Note that symbols can also have trailing numbers in them, just not at the front — like add1.

For characters, we only have three potential input characters to look at: quote, char, quote. No need for a loop:

ASTNode *read_char(char *input, word *pos) {
  char c = input[*pos];
  if (c == '\'') {
    return AST_error();
  if (input[*pos] != '\'') {
    return AST_error();
  return AST_new_char(c);

This means that input like '' or 'aa' will be an error.

For booleans, we can tackle those inline because there’s only two cases and they’re both trivial. Check for #t and #f. Done.

And last, for lists, it means we recursively build up pairs until we get to nil:

ASTNode *read_list(char *input, word *pos) {
  char c = skip_whitespace(input, pos);
  if (c == ')') {
    return AST_nil();
  ASTNode *car = read_rec(input, pos);
  assert(car != AST_error());
  ASTNode *cdr = read_list(input, pos);
  assert(cdr != AST_error());
  return AST_new_pair(car, cdr);

Note that we still have to skip whitespace in the beginning so that we catch cases that have space either right after an opening parenthesis or right before a closing parenthesis. Or both!

That’s it — that’s the whole parser. Now let’s write some tests.


I added a new suite for reader tests. I figure it’s nice to have them grouped. Here are some of the trickier tests from that suite that originally tripped me up one way or another.

Negative integers originally parsed as symbols until I figured out I had to flip the case order:

TEST read_with_negative_integer_returns_integer(void) {
  char *input = "-1234";
  ASTNode *node = Reader_read(input);
  ASSERT_IS_INT_EQ(node, -1234);

Oh, and the ASSERT_IS_INT_EQ and upcoming ASSERT_IS_SYM_EQ macros are helpers that assert the type and value are as expected.

I also forgot about leading whitespace for a while:

TEST read_with_leading_whitespace_ignores_whitespace(void) {
  char *input = "   \t   \n  1234";
  ASTNode *node = Reader_read(input);
  ASSERT_IS_INT_EQ(node, 1234);

And also whitespace in lists:

TEST read_with_list_returns_list(void) {
  char *input = "( 1 2 0 )";
  ASTNode *node = Reader_read(input);
  ASSERT_IS_INT_EQ(AST_pair_car(node), 1);
  ASSERT_IS_INT_EQ(AST_pair_car(AST_pair_cdr(node)), 2);
  ASSERT_IS_INT_EQ(AST_pair_car(AST_pair_cdr(AST_pair_cdr(node))), 0);

And here’s some goofy symbol to make sure all these symbol characters work:

TEST read_with_symbol_returns_symbol(void) {
  char *input = "hello?+-*=>";
  ASTNode *node = Reader_read(input);
  ASSERT_IS_SYM_EQ(node, "hello?+-*=>");

And to make sure trailing digits in symbol names work:

TEST read_with_symbol_with_trailing_digits(void) {
  char *input = "add1 1";
  ASTNode *node = Reader_read(input);
  ASSERT_IS_SYM_EQ(node, "add1");


Some extras

Now, we could wrap up with the tests, but I did mention some fun features like a REPL. Here’s a function repl that you can call from your main function instead of running the tests.

int repl() {
  do {
    // Read a line
    fprintf(stdout, "lisp> ");
    char *line = NULL;
    size_t size = 0;
    ssize_t nchars = getline(&line, &size, stdin);
    if (nchars < 0) {
      fprintf(stderr, "Goodbye.\n");

    // Parse the line
    ASTNode *node = Reader_read(line);
    if (AST_is_error(node)) {
      fprintf(stderr, "Parse error.\n");

    // Compile the line
    Buffer buf;
    Buffer_init(&buf, 1);
    int result = Compile_expr(&buf, node, /*stack_index=*/-kWordSize);
    if (result < 0) {
      fprintf(stderr, "Compile error.\n");

    // Print the assembled code
    for (size_t i = 0; i < buf.len; i++) {
      fprintf(stderr, "%.02x ", buf.address[i]);
    fprintf(stderr, "\n");

  } while (true);
  return 0;

And we can trigger this mode by passing --repl-assembly:

int run_tests(int argc, char **argv) {

int main(int argc, char **argv) {
  if (argc == 2 && strcmp(argv[1], "--repl-assembly") == 0) {
    return repl();
  return run_tests(argc, argv);

It uses all the machinery from the last couple posts and then prints out the results in hex pairs. Interactions look like this:

sequoia% ./bin/compiling-reader --repl-assembly
lisp> 1
48 c7 c0 04 00 00 00 
lisp> (add1 1)
48 c7 c0 04 00 00 00 48 05 04 00 00 00 
lisp> 'a'
48 c7 c0 0f 61 00 00
lisp> Goodbye.

Excellent. A fun exercise for the reader might be going further and executing the compiled code and printing the result, as above. The trickiest (because we don’t have infrastructure for that yet) part of it will be printing the result, I think.

Another fun exercise might be adding a mode to the compiler to print text assembly to the screen, like a debugging trace. This should be straightforward enough since we already have very specific opcode implementations.

Anyway, thanks for reading. Next time we’ll get back to compiling and tackle let-expressions.

Mini Table of Contents

  1. See this series of Tweets by Kate about changing the value of NULL in the TenDRA compiler. 

September 20, 2020

Ponylang (SeanTAllen)

Last Week in Pony - September 20, 2020 September 20, 2020 11:38 PM

Sean T. Allen has released version 0.0.1 of the lori TCP library.

Derek Jones (derek-jones)

Learning useful stuff from the Projects chapter of my book September 20, 2020 09:24 PM

What useful, practical things might professional software developers learn from the Projects chapter in my evidence-based software engineering book?

This week I checked the projects chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

There turned out to be around three to four times more data publicly available than I had first thought. This is good, but there is a trap for the unweary. For many topics there is one data set, and that one data set may not be representative. What is needed is a selection of data from various sources, all relating to a given topic.

Some data is better than no data, provided small data sets are treated with caution.

Estimation is a popular research topic: how long will a project take and how much will it cost.

After reading all the papers I learned that existing estimation models are even more unreliable than I had thought, and what is more, there are plenty of published benchmarks showing how unreliable the models really are (these papers never seem to get cited).

Models that include lines of code in the estimation process (i.e., the majority of models) need a good estimate of the likely number of lines in the final software system. One issue that nobody had considered was the impact of developer variability on the number of lines written to implement the same functionality, which turns out to be large. Oops.

Machine learning has infested effort estimation research. What the machine learning models actually do is estimate adjustment, i.e., they do not create their own estimate but adjust one passed in as input to the model. Most estimation data sets are tiny, and only contain a few different variables; unless the estimate is included in the training phase, the generated model produces laughable results. Oops.

The good news is that there appear to be lots of recurring patterns in the project data. This is good news because recurring patterns are something to be explained by a theory of software project development (apparent randomness is bad news, from the perspective of coming up with a model of what is going on). I think we are still a long way from having workable theories, but seeing patterns is a good sign that one or more theories will be possible.

I think that the main takeaway from this chapter is that software often has a short lifetime. People in industry probably have a vague feeling that this is true, from experience with short-lived projects. It is not cost effective to approach commercial software development from the perspective that the code will live a long time; some code does live a long time, but most dies young. I see the implications of this reality being a major source of contention with those in academia who have spent too long babbling away in front of teenagers (teaching the creation of idealized software that lives on forever), and little or no time building software systems.

A lot of software is written by teams of people, however, there is not a lot of data available on teams (software or otherwise). Given the difficulty of hiring developers, companies have to make do with what they have, so a theory of software teams might not be that useful in practice.

Readers might have a completely different learning experience from reading the projects chapter. What useful things did you learn from the projects chapter?

September 19, 2020

Gustaf Erikson (gerikson)

Re-reading Dune and Heretics of Dune September 19, 2020 07:58 PM

I’ve re-read Frank Herbert’s 1965 novel Dune, partly inspired by the upcoming movie.

Based on my memories I first read it in 1988 or so. The first novel in the series I read was actually Heretics of Dune (published in 1984) which I borrowed from the library in Halmstad. This must have been in 1986 or ‘87. I’ve long realized that it’s not a huge deal to read some novel series out of order - especially ones that are so self-contained as the Dune novels. Heretics takes place 5,000 years after Dune, after all.

Anyway, if you’re only going to read one Dune novel, the first one is the best. It has all the goodies - the worldbuilding, the Hero’s Journey, the tight plotting and good use of language. Even the 1960s elements have aged well - while standards like telepathy are there they’re only mentioned in passing, and the central idea of prescience is part of the plot and well handled there.

I wonder what the movie will do with the implicit connection of the Fremen with modern-day inhabitants of the Middle East. While using terms like jihad was merely a frisson in the original, they take on a darker tone in today’s climate - at least among the less enlightened. I suspect the projected 2-parter will not emphasize the jihad Paul foresees throughout the novel and instead focus on the thrilling twists and turns.

After Dune I decided to re-read Heretics. There’s almost 20 years between the novels, and it’s clear that Herbert has picked up a lot of contemporary SF tropes in the meantime. The tech in Dune is almost indistinguishable from magic - devices such as suspensors and personal shields were never explained, instead added to impart flavor - and to enforce the quasi-medieval setting of the universe.

Heretics is much more explicit in its descriptions of space travel, weapons and other technology, but not in a way that feels dated. However, the novel is marred by long stretches of interior dialogue, where the protagonists muse about religion, history, and fate in excruciating detail. While I admire Herbert for bringing in female protagonists (in the form of the Bene Gesserit sisterhood), they’re really not that interesting as characters.

I consider Dune a bona-fide SF classic and anyone interested in the genre should read it. But don’t feel pressured to read more from Herbert’s universe.

September 18, 2020

Gonçalo Valério (dethos)

Django Friday Tips: Inspecting ORM queries September 18, 2020 07:01 PM

Today lets look at the tools Django provides out of the box to debug the queries made to the database using the ORM.

This isn’t an uncommon task. Almost everyone who works on a non-trivial Django application faces situations where the ORM does not return the correct data or a particular operation as taking too long.

The best way to understand what is happening behind the scenes when you build database queries using your defined models, managers and querysets, is to look at the resulting SQL.

The standard way of doing this is to set the logging configuration to print all queries done by the ORM to the console. This way when you browse your website you can check them in real time. Here is an example config:

    'handlers': {
        'console': {
            'level': 'DEBUG',
            'filters': ['require_debug_true'],
            'class': 'logging.StreamHandler',
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console', ],

The result will be something like this:

web_1     | (0.001) SELECT MAX("axes_accessattempt"."failures_since_start") AS "failures_since_start__max" FROM "axes_accessattempt" WHERE ("axes_accessattempt"."ip_address" = ''::inet AND "axes_accessattempt"."attempt_time" >= '2020-09-18T17:43:19.844650+00:00'::timestamptz); args=(Inet(''), datetime.datetime(2020, 9, 18, 17, 43, 19, 844650, tzinfo=<UTC>))
web_1     | (0.001) SELECT MAX("axes_accessattempt"."failures_since_start") AS "failures_since_start__max" FROM "axes_accessattempt" WHERE ("axes_accessattempt"."ip_address" = ''::inet AND "axes_accessattempt"."attempt_time" >= '2020-09-18T17:43:19.844650+00:00'::timestamptz); args=(Inet(''), datetime.datetime(2020, 9, 18, 17, 43, 19, 844650, tzinfo=<UTC>))
web_1     | Bad Request: /users/login/
web_1     | [18/Sep/2020 18:43:20] "POST /users/login/ HTTP/1.1" 400 2687

Note: The console output will get a bit noisy

Now lets suppose this logging config is turned off by default (for example, in a staging server). You are manually debugging your app using the Django shell and doing some queries to inspect the resulting data. In this case str(queryset.query) is very helpful to check if the query you have built is the one you intended to. Here’s an example:

>>> box_qs = Box.objects.filter(
>>> str(box_qs.query)
'SELECT "boxes_box"."id", "boxes_box"."name", "boxes_box"."description", "boxes_box"."uuid", "boxes_box"."owner_id", "boxes_box"."created_at", "boxes_box"."updated_at", "boxes_box"."expires_at", "boxes_box"."status", "boxes_box"."max_messages", "boxes_box"."last_sent_at" FROM "boxes_box" WHERE ("boxes_box"."expires_at" > 2020-09-18 18:06:25.535802+00:00 AND NOT ("boxes_box"."owner_id" = 10))'

If the problem is related to performance, you can check the query plan to see if it hits the right indexes using the .explain() method, like you would normally do in SQL.

>>> print(box_qs.explain(verbose=True))
Seq Scan on public.boxes_box  (cost=0.00..13.00 rows=66 width=370)
  Output: id, name, description, uuid, owner_id, created_at, updated_at, expires_at, status, max_messages, last_sent_at
  Filter: ((boxes_box.expires_at > '2020-09-18 18:06:25.535802+00'::timestamp with time zone) AND (boxes_box.owner_id <> 10))

This is it, I hope you find it useful.

September 17, 2020

Gustaf Erikson (gerikson)

Six months since WFH began September 17, 2020 02:57 PM

September 16, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Primitive binary functions September 16, 2020 05:00 AM


Welcome back to the “Compiling a Lisp” series. Last time, we added some primitive unary instructions like add1 and integer->char. This time, we’re going to add some primitive binary functions like + and <. After this post, we’ll be able to compile programs like:

(< (+ 1 2) (- 4 3))

Note that these expressions may look like function calls but, like last chapter, they are not opening new stack frames (which I’ll explain more about later). Instead, the compiler will recognize that the programmer is directly applying the symbol + and generate special code. You can think about this kind of like an inlined function call.

It’s important to remember that the compiler has a certain internal contract: the result of any given compiled expression is stored in rax. This isn’t some intrinsic property of all compilers, but it’s one we’ve kept so far in this series.

This is similar to but not the same as the calling convention that I mentioned earlier, where function results are stored in rax. That calling convention is for interacting with other people’s code. Within your own generated code, there are no rules. So we could pick any other register, really, for storing intermediate results.

Now that we’re building primitive functions that can take two arguments, you might notice a problem: our strategy of storing the result in rax won’t work on its own. If we were to naïvely write something like the following to implement +, then rax would get overwritten in the code generated by compiling operand1(args):

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "+")) {
      _(Compile_expr(buf, operand2(args)));
      // The result of this is stored in rax ^
      _(Compile_expr(buf, operand1(args)));
      // Oops, we just overwrote rax ^
      Emit_add_something(buf, /*dst=*/kRax));
      return 0;
    // ...
  // ...

We could try and work around this by adding some kind of register allocation algorithm and take advantage of rcx, rdx, etc. Or, simpler, we could decide to allocate all intermediate values on the stack and move on with our lives. I prefer the latter. It’s simpler.

Stack background info

Since we can’t yet save our compiled programs to disk, there’s some amount of setup that has to happen before they’re run. Right now, the C programs I’m providing along with this series compile to binaries that just run the test suites for the compiler. They don’t actually run full programs. For this reason, there are already some call frames on the stack by the time our generated code is run.

Let’s take a look at the stack at the moment we enter a compiled Lisp program:

|                  | High addresses
|  main            |
+------------------+ |
|  ~ some data ~   | |
|  ~ some data ~   | |
+------------------+ |
|  compile_test    | |
+------------------+ |
|  ~ some data ~   | |
|  ~ some data ~   | v
|  Testing_exe...  | rsp (stack pointer)
|                  | <-- Our frame!
|                  | Low addresses

In this diagram, we have the C program’s main function, which has its own local variables and so on. Then the main function calls the compile_test unit suite. This in turn calls this Testing_execute_expr function (abbreviated in the diagram), which is responsible for calling into our generated code. Every call stores the return address (some place to find the next instruction to execute) on the stack and adjusts rsp down.

Refresher: the call stack grows down. Why? Check out this StackOverflow answer that quotes an architect on the Intel 4004 and 8080 architectures. It’s stayed the same ever since.

In this diagram, we have rsp pointing at a return address somewhere inside the function Testing_execute_expr, since that’s what called our Lisp entrypoint. We have some data “above” (higher addresses) rsp that we’re not allowed to poke at, and we have this empty space “below” (lower addresses) rsp that is in our current stack frame. I say “empty” because we haven’t yet stored anything there, not because it’s necessarily zero-ed out. I don’t think there are any guarantees about the values in this stack frame.

We can use our stack frame to write and read values for our current Lisp program. With every recursive subexpression, we can allocate a little more stack space to keep track of the values. When I say “allocate”, I mean “subtract from the stack pointer”, because the stack is already a contiguous space in memory allocated for us. For example, here is how we can write to the stack:

mov [rsp-8], 0x4

This puts the integer 4 at displacement -8 from rsp. On the stack diagram above, it would be at the slot labeled “Our frame”. It’s also possible to read with a positive or zero displacement, but those point to previous stack frames and the return address, respectively. So let’s avoid manipulating those.

Note that I used a multiple of 8. Not every store has to be a to an address that is a multiple of 8, but it is natural and I think also faster to store 8-byte-sized things at aligned addresses.

Let’s walk through a real example to get more hands-on experience with this stack storage idea. We’ll use the program (+ 1 2). The compiled version of that program should:

  • Move compile(2) to rax
  • Move rax into [rsp-8]
  • Move compile(1) to rax
  • Add [rsp-8] to rax

So after compiling that, the stack will look like this:

|                  | High addresses
|  Testing_exe...  | RSP
|  0x8             | RSP-8 (result of compile(2))
|                  | Low addresses

And the result will be in rax, per our internal compiler contract.

This is all well and good, but at some point we’ll need our compiled programs to emit the push instruction or make function calls of their own. Both of these modify the stack pointer. push writes to the stack and decrements rsp. call is roughly equivalent to push followed by jmp.

For that reason, x86-64 comes with another register called rbp and it’s designed to hold the Base Pointer. While the stack pointer is supposed to track the “top” (low address) of the stack, the base pointer is meant to keep a pointer around to the “bottom” (high address) of our current stack frame.

This is why in a lot of compiled code you see the following instructions repeated1:

push rbp
mov rbp, rsp
sub rsp, N  ; optional; allocate stack space for locals
; ... function body ...
mov rsp, rbp  ; required if you subtracted from rsp above
pop rbp

The first three instructions, called the prologue, save rbp to the stack, and then set rbp to the current stack pointer. Then it’s possible to maintain steady references to variable locations on the stack even as rsp changes. Yes, the compiler could adjust its internal table of references every time the compiler emits code that modifies rsp, but that sounds much harder.

The last three instructions, called the epilogue, fetch the old rbp that we saved to the stack, write it back into rbp, then exit the call.

To confirm this for yourself, check out this sample compiled C code. Look at the disassembly following the label square. Prologue, code, epilogue.

Stack allocation infrastructure

Until now, we haven’t needed to keep track of much as we recursively traverse expression trees. Now, in order to keep track of how much space on the stack any given compiled code will need, we have to add more state to our compiler. We’ll call this state the stack_index — Ghuloum calls it si — and we’ll pass it around as a parameter. Whatever it’s called, it points to the first writable (unused) index in the stack at any given point.

In compiled functions, the first writable index is -kWordSize (-8), since the base pointers is already at 0.

int Compile_function(Buffer *buf, ASTNode *node) {
  Buffer_write_arr(buf, kFunctionPrologue, sizeof kFunctionPrologue);
  _(Compile_expr(buf, node, -kWordSize));
  Buffer_write_arr(buf, kFunctionEpilogue, sizeof kFunctionEpilogue);
  return 0;

I’ve also gone ahead and added the prologue and epilogue. They’re stored in static arrays. This makes them easier to modify, and also makes them accessible to testing helpers. The testing helpers can use these arrays to make testing easier for us — we can check if our expected code is book-ended by this code.

static const byte kFunctionPrologue[] = {
    // push rbp
    // mov rbp, rsp
    kRexPrefix, 0x89, 0xe5,

static const byte kFunctionEpilogue[] = {
    // pop rbp
    // ret

For Compile_expr, we just pass this new stack index through.

int Compile_expr(Buffer *buf, ASTNode *node, word stack_index) {
  // ...
  if (AST_is_pair(node)) {
    return Compile_call(buf, AST_pair_car(node), AST_pair_cdr(node),
  // ...

And for Compile_call, we actually get to use it. Let’s look back at our stack storage strategy for compiling (+ 1 2) (now replacing rsp with rbp):

  • Move compile(2) to rax
  • Move rax into [rbp-8]
  • Move compile(1) to rax
  • Add [rbp-8] to rax

For binary functions, this can be generalized to:

  • Compile arg2 (stored in rax)
  • Move rax to stack_index
  • Compile arg1 (stored in rax)
  • Do something with the results (in [rbp-stack_index] and rax)

The key is this: for the first recursive call to Compile_expr, the compiler is allowed to emit code that can use the current stack_index and anything below that on the stack. For the second recursive call to Compile_expr, the compiler has to bump stack_index, since we’ve stored the result of the first compiled call at stack_index.

Take a look at our implementation of binary add:

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "+")) {
      _(Compile_expr(buf, operand2(args), stack_index));
      Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
      _(Compile_expr(buf, operand1(args), stack_index - kWordSize));
      Emit_add_reg_indirect(buf, /*dst=*/kRax, /*src=*/Ind(kRbp, stack_index));
      return 0;
    // ...
  // ...

In this snippet, Ind stands for “indirect”, and is a constructor for a struct. This an easy and readable way to represent (register, displacement) pairs for use in reading from and writing to memory. We’ll cover this more detail in the instruction encoding.

To prove to ourselves that this approach works, we’ll add some tests later.

Other binary functions

Subtraction, multiplication, and division are much the same as addition. We’re also going to completely ignore overflow, underflow, etc.

Equality is different in that it does some comparisons after the fact (see Primitive unary functions). To check if two values are equal, we compare their pointers:

    if (AST_symbol_matches(callable, "=")) {
      _(Compile_expr(buf, operand2(args), stack_index));
      Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
      _(Compile_expr(buf, operand1(args), stack_index - kWordSize));
      Emit_cmp_reg_indirect(buf, kRax, Ind(kRbp, stack_index));
      Emit_mov_reg_imm32(buf, kRax, 0);
      Emit_setcc_imm8(buf, kEqual, kAl);
      Emit_shl_reg_imm8(buf, kRax, kBoolShift);
      Emit_or_reg_imm8(buf, kRax, kBoolTag);
      return 0;

It uses a new comparison opcode that compares a register with some memory. This is why we can’t use the Compile_compare_imm32 helper function.

The less-than operator (<) is very similar to equality, but instead we use setcc with the kLess flag instead of the kEqual flag.

New opcodes

We used some new opcodes today, so let’s take a look at the implementations. First, here is the indirection implementation I mentioned earlier:

typedef struct Indirect {
  Register reg;
  int8_t disp;
} Indirect;

Indirect Ind(Register reg, int8_t disp) {
  return (Indirect){.reg = reg, .disp = disp};

I would have used the same name in the struct and the constructor but unfortunately that’s not allowed.

Here’s an implementation of an opcode that uses this Indirect type. This emits code for instructions of the form mov [reg+disp], src.

uint8_t disp8(int8_t disp) { return disp >= 0 ? disp : 0x100 + disp; }

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  Buffer_write8(buf, 0x40 + src * 8 + dst.reg);
  Buffer_write8(buf, disp8(dst.disp));

The disp8 function is a helper that encodes negative numbers.

The opcodes for add, sub, and cmp are similar enough to this one, except src and dst are swapped. mul is a little funky because it doesn’t take two parameters. It assumes that one of the operands is always in rax.


As usual, we’ll close with some snippets of tests.

Here’s a test for +. I’m trying to see if inlining the text assembly with the hex makes it more readable. Thanks Kartik for the suggestion.

TEST compile_binary_plus(Buffer *buf) {
  ASTNode *node = new_binary_call("+", AST_new_integer(5), AST_new_integer(8));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  byte expected[] = {
      // 0:  48 c7 c0 20 00 00 00    mov    rax,0x20
      0x48, 0xc7, 0xc0, 0x20, 0x00, 0x00, 0x00,
      // 7:  48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
      0x48, 0x89, 0x45, 0xf8,
      // b:  48 c7 c0 14 00 00 00    mov    rax,0x14
      0x48, 0xc7, 0xc0, 0x14, 0x00, 0x00, 0x00,
      // 12: 48 03 45 f8             add    rax,QWORD PTR [rbp-0x8]
      0x48, 0x03, 0x45, 0xf8};
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_integer(13));

Here’s a test for <.

TEST compile_binary_lt_with_left_greater_than_right_returns_false(Buffer *buf)
  ASTNode *node = new_binary_call("<", AST_new_integer(6), AST_new_integer(5));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_false(), result, "0x%lx");

There are more tests in the implementation, as usual. Take a look if you like.

This has been a more complicated post than the previous ones, I think. The stack allocation may not make sense immediately. It might take some time to sink in. Try writing some of the code yourself and see if that helps.

Next time we’ll add the ability to bind variables using let a parser so we can input expressions more easily. See you then!

  1. You may also see an enter instruction paired with a leave instruction. These are equivalent. Read more here

September 15, 2020

Kevin Burke (kb)

How to Get a Human Operator on the California EDD Paid Family Leave line September 15, 2020 08:23 PM

The California EDD Paid Family Leave phone tree is like a choose your own adventure book, where almost every option leaves you with no option to contact a human. This can be frustrating. But you can reach a human if you know the right buttons to press!

Here is how to reach a human:

  • Call the EDD Paid Family Leave number at 877-238-4373.

  • Press '1' for "benefit information."

  • Follow the prompts to enter your SSN, zip code, date of birth, and weekly benefit amount.

  • The computer will read you an automated list of information about your claim.

The computer will then read a list of prompts. Wait!! After the computer asks if you want to go back to the main menu it will say "press 0 to speak to a human." Press 0 and then wait and you should get a human!

Unrelenting Technology (myfreeweb)

Burstable Graviton2 inst... September 15, 2020 08:23 PM

Burstable Graviton2 instances are now a thing. Cool! Changed the instance type for this website from a1.medium to t4g.micro so that Jeff Bezos gets less of my money :P (Basically no money until the end of this year, even — there’s a free trial for t4g.micro for all AWS accounts!)

Eric Faehnrich (faehnrich)

T Puzzle September 15, 2020 01:57 AM

The T puzzle has four T pieces, and you have to fit them all so they lay flat without overlapping in a frame. And each side has a frame, one is a little larger so is easier, the other side is smaller so harder.

Here is the file I made to laser cut it.

Spoiler, it's setup as the solution to the harder puzzle. That's because it has to be more compact to fit in the harder puzzle.

Also, since the Ts are more compact, they are placed next to each other. But the way I made the file, the edges where the Ts touch are just stacked so they're doubled up. This means a bunch of places on the Ts, the laser goes over again.

It prints just fine. In fact, it shaves off a little bit more from the edge of the T so it makes a bit more tolerance and it's easier put the puzzle together, but as far as I could tell it doesn't make it so you can cheat the puzzle. I also found if you don't run over the edges these extra few times, the Ts are so tight that they don't fit in the puzzle.

September 14, 2020

Ponylang (SeanTAllen)

Last Week in Pony - September 13, 2020 September 14, 2020 12:20 AM

A Pony talk given by Sophia Drossopoulou is now available on InfoQ.

September 13, 2020

Derek Jones (derek-jones)

Learning useful stuff from the Reliability chapter of my book September 13, 2020 09:38 PM

What useful, practical things might professional software developers learn from my evidence-based software engineering book?

Once the book is officially released I need to have good answers to this question (saying: “Well, I decided to collect all the publicly available software engineering data and say something about it”, is not going to motivate people to read the book).

This week I checked the reliability chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

A casual reader skimming the chapter would conclude that little was known about software reliability, and they would be right (I already knew this, but I learned that we know even less than I thought was known), and many researchers continue to dig in unproductive holes.

A reader with some familiarity with reliability research would be surprised to see that some ‘major’ topics are not discussed.

The train wreck that is machine learning has been avoided (not forgetting that the data used is mostly worthless), mutation testing gets mentioned because of some interesting data (the underlying problem is that mutation testing assumes that coding mistakes are local to one line, but in practice coding mistakes often involve multiple lines), and the theory discussions don’t mention non-homogeneous Poisson process as the basis for software fault models (because this process is not capable of solving the questions asked).

What did I learn? My highlights include:

  • Anne Choa‘s work on population estimation. The takeaway from this work is that if people want to estimate the number of remaining fault experiences, based on previous experienced faults, then every occurrence (i.e., not just the first) of a fault needs to be counted,
  • Phyllis Nagel and Janet Dunham’s top read work on software testing,
  • the variability in the numeric percentage that people assign to probability terms (e.g., almost all, likely, unlikely) is much wider than I would have thought,
  • the impact of the distribution of input values on fault experiences may be detectable,
  • really a lowlight, but there is a lot less publicly available data than I had expected (for the other chapters there was more data than I had expected).

The last decade has seen fuzzing grow to dominate the headlines around software reliability and testing, and provide data for people who write evidence-based books. I don’t have much of a feel for how widely used it is in industry, but it is a very useful tool for reliability researchers.

Readers might have a completely different learning experience from reading the reliability chapter. What useful things did you learn from the reliability chapter?

Patrick Louis (venam)

Did You Know Fonts Could Do All This? September 13, 2020 09:00 PM

Confusing Mexican Calendar, at least for those not in the know

Freetype, included in the font stack on Unix, is quite complex. There are so many layers to get it to do what it does that it’s easy to get lost. From finding the font, to actually rendering it, and everything in between.
Like most of the world, I use a rather low screens definition (1366x768 with 96 dpi) and rather old-ish laptop, unlike some font designers that live in a filter bubble where everyone has the latest macbook. Thus, good and legible font rendering is important.
Let’s play with lesser known toggles available to us when it comes to font rendering and see what they do, let’s have fun and explore possibilities.

A General Picture

Generally, to make a font look better on screens, which are arrays of pixels, we use a combination of these three:

  • Antialiasing: Applying a light shade around the glyph. It is useful at small scale, when you don’t have enough pixels, but it makes most glyphs look bolder.

Font anti-alias example

  • Subpixel rendering: A technique similar to antialias but using subpixels, the color components inside the pixels. By applying a small amount of colors on the sides you can reach more granular precision. However, if applied clumsily, or if you simply move the window containing the text, these colored subpixels will become apparent, what we call fringe.

Font sub-pixel rendering example

  • Hinting: Pixels are blocks but text is made of curves, that means these curves will never match exactly with screen pixels. Hinting is about repositioning or selecting the closest pixels while trying as much as possible to keep the shape of the glyph intact. There are multiple levels of hinting, hinting information provided by the font itself (bytecode interpreter hinting), and hinting provided by the rendering library (auto-hinting).

Font hinting example

NB: “It’s just text”… This article is yet another that shows how fonts aren’t as easy as they look. For more info about the font stack, please visit my previous article on the topic, and if you want an idea of what it means to draw them on the screen take a look a this article.

What is applied, when, how to control all of this, can we see what they do, and should we even care?

Freetype and fontconfig default rendering these days is pretty good, so there shouldn’t be anything to worry about; Until there’s something to worry about, like a font not looking the way you want.
Our first stop will be something that intrigued me because I haven’t heard many talk about it: the Freetype driver’s properties.
The Freetype driver is used whenever hinting is needed, so this is the part it actually changes — how hinting is applied.

Getting The Right Tools For The Task

Let’s start with arming ourselves with ways to easily test all this.
Freetype2 demos utilities are a must, you can clone them here or fetch them from your package repositiory, for example Debian and Arch Linux.
These will give you a bunch of useful tools such as ftdiff, ftview, ftstring ftgrid, fttimer, ftbench, and others. The most important ones for us are ftdiff and ftgrid.

Example usage:

ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf
ftgrid -r 96 -f 20 10 ~/.local/share/fonts/times.ttf
ftstring -r 96 -m 'Hello World!' 10 ~/.local/share/fonts/times.ttf

Additionally, you can install pango-view from pango-tools to later test if fontconfig applies your configurations properly. It can be used by preparing a file written in pango markup and displaying it using pango-view --markup file.pangpang.
You can set the fontconfig debug level higher to see which font is actually loaded by setting the FC_DEBUG to something like 4096, FC_DEBUG=4096.

More values can be found here, we’ll use them later to see if our fontconfig settings are applied properly:

Name         Value    Meaning
MATCH            1    Brief information about font matching
MATCHV           2    Extensive font matching information
EDIT             4    Monitor match/test/edit execution
FONTSET          8    Track loading of font information at startup
CACHE           16    Watch cache files being written
CACHEV          32    Extensive cache file writing information
PARSE           64    (no longer in use)
SCAN           128    Watch font files being scanned to build caches
SCANV          256    Verbose font file scanning information
MEMORY         512    Monitor fontconfig memory usage
CONFIG        1024    Monitor which config files are loaded
LANGSET       2048    Dump char sets used to construct lang values
MATCH2        4096    Display font-matching transformation in patterns

Yet another way is to test directly in your browser URL bar:

data:text/html,<meta charset="utf8"><p style="font-family: Times New Roman;">Hello World</p>

The Freetype2 Drivers Properties

So let’s get back to our testing of Freetype2 drivers.
On this documentation page, ft (freetype) properties are listed and are said to affect the behavior of the drivers, each touching a different one. They are set by modifying the FREETYPE_PROPERTIES environment variable, normally loaded from /etc/profile.d/
However, most of the ones listed are targeted at the CFF, Type 1, and CID fonts driver and not at TrueType fonts, so they do nothing if you don’t have these font types. The only toggle available for TrueType is the interpreter-version which controls the bytecode interpreter, the rasterizer, and thus how the outline gets hinted.

The options available to us are the following:

  • 35 — For classic mode GDI (Win 98/2000)
  • 38 — GDI+ old (Vista, Win 7), Infinality, considered slow
  • 40 — For minimal mode (stripped down Infinality, this is the default) (After Win 7)

Kind of weird that we jump from 35 to 38, where did 36 and the rest go? The answer is that it’s a choice from the Freetype devs to only include those and not the ones in between.

And the differences look as follows, notice the native hinter in the left column:

  • v35
FREETYPE_PROPERTIES="truetype:interpreter-version=35" ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf

ftdiff interpreter v35

FREETYPE_PROPERTIES="truetype:interpreter-version=35" ftgrid -r 96 -f 36 10 ~/.local/share/fonts/times.ttf

ftgrid interpreter v35

  • v38
FREETYPE_PROPERTIES="truetype:interpreter-version=38" ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf

ftdiff interpreter v38

FREETYPE_PROPERTIES="truetype:interpreter-version=38" ftgrid -r 96 -f 36 10 ~/.local/share/fonts/times.ttf

ftgrid interpreter v38

  • v40
FREETYPE_PROPERTIES="truetype:interpreter-version=40" ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf

ftdiff interpreter v40

FREETYPE_PROPERTIES="truetype:interpreter-version=40" ftgrid -r 96 -f 36 10 ~/.local/share/fonts/times.ttf

ftgrid interpreter v40

We can also test using pango-view (remember again that this should be a font that has native hinting enabled but not the auto-hinter):

<span font_family="Times New Roman" font="10" foreground="black" alpha="83%">
Lorem ipsum dolor sit amet, c
onsectetur adipiscing elit, s
ed do eiusmod tempor incididu
nt ut labore et dolore magna 
aliqua. Ut enim ad minim venia
m, quis nostrud exercitation u
llamco laboris nisi ut aliquip
ex ea commodo consequat. Duis 
aute irure dolor in reprehende
rit in voluptate velit esse ci
llum dolore eu fugiat nulla pa
riatur. Excepteur sint occaeca
t cupidatat non proident, sunt
in culpa qui officia deserunt 
mollit anim id est laborum.

You can also change the font via the --font= argument of pango-view.

FREETYPE_PROPERTIES="truetype:interpreter-version=35" pango-view --markup text.bangarang
  • v35

pango interpreter v35

  • v38

pango interpreter v38

  • v40

pango interpreter v40

So definitely, older interpreter versions were rougher with hinting, much bolder, and could deform the glyphs. The newer ones are more minimal with it. We also notice that the auto-hinter isn’t that bad and that avoiding hinting can help. I took the specific case of the Windows font ‘Times New Roman’ because it has the reputation of rendering badly with Freetype, mostly because of the job the interpreter does. Applying very light or no hinting at all helps tremendously, even at very small point size as you can see in the next comparison. The hinting does indeed help legibility at this scale but the font shape and personality is completely destroyed.

From left to right: v35, v38, v40.

pang interpreter small point comparison

How Fontconfig Works

We’re not done with hinting yet, there can be many levels of hinting that can be applied, but let’s first take a detour and learn a bit about fontconfig and how to use it.

Fontconfig is the layer in the font stack responsible for loading the font along with the configurations that tell the next layer how to find the font file and what changes to apply when rendering it. It is usually composed of a library, a preset of configuration files, and a bunch of helpful tools all starting with the prefix fc- such as: fc-cache, fc-query, fc-match, and fc-conflist, to name a few.

The configuration files are usually found in /etc/fonts/ and split into the presets available /etc/fonts/conf.avail, and the chosen presets in /etc/fonts/conf.d, which are symbolic links to the former.
The precedence of the rules is alphanumerical, a first-come first-served principle, thus 01-custom-rule.conf will be loaded before 99-not-important-rule.conf. Local user configurations, in the user’s $XDG_CONFIG_HOME/fontconfig directory, are loaded from one of these configurations that contains an include statement. On my machine it is the 50-user.conf, so it’s precedence is lower than anything loaded before it. This isn’t practical when testing rules so rename this file to something like 01-user.conf. Now anything you put in $XDG_CONFIG_HOME/fontconfig/conf.d or $XDG_CONFIG_HOME/fontconfig/fonts.conf should have priority.
You can make sure the order and configurations are loaded properly by using the fc-conflist command. It lists in order of precedence the configurations found, the ones starting with a + are loaded, the ones with - are not.

These files are composed of mainly 4 components:

  • Match rules: If something matches, then edit the properties mentioned. There are ton of matching and editing rules, even including stuff like the program name that is currently trying to load the fonts and custom ones. You can also match at different times: when looking for a pattern/font, after finding the font, when scanning the font.
  • Aliases creation: An alias is a font name shorthand, it’s useful when querying generic family names such as “monospace”.
  • Inclusion of other configurations: There can be so many configuration files that it’s good practice to split them.
  • Where to look for settings and fonts, and if some fonts should be skipped entirely (like if they aren’t scalable — bitmap): You may think that the location of fonts is a constant value, but it’s not. For example, on my machine it’s set in /etc/fonts/fonts.conf as:
<!-- Font directory list -->
<dir prefix="xdg">fonts</dir>
<!-- the following element will be removed in the future -->

Editing XML files is cumbersome, unfortunately today there aren’t many GUIs or simpler tools to set these. I’ve found a single one to date that is named fontweak but that isn’t complete.
It’s a shame because it’s rare to find people that have a clue about how to actually set font configuration nicely.

If you want more info, you can consult man 5 fonts-conf. It’s heavy content and can be confusing content, but still great content.

NB: Fontconfig is not enough to configure every graphical program, some programs load font settings in a simpler way through Xresources, the RESOURCE_MANAGER of X.

Testing Different Hinting

Let’s close this parenthesis and get back to hinting.
Fontconfig has 4 settings related to it, of which one is a matching criterion and the other three are edit rules. They are the following.

  • fonthashint: Matching test to check if the font has built-in hints, namely bytecode interpreter hinting.
  • hinting: If set to true, it tells the next phase, the rasterizer, that hinting in general will be applied.
  • autohint: Use the autohinter instead of the normal hinter. This will skip entirely the bytecode interpreter.
  • hintstyle: The harshness of the hinting that will be applied. It could either be hintnone, hintslight, hintmedium, or hintfull. It needs to be mentioned that these will use a mix of the autohinter and bytecode interpreter if the font has hints. For example, hintslight will snap on the vertical grid only but hintmedium and hintfull will snap harder on the horizontal grid too.

Practically, what does it mean? Let’s show what a font looks like with a combination of these hinting configurations.
Remember that if you’re having issues applying these configurations in your user fontconfig file that you can set the FC_DEBUG environment variable we mentioned before. Always be sure everything loads properly by checking fc-conflist and the currently applied match rules via fc-match --verbose YourFontSearchHere

Let’s test hinting enabled, autohint enabled, and full on grid snapping.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting autohint+hintfull

What about disabling autohint and full on grid snapping.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting no-autohint+hintfull

Not so pretty, maybe just snapping vertically is better, let’s try no-autohinter and a slight hinting.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting no-autohint+hintslight

Better but it still looks too bold. Let’s try again the autohinter but with a softer hinting now.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting autohint+hintslight

It looks very similar to the full hinting, let’s test without hinting at all.

<edit mode="assign" name="hinting">

Test Hinting disabled

It seems like the auto-hinter is doing a good job at aligning the letters vertically in a subtle way. When zoomed in, you can clearly see how the letters seem a bit more compressed with the auto-hinter turned on.

Test Hinting vs No-Hinting

Overall, for the specific font I tested, “Times New Roman”, no hinting at all or slight auto-hinting are the best on my display.

Subpixel Rendering

Let’s move to subpixel rendering.
Fontconfig offers some preset to how harshly the subpixel rendering is done. lcddefault is color-balanced and normalized, lcdlegacy is neither normalized nor color-balanced, it uses any sub-pixels it can find, lcdlight is similar to lcddefault but applies a lighter hint to the surrounding pixels, and lcdnone disables it.
Additionally, there’s also ways to enable Microsoft’s Cleartype subpixel rendering by recompiling Freetype (disabled by default because of patent), and ways to tweak the subpixel rendering matrix by manually editing the Freetype code. But why go through the hassle.

Before testing these, you should find out what’s the subpixel geometry of your screen by consulting this page, and set it as the rgba property. Normally, preset files such as 10-sub-pixel-rgb.conf already come installed so you simply have to symlink them to the /etc/fonts/conf.d directory.

NB: These tests don’t seem to show differences with pango-view but starting any other graphical program should be enough.
NB: Fringes are more apparent with white text on black background.

Here’s the result of the comparison, you can clearly see the fringes when the wrong subpixel geometry is chosen, here my screen has rgb geometry. Also, no-subpixel rendering at all seems like a very good choice for bitmap fonts, keep this in mind.

Test Subpixel geometry comparison

I’ve tried to notice the differences between lcddefault, lcdlight, and lcdlegacy but it’s so minimal that it isn’t worth mentioning. So lcddefault should be fine in most cases. Someone made a comparison on this website if you want to check.

NB: It is rare, but if fonts look deformed on your screen it might be because your DPI isn’t detected properly by fontconfig. Find it on X11 by doing xdpyinfo | grep -B 2 resolution and set it with the following match:

<match target="pattern">
	<edit name="dpi" mode="assign">


Antialias is the settings you should almost never turn off, unless your font is bitmap/non-scalable.
This picture clearly shows the advantage of antialias on scalable fonts. On the right is the non-antialiased version.

Test Anti-Alias comparison

Weird things happen when the 10-scale-bitmap-fonts.conf preset is present. The following image shows a bitmap font without hinting and antialias on the left and on the right with them. Removing this file should fix the font and show it as crisp as possible.

Test Anti-Alias bitmap

NB: If you want to convert bitmap/pcf/bdf fonts to be supported by Pango see this thread on the forums.

Applying What We’ve Learned

Some fonts are known to render badly with Freetype, such as Windows fonts. So let’s test what we’ve learned to make them look better.

You can get a copy of the Windows font from a Windows machine, they are present in the C:\Windows\Fonts\* directory (PS: I do not take responsibility if you do this, for legal reasons).
You should now have the fonts, put them in either $XDG_DATA_HOME/fonts (usually $HOME/.local/share/fonts) or $XDG_DATA_DIRS/fonts (usually /usr/share/fonts).
Be sure to have followed the previous advice of renaming 50-user.conf to 01-user.conf, and confirm that your local font configuration is the first by executing fc-conflist.

Now let’s take the name of all the Windows font we got:

fc-query --format='%{family}\n' * | sort | uniq
  • Arial
  • Arial Black
  • Calibri
  • Calibri Light
  • Cambria
  • Cambria Math
  • Comic Sans MS
  • Consolas
  • Georgia
  • Impact
  • Javanese Text
  • Segoe Print
  • Segoe Script
  • Segoe UI
  • Segoe UI Emoji
  • Segoe UI Historic
  • Segoe UI Black
  • Segoe UI Light
  • Segoe UI Semibold
  • Segoe UI Semilight
  • Segoe UI Symbol
  • Tahoma
  • Times New Roman
  • Trebuchet MS
  • Verdana
  • Webdings
  • Wingdings

And let’s add some rules to our fontconfig file as follows:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

	<description>Make Windows Font Look Good</description>

	<match target="font">
		<edit name="iswindowsfont" mode="assign">
					<string>Arial Black</string>
					<string>Calibri Light</string>
					<string>Cambria Math</string>
					<string>Comic Sans MS</string>
					<string>Javanese Text</string>
					<string>Segoe Print</string>
					<string>Segoe Script</string></eq>
					<string>Segoe UI</string>
					<string>Segoe UI Emoji</string>
					<string>Segoe UI Historic</string>
					<string>Segoe UI Black</string>
					<string>Segoe UI Light</string>
					<string>Segoe UI Semibold</string>
					<string>Segoe UI Semilight</string>
					<string>Segoe UI Symbol</string>
					<string>Times New Roman</string>
					<string>Trebuchet MS</string>

	<match target="font">
		<test name="iswindowsfont" compare="eq">
		<edit mode="assign" name="hinting">
		<edit name="autohint" mode="assign">
		<edit mode="assign" name="hintstyle">
		<edit mode="assign" name="antialias">
		<edit name="embeddedbitmap" mode="assign">


File also hosted here

This may look like a big script and it might be your first time seeing someone write such script for fontconfig but don’t worry. It’s pretty simple overall, it checks the name of the family of the font and sets a variable to true iswindowsfont if it matches. Then, if this is set, it configures the values we want for this group of fonts. You can play with the values if you aren’t satisfied, the grouping should help.
You shouldn’t even have to run fc-cache, this should take effect as soon as you restart an application that uses fontconfig.

fc-match --verbose 'Cambria' | grep iswindowsfont
# iswindowsfont: True(w)


This is it for this post.
I hope you’ve learned a thing or two about font configurations with Freetype and Fontconfig and were surprised by at least one of them.

If you’ve enjoyed my article, have comments, suggestions, or simply want to say thanks, please leave a comment.



  • Internet Archive Book Images / No restrictions

Gonçalo Valério (dethos)

The app I’ve used for the longest period of time September 13, 2020 03:18 PM

What is the piece of software (app) you have used continuously for the longest period of time?

This is an interesting question. More than 2 decades have passed since I’ve got my first computer. Throughout all this time my usage of computers evolved dramatically, most of the software I installed at the time no longer exists or is so outdated that there no point in using it.

Even the “type” of software changed, before I didn’t rely on so many web apps and SaaS (Software as a service) products that dominate the market nowadays.

The devices we use to run the software also changed, now it’s common for people to spend more time on certain mobile apps than their desktop counterparts.

In the last 2 decades, not just the user needs changed but also the communication protocols in the internet, the multimedia codecs and the main “algorithms” for certain tasks.

It is true that many things changed, however others haven’t. There are apps that were relevant at the time, that are still in use and I expect that they will still be around in for many years.

I spent some time thinking about my answer to the question, given I have a few strong contenders.

One of them is Firefox. However my usage of the browser was split by periods when I tried other alternatives. I installed it when it was initially launched and I still use it nowadays, but the continuous usage time doesn’t take it to the first place.

I used Windows for 12/13 straight years before switching to Linux, but it is still not enough (I also don’t think operating systems should be taken into account for this question, since for most people the answer would be Windows).

VLC is another contender, but like it happened to Firefox, I started using it early and then kept switching back and forth with other media players throughout the years. The same applies to the “office” suite.

The final answer seems to be Thunderbird. I’ve been using it daily since 2004, which means 16 years and counting. At the time I was fighting the ridiculously small storage limit I had for my “webmail” inbox, so I started using it to download the messages to my computer in order to save space. I still use it today for totally different reasons.

And you, what is the piece of software or app you have continuously used for the longest period of time?

September 11, 2020

Bit Cannon (wezm)

Finding an Alternative to iOS September 11, 2020 11:20 PM

I've used iPhones since 2008, adding thousands of dollars to Apple's giant pile of cash. Much like my move from macOS to Linux more than 3 years ago, Apple's recent behaviour has prompted me to consider iPhone/iOS alternatives. Join me on this journey into the world of Android and the lack of real choice that smartphones present in 2020.


For about 12 years I've owned iPhones, most bought outright, totalling thousands of dollars. I've held on to my most recent iPhone, an iPhone X longer than all others. Contrary to claims of planned obsolescence it still works well. I like technology though, and was planning to replace it this year and pass it on to my father.

Apple have recently ramped up their hostility towards the developers that make iOS the desirable platform it is. App Store horror stories are nothing new, but lately Apple seems to have really ramped up their desire to extract money from every developer's business, despite being one of the richest companies in the world. They seemingly do so without regard for whether the end-user experience is actually better for it.

Recent events, perhaps starting with the Hey saga and continuing with the ongoing battle with Epic have not reflected well. Apple appears to see developers as owing them for the privilege of being in their store and using their APIs. This is despite app development requiring a yearly membership fee of AU$149, and purchase of Mac hardware for development.

We understand that Basecamp has developed a number of apps and many subsequent versions for the App Store for many years, and that the App Store has distributed millions of these apps to iOS users. These apps do not offer in-app purchase — and, consequently, have not contributed any revenue to the App Store over the last eight years.

Apple App Review Board

Epic decided that it would like to reap the benefits of the App Store without paying anything for them.

— Apple legal submission, via Marco Arment

Apple: Epic only looking for a free ride

Epic, according to Apple, has given Apple $257,000,000 in commission fees in two years over in-app purchases that Apple has no hand, act, part in, doesn't host on their servers, just for the privilege of existing on their OS. ‘Free ride’.

Steve Troughton-Smith

To take just one example, Epic has for years used Apple's groudbreaking graphics technology, Metal. [..] Apple doesn't charge anything beyond its standard commission for the use of Metal or any of the other tools that Epic has used to develop great games on iOS.

— Philip Schiller, via Steve Troughton-Smith

The only alternative to Metal is OpenGL and Apple have deprecated that!

Anyway, whether you agree with Apple or not this whole thing has me (a developer by trade, and a past contributor to the App Store) feeling offside. Additionally, since I now use Linux full-time there are other sources of friction:

  • iPhones work best when paired with a Mac (or even a PC running Windows).
  • Apple only support building apps on Macs, so if I want to cobble together an app for my phone it's no longer possible.

My only real recourse as a consumer is voting with my wallet and perhaps sharing my reasoning on this blog, so here we are. If enough people do this maybe they will take notice, maybe they won't, but I feel I at least need to try. Just like last time, when I sought a replacement for Mac OS X and switched to Linux I have been evaluating alternatives to iOS.

It's worth noting at this point that I really dislike Google. I distanced myself from all of their services about 8 years ago. The only Google service I use regularly is YouTube. I use Fastmail for email, DuckDuckGo for search, Apple + Flickr for photos, Mattermost, iMessage, Matrix, and Telegram for chat.

Evaluating Alternatives

Initial research turned up the following candidates. Almost all were immediately written-off due to lacking apps or being too immature:

  • Android as shipped on a mainstream phone
    • Full of apps and services dependant on Google.
  • LineageOS
  • LineageOS for microG
    • LineageOS with microG compatibility library to allow running apps that rely on Google APIs, without using Google services.
  • postmarketOS
    • Good in theory: An Alpine Linux based OS for your phone. However, it notes, "Beta version. Calls don't work on most phones yet", on the home page.
  • Librem 5 + PureOS
    • By all accounts the expensive hardware is still not great quality and the software is still being built.
  • LuneOS (WebOS)
    • Very small ecosystem.
  • Sailfish OS
    • Bills itself as, "the mobile OS solution for corporations and governments", right on the front page. I am neither of these things.
  • Give up on a smartphone
    • Get a basic phone for calls and texts and do everything else on a real computer, possibly an ultra compact like the GPD Pocket 2.
    • A friend who has never owned a smartphone talked me out of this. It's possible but very inconvenient. Especially due to some things only being possible with a smartphone like ride sharing.

Turns out duopolies suck: you can choose some modicum of respect for privacy with developer hostile Apple, or get a bit more freedom with surveillance capitalist Google. The candidates that seem most viable for me are LineageOS, and LineageOS for microG. To test out this theory I purchased the cheapest phone supported by LineageOS that I could get new: a Redmi 7 by Xiaomi for AU$175.

For the price I was honestly expecting this phone to be hot garbage. It is in fact much better that I expected. However, this was just a platform for testing the software ecosystem, I won't be reviewing the hardware or letting it colour my impressions of Android. If this experiment goes well my plan would be to by a higher quality iPhone replacement phone.


I spent a small amount of time with the stock ROM1 (MIUI) that the Redmi comes with to get a bit of a baseline. It worked well enough and was fairly aesthetically pleasing, but the ads and tracking were truly horrifying. Just take a look at this post describing the steps required to disable data collection and ads — and this is just what you can turn off. Who knows what else it's doing behind the scenes.

LineageOS + Open GApps

I quickly nuked MIUI and installed LineageOS + Open GApps (nano). Open GApps gives you access to some of Google's closed-source apps and libraries, crucially the Google Play Store. The "Open" part of the name refers to the open-source scripts the project publishes for the generation of up-to-date Google Apps packages.

This ROM provides a decent balance between open-source Android and access to the breadth of the Google Play Store. In hindsight The nano version of Open GApps includes more Google than I actually want. I think the ideal for me is the pico package, which is just what's needed to run the Google Play Store.

With this install I attempted to replace the apps that I use most on iOS. For the following apps I just used the Android version:

  • Authy
  • Deliveroo
  • Discord
  • Element (Matrix)
  • Fastmail
  • Firefox
  • Firefox Focus
  • Instagram
  • Mattermost
  • Reddit
  • Slack
  • Telegram
  • Up
  • YouTube

For these apps I found a replacement that I was mostly happy with:

For these apps I wasn't able to/have not yet found a replacement that I was happy with (please don't send me recommendations):

In general I don't find Android apps to be as nice, or as polished as iOS apps. John and Ben recently discussed this on the 9 September episode of Dithering, which matched my experience. I also really dislike the visual style and slow animations of the Material design language. Especially the circular animation on tap. The apps I like the most are the ones that shun the Material style for their own.

Something I learnt from my move to Linux though, was to embrace the platform's conventions as opposed to trying to reproduce the system you're moving from as much as possible. So I will put my dislike of Material aside.

Screenshot of the emoji keyboard on LineageOS

So Ugly

One thing I'm not sure I can put aside is the use of the super ugly Noto Color Emoji font for emoji on Android. On Linux my system emoji font is JoyPixels and I go to certain lengths to avoid seeing Noto Color Emoji. Almost any other widely available emoji font would be preferable to me. I did try side-loading a JoyPixels package when flashing the ROM but couldn't get it to stick. Apparently something changed in Android 10.

I could "root" the phone and swap out the font file but in the same way I've never jail-broken an iPhone this is not a path I want to go down right now. If worst comes to worst I could actually build LineageOS from source and swap out Noto Color Emoji — what a concept!

LineageOS for microG

microG is a library that implements various APIs provided by Google closed-source libraries in order to be able to run more apps — those that depend on Google's mapping APIs for example. The microG versions of the APIs don't rely on Google servers. Critically going down this path you lose access to Google's push notification servers. Some apps like Telegram work around this but for the most part you lose notifications.

LineageOS for microG gives a familiar LineageOS experience initially. Instead of the Google Play Store though, it uses F-Droid, a repository of strictly free and open-source software. As expected there are far fewer apps available on F-Droid. Most of the big names are missing.

I think if you were especially principled, were happy to use web apps for many things (like Twitter), and didn't use a smartphone all that much LineageOS for microG could work. After spending some time with it though, it's just too limited for me.

Picking a New Phone

The experiment so far showed that I could probably get by with LineageOS + Open GApps. I started looking into what real phone to get as opposed to the Redmi 7 test phone. I had these requirements:

  1. I want a phone around the size of my iPhone X (5.8" display).
    • I find larger phones like the iPhone 6 Plus I owned, and Redmi 7 uncomfortable in pocket, especially when sitting.
  2. If I go with Android I want it to run LineageOS or similar (least amount of Google as possible).
  3. Available in Australia.

That basically only leaves Google Pixel 3 and 4 phones. Pixel 4 seems to have been a bit of a dud. It was discontinued after 9 months and the unreleased successor is rumoured to revert a bunch of the changes it introduced: back to fingerprint sensor, removal of radar gesture sensor. Pixel 3 (from 2018) seemed like it could be viable… but then I looked at GeekBench benchmarks:

  • Pixel 3 — 468 single core, 1833 multi-core
  • Pixel 4 — 610 single core, 2210 multi-core
  • iPhone X — 916 single core, 2334 multi-core

At the time of writing no Android phone is faster in single core performance than my iPhone X from 2017. The OnePlus 8 is at the top with a score of 900. It seems they caught up on multi-core ~last year (by having more cores).

So if the Pixel 3 is my main option I'd be spending money to upgrade to a significantly slower phone made by Google to escape Apple's restrictive, developer hostile, albeit more privacy respecting ecosystem… this is not immediately compelling.

Closing Thoughts

I'm really torn. The upcoming Pixel 5 would likely be a good option if it were possible to strip out as much of the Google dependencies as possible. If past releases are anything to go by it seems that it's likely to be almost another year or so before LineageOS is available for the Pixel 5.

I don't like the idea of buying a Pixel 3 given that it's a step backwards performance wise. After 3 years with the iPhone X I kind of what the replacement to perform close to it. Sadly modern web pages and fake native apps (apps built with web tech) demand fast performance. For example, the Redmi 7 has a really hard time with long Medium articles.

Another option would be to just keep using the iPhone X. It still performs well, battery capacity is still 89% of new, it's still getting major iOS updates. And I'm still voting with my wallet by not giving Apple more money. I did however tell my Dad to hold off buying a new phone earlier in the year because he could have mine when I replace it. So I kind of need a new phone one way or another.

For now I'm going to wait for the Pixel 5 and new iPhones to be released later this year and continue to follow Apple's behaviour towards developers. It's not uncommon for them to actually listen to their customers eventually — often it takes longer than it feels it should though (*cough* butterfly keyboard). As usual subscribe to the feed, or follow me on Twitter or the Fediverse for future updates.


I'm using "ROM" (Read Only Memory) here knowing that it's incorrect, since that's the typical language for alternate OSes for Android phones.

Jan van den Berg (j11g)

Moby-Dick – Herman Melville September 11, 2020 06:20 PM

I suspect Moby-Dick — the quintessential Great American Novel — has the curious accolade of being one of the most famous books ever, while also being one of the least read books. Its reputation greatly exceeds its appeal. Nonetheless, I had always wanted to read this extraordinary 170 year old book. And now that I did, I think I understand its reputation as well as I understand the incongruent appeal.

Moby-Dick stats

Moby-Dick clocks in around 650+ pages and 212,000 words. It’s not a small book but it’s also not the biggest book I ever read. But it was definitely one of the hardest, and one that demanded a dedicated and focused effort to finish.

Long story short: reading Moby-Dick is hard work and it’s not exactly the most riveting thing I ever read.

It doesn’t keep you on the edge of your seat. Surprisingly very little happens for such a big book. You can summarize the entire thing in one sentence (yes, I’ll get to the allegories later).

That is not to say that this is not a smart book. Herman Melville’s IQ probably bordered on genius and he pulled out all the stops with Moby-Dick. However, those two things don’t necessarily make for a good book. Why is it then than Moby-Dick is so revered? I can think of a few things.

Moby-Dick – Herman Melville (1851) – 656 pages. Don’t mind the sticker.

Words, just so.many.different.words

Melville’s dictionary must be the most abused book ever. Because if there was an Olympics for using the most different words, Herman Melville would win first, second and third place. This is actually a scientific fact: “About 44% of the distinct set of words in this novel occur only once”

Read that again: 44% of all words in Moby-Dick are used only once.

If you don’t believe me just open this book on any page and you can tell this right away. Moby-Dick is not like any other book.

It is divided in 135 small chapters — and one very important epilogue — each chapter deals with a dedicated subject. And it seems Melville took it as an exercise to fill each chapter with as many different words as he could. Not only that, he likes to use long, half page long rambling sentences. There is also an enormous variation in style per chapter; from dialogue to scientific descriptions to inner thoughts to poetic or philosophical or almost theatrical treaties. And to top it all off, this is all done in English from 170 years ago. Just to give you an idea of what a chore it is to read.

And all of these things are reasons Moby-Dick stands out among other books. Another is because it’s about whaling.


Whaling in the 19th century was astoundingly difficult and fantastical venture. If I hadn’t known about it and you would explain it to me I wouldn’t believe you. People actually set out on wooden ships for three or four years and just randomly sail around the world until they found some whales?! Whales that are actual leviathans and that can kill any man in an instant? And when they do spot these whales, they set out on even smaller wooden boats to try to harpoon these 100 foot creatures, BY HAND?! Surely this is all made up! This cannot be real! But it is.

Whaling is an absolutely insane endeavour. And this makes it a terrific backdrop for a story.

I would like to argue no man before or after has know more about whaling than Melville. He not only writes from his own experiences as a whaler, he also had probably read everything ever written (at that point) about whaling and whales. And he uses all this knowledge to bombard the reader with more facts than your brain can handle, about whaling, whales and whalers.

He also shares detailed glimpses of 19th century Nantucket life. Which makes this book a time-capsule of the American spirit. These are reasons this book is so revered in the English speaking world. So much so, that it is regarded as the definitive Great American Novel.

Even though the book suffered greatly from negative reviews and criticism about alleged blasphemy. And it wasn’t until a good 70 years later that Moby-Dick started to be regarded as the classic we now all know. (But this is a story by itself).

Without the bookcover. Gorgeous.


On to the good parts. Moby-Dick is not really about the demonstration of Melville’s mastery of language or even about whaling. These two things make it unique, but what makes it good is what is under the surface (see what I did there?).

This book is absolutely brimmed with allegories, allusions and metaphors. Some small, some encapsulate the entire plot, some are even displayed by the book’s structure.

The most clear-cut one is of course that the whale Moby-Dick represents fate itself. But there are many more. Philosophical or contemplative of nature. You can talk and discuss and debate on this endlessly.


There is one meta-allegory I particularly like. In Moby-Dick we read about a whaler, Ahab, that sets out to kill this mythical monster Moby-Dick, a sperm whale he lost his leg to previously. We as a reader slowly get to experience how this whaler goes maniacally insane and takes his crew with him. Until they all go under.
In a sense this is about Melville himself and his experience and difficulty writing this book! And we, the readers, are the crew.

This is just one take. But there are many more direct allegories, about names, stories and references. Specifically the boats and captains Ahab and Ishmael meet along the way, are loaded with biblical references and meaning. I am sure I probably missed a whole bunch too. Melville uses these narrative devices to deal with many different themes. And it is exactly this what sets Moby-Dick apart from other books. There is a score of things that aren’t said, but implied.

My copy of the book ends with a couple of letters from Melville about his book and his struggles in getting it published. Right after the letters the book, oddly enough, shares a couple of very negative reviews from the time of publishing. I am not sure why they are in there. Maybe to demonstrate that people did not recognize the genius at once? Or how remarkable it is that this book still became a classic? I am not sure.


All in all Moby-Dick is a distinctive and unique reading experience detailing a story about a very specific time and endeavour. And I can now boast “I read Moby-Dick”, and I am glad I did but I will also say I didn’t really enjoy reading it all that much.

I think I understand what Melville set out to do and I admire his genius. I also think I understand the appeal of this book 170 years later. This book makes you work and that is not a problem, but there were times that I really had to force myself, and that does not happen to books that are favorites of mine.

Melville was a genius wordsmith and put many ideas in this book for people to contemplate over for generations to come. But as is the case with music, I don’t care how many different notes a guitar player can hit on his guitar in 1 minute, that is not music, that is a demonstration of mastery. In the end it is about what songs this mastery produces. And in this case, I think I wanted to have liked the song more.

The post Moby-Dick – Herman Melville appeared first on Jan van den Berg.

September 09, 2020

Kevin Burke (kb)

Let employees sell their equity September 09, 2020 10:27 PM

Sometimes people choose to work for one company over another for reasons related to the work environment, for example what the company does, and whether the other employees create a place that's pleasant to work at. But a major factor is compensation. If Company A and Company B are largely comparable, but Company A offers $30,000 more in base pay per year more than Company B, most people will choose Company A.

At tech companies, compensation usually breaks down into four components: company stock, benefits, cash salary, and bonus. When you get an offer from a company, these are the four areas that the recruiter will walk you through. The equity component is a key part of the compensation at startups. Small startups hope that the potential for a large payoff is worth sacrificing a few years of smaller base pay.

If you join a small startup and you get stock, you generally can't sell it until an "exit event" - an IPO or acquisition - even if your entire stock grant has vested. Generally, any stock sale before an exit event will require approval of the board, and the boards generally frown on stock sales, for reasons I will get into. So while you may own something that is worth a lot of money, you can't convert it into cash you can actually spend for a half decade or more.

By contrast, if you join a public company, your compensation includes equity that you can sell basically immediately after it vests, because it trades on a public exchange. There are hundreds of people who will compete to offer the best price for your shares every day between 9am and 4:30pm.

As an employee, how should you think about the equity component of your offer? One reason to take a big equity stake is to bet on yourself. If you have a great idea about how you can make the company 10%, 50%, or 200% more valuable, and you think you can execute it, you should take an equity stake! After you implement the changes, your equity will be massively more valuable. Broadly speaking this is what "activist investors" try to do; they have a theory about how to improve companies, they buy a stake and hope the value changes in line with the theory.

One problem with this is that you are much likely to be in a position to make these changes if you are someone important like a C-level executive or a distinguished engineer. However, most tech employees are not C-level executives. If you are an engineer on the fraud team, and you try really, really hard at your job for a year, maybe you can increase the value of the company by 1% or 2%. You are just not in a position, scope wise, to drastically alter the trajectory of the company by yourself.

Rationally speaking, it does not make much sense for you, an engineer on the fraud team, to double or triple your effort just to make your equity stake worth 1% more. There might be other reasons to do it - you could really buy into the mission, or you hate being yelled at or whatever - but just looking at the compensation, whether you, personally, work really hard or slack off, your stock is probably going to be worth about the same. Unless you are the CEO or other C-level executive, at which point you have a big enough lever that your level of effort matters.

Another way to think about it is, imagine you have invested your money in a broad range of stocks and bonds, and then someone asked you to sell 30% of it and place it all in a single tech stock. Modern portfolio theory would suggest that that is a bad thing to do. You could gain a lot if the stock does well, but on the other hand, if the company's accountant was embezzling funds, or the company lost a lawsuit, or the company lost a database or had the factory struck by lightning or something, you could lose a ton of money that you wouldn't if you were better diversified. It's not worth the risk.

All this goes to say that employees should value their equity substantially less than an equivalent amount of cash. Outside of the C-level, you can't do much to make the equity more valuable, and an extra dollar worth of equity takes your portfolio further away from an ideal portfolio that you could buy if you just had cash. (For more on this topic you should read Lisa Meulbroek (hi, Professor Meulbroek), whose CV is criminally underrated.)

(On the flip side, if your company is small and valuable, it may have its pick of investors to take money from, and be able to dictate investment terms. Holding equity in a company like this is a way to approximate the "deal flow" of a good Silicon Valley investor - as an employee you are getting the chance to buy and hold stock in a company at prices that would not be accessible to you otherwise. This may be true of small, hot startups but it gets less and less true the bigger a company gets and the more fundraising rounds it goes through.)

One implication is that you should prefer to work at public companies. At a public company, you can take your equity compensation and immediately sell it and buy VT (or even QQQ) or whatever and be much better off because you are diversified. You can't do that at a private startup.

Another problem is that public companies tend to have better equity packages. I went through a round of interviews recently and I was stunned at how paltry the equity offers were from private, Series A-C companies. For most of the offers I received, the company valuation would need to increase by 8-20x for the yearly compensation to achieve parity with the first-year offer from a public SF-based company, let alone to exceed it. Even if they did achieve 4 doublings of their valuation, you might not be able to sell the private company stock, so you're still behind the public company.

I expect larger companies to have better compensation, it's part of the deal, but that large of a differential, plus the cash premium to be able to sell instantly, makes it foolish to turn down the public company offer. 1

So how can you compete if you're a smaller company? The obvious answers are what they've always been: recruit people with backgrounds that bigger companies overlook, give people wild amounts of responsibility, sell people on the vision, commit to "not being evil" and actually follow through on it.

But you can also try to eliminate an advantage that public companies have by letting your employees sell their equity. Not just, like, one time, at a huge discount before you go public, or when you get to Stripe's size and want to appease your employees. But routinely; because your employees want to boost their cash base, or buy the stock market, or buy a vacation, or whatever.

There are some objections. Having more than 500 shareholders triggers SEC disclosure requirements, which can be a pain to deal with. So require employees to sell to other employees or existing investors. Cashing out entirely might send the wrong signals, so limit sales to 10-20% of your stake per calendar year. A liquid market might require repricing stock options constantly. So implement quarterly trading windows.

Executives might not want to see what the market value of your stock is at a given time. That's tougher. But a high day-to-day price might convince people to join when they otherwise wouldn't. A low price might convince you to change direction faster than waiting for the next fundraising round.

There are also huge benefits. Employees can cash in earlier in ways that are generally only available to executives. They can take some risk off the table. People who want to double up on their equity position can do so.

Finally, you might be able to attract employees you might not otherwise be able to. A lot of folks who are turned off by the illiquidity of an equity offer might turn their heads when you describe how they can sell a portion at market value every year.

Big companies have big moats. One of them - the ability to convert stock to cash instantly - doesn't need to be one.

Thanks to Dan Luu and Alan Shreve for reading drafts of this post.

You may think they were lowballing me, but this was after negotiation with each. Another possibility is that I did differently on the interviews for each, and the smaller companies offered me lower packages because they thought I did worse. I think I did about equally well on the interviews for each.

Patrick Louis (venam)

Notes About Compilers September 09, 2020 09:00 PM

Architect style wall, nothing really related but it looks good and gives a vibe

Compilers, these wonderful and intricate pieces of software that do so much and that so many know little of. Similar to the previous article about computer architecture, I’ll take a look at another essential, but lesser known, CS topic: Compilers.
I won’t actually dive into much details but I’ll keep it short to my notes, definitions, and what I actually found intriguing and helpful.

General schema of a compiler pieces

A compiler is divided into a frontend and a backend. The frontend role is to parse the textual program, or whatever format the programmer uses to input the code, verify it, and turn it into a representation that’s easier to work with — an IR or Intermediary Representation.
Anything after getting this intermediate representation, which is usually either a tree or a three-address code, is the backend which role is to optimize the code and generate an output. This output could be anything ranging from another programming language, what’s called a transpiler, to compiling into specific machine code instructions.
These days many programming languages rely on helpful tools to make these steps easier. For example, most of them use Yacc and Lex to build the front-end, and then use LLVM to automatically have a backend. LLVM IR is a backend that could in theory plug to any compiler frontend, thus any compiler relying on it will necessarily benefit from optimizations done in the LLVM IR.

Personally, I’ve found that the most interesting parts were in the backend. While the frontend consist of gruesome parsing, things become fascinating when you realizing everything can be turned into three-address code, instructions that consist of maximum 3 operands and that have only one operand on the left side for assignment and one operator on the right.
From this point on, you can apply every kind of optimizations possible, like if loops over arrays can have their address represented by linear functions, or if dependence between data allows to reposition the code, of if following the lifetime of values help. In the backend you can manage what the process will look like in memory, and you can also implement garbage collection.

Overall, learning a bit about compilers doesn’t hurt. It gives insights into the workings of the languages we use everyday, removing the magic around them but keeping the awe and amazement.
So here are my rough notes and definitions I took while learning about compilers, I hope these help someone going on the same path as there’s a lot of jargon involved.


  • Terminals: Basic symbols from which strings are formed, also called token name.

  • Nonterminals: Syntactic variables that denote sets of strings. It helps define the language generated by the gammar, imposing a hierarchical structure on the language that is key to syntax analysis and translation.

  • Production: What nonterminals produce, the manner in which the terminals and nonterminals can be combind to form strings. They have a left/head side and a body/right side, separated by -> or sometimes ::==

  • Grammar: The combination of terminal symbols, nonterminal symbols, productions (nonterminals output)

  • Context free grammar: It has 4 components: terminal symbols/tokens, nonterminal symbols/syntactic variables (a string of terminals), productions (nonterminals called the head/left side + arrow + sequence of terminals and/or nonterminals the body or right side), and the designation of nonterminals as start symbol.

  • The language: The strings that we can derive from the grammar.

  • Parse Tree: Finding a tree that can be used to derive/yield a string in the language.

  • Parsing: The process of finding a parse tree for a given string of terminals.

  • Ambiguous grammar: A grammar that can have more than one parse tree that can generate a given string.

  • Associativity: The side to which the operator belongs to if the operator is within two tokens. Could be left-side associativity or right-side associativity. This is a way to assign and resolve the priority/precedence of operators.

  • Syntax-directed translation scheme: Attaching rules (semantic rules) or program fragments to productions in a grammar. The output is the translated program.

A schema representing simple syntax-directed translation

  • Attributes: Any quantity associated with a programming construct.

  • Syntax tree: The tree generated from a syntax-directed translation.

  • Synthesized attributes: We can associate attributes with terminals and nonterminals, then also attach rules that dictate how to fill these attributes. This can be done in syntax-directed translation.

  • Semantic rules: When displaying a syntax-directed grammar, the semantic rules are the attached actions that need to be done to synthesized attributes (other than the usual production).

  • Tree traversal: How we visit each element of a tree, could be depth first, aka go to children first, or breadth first/top-down, aka root first.

  • Translation schemes: executing program fragments, semantic actions, instead of concatenating strings.

  • Top-down parsing: Start at the root/breadth first, the starting nonterminal, and repeatedly perform: select one production at that node and construct children, find next node at which the subtree is constructed. The selection involves trial and error.

  • lookahead symbol: The current or future terminal being scanned in the input. Typically, the leftmost terminal of the input string.

  • Recursive-descent parsing: a top-down method of syntax analysis in which you recursively try to process the input. There’s a set of procedures, one for each nonterminal.

void A() {
	Choose an A-production, A->XaX2 ... Xk;
	for (i = 1 to k) {
		if (Xi is a nonterminal) {
			call procedure Xi();
		} else if ( i equals the current input symbol a) {
			advance the input to the next symbol;
		} else {
			/* an error has occurred */
  • Backtracking: Going backward in the input to parse them again using another production as the new choice.

  • Predictive parsing: A form of recursive-descent parsing in which the lookahead symbol unambiguously determines the flow of control through the procedure body of non-terminal. This implicitly defines a parse tree for the input and can also be used to build an explicit parse tree. The procedure does two things: It decides which production to use by examining the lookadhead symbol if it is in the FIRST(a), The procedure mimics the body of the chosen production, it fakes execution until a terminal.

  • FIRST(a): Function to return the set of terminals that appear as the first symbols of one or more strings of terminals generated from a.

    1. If X is a terminal, then FIRST(X) = {X}.
    2. If X is a nonterminal and X-> Y1Y2...Yk is a production, then place a in FIRST(X), a in FIRST(Yi) and ε in all of FIRST(Y1)...FIRST(Yi-1), that basically means that X -> ε a.
    3. If X -> ε is a production, then add ε to FIRST(X).

  • FOLLOW: Function to return the rightmost symbols in the derivation sentential form.
    1. Place $ in FOLLOW(S), S is the start symbol
    2. If there is a production A-> aBb then everything in FIRST(b) except ε is in FOLLOW(B), so in sum any terminal that follows B
    3. If there is a production A -> aB, or a production A -> aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B).

  • Left recursion: A recursive-descent parser could loop forever, we need to avoid that. It can be eliminated by rewriting the offending production. Example A -> Aa | B which is left recursive, can be rewriten as A -> BR, R -> aR | ε.
    Algorithm to remove left recursion:
arrange the nonterminals in some order A1,A2,..., An
for (each i from 1 to n) {
	for (each j from 1 to i-1) {
		replace each production of them form Ai -> Aiy by the
		productions Ai -> d1y | d2y| .. | dky, where
		Aj -> d1 | d2 | ... | dk are all current Aj-productions
	eliminate the immediate left recursion among the Ai-productions
  • Left Factoring: When it’s ambiguous which production to select in A -> aB1 | aB2 , we can defer the selection to later by factoring it to A -> aA1 and A1 -> B1 | B2. We factor by the most common prefix.

  • Abstract syntax tree or syntax tree: A tree in which interior nodes represent an operator and children node represent operands of the operator. They differs from parse tree in the way that they have programming construct in interior nodes instead of non-terminals.

  • Token: Terminal with additional information, name and optional attribute value. The name is an abstract symbol representing a kind of lexical unit, be it a keyword, an identifier, etc.

  • Lexeme: sequence of characters from the source program that comprises a single token name. It’s an instance of that token.

  • Pattern: A description of the form that the lexeme of a token may take. A sequence of characters that form a keyword or form identifiers and other tokens, any more complex string structure that needs to be matched.

  • Lexical analysis/analyzer: a lexical analyzer reads characters from the input and groups them into “token objects”. Basically, it creates the tokens. It could be split into two parts, a scanning that consists of processing input by removing comments and compacting white spaces, and a proper lexical analysis that is the more complex portion that produces the token.

Interaction between syntax analyzer and parser

  • Reading Ahead: It’s useful to read future characters to decide if they are part of the same lexeme. A technique is to use an input buffer or a peek variable that holds the next character.

  • Input buffering: The best technique is to use a buffer pairs, 2 buffers of the size of a disk block so that reading is more efficient. We use a lexemeBegin pointer and a forward pointer. To check if we are out of bound of a buffer or that reading is finished we can use “sentinels”, special characters that specify the end of file, if in the middle, or end of buffer, if at the end of the buffer. This character can be EOF.

  • Keywords: character strings, lexeme, that identify constructs such as if, for, do, etc.

  • Identifier: also a character string, lexeme, that identify a named value.

  • Symbol Tables: Data structures used by compilers to hold information about source-program constructs. The info is collected incrementally by the analysis phase and used by the synthesis phases to generate the target code. Entries in it contain info about identifiers such as its character string (lexeme), its type, its position in storage, and any other relevant info. Each scope usually have their own symbol table. It gets filled during the analysis phase, the semantic action fills the symbol table, then for example factor -> id, id token gets replaced by its symbol that was declared in the table.

  • Intermediate Representations: The frontend generates an intermediate representation of the source program so that the backend can generate the target program. The two most important are: Trees (parse trees and abstract syntax trees), and linear representations (such as three-address code).

  • Static checking: The process of checking if the program follows syntactic and semantic rules of the source language. Assures that the program will compile successfully, and catches errors early. Contain: Syntactic checking: checks grammar, identifier declared, scope check, break statement at end of loops, and type checking.

  • Type Checking: Assures that an operator or function is applied the right number and type of operands, also handles the conversion if necessary aka “coercion”.

  • Strings and languages: A string is synonym for word or sentence. It’s a finite set of characters, its length measured as |s|. ε, or e, is the string of length 0, |ε| = 0. Strings can be concatenated, if concatenated with ε they stay the same. We can define exponentiation of strings, as in s**0 = ε, s**1 = s, s**2 = ss, s**n = s**(n-1) s. Language is the countable set of strings over an alphabet, a set { x y z }, empty set is 0 or { ε }.

  • Operations over language: We can perform union, concatenation, and closure, which are the most important operations. Union of two different languages is the same as in set theory. Concatenation is strings formed by taking strings from the first language and string from the second language. Closure aka kleene of L or L* is a set of string you can get by doing concatenation of L zero or more time. L+, positive closure is 1 or more time.

Closure operation

DFA NFA NDFA NFA Example Subset construction Transition table for conversion

  • Regular expressions aka regex: The joint of all operations over language done in an expresive way. There are precedence priority rules: All are left associative, the highest precedence goes to *, then concatenation, then union |. A language that can be defined by regular expressions is called a regular set.

Algebraic laws for regular expressions

  • Regular definitions: Like variables holding a regex for later use, to make it more readable.

Summary of lexer 1 Summary of lexer 2 Summary of lexer 3

Notational convention 1 Notational convention 2

  • Aho-Corasick algorithm: Algorithm that permits to find the longuest prefix that matched a single keyword that is found as a prefix of a string. It defines a special transition diagram called a trie, it is a tree structured transition diagram. Define for every node of that tree a failure function which is the previous state that fits the prefix, f(s), s being the current position in the string we are trying to match. The seek pointer should be put back at b(f(s)+1) in case of error. There’s also the KMP algoritm to match the string.

Pseudo code for failure function:

t = 0;
f(1) = 0;
for (s = 1; s < n; s++) {
	while (t > 0 && b(s+1) != b(t+1)) t = f(t);
	if (b(s+1) == b(t+1)) {
		t = t + 1;
		f(s+1) = t;
	} else {
		f(s+1) = 0;
  • Conflict resolution in Lex:
    1. Always prefer a longer prefix to a shorter prefix.
    2. If the longest possible prefix matches two or more patterns, prefer the pattern listed first in the Lex program.

Position of Parser in compiler model

  • Constructing parse tree through derivation: Begin with the start symbol and then at each step replace a nonterminal by the body of one of its production. It’s a top-down construction of a parse tree. We use the => to denote “derives”. This proves that a certain terminal derives, in a number of steps, from a particular instance of an expression. If a form with no-nonterminals derives from the start symbol we can say that it is a sentential form of the grammar. The language of a grammar is the set of sentences. Some grammars can be equivalent, different path for same sentence, we denote leftmost derivation and rightmost/canonical derivation.

  • LL(1) grammars: L for left to right scanning, L for a leftmost derivation, 1 for using one input symbol of lookahead at each step to make parsing action decisions.

  • Constructing parse trees through reduction: Reduction or bottom-up parsing, is the inverse of derivation, it consists of reducing terminals until the start symbol is found. A “handle” is a substring that matches the substring of the body of a production, it is reduced to the left-most/head of it.

  • LR(k) parsing: L for left to right scanning of the input, R for constructing the rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions.

  • Items in LR(0): States represent sets of “items”, it’s a production of G grammar with a dot at some position of the body indicating where we are in the parsing. For example: A -> X . Y Z

  • Augmented expression grammar in LR(0): a grammar with added initial state S' that produces S, such as: S' -> S, we accept the state once everything is reduced to S'.

  • CLOSURE and GOTO in LR(0): The CLOSURE of a set of items I for a grammar G is constructed as follows: add every items in I to the CLOSURE(I), if A -> a.Bb is in CLOSURE(I) then replace B by what B produces, example: B -> y, then we add A -> a.yb we do this until we cannot apply this rule anymore. We call the added ones nonkernel items, and the initial ones kernel items.
    The GOTO(I, X), where I is the set of items and X a grammar symbol, defines the closure of the set of all items [ A -> a X. b ] such that [A -> a .X b] is in I. It defines the transitions in the LR(0) automaton for a grammar on input X.

  • CLOSURE and GOTO in LR(1): LR(1) is similar to LR(0) however it has one lookahead character, an item as the form: [A -> a.B, a] where this production is valid only when the next input symbol is a.

SetOfItems CLOSURE(I) {
		for (each item [A -> a.Bb, a] in I)
			for (each production B -> y in G')
				for (each terminal c in FIRST(ba)
					add [B -> .y, c] to set I;
	until no more items are added to I;
	return I;

SetOfItems GOTO(I, X) {
	initialize J to be the empty set;
	for (each item [A -> a.XB, a] in I)
		add item [A -> aX.B, a] to set J;
	return CLOSURE(J);

void items(G') {
	initialize C to {CLOSURE({|S' -> .S, $|})};
		for (each set of items I in C)
			for (each grammar symbol X)
				if (GOTO(I, X) is not empty and not in C)
					add GOTO(I,X) to C;
	until no new sets of items are added to C;

LR(1) example 1 LR(1) example 2 LR(1) example 3 LR(1) example 4 LR(1) example 5

Parser Summary 1 Parser Summary 2 Parser Summary 3 Parser Summary 4 Parser Summary 5 Parser Summary 6

  • L-attributed translations: Class of syntax-directed translations (L for left-to-right), which encompass virtually all translations that can be performed during parsing.

  • SDD, syntax-directed definition: Context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. If X is a symbol, X.a shows a as an attribute of X.

  • Synthesized and Inherited attributes: Synthesized attributes at node N for nonterminal A are computed from the semantic rules at that node, while inherited the attribute of the children are computed from the parent’s semantic rules. Terminals only have synthesized attributes.

  • S-attributed SDD: A syntax directed definition that only contains synthesized attributes, that is the head attributes are computed from its production body at node N only (not parent).

  • L-attributed SDD: Where the inherited attributes are only defined by one of the attribute on the left or in the head of the production (left-to-right).

  • Attribute grammar: An SDD without side effects.

  • Annotated parse tree: A parse tree showing the value(s) of its attribute(s).

  • Dependency graph: A graph with arrows/edges pointing in the direction of the value that depends upon the other side of those arrows. It’s applicable for both synthesized attributes and inherited attributes.

  • Topological sort: a way of sorting the dependency graph in a way in which the attributes/node have to be processed. When there are loops in the dependency, topological sorts are not possible.

  • Syntax-directed translation (SDT) for L-Attributed Definition: A syntax directed translation where we put the action/semantic-rule right before the character that requires them, and put the semantic-rule of the head as the last rule.

Syntax directed definition summary 1 Syntax directed definition summary 2 Syntax directed definition summary 3

  • Directed acyclic graph (DAG): A way to convert a syntax-directed definition into a graph where leaves are unique/atomic operand, and interior nodes correspond to operators. A leaf node can have many parents. It expresses the syntax tree more succintly and can be used for generation of efficient code to evaluate expressions. Nodes can be stored in an array of records, where each row represents one node. Leaves have a field as lexical value and interior nodes have two fields for left and right children.
|1| id  |  ----|-> to entry for i
|2| num | 10   |
|3| +   |1 | 2 |
|4| =   |1 | 3 |
|5|  ....      |

     .- = .
   .'      `.         
  :         +         
  :      .'  `.       
  `.   .'      `      
     i          10    

Intermediate representation position in compiler 1 Intermediate representation position in compiler 2

  • Three-address code: Instructions where there are at most one operator on the right side. It is a linear representation of a syntax tree or a DAG in which explicit names correspond to the interior nodes of the graph. Three-address code is composed of addresses and instructions. An address could either be a name, a constant, or a compiler-generated temporary. Common instructions used can be an assignment instruction (x = y op z), unary operator assignment (x = op y), copy instruction of the form x = y, unconditional jump goto L, conditional jump of the form if x goto L and ifFalse x goto L, conditional jump such as if x relop y goto L relop being a conditional operator, and procedure calls such as param x for parameters and call p, n and y = call p, n (last n arguments) for procedures and function calls respectively, and return y y being the returned value, indexed copy instructions of the form x = y[i] and x[i] = y, and address and pointer assignments of the form x = &y x = *y and *x = y.

if-else to three-address code 1 if-else to three-address code 2 if-else to three-address code 3 if-else to three-address code 4

  • Quadruples (in the context of three-address code): A table where we map 4 columns: op, arg1, arg2, result. Unary operators don’t fill arg2, param don’t fill arg2 nor result, and conditional jumps have the target label in result.

  • Triples (in the context of three-address code): A table where we map 3 columns, similar to quadruples, but without the result. The result is referred to by its position only. They are one to one with syntax tree. Indirect triples are like triples but instead of pointing the result directly we point to the result position in a separate instruction table, and thus can move chunks of code independently.

  • Static single-assignment form (SSA): An intermediary representation similar to three-address code but where all assignments are to variables with distinct names. It uses ø-function to combine definitions of the same variable, returns the value of the asignment-statement corresponding to the control-flow path.

  • Translation applications: From the type of a name, the compiler can determine the type of storage (storage layout) that will be needed for that name at run time. Type information can be used to calculate addresses denoted in arrays for example.
    Array layout is either row major or column major, as: base + (i-low)*w Some types could be left chosen by the output archicture, left as symbolic type width in the intermediate representation.

  • Type checking: A method the compiler uses, with a type system, to assign type expression to each components of a source program to avoid inadvertent error and malicious misbehavior. A language is either strongly typed or not, meaning it needs all the types to be chosen explicitly.
    Two forms: synthesis and inference, synthesis builds up the type of an expression from the type of its subexpressions. It requires names to be declared before they are used. ex: if f has type s -> t and x has type s, then expression f(x) has type t. Type inference determines the type of a language construct from the way it is used. ex: if f(x) is an expression, then for some a and b, f has type a -> b and x hs type a.

  • Implicit and explicit type conversion: implicit conversion is when the compiler coerces the types, usually when widening types, explicit is when the programmer must write something to cause the conversion. Two semantic actions for checking E -> E1 + E2 one is max(t1, t2) another widen(a,t,w) which widen address a of type t into a value of type w.

Addr widen(Addr a, Type t, Type w) {
	if (t = w) return a;
	else if (t = integer and w = float) {
		temp = new Temp();
		gen(temp = '=' (float)' a);
		return temp;
	} else {
  • Polymorphic function: A type expression with a stands “for any type” which the function can be applied to. Each time a polymorphic function is applied, its bound type variables can denote a different type.

  • Unification: The problem of determining whether two expressions s and t can be made identical by substituting expressions for the variables in s and t.

  • Boolean expressions: Either used to alter the flow of control or to compute logical.

B -> B || B | B && B | !B | (B) | E rel E | true | false

We can short-circuit boolean operators, translating them into jumps:

if (x < 100 || x > 200 && x != y) x = 0;

equivalent to:

if x < 100 goto L2
ifFalse x > 200 goto L1
ifFalse x != y goto L1
L2: x = 0
  • Backpatching: A method of generating labels for jumps in boolean expression (ex: if (B)) S) in one pass as synthesized attributes.

Intermediate representation summary 1 Intermediate representation summary 2

  • Run-time environment: The environment provided by the operating system so that the program runs. Typically:
Free memory

General activation record

  • Stack vs Heap: Stack storage: for names local to a procedure. Heap storage: data that may outlive the call to the procedure that created it (we talk of virtual memory).

  • Memory Manager: A subsystem that allocates and deallocates space within the heap, it serves as an interface between application programs and the operating system. It performs two basic functions: allocation and deallocation.
    A memory manager should be space efficient, minimizing the total heap space needed by a program, program efficient, it should allow the program to run faster by making use of the memory subsystem, and have a low overhead, because memory allocation and deallocation are frequent in many programs.

  • Garbage collectors: A piece of code to reclaims chunks of storage that aren’t accessed anymore.
    Things to consider: overall execution time, space usage, pause time, program locality.
    Either we catch the transition when object become unreachable (like reference counting), or we periodically locate all the reachable objects and then infer that all the other objects are unreachable (trace-based).

  • Mutator: A subsystem that is in charge of manipulating memory. It performs 4 basic operations: Object allocation, parameter passing and return values, reference assignments, procedure returns.

  • Root set: All the data that can be accessed directly by a program, without having to dereference any pointer.

  • Code generation: The process of generating machine instruction/target program (be it asm or other) from an intermediary representation.

  • Addresses in target code: The code found in a static area is used for, static for global constants, and the heap is the dynamic managed area during program execution, stack is dynamic for holding activation records as they are created and destroyed during calls and returns

Environment summary 1 Environment summary 2 Environment summary 3 Environment summary 4 Environment summary 5

Position of code generator in compiler

  • Basic blocks and flow graphs: Dividing the code into sections called blocks, consisting of: flow that can only enter the basic block through the first instruction, no jump in the middle, and control will leave the block without halting or branching execpt possibly as the last instruction. The basic block becomes a node in a flow graph.

  • Live variable, and next-use: A variable that lives after one basic block, the next-use tell us when it’s going to be used

  • Optimizing the code: Optimization is based on multiple things including: cost of instruction, eliminate local common subexpressions, eliminate dead code, reorder statements that do not depend on one another, use algebraic laws to reorder operands of three-address instructions and sometimes simplify the computation

  • DAG for basic block: The basic block itself can be represented by a DAG, having as parents the operators and as leaves the operands. This is used for simplifications and to represent array references too.

  • Managing register and address descriptors: registers are limited and so we need an algorithm, using a getReg() method to choose what to do with the registers. We need two structures, one to know what is currently in the registers, a register descriptor, and one to know where, in which addresses, the variables are currently found, an address descriptor.

  • A register spill: When there’s no place in the current register to store the operand of an instruction and that register value needs to be stored on its own memory location.

Code generation summary 1 Code generation summary 2 Code generation summary 3

  • Peephole optimization: Improving a known target code, a peephole, by replacing instruction sequences within it by a shorter or faster sequence. It usually consists of many passes. Examples: redundant-instruction elimination, flow-of-control optimizations, algebraic simplifications, use of machine idioms.

  • Data flow graph analysis: A way of drawing the flow of a program/blocks to optimize it. When iterative, it usually consists of parameters in a semi-lattice with a domain, direction (forward, backward), a transfer function which has results in the domain, a boundary (top and bottom), a meet operator ∧ (that follows ≤ properties), equations, and initialization. Such graph can be: reaching definitions, live variables, available expressions, constant propagation, partial redundancy, etc..

Reaching definition

  • Monotonicity: A function f on a partial order is monotonic if: if x ≤ y then f(x) ≤ f(y)

Data flow summary 1 Data flow summary 2 Data flow summary 3 Data flow summary 4 Data flow summary 5 Data flow summary 6 Data flow summary 7

  • MOP (meet-over-all-paths solution): Then the “best” possible solution to a dataflow problem for node n is given by computing the dataflow information for all possible paths from entry to n, and then combining them ø. in general there will be an infinite number of possible paths to n.

  • Very busy expressions: An expression e is very busy at point p if On every path from p, expression e is evaluated before the value of e is changed

  • Natural loop: Conditions: It must have a single-entry node, called the header. This entry node dominates all nodes in the loop. There must be a back edge that enters the loop header. Otherwise, it is not possible for the flow of control to return to the header directl from the “loop”.

ILP summary 1 ILP summary 2 ILP summary 3 ILP summary 4

  • Region based analysis: Instead of iterative, we start from a small scope, apply the transfer function, and wide the scope.

  • Hardware vs software ILP: Machine that let the software manage parallelism are called VLIW machines (Very Long instruction word), and those that use the hardware are called superscalar machines. See computer architecture article.

  • Array afine optimization: When you can express the indices of the array by an affine function, you can start applying types of optimization such as time based and space based optimization.

Basic matrix multiplication 1 Basic matrix multiplication 2

Array access with matrix vector 1 Array access with matrix vector 2

Hardware optimization summary 1 Hardware optimization summary 2 Hardware optimization summary 3 Hardware optimization summary 4

Further Reading


  • Internet Archive Book Images / No restrictions

September 07, 2020

Frederic Cambus (fcambus)

Playing with Kore JSON API September 07, 2020 03:15 PM

Kore 4.0.0 has been released a few days ago, and features a brand new JSON API allowing to easily parse and serialize JSON objects.

During the last couple of years, I have been using Kore for various projects, including exposing hardware sensor values over the network via very simple APIs. In this article, I would like to present a generalization of this concept and show how easy it is to expose system information with Kore.

This small API example allows to identify hosts over the network and has been tested on Linux, OpenBSD, NetBSD, and macOS (thanks Joris!).

After creating a new project:

kodev create identify

Populate src/identify.c with the following code snippet:

#include <sys/utsname.h>

#include <kore/kore.h>
#include <kore/http.h>

#if defined(__linux__)
#include <kore/seccomp.h>


int		page(struct http_request *);

page(struct http_request *req)
	char *answer;

	struct utsname u;

	struct kore_buf buf;
	struct kore_json_item *json;

	if (uname(&u) == -1) {
		http_response(req, HTTP_STATUS_INTERNAL_ERROR, NULL, 0);
		return (KORE_RESULT_OK);

	kore_buf_init(&buf, 1024);
	json = kore_json_create_object(NULL, NULL);

	kore_json_create_string(json, "system", u.sysname);
	kore_json_create_string(json, "hostname", u.nodename);
	kore_json_create_string(json, "release", u.release);
	kore_json_create_string(json, "version", u.version);
	kore_json_create_string(json, "machine", u.machine);

	kore_json_item_tobuf(json, &buf);

	answer = kore_buf_stringify(&buf, NULL);
	http_response(req, 200, answer, strlen(answer));


	return (KORE_RESULT_OK);

And finally launch the project:

kodev run

The kodev tool will build and run the project, and we can now query the API to identify hosts:

  "system": "OpenBSD",
  "hostname": "",
  "release": "6.8",
  "version": "GENERIC.MP#56",
  "machine": "amd64"

Wesley Moore (wezm)

Slowing Down Read Rust Posting September 07, 2020 12:00 AM

After nearly 3 years and more than 3200 posts I'm going to slow down the posting frequency on Read Rust. I hope this will free up some spare time and make it easier to take breaks from social media. I aim to share all of the #rust2021 posts I can find, but after that I'll probably only share posts that seem particularly noteworthy or interesting.

I started Read Rust in January 2018 to track the posts being shared as part of the inaugural call for blog posts. When I started there were only a handful of new posts each day to triage. Now there are many more and unless I triage and publish daily they quickly pile up.

Also, I've kind of built a reflex of trying to "complete the Internet" each day by ensuring that I read my whole Twitter feed, and new posts on /r/rust. I would like to break this habit and be able to take breaks from these things, without feeling like I might miss an important post.

Whilst I think there is value in the curation and archiving of posts on Read Rust, the website doesn't see a lot of use. I think most of the value for people is following the Twitter, Mastodon, and Facebook accounts. However, there's a fair amount of overlap between posts shared on /r/rust, @rustlang, and This Week in Rust. So, I think that if folks keep an eye on one or more of those they will still see most posts of note.

If you're not into social media, the full list of more than 450 Rust RSS feeds I subscribe to is available via an OPML file on the site. So, feel free to use that to subscribe to a bunch of feeds instead. Rust blogs OPML.

It's been fun to build, and rebuild the website and surrounding tooling over the years. Read Rust was initially just an RSS feed but after requests for an actual web-page I built a small site with the Cobalt static site compiler. In late 2019 in an effort to streamline the sharing of posts I rebuilt the site as dynamic web app. In early 2020 I added full test search.

As mentioned in the introduction, from here I plan to share #rust2021 posts and after that posting will be much less frequent. Thanks for reading, and happy coding 🦀.

Frequently Anticipated Questions

Q. What about getting others to help share posts?

I considered this, and it it was actually part of the motivation for the rebuild in 2019. However, ultimately Rust is now large enough and continuing to grow such that it's become less and less feasible to curate the entire firehose of Rust content.

Q. What about making it a sort of RSS powered Rust planet?

I think there's value in curation. Rust is popular enough now that there's a lot of low effort posts, or repetitious getting started posts. Also, people rightly have diverse interests and their blog may not solely contain Rust posts. So, I'd prefer to keep the archive in the focussed state it's in now.

Q. What will happen to the site and social media accounts now?

I plan to keep the site up and running indefinitely. I am a strong believer in not breaking links on the web, and I think I have a pretty decent track record. For example, this site has been online for 13 years and I still have redirects in place from the very first version of it. I may still share the occasional post but in general I hope to free up a bit of time to work on other things.

September 06, 2020

Derek Jones (derek-jones)

Impact of function size on number of reported faults September 06, 2020 09:55 PM

Are longer functions more likely to contain more coding mistakes than shorter functions?

Well, yes. Longer functions contain more code, and the more code developers write the more mistakes they are likely to make.

But wait, the evidence shows that most reported faults occur in short functions.

This is true, at least in Java. It is also true that most of a Java program’s code appears in short methods (in C 50% of the code is contained in functions containing 114 or fewer lines, while in Java 50% of code is contained in methods containing 4 or fewer lines). It is to be expected that most reported faults appear in short functions. The plot below shows, left: the percentage of code contained in functions/methods containing a given number of lines, and right: the cumulative percentage of lines contained in functions/methods containing less than a given number of lines (code+data):

left: the percentage of code contained in functions/methods containing a given number of lines, and right: the cumulative percentage of lines contained in functions/methods containing less than a given number of lines.

Does percentage of program source really explain all those reported faults in short methods/functions? Or are shorter functions more likely to contain more coding mistakes per line of code, than longer functions?

Reported faults per line of code is often referred to as: defect density.

If defect density was independent of function length, the plot of reported faults against function length (in lines of code) would be horizontal; red line below. If every function contained the same number of reported faults, the plotted line would have the form of the blue line below.

Number of reported faults in C++ classes (not methods) containing a given number of lines.

Two things need to occur for a fault to be experienced. A mistake has to appear in the code, and the code has to be executed with the ‘right’ input values.

Code that is never executed will never result in any fault reports.

In a function containing 100 lines of executable source code, say, 30 lines are rarely executed, they will not contribute as much to the final total number of reported faults as the other 70 lines.

How does the average percentage of executed LOC, in a function, vary with its length? I have been rummaging around looking for data to help answer this question, but so far without any luck (the llvm code coverage report is over all tests, rather than per test case). Pointers to such data very welcome.

Statement execution is controlled by if-statements, and around 17% of C source statements are if-statements. For functions containing between 1 and 10 executable statements, the percentage that don’t contain an if-statement is expected to be, respectively: 83, 69, 57, 47, 39, 33, 27, 23, 19, 16. Statements contained in shorter functions are more likely to be executed, providing more opportunities for any mistakes they contain to be triggered, generating a fault experience.

Longer functions contain more dependencies between the statements within the body, than shorter functions (I don’t have any data showing how much more). Dependencies create opportunities for making mistakes (there is data showing dependencies between files and classes is a source of mistakes).

The previous analysis makes a large assumption, that the mistake generating a fault experience is contained in one function. This is true for 70% of reported faults (in AspectJ).

What is the distribution of reported faults against function/method size? I don’t have this data (pointers to such data very welcome).

The plot below shows number of reported faults in C++ classes (not methods) containing a given number of lines (from a paper by Koru, Eman and Mathew; code+data):

Number of reported faults in C++ classes (not methods) containing a given number of lines.

It’s tempting to think that those three curved lines are each classes containing the same number of methods.

What is the conclusion? There is one good reason why shorter functions should have more reported faults, and another good’ish reason why longer functions should have more reported faults. Perhaps length is not important. We need more data before an answer is possible.

Ponylang (SeanTAllen)

Last Week in Pony - September 6, 2020 September 06, 2020 07:19 PM

We have a new RFC for added syntax to extend automatic receiver recovery. The shared-docker shellcheck image is being deprecated.

Gonçalo Valério (dethos)

Giving a new life to old phones September 06, 2020 12:18 PM

Nowadays, in some “developed” countries, it is very common for people to have a bunch of old phones stored somewhere in a drawer. Ten years have passed since smartphones became ubiquitous and those devices tend to become unusable very quickly, at least for their primary purpose. Either a small component breaks, the vendor stops providing updates, newer apps don’t support those older versions, etc.

The thing is, these phones are still powerful computers. It would be great if we could give them another life once they are no longer fit for regular day to day use or the owner just wants to try a shiny new device.

I never had many smartphones, mines tend to last many years, but I still have one or two lying around. Recently I started thinking of new uses for them, make them work instead of just gathering dust. A quick search on the internet tells me that many people already had the same idea (I’m quite late to the party) and have been working on cool things to do with these devices.

However, most of these articles just throw the idea at you, without telling you how to do it. Others assume that your device is relatively recent.

Of course the difficulty increases with the age of the phone, in my case the software that I will be able to run on a 10 year old Samsung Galaxy S will not be as easy to find as the software that I can run on another device with just one or two years.

Bellow is a list posts I found online with cool things you can do with your old phones. What sets this list apart from other results is that all the items aren’t just ideas, they contain step by step instructions of how to achieve the end result.

You don’t have to follow the provided instructions rigorously and you should introduce some variations that are more appropriate to your use case.

Have fun and reuse your old devices.

September 05, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Primitive unary functions September 05, 2020 09:00 PM


Welcome back to the “Compiling a Lisp” series. Last time, we finished adding the rest of the constants as tagged pointer immediates. Since it’s not very useful to have only values (no way to operate on them), we’re going to add some primitive unary functions.

“Primitive” means here that they are built into the compiler, so we won’t actually compile the call to an assembly procedure call. This is also called a compiler intrinsic. “Unary” means the functions will take only one argument. “Function” is a bit of a misnomer because these functions won’t be real values that you can pass around as variables. You’ll only be able to use them as literal names in calls.

Though we’re still not adding a reader/parser, we can imagine the syntax for this looks like the following:

(integer? (integer->char (add1 96)))

Today we also tackle nested function calls and subexpressions.

Adding function calls will require adding a new compiler datastructure, an addition to the AST, but not to the compiled code. The compiled code will still only know about the immediate types.

Ghuloum proposes we add the following functions:

  • add1, which takes an integer and adds 1 to it
  • sub1, which takes an integer and subtracts 1 from it
  • integer->char, which takes an integer and converts it into a character (like chr in Python)
  • char->integer, which takes a character and converts it into an integer (like ord in Python)
  • null?, which takes an object and returns true if it is nil and false otherwise
  • zero?, which takes an object and returns true if it is 0 and false otherwise
  • not, which takes an object and returns true if it is false and false otherwise
  • integer?, which takes an object and returns true if it is an integer and false otherwise
  • bool?, which takes an object and returns true if it is a boolean and false otherwise

The functions add1, sub1, and the char/integer conversion functions will be our first real experience dealing with object encoding in the compiled code. What fun!

The implementations for null?, zero?, not, integer?, and bool? are so similar that I am only going to reproduce one or two in this post. The rest will be visible at assets/code/lisp/compiling-unary.c.

In order to implement these functions, we’ll also need some more instructions than mov and ret. Today we’ll add:

  • add
  • sub
  • shl
  • shr
  • or
  • and
  • cmp
  • setcc

Because the implementations of shl, shr, or, and and are so straightforward — just like mov, really — I’ll also omit them from the post. The implementations of add, sub, cmp, and setcc are more interesting.

The fundamental data structure of Lisp

Pairs, also called cons cells, two-tuples, and probably other things too, are the fundamental data structure of Lisp. At least the original Lisp. Nowadays we have fancy structures like vectors, too.

Pairs are a container for precisely two other objects. I’ll call them car and cdr for historical1 and consistency reasons, but you can call them whatever you like. Regardless of name, they could be represented as a C struct like this:

typedef struct Pair {
  ASTNode *car;
  ASTNode *cdr;
} Pair;

This is useful for holding pairs of objects (think coordinates, complex numbers, …) but it is also incredibly useful for making linked lists. Linked lists in Lisp are comprised of a car holding an object and the cdr holding another list. Eventually the last cdr holds nil, signifying the end of the list. Take a look at this handy diagram.

Fig. 1 - Cons cell list, courtesy of Wikipedia.

This represents the list (list 42 69 613), which can also be denoted (cons 42 (cons 69 (cons 613 nil))).

We’ll use these lists to represent the syntax trees for Lisp, so we’ll need to implement pairs to compile list programs.

Implementing pairs

In previous posts we implemented the immediate types the same way in the compiler and in the compiled code. I originally wrote this post doing the same thing: manually laying out object offsets myself, reading and writing from objects manually. The motivation was to get you familiar with the memory layout in the compiled code, but ultimately it ended up being too much content too fast. We’ll get to memory layouts when we start allocating pairs in the compiled code.

In the compiler we’re going to use C structs instead of manual memory layout. This makes the code a little bit easier to read. We’ll still tag the pointers, though.

const unsigned int kPairTag = 0x1;        // 0b001
const uword kHeapTagMask = ((uword)0x7);  // 0b000...0111
const uword kHeapPtrMask = ~kHeapTagMask; // 0b1111...1000

This adds the pair tag and some masks. As we noted in the previous posts, the heap object tags are all in the lowest three bits of the pointer. We can mask those out using this handy utility function.

uword Object_address(void *obj) { return (uword)obj & kHeapPtrMask; }

We’ll need to use this whenever we want to actually access a struct member. Speaking of struct members, here’s the definition of Pair:

typedef struct Pair {
  ASTNode *car;
  ASTNode *cdr;
} Pair;

And here are some functions for allocating and manipulating the Pair struct, to keep the implementation details hidden:

ASTNode *AST_heap_alloc(unsigned char tag, uword size) {
  // Initialize to 0
  uword address = (uword)calloc(size, 1);
  return (ASTNode *)(address | tag);

void AST_pair_set_car(ASTNode *node, ASTNode *car);
void AST_pair_set_cdr(ASTNode *node, ASTNode *cdr);

ASTNode *AST_new_pair(ASTNode *car, ASTNode *cdr) {
  ASTNode *node = AST_heap_alloc(kPairTag, sizeof(Pair));
  AST_pair_set_car(node, car);
  AST_pair_set_cdr(node, cdr);
  return node;

bool AST_is_pair(ASTNode *node) {
  return ((uword)node & kHeapTagMask) == kPairTag;

Pair *AST_as_pair(ASTNode *node) {
  return (Pair *)Object_address(node);

ASTNode *AST_pair_car(ASTNode *node) { return AST_as_pair(node)->car; }

void AST_pair_set_car(ASTNode *node, ASTNode *car) {
  AST_as_pair(node)->car = car;

ASTNode *AST_pair_cdr(ASTNode *node) { return AST_as_pair(node)->cdr; }

void AST_pair_set_cdr(ASTNode *node, ASTNode *cdr) {
  AST_as_pair(node)->cdr = cdr;

There a couple important things to note.

First, AST_heap_alloc very intentionally zeroes out the memory it allocates. If the members were left uninitialized, it might be possible to read off a struct member that had an invalid pointer in car or cdr. If we zero-initialize it, the member pointers represent the object 0 by default. Nothing will crash.

Second, we keep moving our ASTNode pointers through AST_as_pair. This function has two purposes: catch invalid uses (via the assert that the object is indeed a Pair) and also mask out the lower bits. Otherwise we’d have to do the masking in every operation individually.

Third, I abstracted out the AST_heap_alloc so we don’t expose the calloc function everywhere. This allows us to later swap out the allocator for something more intelligent, like a bump allocator, an arena allocator, etc.

And since memory allocated must eventually be freed, there’s a freeing function too:

void AST_heap_free(ASTNode *node) {
  if (!AST_is_heap_object(node)) {
  if (AST_is_pair(node)) {
  free((void *)Object_address(node));

This assumes that each ASTNode* owns the references to all of its members. So don’t borrow references to share between objects. If you need to store a reference to an object, make sure you own it. Otherwise you’ll get a double free. In practice this shouldn’t bite us too much because each program is one big tree.

Implementing symbols

We also need symbols! I mean, we could try mapping all the functions we need to integers, but that wouldn’t be very fun. Who wants to try and debug a program crashing on function#67? Not me. So let’s add a datatype that can represent names of things.

As above, we’ll need to tag the pointers.

const unsigned int kSymbolTag = 0x5;      // 0b101

And then our struct definition.

typedef struct Symbol {
  word length;
  char cstr[];
} Symbol;

I’ve chosen this variable-length object representation because it’s similar to how we’re going to allocate symbols in assembly and the mechanism in C isn’t so gnarly. This struct indicates that the memory layout of a Symbol is a length field immediately followed by that number of bytes in memory. Note that having this variable array in a struct is a C99 feature.

If you don’t have C99 or don’t like this implementation, that’s fine. Just store a char* and allocate another object for that string.

You could also opt to not store the length at all and instead NUL-terminate it. This has the advantage of not dealing with variable-length arrays (it’s just a tagged char*) but has the disadvantage of an O(n) length lookup.

Now we can add our Symbol allocator:

Symbol *AST_as_symbol(ASTNode *node);

ASTNode *AST_new_symbol(const char *str) {
  word data_length = strlen(str) + 1; // for NUL
  ASTNode *node = AST_heap_alloc(kSymbolTag, sizeof(Symbol) + data_length);
  Symbol *s = AST_as_symbol(node);
  s->length = data_length;
  memcpy(s->cstr, str, data_length);
  return node;

See how we have to manually specify the size we want. It’s a little fussy, but it works.

Storing the NUL byte or not is up to you. It saves one byte per string if you don’t, but it makes printing out strings in the debugger a bit of a pain since you can’t just treat them like normal C strings.

Some Lisp implementations use a symbol table to ensure that symbols allocated with equivalent C-string values return the same pointer. This allows the implementations to test for symbol equality by testing pointer equality. I think we can sacrifice a bit of memory and runtime speed for implementation simplicity, so I’m not going to do that.

Let’s add the rest of the utility functions:

bool AST_is_symbol(ASTNode *node) {
  return ((uword)node & kHeapTagMask) == kSymbolTag;

Symbol *AST_as_symbol(ASTNode *node) {
  return (Symbol *)Object_address(node);

const char *AST_symbol_cstr(ASTNode *node) {
  return (const char *)AST_as_symbol(node)->cstr;

bool AST_symbol_matches(ASTNode *node, const char *cstr) {
  return strcmp(AST_symbol_cstr(node), cstr) == 0;

Now we can represent names.

Representing function calls

We’re going to represent function calls as lists. That means that the following program:

(add1 5)

can be represented by the following C program:

Pair *args = AST_new_pair(AST_new_integer(5), AST_nil());
Pair *program = AST_new_pair(AST_new_symbol("add1"), args);

This is a little wordy. We can make some utilities to trim the length down.

ASTNode *list1(ASTNode *item0) {
  return AST_new_pair(item0, AST_nil());

ASTNode *list2(ASTNode *item0, ASTNode *item1) {
  return AST_new_pair(item0, list1(item1));

ASTNode *new_unary_call(const char *name, ASTNode *arg) {
  return list2(AST_new_symbol(name), arg);

And now we can represent the program as:

list2(AST_new_symbol("add1"), AST_new_integer(5));
// or, shorter,
new_unary_call("add1", AST_new_integer(5));

This is great news because we’ll be adding many tests today.

Compiling primitive unary function calls

Whew. We’ve built up all these data structures and tagged pointers and whatnot but haven’t actually done anything with them yet. Let’s get to the compilers part of the compilers series, please!

First, we have to revisit Compile_expr and add another case. If we see a pair in an expression, then that indicates a call.

int Compile_expr(Buffer *buf, ASTNode *node) {
  // Tests for the immediates ...
  if (AST_is_pair(node)) {
    return Compile_call(buf, AST_pair_car(node), AST_pair_cdr(node));
  assert(0 && "unexpected node type");

I took the liberty of separating out the callable and the args so that the Compile_call function has less to deal with.

We’re only supporting primitive unary function calls today, which means that we have a very limited pattern of what is accepted by the compiler. (add1 5) is ok. (add1 (add1 5)) is ok. (blargle 5) is not, because the blargle isn’t on the list above. ((foo) 1) is not, because the thing being called is not a symbol.

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args) {
  assert(AST_pair_cdr(args) == AST_nil() &&
         "only unary function calls supported");
  if (AST_is_symbol(callable)) {
    // Switch on the different primitives here...
  assert(0 && "unexpected call type");

Compile_call should look at what symbol it is, and depending on which symbol it is, emit different code. The overall pattern will look like this, though:

  • Compile the argument — the result is stored in rax
  • Do something to rax

Let’s start with add1 since it’s the most straightforward.

    if (AST_symbol_matches(callable, "add1")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_add_reg_imm32(buf, kRax, Object_encode_integer(1));
      return 0;

If we see add1, compile the argument (as above). Then, add 1 to rax. Note that we’re not just adding the literal 1, though. We’re adding the object representation of 1, ie 1 << 2. Think about why! When you have an idea, click the footnote.2

If you’re wondering what the underscore (_) function is, it’s a macro that I made to test the return value of the compile expression and return if there was an error. We don’t have any non-aborting error cases just yet, but I got tired of writing if (result != 0) return result; over and over again.

Note that there is no runtime error checking. Our compiler will allow (add1 nil) to slip through and mangle the pointer. This isn’t ideal, but we don’t have the facilities for error reporting just yet.

sub1 is similar to add1, except it uses the sub instruction. You could also just use add with the immediate representation of -1.

integer->char is different. We have to change the tag of the object. In order to do that, we shift the integer left and then drop the character tag onto it. This is made simple by integers having a 0b00 tag (nothing to mask out).

Here’s a small diagram showing the transitions when converting 97 to 'a':

High                                                           Low
0000000000000000000000000000000000000000000000000000000[1100001]00  Integer
0000000000000000000000000000000000000000000000000[1100001]00000000  Shifted
0000000000000000000000000000000000000000000000000[1100001]00001111  Character

where the number in enclosed in [brackets] is 97. And here’s the code to emit assembly that does just that:

    if (AST_symbol_matches(callable, "integer->char")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_shl_reg_imm8(buf, kRax, kCharShift - kIntegerShift);
      Emit_or_reg_imm8(buf, kRax, kCharTag);
      return 0;

Note that we’re not shifting left by the full amount. We’re only shifting by the difference, since integers are already two bits shifted.

char->integer is similar, except it’s just a shr. Once the value is shifted right, the char tag gets dropped off the end, so there’s no need to mask it out.

nil? is our first primitive with ~ exciting assembly instructions ~. We get to use cmp and setcc. The basic idea is:

  • Compare (this means do a subtraction) what’s in rax and nil
  • Set rax to 0
  • If they’re equal (this means the result was 0), set al to 1
  • Shift left and tag it with the bool tag

al is the name for the lower 8 bits of rax. There’s also ah (for the next 8 bits, but not the highest bits), cl/ch, etc.

    if (AST_symbol_matches(callable, "nil?")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_cmp_reg_imm32(buf, kRax, Object_nil());
      Emit_mov_reg_imm32(buf, kRax, 0);
      Emit_setcc_imm8(buf, kEqual, kAl);
      Emit_shl_reg_imm8(buf, kRax, kBoolShift);
      Emit_or_reg_imm8(buf, kRax, kBoolTag);
      return 0;

The cmp leaves a bit set (ZF) in the flags register, which setcc then checks. setcc, by the way, is the name for the group of instructions that set a register if some condition happened. It took me a long time to realize that since people normally write sete or setnz or something. And cc means “condition code”.

If you want to simplify your life — we’re going to do a lot of comparisons today – we can extract that into a function that compares rax with some immediate value, and then refactor Compile_call to call that.

void Compile_compare_imm32(Buffer *buf, int32_t value) {
  Emit_cmp_reg_imm32(buf, kRax, value);
  Emit_mov_reg_imm32(buf, kRax, 0);
  Emit_setcc_imm8(buf, kEqual, kAl);
  Emit_shl_reg_imm8(buf, kRax, kBoolShift);
  Emit_or_reg_imm8(buf, kRax, kBoolTag);

Let’s also poke at the implementations of cmp and setcc, since they involve some fun instruction encoding.

cmp, as it turns out, has a short-path when the register it’s comparing against is rax. This means we get to save one (1) whole byte if we want to!

void Emit_cmp_reg_imm32(Buffer *buf, Register left, int32_t right) {
  Buffer_write8(buf, kRexPrefix);
  if (left == kRax) {
    // Optimization: cmp rax, {imm32} can either be encoded as 3d {imm32} or 81
    // f8 {imm32}.
    Buffer_write8(buf, 0x3d);
  } else {
    Buffer_write8(buf, 0x81);
    Buffer_write8(buf, 0xf8 + left);
  Buffer_write32(buf, right);

If you don’t want to, just use the 81 f8+ pattern.

For setcc, we have to define this new notion of “partial registers” so that we can encode the instruction properly. We can’t re-use Register because there are two partial registers for rax. So we add a PartialRegister.

typedef enum {
  kAl = 0,
} PartialRegister;

And then we can use those in the setcc implementation:

void Emit_setcc_imm8(Buffer *buf, Condition cond, PartialRegister dst) {
  Buffer_write8(buf, 0x0f);
  Buffer_write8(buf, 0x90 + cond);
  Buffer_write8(buf, 0xc0 + dst);

Again, I didn’t come up with this encoding. This is Intel’s design.

The zero? primitive is much the same as nil?, and we can re-use that Compile_compare_imm32 function.

    if (AST_symbol_matches(callable, "zero?")) {
      _(Compile_expr(buf, operand1(args)));
      Compile_compare_imm32(buf, Object_encode_integer(0));
      return 0;

not is more of the same — compare against false.

Now we get to integer?. This is similar, but different enough that I’ll reproduce the implementation below. Instead of comparing the whole number in rax, we only want to look at the lowest 2 bits. This can be accomplished by masking out the other bits, and then doing the comparison. For that, we emit an and first and compare against the tag.

    if (AST_symbol_matches(callable, "integer?")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_and_reg_imm8(buf, kRax, kIntegerTagMask);
      Compile_compare_imm32(buf, kIntegerTag);
      return 0;

It’s possible to shorten the implementation a little bit because and sets the zero flag. This means we can skip the cmp. But it’s only one instruction and I’m lazy so I’m reusing the existing infrastructure.

Last, boolean? is almost the same as integer?.

Boom! Compilers! Let’s check our work.


I’ll only include a couple tests here, since the new tests are a total of 283 lines added and are a little bit repetitive.

First, the simplest test for add1.

TEST compile_unary_add1(Buffer *buf) {
  ASTNode *node = new_unary_call("add1", AST_new_integer(123));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov rax, imm(123); add rax, imm(1); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0xec, 0x01, 0x00, 0x00,
                     0x48, 0x05, 0x04, 0x00, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(buf, expected);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_integer(124));

Second, a test of nested expressions:

TEST compile_unary_add1_nested(Buffer *buf) {
  ASTNode *node = new_unary_call(
      "add1", new_unary_call("add1", AST_new_integer(123)));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov rax, imm(123); add rax, imm(1); add rax, imm(1); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0xec, 0x01, 0x00, 0x00,
                     0x48, 0x05, 0x04, 0x00, 0x00, 0x00, 0x48,
                     0x05, 0x04, 0x00, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(buf, expected);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_integer(125));

Third, the test for boolean?.

TEST compile_unary_booleanp_with_non_boolean_returns_false(Buffer *buf) {
  ASTNode *node = new_unary_call("boolean?", AST_new_integer(5));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // 0:  48 c7 c0 14 00 00 00    mov    rax,0x14
  // 7:  48 83 e0 3f             and    rax,0x3f
  // b:  48 3d 1f 00 00 00       cmp    rax,0x0000001f
  // 11: 48 c7 c0 00 00 00 00    mov    rax,0x0
  // 18: 0f 94 c0                sete   al
  // 1b: 48 c1 e0 07             shl    rax,0x7
  // 1f: 48 83 c8 1f             or     rax,0x1f
  byte expected[] = {0x48, 0xc7, 0xc0, 0x14, 0x00, 0x00, 0x00, 0x48, 0x83,
                     0xe0, 0x3f, 0x48, 0x3d, 0x1f, 0x00, 0x00, 0x00, 0x48,
                     0xc7, 0xc0, 0x00, 0x00, 0x00, 0x00, 0x0f, 0x94, 0xc0,
                     0x48, 0xc1, 0xe0, 0x07, 0x48, 0x83, 0xc8, 0x1f};
  EXPECT_EQUALS_BYTES(buf, expected);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_false());

I’m getting the fancy disassembly from I include it because it makes the tests easier for me to read and reason about later. You just have to make sure the text and the binary representations in the test don’t go out of sync because that can be very confusing…

Anyway, that’s a wrap for today. Send your comments on the elist! Next time, binary primitives.

Mini Table of Contents

  1. There’s a long-running dispute about what to call these two objects. The original Lisp machine (the IBM 704) had a particular hardware layout that led to the creation of the names car and cdr. Nobody uses this hardware anymore, so the names are historical. Some people call them first/fst and second/snd. Others call them head/hd and tail/tl. Some people have other ideas

  2. If you said “to preserve the tag” or “adding 1 would make it a pair” or some variant on that, you’re correct! Otherwise, I recommend going back to the diagram from the last couple of posts and then writing down binary representations of a couple of numbers by hand on a piece of paper. 

September 04, 2020

Kevin Burke (kb)

Building a better home network September 04, 2020 09:28 PM

I finally got my home network in a place where I am happy with it. I wanted to share my setup and what I learned about it. There has never been a better time to set up a great home network; there are several new tools that have made this easier and better than in the past. Hopefully this will help you set up your home network!

My house

My house is two stories on a standard 25 x 100 square foot San Francisco lot. The ground floor looks roughly like this:

|               |                      |
|               |         |   Office   |
|    Garage     | Mudroom |            |
|               |         |-------------
|                           | | | | | |

Upstairs looks like this:

|    ___________                       |
|               |        Living Room   |
|    Bedroom    | Kitchen              |
|               |         -------------
|               |           | | | | | |

We have a Roku in the living room. My goals for home internet were:

  • Good wireless connection in every room
  • Ethernet connections in the office
  • Ethernet connection to the Roku
  • Synology network attached storage (NAS) and other external hard drives reachable from anywhere in the house

We are lucky to have Sonic Fiber internet service. Sonic comes in to a box in the garage, and an Ethernet line runs from there to the mudroom. None of the other rooms have Ethernet connections.

Initial setup

Sonic really wants to push Eero routers to everyone.1 Eero is fairly easy to set up, and Sonic collects a small fee from renting the router to you. You can extend your home network by adding more Eero's into a mesh network.

If you have a small apartment, an Eero is probably going to be a good fit. However, the mesh network was not great for achieving any of the goals I had in mind. The repeaters (Eero beacon) do not have any Ethernet out ports. It was difficult to extend the network from the mudroom to the bedroom without renting two extenders, which added about $100 per year, increased latency and lowered speeds. Further, clients on the network kept connecting to an Eero that was further away, instead of the closest one.


(NB: please don't stop reading here as I don't recommend this.) My next step was to replace the Eero's with a traditional Netgear wireless router in the mudroom. This also could not reach to the bedroom. So I bought a powerline adapter and plugged one end in near the router and the other end in the bedroom.

Powerline adapters send signal via electric current in your house. They don't offer great speeds. Devices on your network that use a lot of electricity, like laundry machines or the microwave, can render the powerline connection unusable.

There are probably better solutions for you than powerline adapters in 2020.

Extending Ethernet to more rooms

I called a cabling company about the possibility of running Ethernet to more rooms in the house. We decided the bedroom would be very easy since it's directly above the garage. It took a team of two two hours to drill a hole in the garage, run a cable up the side of the house to the bedroom, and install an Ethernet port in the bedroom. This cost about $200.

We looked at running Ethernet to other rooms but the geography of the stairs made this really tricky.

Side note: future proofing cabling

Our house has coax cables - the traditional method of getting e.g. cable TV service - running from the garage to four rooms in the house, but it doesn't have Ethernet set up. This is disappointing since it was built within the last decade.

There are two things you can do to future proof cable runs in your house, and ensure that cables can be replaced/swapped out if mice eat them or whatever. I highly recommend you implement them any time you are running cable. One is to leave a pull cable in the wall next to whatever cable you are installing. If you need to run a new cable, you can attach it to the pull cable, and then pull it all the way through from one end to the other.

Normally cables will be stapled to the wall interior, which makes them impossible to pull through. The other option is to leave cables unstapled. This will let you use the coax/other cable directly as the pull cable. In general though it's better to just leave a second pull line in the wall behind the port.

Without either of these solutions in place, running new cables is going to be messy. You can either try to hide it by running it along the exterior walls or ceiling of your house, or drill holes in the wall every few feet, pass a new cable through, and then patch up the holes.

Side note: cat 5 vs. cat 6

Your internet speed will be bottlenecked by the slowest link in the network. Be careful it isn't your cables!

There are two flavors of Ethernet cable. Category 5 is cheaper, but can only support speeds of 100 Mbps. Category 6 is slightly more expensive but you will need it to get full gigabit speeds.

The Ethernet cables that come with the products you buy may be Cat 5 or 6. Be careful to check which one you are using (it should be written in small print on the outside of the cable).


To load, your computer looks up the IP address for Google and sends packets to it. So far so good, but how does Google send packets back? Each client on the network needs a unique local IP address. The router will translate between an open port to Google, say, port 44982, and a local IP address, say,, and send packets it receives from the broader Internet on port 44982 to the client with that IP address.

What happens if two clients on your network try to claim the same local IP address? That would be bad. Generally you set up a DHCP server to figure this out. When your phone connects to a wifi network it sends out a packet that says basically "I need an IP address." If a DHCP server is listening anywhere on the network it will find an empty IP address slot and send it back to the phone.2 The phone can then use that IP address.

Generally speaking, a consumer wireless router has three components:

  • wireless radios, that broadcast a network SSID and send packets to and from wireless clients.
  • an Ethernet switch that can split an incoming Internet connection into four or more out ports. Generally this has one WAN port (that connects to your modem/ISP) and four LAN ports (that connect to local devices on your network)
  • a DHCP server.

You can buy products that offer each of these independently - a four way switch without a radio or DHCP server will cost you about $15. But this is a convenient bundle for home networks.

If your network contains multiple switches or multiple routers you need to think about which of these devices will be giving out DHCP.

Two Routers, Too Furious

At this point my network had one router in the bedroom and one router upstairs in the living room, via an ungainly cable up the stairs. So I had good coverage in every room, and the Roku hooked up via Ethernet to the living room router, but this setup still had a few problems. I didn't have the office wired up, and the NAS only worked when you were connected to the living room router.

Furthermore, I kept running into issues where I would walk from the living room to the bedroom or vice versa but my phone/laptop would stay connected to the router in the room I was just in. Because that router was outside its normal "range", I would get more latency and dropped packets than usual, which was frustrating.

How to diagnose and measure this problem

On your laptop, hold down Option when you click the wifi button, and you'll get an extended menu that looks like this.

The key value there is the RSSI parameter, which measures the signal quality from your client to the router. This is currently at -46, a quite good value. Lower than -65 and your connection quality will start to get dicey - you will see lower bandwidth and higher latency and dropped packets.

Apple devices will hang on to the router they are currently connected to until the RSSI gets to -75 or worse, which is a very low value. This is explained in gory detail on this page. Because router coverage areas are supposed to overlap a little bit, this means the connection will have to get very bad before your phone or laptop will start looking for a new radio.

Adjust the power

Generally this means that you don't want the coverage area for the router to reach to the center of the coverage area for the other router, if you can help it. If the coverage areas don't overlap that much, clients will roam to the closest router, which will improve the connection.

You can adjust the coverage area either by physically moving the router or by lowering the power for the radios (which you may be able to do in the admin panel for the router).

If neither of these works, as a last ditch attempt you can give your routers different network names. But this makes it more difficult to keep a connection when you roam from one router to the other.

Ethernet Over... Coax?

I had not managed to get a fixed connection to the office, which would have required snaking a Ethernet cable over at least two doorways and three walls. However, I heard recently about a new technology called MoCA (multimedia over coax), which makes it possible to send an Ethernet signal over the coax line from the garage to the office. I bought a MoCA adapter for each end of the connection - about $160 in total - and wired it up and... it worked like a charm!

Moca ethernet over coax connector in

The latency is slightly higher than traditional Ethernet, but only by a few milliseconds, and the bandwidth is not as high as a normal wired connection but it's fine - I am still glad to be able to avoid a wireless connection in that room.

This change let me move my NAS into the office as well, which I'm quite happy about.

Letting Everything Talk to Each Other

At this point I had a $15 unmanaged switch in the garage that received a connection from the Sonic Fiber router, and sent it to three places - the bedroom, the living room and my office. However, the fact that it was unmanaged meant that each location requested a public IP address and DHCP from Sonic. Sonic was not happy with this arrangement - there is a limit of 8 devices per account that are stored in a table mapping a MAC address to an IP address, and after this you need to call in to have the table cleared out. This design also meant that the clients on my network couldn't talk to each other - I couldn't access the NAS unless I was connected to the living room router.

The solution was to upgrade to a "managed" switch in the garage that could give out DHCP. You can buy one that is essentially a wifi router without the radio for about $60. This has the same dashboard interface as your router does and can give out DHCP.

Once this switch was in place, I needed to update the routers to stop giving out DHCP (or put them in "pass through mode") so only a single device on the network was assigning IP addresses. I watched the routers and NAS connect, then assigned static IP's on the local network to each one. It's important to do this before you set them in pass-through mode so you can still access them and tweak their settings.

You should be able to find instructions on pass-through mode or "disable DHCP" for your router online. You may need to change the IP address for the router to match the static IP you gave out in the previous paragraph.

That's it

I finally have a network that supported everything I want to do with it! I can never move now.

Garage router setup

I hope this post was helpful. I think the most important thing to realize is that if you haven't done this in a few years, or your only experience is with consumer grade routers, there are other tools/products you can buy to make your network better.

If you are interested in this space, or interested in improving your office network along these lines, I'm working with a company that is making this drop dead easy to accomplish. Get in touch!

1. I posted on the forums to get help several times. Dane Jasper, the Sonic CEO who's active on the forums, responded to most of my questions with "you should just use Eero." I love that he is on the forums but Eero is just not great for what I'm trying to do.

2. I'm simplifying - there are two roundtrips, not one - but the details are really not that important.

Jeremy Morgan (JeremyMorgan)

Optimizing String Comparisons in Go September 04, 2020 07:07 PM

Want your Go programs to run faster? Optimizing string comparisons in Go can improve your application’s response time and help scalability. Comparing two strings to see if they’re equal takes processing power, but not all comparisons are the same. In a previous article, we looked at How to compare strings in Go and did some benchmarking. We’re going to expand on that here. It may seem like a small thing, but as all great optimizers know, it’s the little things that add up.

September 02, 2020

Eric Faehnrich (faehnrich)

Booting a 486 From Floppy with the Most Up-to-Date Stable Linux Kernel September 02, 2020 06:04 PM

pretty cool simple writeup of a floppy to boot modern linux on a 486

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Booleans, characters, nil September 02, 2020 07:45 AM


Welcome back to the “Compiling a Lisp” series. Last time, we compiled integer literals. In today’s relatively short post, we’ll add the rest of the immediate types. Our programs will look like this:

  • 'a'
  • true
  • false
  • nil or ()

In addition, since we’re not adding too much exciting stuff today, I made writing tests a little bit easier by adding fixtures. Now, if we want, we can get a pre-made Buffer object passed into the test, and then have it destroyed afterward.


Since we’re coming back to the pointer tagging scheme, I’ve reproduced the “pointer templates” (I don’t think that’s a real term) diagram from the last post below.

High                                                         Low
0000000000000000000000000000000000000000000000000XXXXXXX00001111  Character
00000000000000000000000000000000000000000000000000000000X0011111  Boolean
0000000000000000000000000000000000000000000000000000000000101111  Nil

Notice that we have a pattern among the other immediates (character, boolean, and nil) – the lower four bits are all the same, and that sets them apart from other pointer types.

Also notice that among those immediates, they can be discriminated by the two bits just above those four:

High                                                             Low
0000000000000000000000000000000000000000000000000XXXXXXX00[00][1111]  Character
00000000000000000000000000000000000000000000000000000000X0[01][1111]  Boolean
0000000000000000000000000000000000000000000000000000000000[10][1111]  Nil

So a lower four bits of 0b1111 means immediate, and from there 0b00 means character, 0b01 means boolean, and 0b10 means nil. There’s even room to add another immediate tag pattern (0b11) if we like.

Let’s add some of the symbolic constants for bit manipulation.

const unsigned int kImmediateTagMask = 0x3f;

const unsigned int kCharTag = 0xf;   // 0b00001111
const unsigned int kCharMask = 0xff; // 0b11111111
const unsigned int kCharShift = 8;

const unsigned int kBoolTag = 0x1f;  // 0b0011111
const unsigned int kBoolMask = 0x80; // 0b10000000
const unsigned int kBoolShift = 7;

Notice that we don’t have any for nil. That’s because nil is a singleton and has no payload at all. It’s just a solitary 0x2f.

For the others, we need to put the payload alongside the tag, and that requires a shift and a bitwise or. The first operation, the shift, moves the payload left enough that there’s space for a tag, and the or adds the tag.

word Object_encode_char(char value) {
  return ((word)value << kCharShift) | kCharTag;

char Object_decode_char(word value) {
  return (value >> kCharShift) & kCharMask;

word Object_encode_bool(bool value) {
  return ((word)value << kBoolShift) | kBoolTag;

bool Object_decode_bool(word value) { return value & kBoolMask; }

word Object_true() { return Object_encode_bool(true); }

word Object_false() { return Object_encode_bool(false); }

word Object_nil() { return 0x2f; }

For bool, we’ve done a little trick. Since we only care if the value is true or false, instead of doing both a shift and mask to decode, we can turn off the tag bits. The resulting value will be either 0b00000000 for false or 0b10000000 for true. Since any non-zero value is truthy in C, we can “cast” that to a C bool by just returning it.

Note that the cast from char and bool to word is necessary because — as I learned the hard way, several months ago — shifting a type left more to the left than the size has bits is either undefined or implementation-defined behavior. I can’t remember which offhand but the situation went sideways and left me scratching my head.

I added Object_true and Object_false because I thought they might come in handy at some point, but we don’t have a use for them now. If you are strongly against including dead weight code, then feel free to omit them.

Now let’s add some more AST utility functions before we move on to compiling:

bool AST_is_char(ASTNode *node) {
  return ((word)node & kImmediateTagMask) == kCharTag;

char AST_get_char(ASTNode *node) { return Object_decode_char((word)node); }

ASTNode *AST_new_char(char value) {
  return (ASTNode *)Object_encode_char(value);

bool AST_is_bool(ASTNode *node) {
  return ((word)node & kImmediateTagMask) == kBoolTag;

bool AST_get_bool(ASTNode *node) { return Object_decode_bool((word)node); }

ASTNode *AST_new_bool(bool value) {
  return (ASTNode *)Object_encode_bool(value);

bool AST_is_nil(ASTNode *node) { return (word)node == Object_nil(); }

ASTNode *AST_nil() { return (ASTNode *)Object_nil(); }

Enough talk about object encoding. Let’s compile some immediates.


The implementation is much the same as for integers. Check the type, pull out the payload, move to rax.

int Compile_expr(Buffer *buf, ASTNode *node) {
  if (AST_is_integer(node)) {
    word value = AST_get_integer(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_integer(value));
    return 0;
  if (AST_is_char(node)) {
    char value = AST_get_char(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_char(value));
    return 0;
  if (AST_is_bool(node)) {
    bool value = AST_get_bool(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_bool(value));
    return 0;
  if (AST_is_nil(node)) {
    Emit_mov_reg_imm32(buf, kRax, Object_nil());
    return 0;
  assert(0 && "unexpected node type");

I suppose we could coalesce these by checking if the node is any sort of immediate and then writing the address immediately back with Emit_mov_reg_imm32… but that would be breaking abstractions or something.


The testing is also so much the same — so much so, that I’ll only include the test for compiling characters. The other code is available from assets/code/lisp if you would like a reference.

TEST compile_char(Buffer *buf) {
  char value = 'a';
  ASTNode *node = AST_new_char(value);
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov eax, imm('a'); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0x0f, 0x61, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(buf, expected);
  word result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_char(value));

You’ll notice that instead of void, the function now takes Buffer*. This is part of the new testing fixtures setup that I mentioned earlier. The implementation is a macro that uses greatest.h’s “pass a parameter to your test” feature. Running a test looks much the same:


Anyway, that’s a wrap for today. Next time we’ll add some unary primitives for querying and manipulating the objects we have already.

Mini Table of Contents

Marc Brooker (mjb)

Focus on the Good Parts September 02, 2020 12:00 AM

Focus on the Good Parts

Skepticism and cynicism can get in your way.

Back in May, I wrote Reading Research: A Guide for Software Engineers, answering common questions I get about why and how to read research papers. In that post, I wrote about three modes of reading: solution finding, discovery, and curiosity. In subsequent conversations, I've realized there's another common issue that gets in engineers' ways when they read research, especially in the discovery and curiosity modes: too much skepticism.

The chief deficiency I see in the skeptical movement is its polarization: Us vs. Them — the sense that we have a monopoly on the truth; that those other people who believe in all these stupid doctrines are morons; that if you're sensible, you'll listen to us; and if not, to hell with you. (from Carl Sagan's The Demon Haunted World)

I could blame it on comment thread culture, racing to make that top comment pointing out errors in the paper. I could blame it on the low signal-to-noise ratio of content in general. I could blame it on poor research, poor writing, or incorrect data. But whatever is to blame, many readers approach technical content with their first goal being to find errors and mistakes, gaps in logic, or incomplete justifications of statements. When a mistake is found, the reader is justified in throwing out the whole piece of writing (unreliable!), the authors (sloppy!), their institutions (clueless!), or even the whole field (substandard!). It's also a perfect opportunity to write that comment or tweet pointing out the problems. After all, if you found the author's mistake, doesn't that make you smarter and better than the author?

This approach gets in the way of your ability to learn from reading. I'd encourage you to take a different one: read with the goal of finding the good stuff. Dig for the ideas, the insights, the analyses and the data points that provide value. Look for what you can learn.

I'm not suggesting that you don't carefully approach what you read. You absolutely should make sure what you believe is well-supported. Don't waste your life reading crap. Your time is too valuable for that.

The flip side of this is relying too much on social proof. If you open the comment thread first, you'll find that the piece you're about to read is great or it's crap or it's another piece of junk published by those people (you know, them, the incompetent ones). Then, when you finally read the paper, you'll be less smart. You'll be biased towards confirming the opinions of others, rather than reading and understanding the material. I'm not against comment threads, but I never read them first.

Again, you can go too far in this direction. A lot of academic publishing is an exercise in social proof. Almost all the filtering we use to reduce the firehose of content down to a manageable stream depends on social proof. We use these tools because they're powerful, and scalable. But remember than popularity with Hacker News commenters, and even publication in a prestigious conference or journal, is only weak evidence of quality. Unpopularity, and rejection, are weak evidence of a lack of quality.

An Example

Fox and Brewer's classic paper Harvest, Yield, and Scalable Tolerant Systems contains many great ideas. The framing of Harvest and Yield is very useful, and I've found it's had a big influence on the way that I have approached system design over the years. The first time I read it, though, I put it down. The parts describing CAP (Section 2 and 3) are confusing at best and wrong at worst (as I've blogged about before). I couldn't get past them.

It was only after being encouraged by a colleague that I read the whole thing. Taken as a whole, it's full of great ideas. If I had kept tripping over my skepticism, and getting stuck on the bad parts, I never would have been able to learn from it.

August 31, 2020

Frederic Cambus (fcambus)

Modernizing the OpenBSD console August 31, 2020 06:30 PM

At the beginning were text mode consoles. Traditionally, *BSD and Linux on i386 and amd64 used text mode consoles which by default provided 25 rows of 80 columns, the "80x25 mode". This mode uses a 8x16 font stored in the VGA BIOS (which can be slightly different across vendors).

OpenBSD uses the wscons(4) console framework, inherited from NetBSD.

CRT monitors allowed to set the resolution you wanted, so on bigger monitors a 80x25 console in textmode was fairly large but not blurry.

Framebuffer consoles allowed taking advantage of larger monitor sizes, to fit more columns and row. With the switch to LCD monitors, also in part driven by the decreasing costs of laptops, the fixed size panels became a problem as the text mode resolution needed to be stretched, leading to distortion and blurriness.

One thing some people might not realize, is the huge discrepancy between text mode and framebuffer consoles regarding the amount of data you have to write to cover the whole screen. In text mode, we only need to write 2 bytes per character: 1 byte for the ASCII code, and 1 byte for attributes. So in 80x25 text mode, we only need to write 80 * 25 * 2 bytes of data, which is 4000 bytes, and the VGA card itself takes care of plotting characters to the screen. In framebuffer however, to fill a 4K UHD-1 (3840x2160) screen in 32bpp mode we need to send 3840 * 2160 * 4 bytes of data, which is 33177600 bytes (approximately 33 MB).

On framebuffer consoles, OpenBSD uses the rasops(9) subsystem (raster operations), imported from NetBSD in 2001.

While they had been used for a while on platforms without VGA cards, framebuffer consoles were only enabled on i386 and amd64 in 2013 for inteldrm(4) and radeondrm(4).

In recent years, rasops(9) itself and framebuffer drivers have seen some improvements:

General improvements:

  • Add and enable efifb(4), EFI framebuffer driver (yasuoka@, August 2015)
  • Implement counter-clockwise rotation (kettenis@, August 2017)
  • Implement scrollback in rasops(9) (jcs@, April 2018)

Performance related improvements:

  • Make it possible to use RI_WRONLY during early boot (kettenis@, September 2015)
  • Introduce rasops_wronly_do_cursor() (kettenis@, August 2018)
  • Remap EFI framebuffer early to use write combining (kettenis@, September 2018)
  • Do PAT setup earlier, so mapping the framebuffer WC actually works (kettenis@, December 2018)
  • Fast conditional console scrolling (John Carmack, June 2020)
  • Optimize character rendering in 32bpp mode (John Carmack, June 2020)

Console fonts improvements:

There is an article about Spleen in the OpenBSD Journal with more information, notably on the font selection mechanism relative to screen resolution.

And work slowly continues to make framebuffer consoles more usable.

It is interesting to note that while NetBSD has been adding a lot of features to rasops(9) over the years, OpenBSD has taken a more conservative approach. There is however one major feature that NetBSD currently has which would be beneficial: the capability for loading fonts of different metrics and subsequently resizing screens.

Looking forward, performance of various operations could likely still be improved, possibly by leveraging the new OpenBSD dynamic tracing mechanism to analyze bottlenecks.

Another open question is UTF-8 support, Miod Vallat started work in this direction back in 2013 but there are still a few things missing. I have plans to implement sparse font files support in the future, at least so one can take advantage of box drawing and possibly block elements characters.

Lastly, a major pain point has been the lack of larger fonts in RAMDISK kernels, making installations and upgrades very difficult and error-prone on large DPI monitors as the text is basically unreadable. There is no technical blocker to make this happen, which ironically makes it the most difficult kind of issue to tackle.

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Integers August 31, 2020 04:46 PM


Welcome back to the “Compiling a Lisp” series. Last time we made a small code execution demo. Today we’re going to add the first part of our language: integer literals. Our programs will look like this:

  • 123
  • -10
  • 0

But we’re not going to put a parser in. That can come later when it gets harder to manually construct syntax trees.

Also, since implementing full big number support is pretty tricky, we’re only going to support fixed-width numbers. It’s entirely possible to then implement big number support in Lisp after we build out some more features.

Pointer tagging scheme

Since the integers are always small (less than 64 bits), and we’re targeting x86-64, we can represent the integers as tagged pointers. To read a little more about that, check out the “Pointer tagging” section of my Programming languages resources page. Since we’ll also represent some other types of objects as tagged pointers, I’ll sketch out a tagging scheme up front. That way it’s easier to reason about than if I draw it out post-by-post.

High                                                         Low
0000000000000000000000000000000000000000000000000XXXXXXX00001111  Character
00000000000000000000000000000000000000000000000000000000X0011111  Boolean
0000000000000000000000000000000000000000000000000000000000101111  Nil

In this diagram, we have some pointer templates composed of 0s, 1s, and Xs. 0 refers to a 0 bit and 1 refers to a 1 bit.

X is a placeholder that refers to payload data for that value. For immediate values — values whose data are part of the pointer itself — the Xs refer to the data. For heap-allocated objects, it is the pointer address.

It’s important to note that we can only accomplish this tagging scheme because on modern computer systems the lower 3 bits of heap-allocated pointers are 0 because allocations are word-aligned — meaning that all pointers are numbers that are multiples of 8. This lets us a) differentiate real pointers from fake pointers and b) stuff some additional data there in the real pointers.

These tags let us quickly distinguish objects from one another. Just check the lower bits:

  • Lower 2 bits 00 means integer
  • Lower 3 bits 111 means one of the other immediate value types; check the lower 7 bits to tell them apart
  • For any of the other types, there’s a one-to-one mapping of bit pattern in the lower 3 bits to the type

This is a choice that Ghuloum made when drawing up the compiler paper. It’s entirely possible to pick your own encoding as long as your encoding also has the property that it’s possible to distinguish the type based on the pointer.1

We’re going to be a little clever and use the same encoding scheme inside the compiler to represent Abstract Syntax Tree (AST) nodes as we are going to use in the compiled code. I mean, why not? We’re going to have to build the encoding and decoding tools anyway.

Pointer tagging in practice

We’ll start off with integer encoding, since we don’t have any other types yet.

#include <assert.h>   // for assert
#include <stdbool.h>  // for bool
#include <stddef.h>   // for NULL
#include <stdint.h>   // for int32_t, etc
#include <string.h>   // for memcpy
#include <sys/mman.h> // for mmap

#include "greatest.h"

// Objects

typedef int64_t word;
typedef uint64_t uword;

const int kBitsPerByte = 8;                        // bits
const int kWordSize = sizeof(word);                // bytes
const int kBitsPerWord = kWordSize * kBitsPerByte; // bits

Ignore greatest.h — that is a header-only library I use for lightweight testing.

word and uword are type aliases that I will use throughout the codebase to refer to types of values that fit in registers. It saves us a bunch of typing and helps keep types consistent.

To avoid some mysterious magical constants, I’ve also defined helpful names for the number of bits in a byte (a standard C feature), the number of bytes in a word, and the number of bits in a word.

const unsigned int kIntegerTag = 0x0;
const unsigned int kIntegerTagMask = 0x3;
const unsigned int kIntegerShift = 2;
const unsigned int kIntegerBits = kBitsPerWord - kIntegerShift;
const word kIntegerMax = (1LL << (kIntegerBits - 1)) - 1;
const word kIntegerMin = -(1LL << (kIntegerBits - 1));

word Object_encode_integer(word value) {
  assert(value < kIntegerMax && "too big");
  assert(value > kIntegerMin && "too small");
  return value << kIntegerShift;

// End Objects

As we saw above, integers can be fit inside pointers by shifting them two bits to the left. We have this handy-dandy function, Object_encode_integer, for that.

I’ve added some bounds checks to make sure we don’t accidentally mangle the values coming in. If the number we’re trying to encode is too big or too small, shifting it left by 2 bits will chop off the left end.

This function is pretty low-level. It doesn’t add any new type information (it returns a word, just as it takes a word). It’s meant to be a utility function inside the compiler. We’ll add another function in a moment that builds on top of this one to make ASTs.

Syntax trees

While we could pass around words all day and try really hard to keep the boundary between integral values and pointer values straight, I don’t much fancy that. I like my type in formation, thank you very much. So we’re going to add a thin veneer over the object encoding that both gives us some nicer type APIs and gives the C compiler some hints about when we’ve already encoded an object.

// AST

struct ASTNode;
typedef struct ASTNode ASTNode;

ASTNode *AST_new_integer(word value) {
  return (ASTNode *)Object_encode_integer(value);

bool AST_is_integer(ASTNode *node) {
  return ((word)node & kIntegerTagMask) == kIntegerTag;

word AST_get_integer(ASTNode *node) { return (word)node >> kIntegerShift; }

// End AST

We’ll use these functions pretty heavily in the compiler, especially as we add more datatypes.

An expandable byte buffer

Now that we can manually build programs, let’s get cracking writing our buffers. We have to emit the machine code to somewhere, after all. Remember the mmap/memcpy stuff from last time? We’re going to wrap those in some easier-to-remember APIs.

// Buffer

typedef unsigned char byte;

typedef enum {
} BufferState;

typedef struct {
  byte *address;
  BufferState state;
  size_t len;
  size_t capacity;
} Buffer;

byte *Buffer_alloc_writable(size_t capacity) {
  byte *result = mmap(/*addr=*/NULL, /*length=*/capacity,
                      /*prot=*/PROT_READ | PROT_WRITE,
                      /*flags=*/MAP_ANONYMOUS | MAP_PRIVATE,
                      /*filedes=*/-1, /*offset=*/0);
  assert(result != MAP_FAILED);
  return result;

void Buffer_init(Buffer *result, size_t capacity) {
  result->address = Buffer_alloc_writable(capacity);
  assert(result->address != MAP_FAILED);
  result->state = kWritable;
  result->len = 0;
  result->capacity = capacity;

void Buffer_deinit(Buffer *buf) {
  munmap(buf->address, buf->capacity);
  buf->address = NULL;
  buf->len = 0;
  buf->capacity = 0;

int Buffer_make_executable(Buffer *buf) {
  int result = mprotect(buf->address, buf->len, PROT_EXEC);
  buf->state = kExecutable;
  return result;

These functions are good building blocks for creating and destroying buffers. They abstract away some of the fiddly parameters and add runtime checks.

We still need to write into the buffer at some point, though, and we’re not going to memcpy whole blocks in. So let’s add some APIs for incremental writing.

byte Buffer_at8(Buffer *buf, size_t pos) { return buf->address[pos]; }

void Buffer_at_put8(Buffer *buf, size_t pos, byte b) { buf->address[pos] = b; }

This Buffer_at_put8 is the building block of the rest of the compiler. Every write will go through this function. But notice that it is pretty low-level; it does not do any bounds checks and it does not advance the current position in the buffer. So let’s add some more functions to do that…

word max(word left, word right) { return left > right ? left : right; }

void Buffer_ensure_capacity(Buffer *buf, word additional_capacity) {
  if (buf->len + additional_capacity <= buf->capacity) {
  word new_capacity =
      max(buf->capacity * 2, buf->capacity + additional_capacity);
  byte *address = Buffer_alloc_writable(new_capacity);
  memcpy(address, buf->address, buf->len);
  int result = munmap(buf->address, buf->capacity);
  assert(result == 0 && "munmap failed");
  buf->address = address;
  buf->capacity = new_capacity;

void Buffer_write8(Buffer *buf, byte b) {
  Buffer_ensure_capacity(buf, sizeof b);
  Buffer_at_put8(buf, buf->len++, b);

void Buffer_write32(Buffer *buf, int32_t value) {
  for (size_t i = 0; i < sizeof value; i++) {
    Buffer_write8(buf, (value >> (i * kBitsPerByte)) & 0xff);

// End Buffer

With the addition of Buffer_ensure_capacity, Buffer_write8, and Buffer_write32, we can start putting together functions to emit x86-64 instructions. I added both write8 and write32 because we’ll need to both emit single bytes and 32-bit immediate integer values. The helper function ensures that we don’t need to think about endian-ness every single time we emit a 32-bit value.

Emitting instructions

There are a couple ways we could write an assembler:

  • Emit binary directly in the compiler, with comments
  • Make a table of all the possible encodings of the instructions we want (meaning mov eax, 1 and mov ecx, 1 are distinct, for example) and fetch chunks of bytes from there
  • Use some encoding logic to make re-usable building blocks

I chose to go with the last option, though I’ve seen all three while looking for a nice C assembler library. It allows us to write code like Emit_mov_reg_imm32(buf, Rcx, 123), which if you ask me, looks fairly similar to mov rcx, 123.

If we were writing C++ we could get really clever with operator overloading… or we could not.

Note that I did not make up this encoding logic. This is a common phenomenon in instruction sets and it helps in decoding (for the hardware) and encoding (for the compilers).

// Emit

typedef enum {
  kRax = 0,
} Register;

static const byte kRexPrefix = 0x48;

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0xc7);
  Buffer_write8(buf, 0xc0 + dst);
  Buffer_write32(buf, src);

void Emit_ret(Buffer *buf) { Buffer_write8(buf, 0xc3); }

// End Emit

Boom. Two instructions. One mov, one ret. The REX prefix is used in x86-64 to denote that the following instruction, which might have been decoded as something else in x86-32, means something different in 64-bit mode.

In this particular mov’s case, it is the difference between mov eax, IMM and mov rax, IMM.

Compiling our first program

Now that we can emit instructions, it’s time to choose what instructions to emit based on the input program. We have a very restricted set of input programs (yes, several billion of them, if you’re being persnickety about the range of possible integers) so the implementation is short and sweet.

If we see a literal integer, encode it and put it in rax. Done.

// Compile

int Compile_expr(Buffer *buf, ASTNode *node) {
  if (AST_is_integer(node)) {
    word value = AST_get_integer(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_integer(value));
    return 0;
  assert(0 && "unexpected node type");

int Compile_function(Buffer *buf, ASTNode *node) {
  int result = Compile_expr(buf, node);
  if (result != 0) {
    return result;
  return 0;

// End Compile

I make a distinction between expr and function because we don’t always want to ret. We only want to ret the result of a function body, which might be composed of several nested expressions. This divide will become clearer as we add more expression types.

Making sure it works

Our compiler is all well and good, but it’s notably more complicated than the mini JIT demo from the last post. It’s one thing to test that by manually checking the return code of main, but I think we should have some regression tests to keep us honest as we go forth and break things.

For that, I’ve written some testing utilities to help check that we generated the right code, and also to execute the JITed code and return the result.

typedef int (*JitFunction)();

// Testing

#define EXPECT_EQUALS_BYTES(buf, arr)                                          \
  ASSERT_MEM_EQ(arr, (buf)->address, sizeof arr)

word Testing_execute_expr(Buffer *buf) {
  assert(buf != NULL);
  assert(buf->address != NULL);
  assert(buf->state == kExecutable);
  // The pointer-pointer cast is allowed but the underlying
  // data-to-function-pointer back-and-forth is only guaranteed to work on
  // POSIX systems (because of eg dlsym).
  JitFunction function = *(JitFunction *)(&buf->address);
  return function();

// End Testing

ASSERT_MEM_EQ will check the generated code and point out any differences if it finds them. Even though this only prints out hex representations of the generated code, it’s very helpful. I often paste unexpected output into rasm2 (part of the radare2 suite), Cutter (also part of the radare2 suite), or this online disassembler. If the instructions look super unfamiliar, it means we messed up the encoding!

Since we have our utilities, we’re going to use the greatest.h testing API to write some unit tests for our compiler and compiler utilities.

// Tests

TEST encode_positive_integer(void) {
  ASSERT_EQ(0x0, Object_encode_integer(0));
  ASSERT_EQ(0x4, Object_encode_integer(1));
  ASSERT_EQ(0x28, Object_encode_integer(10));

TEST encode_negative_integer(void) {
  ASSERT_EQ(0x0, Object_encode_integer(0));
  ASSERT_EQ((word)0xfffffffffffffffc, Object_encode_integer(-1));
  ASSERT_EQ((word)0xffffffffffffffd8, Object_encode_integer(-10));

TEST buffer_write8_increases_length(void) {
  Buffer buf;
  Buffer_init(&buf, 5);
  ASSERT_EQ(buf.len, 0);
  Buffer_write8(&buf, 0xdb);
  ASSERT_EQ(Buffer_at8(&buf, 0), 0xdb);
  ASSERT_EQ(buf.len, 1);

TEST buffer_write8_expands_buffer(void) {
  Buffer buf;
  Buffer_init(&buf, 1);
  ASSERT_EQ(buf.capacity, 1);
  ASSERT_EQ(buf.len, 0);
  Buffer_write8(&buf, 0xdb);
  Buffer_write8(&buf, 0xef);
  ASSERT(buf.capacity > 1);
  ASSERT_EQ(buf.len, 2);

TEST buffer_write32_expands_buffer(void) {
  Buffer buf;
  Buffer_init(&buf, 1);
  ASSERT_EQ(buf.capacity, 1);
  ASSERT_EQ(buf.len, 0);
  Buffer_write32(&buf, 0xdeadbeef);
  ASSERT(buf.capacity > 1);
  ASSERT_EQ(buf.len, 4);

TEST buffer_write32_writes_little_endian(void) {
  Buffer buf;
  Buffer_init(&buf, 4);
  Buffer_write32(&buf, 0xdeadbeef);
  ASSERT_EQ(Buffer_at8(&buf, 0), 0xef);
  ASSERT_EQ(Buffer_at8(&buf, 1), 0xbe);
  ASSERT_EQ(Buffer_at8(&buf, 2), 0xad);
  ASSERT_EQ(Buffer_at8(&buf, 3), 0xde);

TEST compile_positive_integer(void) {
  word value = 123;
  ASTNode *node = AST_new_integer(value);
  Buffer buf;
  Buffer_init(&buf, 10);
  int compile_result = Compile_function(&buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov eax, imm(123); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0xec, 0x01, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(&buf, expected);
  word result = Testing_execute_expr(&buf);
  ASSERT_EQ(result, Object_encode_integer(value));

TEST compile_negative_integer(void) {
  word value = -123;
  ASTNode *node = AST_new_integer(value);
  Buffer buf;
  Buffer_init(&buf, 100);
  int compile_result = Compile_function(&buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov eax, imm(-123); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0x14, 0xfe, 0xff, 0xff, 0xc3};
  EXPECT_EQUALS_BYTES(&buf, expected);
  word result = Testing_execute_expr(&buf);
  ASSERT_EQ(result, Object_encode_integer(value));

SUITE(object_tests) {

SUITE(buffer_tests) {

SUITE(compiler_tests) {

// End Tests


int main(int argc, char **argv) {

These tests pass, at least for me. And no Valgrind errors, either! The full source for this post can be put together by putting together the individual code snippets back to back, in order. I recommend following along and typing it manually, to get the full educational experience, but if you must copy and paste it should still work. :)

If you want to convince yourself the tests work, modify the values we’re checking against in some places. Then you’ll see the test fail. Never trust a test suite that you haven’t seen fail… it might not be running the tests!

I think there is also a way to use greatest.h to do setup and teardown so we don’t have to do all that buffer machinery, but I haven’t figured out an ergonomic way to do that yet.

Next time on Dragon Ball Z, we’ll compile some other immediate constants.

Mini Table of Contents

  1. Actually, you can get away with a scheme that only plays games with pointer tagging for immediate objects, and uses a header as part of the heap-allocated object to encode additional information about the type, the length, etc. This is what runtimes like the JVM do. 

Joe Nelson (begriffs)

Tips for stable and portable software August 31, 2020 12:00 AM

After several years’ involvement with quickly evolving programming languages, I’ve come to appreciate stability. I’d like to make my programs easy to build on a wide variety of systems with minimal adjustment. I’d like them to keep working long into the future as environments change.

To think about stability more clearly, let’s divide a functioning program into its layers. Then we can examine development choices one layer at a time.

concentric circles of program resources

concentric circles of program resources

The more features a program needs, the further out it must reach through the layers.

Layer 0: Programming language

Choose a language with multiple implementations and a standard

Every language has to start somewhere, often as an implementation by a single person or small group. At this stage the language evolves rapidly, and to be fair it’s this stage that advances the state of the art.

However, using a language in its single-implementation stage means you’re committing a percentage of your energy to the “research project” of the language itself. You’ll deal with breaking changes (including tools), and experimental dead-ends.

If you love the idea behind a new language, or believe it’s a winner and that your early familiarity will pay off, then go for it! Otherwise use a language that has advanced beyond a single implementation. That way you can focus on your domain of expertise rather than keeping up with a language research agenda.

Languages get to the next stage when groups of people fork them for new situations and architectures. Some people add features, other people discover difficulties in their environments. Stakeholders then debate and reach consensus through a standardization process. The end result is that the standard, rather than a particular software artifact, defines the language and has the final say.

Naturally the whole thing takes a while. Standardized languages are going to be fairly old. They’ll miss out on recent ideas, but will be well understood. Here are some mature languages with standards:

  • Ada
  • C
  • Common Lisp
  • ECMAScript
  • Pascal
  • SQL

I’ve been using C lately because of its portability, simple (yet expressive) abstract machine model, and deep compatibility with POSIX and foundational libraries.

Avoid – or wrap – compiler language extensions

If you’re using a language with a standard, take advantage of it. First, choose a specific version of the standard. Older versions are generally more widely supported, but have fewer features. In the C world I usually pick C99 because it has some conveniences over C89, and is still supported pretty much everywhere (although only partially on Windows).

Consult your compiler documentation to see if the compiler can catch accidental uses of non-standard behavior. In clang or gcc, add the following flags to your Makefile:

# enforce a specific version of the standard
CFLAGS += -std=c99 -pedantic

Substitute another version for “c99” as desired. The pedantic flag rejects all programs that use forbidden extensions, and some other programs that do not follow ISO C.

If you do want to use compiler extensions (such as those in gcc or clang), wrap them behind your own macros so that the code stays portable. The PostgreSQL project does this kind of thing in c.h. Here’s an example at random:

 * Use "pg_attribute_always_inline" in place of "inline" for functions that
 * we wish to force inlining of, even when the compiler's heuristics would
 * choose not to.  But, if possible, don't force inlining in unoptimized
 * debug builds.
#if (defined(__GNUC__) && __GNUC__ > 3 && defined(__OPTIMIZE__)) || defined(__SUNPRO_C) || defined(__IBMC__)
/* GCC > 3, Sunpro and XLC support always_inline via __attribute__ */
#define pg_attribute_always_inline __attribute__((always_inline)) inline
#elif defined(_MSC_VER)
/* MSVC has a special keyword for this */
#define pg_attribute_always_inline __forceinline
/* Otherwise, the best we can do is to say "inline" */
#define pg_attribute_always_inline inline

Notice how they adapt to various compilers and provide a final fallback. Of course, avoiding extensions in the first place is the simplest option, when possible.

Layer 1: Standard library

Learn it, and consult the standard

Take time to learn your language’s standard library. It’s a freebie, you get it wherever your program goes. Read about the library functions in the language standard, since they will be covered there.

Gaining knowledge of the standard library can help reduce reliance on unnecessary third-party libraries. The ECMAScript world, for instance, is rife with micro-libraries that attempt to supplement the ECMA standard’s real or perceived shortcomings.

The size of a single-implementation language’s library is a trade-off between ease of implementation and ease of use. A giant library like that in the Go language makes it harder for creators of would-be rival implementations, and thus slows the progress to a robust standard.

To learn more about the C standard library, see my article.

Learn the rationale and gotchas

Because standards bodies avoid breaking existing codebases, and because stable languages are slow to change, there will be weird or dangerous functions in the standard library. However the dangers are well known and documented in supporting literature, unlike the dangers in new, relatively untested systems.

Here are some great books for C:

  • “The CERT C Coding Standard” by Robert C. Seacord (ISBN 978-0321984043). Illustrates potential insecurity with, among other things, the standard library. Lists real code that caused vulnerabilities.
  • “The Standard C Library” by P. J. Plauger (ISBN 978-0131315099). Thorough details about the C89 stdlib.
  • “C Traps and Pitfalls” by Andrew Koenig (978-0201179286).
  • “C Programming FAQs” by Steve Summit (ISBN 978-0201845198). I can see why these were historically the most frequently asked questions. I asked many of them myself.

Also the C99 standard has an accompanying rationale document. It talks about alternate designs considered and rejected.

Layer 2: POSIX

Similarly to how competing C implementations led to the C standard, the Unix wars led to POSIX. POSIX specifies a “lowest common denominator” interface that many operating systems honor to a greater or lesser degree.

Read the spec, compare with man pages

Whenever you use system calls outside the C standard library, check whether they’re part of POSIX, and if their official description differs from your local man pages. The Open Group offers a free searchable HTML version of POSIX.1. As of this writing it’s POSIX.1-2017 (which is POSIX.1-2008 plus two technical corrigenda).

There’s one more complication: POSIX.1-2008 (aka “Issue 7”) isn’t fully supported everywhere. (For instance I found that macOS doesn’t support pthread barriers, semaphores, or asynchronous thread cancellation.) I think the root cause is that 2008 requires thread and real-time functionality that was previously in optional extensions. If you stick to functionality in POSIX.1-2001 (aka Issue 6) you should be safe on all reasonably recent platforms.

Activate a version

To call POSIX functions you must define the _POSIX_C_SOURCE “feature test” macro before including header files. Select a specific POSIX version by using one of these values:

Edition Release year Macro value
1 1988 (N/A)
2 1990 1
3 1992 2
4 1993 199309L
5 1995 199506L
6 2001 200112L
7 2008 200809L

Header files hide or reveal functions based on the feature test macro. For example, the getline() function from Issue 7 allocates memory and reads a line.

/* line.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h> /* ssize_t */

int main(void)
	char *line = NULL;
	size_t len = 0;
	ssize_t read;
	while ((read = getline(&line, &len, stdin)) != -1)
		printf("Length %zd: %s", read, line);
	return 0;

Trying to use getline() on Issue 6 (POSIX.1-2001) fails:

$ cc -std=c99 -pedantic -Werror -D_POSIX_C_SOURCE=200112L line.c -o line

line.c:10:17: error: implicit declaration of function 'getline' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        while ((read = getline(&line, &len, stdin)) != -1)
1 error generated.

Selecting Issue 7 with -D_POSIX_C_SOURCE=200809L fixes it.

Important note: setting _POSIX_C_SOURCE will hide non-POSIX operating system extras in the standard headers. The best practice is to separate your source files into those that are POSIX conformant, and those (hopefully few) that aren’t. Compile the latter without the feature macro and link them all together at the end.

Use POSIX in the build process too

POSIX defines the interface for not just the library functions discussed earlier, but for the shell and common tools too. If you use those tools for your builds then you don’t need to install any extra software on destination machines to compile your project.

Probably the most common sources of accidental lock-in are bashisms and GNU extensions to Make. For scripts, use sh, and use (POSIX) make for Makefiles. Too many projects use GNU features needlessly. In fact, learning the portable subset of Make features leads to cleaner, more reliable builds.

This is a topic for an entire article of its own. Chris Wellons wrote a nice tutorial about it. Also “Managing Projects with make” by Andrew Oram (ISBN 0-937175-90-0) is a little book that’s packed with good advice.

Layer 3: Operating system extras

Operating systems include useful functionality beyond POSIX. For instance extensions to pthreads (setting reader-writer preference or thread processor affinity), access to specialized hardware (like audio or graphics), alternate I/O interfaces and semantics, and functions for safety like strlcpy or pledge.

Three ways to use these features portably are to:

  1. wrap them in your own interface and conditionally compile the implementation, or
  2. build a static shim library (“libcompat”) as part of your project to use when functionality is missing, or
  3. link to a third party library that abstracts the details.

We’ll talk about third-party libraries later. Let’s look at option one now.

Detecting OS functions

Consider the example of generating random data. It requires help from the OS since POSIX offers only pseudo-random numbers.

We’ll split our Makefile into two parts:

  1. Makefile – specifies targets, dependencies and rules, that hold on all systems
  2. – sets macros and build flags specific to the local system

The Makefile will include the specifics of like this:

# inside the Makefile...

# set up common options and then...


We’ll generate with a configure script. A developer will run the script before their first build to detect the environment options. The most primitive way for configure to work would be to try parse uname and make decisions based on what OS or distro it sees. A more accurate way is to try to directly probe the needed OS C functions.

To see if a C function exists, we can just try compiling test snippets of code and see if they succeed. You might think this is awkward or that it requires cluttering your project with test code, but it’s actually pretty elegant.

First make this shell script helper function:

compiles ()
	stage="$(mktemp -d)"
	echo "$2" > "$stage/test.c"
	(cc -Werror "$1" -o "$stage/test" "$stage/test.c" >/dev/null 2>&1)
	rm -rf "$stage"
	return $cc_success

The compiles() function takes two arguments: an optional compiler flag, and the source code to attempt to compile.


Note that mktemp and cc are not POSIX compliant. You can write your own mktemp function using POSIX primitives, but I wanted to conserve space in this example. For cc, the spec offers c99 (or c89 in 4th edition POSIX). However, the c99 utility doesn’t allow controlling warning levels, and I wanted to specify that warnings be treated as errors. The cc alias is a common de-facto standard.

Let’s use the helper to check for OS random number generators. The BSD world offers arc4random_buf to get random bytes, and Linux offers getrandom. The configure script can check for each feature like this:

if compiles "" "
	#include <stdint.h>
	#include <stdlib.h>
	int main(void)
		void (*p)(void *, size_t) = arc4random_buf;
		return (intptr_t)p;

if compiles "-D_POSIX_C_SOURCE=200112L" "
	#include <stdint.h>
	#include <sys/types.h>
	#include <sys/random.h>
	int main(void)
		ssize_t (*p)(void *, size_t, unsigned int) = getrandom;
		return (intptr_t)p;

See? Not too bad. These code snippets test not only whether the functions exist, but also check their type signatures. Notice how the second example is compiled with POSIX for the ssize_t type, while the first example is intentionally not marked POSIX conformant because doing so would hide the extra function arc4random_buf that BSD puts in stdlib.h.

Wrap OS functions behind your own

It’s helpful to isolate the use of non-portable functions in a distinct translation unit, and export your own interface on top. That way it’s more straightforward to set up conditional compilation in one place, or to refactor in the future.

Let’s continue the example from the previous section of generating random bytes. With the hard work of OS feature detection behind us, we can wrap the differing OS interfaces behind our own function:

#include <stdint.h>
#include <stdlib.h>
#include <sys/random.h>

void get_random_bytes(void *buf, size_t n)
#if defined HAVE_ARC4RANDOM  /* BSD */
	arc4random_buf(buf, n);
#elif defined HAVE_GETRANDOM /* Linux */
	getrandom(buf, n, 0);
#error OS does not provide recognized function to get entropy

The Makefile defines HAVE_ARC4RANDOM or HAVE_GETRANDOM using CFLAGS when the corresponding functions exist. The code can just use ifdefs. Notice the #error in the #else case to fail compilation with a clear message on unsupported platforms.

The degree of portability we strive for causes trade-offs. Example: we could add a fallback to reading /dev/random. The configure script from the previous section could check whether the device exists:

if test -c /dev/random; then

Using that information, we could add another #elif in get_random_bytes() so that it can potentially work on more systems. However, in this case, the increased portability would require a change in interface. Since fopen() or fread() on /dev/random could fail, our function would need to return bool. Currently the OS functions we’re calling can’t fail, so a void return is fine.

Test on multiple OSes and hardware

The true test of portability is, of course, building and running on multiple operating systems, compilers, and hardware architectures. It can be surprising to see what assumptions this can uncover. Testing portability early and often makes it easier to keep a program shipshape.

The PostgreSQL project, for instance, maintains a bunch of disparate machines known as the “buildfarm.” Buildfarm members each have their own OS, compiler, and architecture. The team compiles every new feature on these machines and runs the test suite there.

Focusing on the architectures alone, we can see an impressive variety in the buildfarm:

Even if you have no intention to run on these architectures, testing there will lead to better code. (See my article C Portability Lessons from Weird Machines.)

Begriffs Buildfarm?

I’ve been thinking of assembling a buildfarm and offering a paid continuous integration service. If this interests you, please send me an email. I think the project is a good cause, and with enough subscriptions I could cover the electricity and hardware costs.

Layer 4: third-party libraries

Many languages have their own application-level package managers, but C has no exclusive package manager. The language has too much history and spans too many environments to have locked into that. Instead people build dependencies from source, or use the OS package manager.

Build with pkg-config

Linking to libraries requires knowing their path, name, and compiler settings. Additionally we want to know which version is installed and whether it’s in-bounds. Since there’s no application-level package manager for C, we need to use another tool to discover installed libraries.

The most cross-platform way to find – and build against – dependency libraries is pkg-config. The tool allows you to query system packages, regardless of how they were installed. To be compatible with pkg-config, each library foo provides a libfoo.pc file containing keys and values like this:


Name: libfoo
Description: The foo library
Version: 1.2.3
Cflags: -I${includedir}/foo
Libs: -L${libdir} -lfoo

The pkg-config executable can query the metadata and provide flags for your Makefile. Call it from your configure script like this:

# check that a sufficient version is installed
pkg-config --print-errors 'libfoo >= 1.0'

# save flags to
cat >> <<-EOF
	CFLAGS += $(pkg-config --cflags libfoo)
	LDFLAGS += $(pkg-config --libs-only-L libfoo)
	LDLIBS += $(pkg-config --libs-only-l libfoo)

Notice the LDLIBS vs LDFLAGS distinction. LDLIBS are options that need to go at the very end of the build line. The default POSIX make suffix rules don’t mention LDLIBS, but here’s a rule you can use instead:

	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $< $(LDLIBS)

Sometimes an operating system will include extra functionality and package it up as a portable library you can use on other operating systems. In this case you can use pkg-config conditionally.

For instance, OpenBSD spun off the LibreSSL project (a more usable OpenSSL). OpenBSD includes the functionality internally. In the configure script just do an operating system check:

# LibreSSL
case "$(uname -s)" in
		# included with OS
		echo 'LDLIBS += -ltls' >>
		# requires a package
		pkg-config --print-errors 'libtls >= 2.5.0'
		cat >> <<-EOF
			CFLAGS += $(pkg-config --cflags libtls)
			LDFLAGS += $(pkg-config --libs-only-L libtls)
			LDLIBS += $(pkg-config --libs-only-l libtls)

For more information about pkg-config, see Dan Nicholson’s guide.

Compensating for the standard library

The C standard library has no generic collections. You have to write your own linked lists, trees, and hash tables. Real Programmers™ might like this, but I don’t.

POSIX offers limited help with their interface in search.h:

  • Binary search tree. This interface has worked for me, although twalk() doesn’t contain an argument to pass auxiliary data to the callback. The callback needs to consult a global or thread-local variable for that. The quality of implementation may vary as well, likely with regard to how/if the tree is balanced.
  • Queue. Very basic functions to insert or delete from a doubly linked (possibly circular) list. It takes void*, but expects a structure whose first two members are pointers to the same structure type (forward and backward pointers).
  • Hash table. Unnecessarily constrained interface. It creates a single hash table in hidden memory. You can destroy the table and later make another, but can never have more than one active at a time anywhere in the callstack. Obviously not thread safe, but that seems to be the least of its problems.

To go beyond that, you’ll have to use third-party libraries. Many well-known libraries seem pretty bloated (GLib, tbox, Apache Portable Runtime). I found a smaller, cleaner library called simply C Algorithms. Haven’t used it in a project yet, but it looks stable and well tested. I also built the library locally with added pedantic C99 flags and got no warnings.

Two other stable libraries (code snippets?) which have received a lot of use over the years are Uthash and BSD’s queue(3) (browse queue.h from OpenBSD, or the FreeBSD variant).

Uthash describes itself this way:

Any C structure can be stored in a hash table using uthash. Just add a UT_hash_handle to the structure and choose one or more fields in your structure to act as the key. Then use these macros to store, retrieve or delete items from the hash table."

The BSD queue code has been used and improved all the way back to the 1990s. It provides macros to create and manipulate singly-linked lists, simple queues, lists, and tail queues. The man page is quite good.

The functionality differs in the codebase of OpenBSD and FreeBSD. I use the OpenBSD version, but it has a little less functionality. In particular, FreeBSD adds the STAILQ (singly-linked tail queue), and a list swap operation. There was once a CIRCLEQ for circular queues, but it used dodgy coding practices and was removed.

Both Uthash and Queue are header files with macros that you vendor into your project and include rather than linking against. In general I consider “header-only libraries” to be undesirable because they abuse the notion of a translation unit, bloat object files, and make debugging harder. However I’ve used these libraries and they do work well.

User interface

The fewer UI features a program requires, the more portable it will be and the fewer opportunities there will be for it to mess up. (Does your command line app really need to output an emoji rocket ship or animated-in-place text spinner?)

The lowest common denominator is the standard I/O library in C, or its equivalent in other languages. Reading and writing text, pretending to be a teletype.

The next level of sophistication is static output but an input line you can modify (like the fancier teletypes that could edit a line before sending). Different terminals support intraline editing differently, and you should use a library to handle it. The classic is GNU readline. Readline provides this functionality:

  • Moving the text cursor (vi and emacs modes)
  • Searching the command history
  • Controlling a kill ring
  • Using tab completion

Its license is straight up GPL though, not even LGPL. There are more permissive knockoffs like libedit (requires ncurses), or linenoise (which is restricted to VT100 terminals/emulators).

Going up yet another level is the text user interface (TUI), where the whole screen is your canvas, but you draw on it with text. Historically terminal control codes diverged wildly, so a standard programming interface was born, X/Open Curses. The most popular implementation is ncurses, which adds some nonstandard extensions as well.

Curses handles these tasks:

  • Terminal capability detection
  • “Raw” mode keyboard input
  • Cursor motion
  • Line drawing
  • Highlighting, underlining
  • Inserting and deleting lines and characters
  • Status line
  • Area clear
  • Windows
  • Color

To stop pretending the computer is an archaic device from the 70s, you can use the cross-platform SDL2 library. It gives low level access to audio, keyboard, mouse, joystick, and graphics hardware. The platform support really is impressive. Everything from Unix, Mac, and Windows to mobile and web rendering.

Finally, for a classic native desktop application with widgets, the most stable and portable choice is probably Motif. The interface is stark, but it runs everywhere, and won’t change or break on you.

Sample of Motif widgets

Sample of Motif widgets

The Motif Programming Manual (free download) says this by way of introduction:

So why motif? Because it remains what it has long been: the common native windowing toolkit for all the UNIX platforms, fully supported by all the major operating system vendors. It is still the only truly industrial strength toolkit capable of supporting large scale and long term projects. Everything else is tainted: it isn’t ready or fully functionally complete, or the functional specification changes in a non-backwards-compatible manner per release, or there are performance issues. Perhaps it doesn’t truly port across UNIX systems, or it isn’t fully ICCCM compliant with software written in any other toolkit on the desktop, or there are political battles as various groups try to control the specification for their own purposes. […] With motif, you know where you are: it’s stable, it’s robust, it’s professionally supported, and it all works.

A reference manual is also available for download.

I was a little skeptical that it would be supported on macOS, but I tried the hello world example and, sure enough, it worked fine on XQuartz. I think there’s value in using Motif rather than a monster like GTK.

August 30, 2020

Derek Jones (derek-jones)

The aims of software engineering research August 30, 2020 10:19 PM

Physics researchers aim to explain the workings of the universe (technically they build models whose behavior mimics that of the universe we can measure), biologists the workings of biological systems, and psychologists the working of the human mind.

What are researchers in software engineering aiming to do?

Talking to academics, the answer is that they aim to do research that can be published in a high impact journal.

What do those involved in commercial software development think software engineering researchers should be aiming to achieve?

Most of the commercial developers I have asked have never thought about the subject; hardly surprising, they have plenty of other issues to think about.

Those who pay for software, rather than create it, want it to be cheaper and delivered faster.

Vendors are under some pressure to reduce costs and deliver sooner. But since its inception, software has been a sellers market, which means the customer pressure does not have the impact it has in other industries.

The very large organizations who pay lots of money for software for their own use (e.g., the U.S. Department of Defence) recognise that research into software production may well save them lots of money, and at one time interesting things were being discovered, but then funding got rerouted to people with an aversion to actual software engineering, i.e., academics.

Cheaper and faster will always be of interest, and will start to become a hot topic in software engineering research once software starts to becoming a buyers market.

Maintaining existing systems continues its growth to dominating what nearly every software developer does. Dependencies on the rest of the software world (e.g., libraries and compilers) is starting to consume a large percentage of maintenance costs. Managers want to know which packages are likely to have a long and stable lifetime, and which are likely to be short-lived. An understanding of the evolution of software ecosystems is a pressing need. This is really cheaper and faster over the long term.

Cheaper and faster (short term for development, long term for maintenance) covers everything.

It’s tempting to list personnel selection, i.e., who is likely to make the best software developer. But why should the process of selecting software developers be any different from the processes used to select people to become doctors, lawyers and other professions? I’m sure that those involved in the various professions would like a magic wand that points to the appropriate people (for some definition of appropriate), this magic wand is no more likely to exist for software developers than any other profession.

What do you think the aims of software engineering research should be?

Ponylang (SeanTAllen)

Last Week in Pony - August 30, 2020 August 30, 2020 09:39 PM

The Flynn project aims to bring a Pony-like actor-model implementation to Swift using a modified version of the Pony runtime. New releases of ponyc, ponyup, and some bots.

Gustaf Erikson (gerikson)

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: The smallest program August 30, 2020 03:49 AM


Welcome to the first post in the “Compiling a Lisp” series. We’re going to write a small program today. Before we actually compile anything, though, let’s build up a bit of a foundation for code execution. That way, we can see the code compile and run and be satisfied with the results of both.

Instead of compiling to disk, like most compilers you may be familiar with (GCC, Clang, DMD, Python, etc), we’re going to compile in memory. This means that every time we run the program we have to compile it again, but it also means we don’t have to deal with whatever on-disk format an executable has to be on your platform (ELF, Mach-O, etc). We can just point the processor at the code and say “go”. This style of compilation is known as “Just-in-Time” compilation, because the compilation happens right when you need it, and not before1.

Let’s start with a small demo.

#include <assert.h>   /* for assert */
#include <stddef.h>   /* for NULL */
#include <string.h>   /* for memcpy */
#include <sys/mman.h> /* for mmap and friends */

const unsigned char program[] = {
    // mov eax, 42 (0x2a)
    0xb8, 0x2a, 0x00, 0x00, 0x00,
    // ret

const int kProgramSize = sizeof program;

typedef int (*JitFunction)();

int main() {
  void *memory = mmap(/*addr=*/NULL, /*length=*/kProgramSize,
                      /*prot=*/PROT_READ | PROT_WRITE,
                      /*flags=*/MAP_ANONYMOUS | MAP_PRIVATE,
                      /*filedes=*/-1, /*offset=*/0);
  memcpy(memory, program, kProgramSize);
  int result = mprotect(memory, kProgramSize, PROT_EXEC);
  assert(result == 0 && "mprotect failed");
  JitFunction function = *(JitFunction*)&memory;
  int return_code = function();
  assert(return_code == 42 && "the assembly was wrong");
  result = munmap(memory, kProgramSize);
  assert(result == 0 && "munmap failed");
  return return_code;

This C program:

  1. Allocates writable memory (mmap)
  2. Copies a program into it (memcpy)
  3. Makes the memory executable (mprotect)
  4. Calls the memory as a function
  5. Deallocates memory (munmap)

The order of those steps is important! This C program will fail, usually with a segmentation fault, if you mix them up or skip one of them.

If you want to understand the pointer shenanigans see the footnote2, but if you would like to ignore it and pretend I never did that please keep reading. The program works, though:

sequoia% gcc -Wall -Wextra -pedantic -fno-strict-aliasing mmap-demo.c
sequoia% ./a.out 
sequoia% echo $?

Let’s back up and go through that demo line-by-line. I’ll skip the includes since that’s just part of life in C.

The machine code

First let’s take a look at our program. Here we have some raw machine code encoded as hex bytes, with helpful commentary by yours truly explaining what the bytes mean in human-speak.

const unsigned char program[] = {
    // mov eax, 42 (0x2a)
    0xb8, 0x2a, 0x00, 0x00, 0x00,
    // ret

I generated this code by going to the Compiler Explorer, making the compiler compile to binary, and typing in a C program that just returns 423.

This is as good a method as any for doing some initial research for what instructions you want to emit. You’ll have to look a little further afield (like in this quick reference or the official Intel x86-64 manual) if you want to figure out how to encode instructions without manually having a table for all the variations you want. We’ll touch more on that later.

In this machine code, 0xb8 is the instruction for “move the following 32-bit integer to the register eax”. It’s a special case of the mov instruction. eax is (the lower half of) one of several general-purpose registers in x86-64. It is also the register conventionally used for return values, but that could vary between calling conventions. It’s not important to know all the details of every calling convention, but it is important to know that a calling convention is just that — a convention. It is an agreement between the people who write functions and the people who call functions about how data gets passed around. In this case, we are moving 42 into eax because eax is the return register in the System V AMD64 calling convention (used on macOS, Linux, other Unices these days) and because we’re calling this hand-built function from C like any other function. It needs to be a well-behaved citizen and put data in places the compiler writers expected.

The next 4 bytes are the number, going from least significant byte to most significant byte.

Finally, 0xc3 is the instruction for ret. ret fetches the return address of the function that called our function off the stack, and jumps to it. This transfers control back to the main function of the C program.

When you put all of that together, you get a very small but well-formed program that returns 42.

The typedef

Next, we use C’s function pointer syntax to declare a type JitFunction that refers to a function that takes no arguments and returns an int.

typedef int (*JitFunction)();

While technically we should specify the size of the integer (after all, we know we want to return a 32-bit integer), I avoided that in this demo because it adds more headers and visual noise.

This declaration, when used with the actual call to the function, tells the C compiler how to arrange the registers and the stack for the call.

The mmap and memcpy dance

Now we allocate a new chunk of memory. We don’t use malloc to do it because mprotect needs the address to be page-aligned. Maybe it’s possible to use malloc and then posix_memalign, but I’ve never seen anybody do that. So we mmap it.

I don’t want to explain all the possible parameter configurations for mmap, especially because they vary between systems. Our configuration requests:

  • memory without specifying a destination address (addr=NULL),
  • of a particular length (length=kProgramSize),
  • that is both readable and writable (prot=PROT_READ | PROT_WRITE),
  • is not mapped to a file, but acts like malloc (flags=MAP_ANONYMOUS, fd=-1, offset=0),
  • and is not shared between processes (flags=MAP_PRIVATE)

And, since memory is kind of useless if we don’t do anything with it, we copy the program into it.

  void *memory = mmap(/*addr=*/NULL, /*length=*/kProgramSize,
                      /*prot=*/PROT_READ | PROT_WRITE,
                      /*flags=*/MAP_ANONYMOUS | MAP_PRIVATE,
                      /*filedes=*/-1, /*offset=*/0);
  memcpy(memory, program, kProgramSize);

You might be wondering why we need to make a whole new buffer and copy into it if we already have some memory containing the code. There are at least two reasons.

First, we need to guarantee that the memory is page-aligned for mprotect – same as above.

Second, in our actual compiler we won’t just have some static array that we copy code from. We’re going to be producing it on the fly and appending to a buffer as we go. We’ll be re-using this mmap dance, but not necessarily the memcpy.

The mprotect

Modern operating systems implement a security feature called “W^X”, pronounced “write xor execute”. This policy prohibits a piece of memory from being both writable and executable at the same time, which makes it harder for people to find exploits in buggy software.

In order to both write our program into a buffer, we need to have an explicit transition point where our memory goes from being both readable and writable to executable. This is mprotect.

  int result = mprotect(memory, kProgramSize, PROT_EXEC);
  assert(result == 0 && "mprotect failed");

If we didn’t do this, something bad would happen at runtime. On my machine, I get a segmentation fault.

The cast

In order to actually call the function, we need to first wrangle the void* into the right type. While we could do the cast and call in one line, I find it easier to read to cast first and call later.

  JitFunction function = *(JitFunction*)&memory;

The call

Ahh, some action! This very innocuous-looking code is maybe the most exciting part of the whole program. We finally take our code, marked executable, treat it the same as any old C function, and call it!

  int return_code = function();
  assert(return_code == 42 && "the assembly was wrong");

The first time I got this working I was very happy with myself.

The clean up

Just as every malloc must be paired with a free, every mmap must be paired with a munmap. Unlike free, munmap returns an error code so we check it.

  result = munmap(memory, kProgramSize);
  assert(result == 0 && "munmap failed");

Some proof

Just so we can convince ourselves that our program actually worked (who knows, maybe the asserts didn’t run), propagate the result of our function call to the outside world. We can then check the return code in $?.

  return return_code;

Note that while the return type of main is int, return codes can only be between 0 and 255, as they are char-sized.

Wrapping up

That was a lot of words for explaining return 42. Hopefully they were helpful words. With this small demo, we’ve gotten used to some building blocks that we’ll use when compiling and executing Lisp programs.

Next up, compiling integers.

Mini Table of Contents

  1. Unlike other JITs, though, we won’t be doing any of the fancy inline caching, deoptimization, or other tricks. We’re just going to compile the code, compile it once, and move on with our lives. 

  2. Hold your nose and ignore the ugly pointer casting. This avoids the compiler complaining even with -pedantic on. It’s technically not legal to cast between data pointers and function pointers, but POSIX systems are required to support it. Also relevant are the C strict aliasing rules, so we use -fno-strict-aliasing. I’m not an expert on what that means so see this nice StackOverflow post

  3.   int main() {
        return 42;

August 29, 2020

Simon Zelazny (pzel)

Large directory feature not enabled on this filesystem August 29, 2020 10:00 PM

TIL Firefox & Chromium will keep regenerating .cache/fontconfig until your filesystem blows up!

My wife's computer had been acting really strangely the last couple of days, and today in culminated in extremely bad performance: the computer was usable for a couple seconds, then completely unresponsive for about 2 seconds, then usable again.

Htop and top proved useless, because during the unresponsive periods their UI would freeze, so whatever application was causing the staggering was not easily detectable.

Dmesg showed what was going on. The following log message was appearing with a regularity that corresponded to the 'hiccups'.

[1805.005848] EXT4-fs warning (device dm-3): ext4_dx_add_entry:2357: Large directory feature is not enabled on this fileystem
[1805.005340] EXT4-fs warning (device dm-3): ext4_dx_add_entry:2352: Directory (ino: 15337216) index full, reach max htree level :2

Some internet spelunking revealed that it's possible to enable this 'Large directory' feature on a mounted device, and the invocation turned out to be:

tune2fs -O large_dir /dev/mapper/pool-abcd

(Note: this is on a LUKS-encrypted partition). Immediately after enabling this feature, the system stopped hiccupping.

Finding the culprit

Now, I was still bothered by which directory was so large that it hit against filesystem limits.

find / -inum 15337216

The find command revealed that the directory in question was $HOME/.cache/fontconfig. I tried to ls inside it, but the ls command seemed to hang forever. Instead, I ran:

find . | wc -l

and this revealed that the target directory had almost 15 million files inside! Deleting them took ~2 hours with the following command:

cd $HOME/.cache/fontconfig
find . -delete

The entire thing turned out to be caused by a bug in the interaction of Firefox and Chromium with fontconfig. Running Firefox once caused 480 new cache files to appear. Running Chrome subsequently added a couple hundred more files. Another run of Firefox again added a bunch of files.

To fix this temporarily I issued the following commands:

cd $HOME/.cache
rm -rf fontconfig
touch fontconfig

Now, there is no directory for the browsers to fight over, and no performance issues so far.

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Overture August 29, 2020 08:16 PM

Many thanks to Kartik Agaram and Leonard Schütz for proofreading these posts.

In my last series, I wrote about building a Lisp interpreter. This time, we’re going to write a Lisp compiler.

This series is an adaptation of Abdulaziz Ghuloum’s excellent paper An Incremental Approach to Compiler Construction, with several key differences:

  • Our implementation is in C, instead of Scheme
  • Our implementation generates machine code directly, instead of generating text assembly
  • Our implementation may omit some runtime data structures

See my implementation for reference, but note that it may be incomplete and also may look a little bit different than the compiler detailed in these posts.

You probably have a couple questions, like why Lisp? and why compile directly to x86-64? and why C? and come on, another Lisp series?, and those are all very reasonable questions that will be answered in due time.

I want to implement this compiler in another language than Scheme because it will prevent me from copying too much of the code from the paper. Even though the paper doesn’t actually contain the source for the whole compiler (most of it is, after all, left as exercises for the reader), I think I will learn a lot more when I have to write all of the code by myself. I get to make my own mistakes and you get to watch me make and fix them in “real” time.

I also don’t want to generate text assembly, but those reasons are a little different than my reason for choosing another implementation language:

First, I think that would be harder to test: I want to have an in-process unit test suite that compiles Lisp programs and executes them on-the-fly. Shelling out to a system assembler like as or nasm would be somewhat error prone. What if the person building this doesn’t have the assembler I need? Sure, I could also write a small assembler as part of this compiler, but that’s a lot of work. Harder than just generating x86-64 directly, perhaps.

Second, I want to learn more about machine architecture. While add a, b seems like one machine instruction, it could probably be encoded in 50 different ways depending on whether a and b are registers, stack locations, other memory addresses, immediates, which registers they are, etc. Shelling out to an assembler abstracts a lot of that detail away. I want to get my hands dirty. Hopefully you do, too.

I chose Lisp because that’s what the Ghuloum paper uses, and because Lisp can be represented as a small, compact, dynamically typed language. Many interpreter implementations are under 200 lines. I don’t think this compiler will be that short, though.

For questions, comments, and suggestions please post on this elist. It’s a public inbox that I can use to discuss and receive patches. I got the idea from Chris Wellons.

Background knowledge

In order to get the most out of this series, I recommend having at least passing familiarity with the following:

  • C or a C-like language
  • some kind of assembly language
  • Abstract Syntax Trees and recursive tree traversal
  • no particular aversion to parentheses

Having the background helps your focus be more on the bigger picture than the minutia, but it is by no means required. I expect most of this series to be fairly readable. If it’s not, that’s a bug and you should report it to me.

Structure of the series

I plan on writing this series in installments where each installment adds a feature of some kind. Maybe that feature is a new bit of Lisp functionality, or maybe it’s a refactoring of the compiler, or maybe it’s a compiler optimization.

For this reason, each post will tend to depend on the code and understanding from previous posts. As such, I recommend reading the series in order. I’ll still try to keep the big ideas understandable for those who don’t.

At each stage of the compiler, we should have a battery of tests that ensure that the features we have already build continue to work as expected.

I plan on adhering to this rough plan:

  1. Compile integers
  2. Compile other immediate constants (booleans, ASCII characters, the empty list)
  3. Compile unary primitives (add1, sub1, integer->char, char->integer, null?, zero?, etc)
  4. Compile binary primitives (+, -, *, /, =, etc)
  5. Read expressions from strings
  6. Compile local variables (let-expressions)
  7. Compile conditional expressions (if-expressions)
  8. Compile heap allocation (cons)
  9. Compile heap allocation (strings, symbols, etc)
  10. Compile label procedure calls
  11. Compile closures
  12. Add tail-call optimization
  13. Compile complex constants (quote)
  14. Compile variable assignment (set!)
  15. Add macro expander
  16. Add extended forms using macro expander (let*, letrec, etc)
  17. Add support for libraries and separate compilation
  18. Compile foreign function calls
  19. Add error checking to primitives and procedure calls
  20. Compile variable-arity procedures (aka varargs)
  21. Compile apply
  22. Add output ports (kind of like FILE*)
  23. Add write, display
  24. Add input ports
  25. Add a tokenizer in Lisp
  26. Add a reader in Lisp
  27. Add a Lisp interpreter (or compiler) in Lisp

With optional add-ons also described in the initial paper:

  • Big numbers, IEEE754 floats, complex numbers
  • User-defined macros
  • A module system
  • Heap overflow handler and garbage collector
  • Stack overflow handler
  • Improved code generation

And optional add-ons not described in the original paper:

  • An intermediate representation for optimization
  • Generate executables and write them to disk
  • An interpreter with optional just-in-time compiler

You may have noticed that this is a lot of steps, and there are some steps that I intend to take but have completely omitted because I want to roll them into other posts. Things like:

  • Code generation infrastructure (a writable buffer, mmap/mprotect, etc)
  • Compiler data structures (variable environments, label environments, etc)
  • Testing infrastructure (unit testing, integration testing)

So it’s really actually more work than I listed above. This series may take a long time. It may take some twisty turns. It may take some shortcuts. But there is good news: I’ve already written the compiler up to compiling heap allocation (still working on procedure calls), and even if I don’t finish this series there is still Ghuloum’s excellent paper to learn from.

Next up, the smallest program.

Mini Table of Contents

Here are some other links you might find useful or interesting while following along with this series:

August 24, 2020

Jan van den Berg (j11g)

New WordPress theme: Neve August 24, 2020 08:18 PM

Frequent visitors might notice a change to the site: I switched WordPress themes.

I have been a happy user of the Independent Publisher theme since this site started, and I still use it on my other blog. It’s a terrific theme and I like a lot.

But because I really like clean and simple aesthetic I made quite a few tweaks to it, specifically to the fonts and CSS.

My favorite themes are usually black and white themes. Two of my favorite examples of this aesthetic are and Both excellent looking sites in my opinion, and a joy to read.

So I looked closely at those sites and copied a few things from them. For example: both use the gorgeous Merriweather serif font as the main font for the body text. So did my site (this wasn’t the previous theme default font). I really like serif fonts, they add a sort of legibility and make big text blocks more readable.

But I always kept tweaking the theme: letter-spacing, font-size, colors and more, and I was never 100% happy with it. Especially when things looked good on the desktop, it would look a bit off on mobile. Or the other way around.

Some tweaks I made to the previous theme. Complete listing


Last week I came across this tweet for a new theme called Neve from ThemeIsle and the example was striking enough to give it a try. And I was happily surprised but how easy, complete and fast this theme was out of the box. I have made exactly 0 CSS tweaks to it. What you’re seeing now is default Neve. I have tried *many* themes over the years, and always most lack something. Neve checks all the boxes for what I have been looking for, for quite some time.

And even though Neve uses sans-serif fonts, I found this theme to have the most overall consistent experience (desktop and mobile) and the configuration options are plentiful. And it’s really fast: which is really important. My site feels snappier because of it.

So I made the decision to switch themes. And I like it a lot. The last couple of days I go to my own site, revisit old posts, just to see how they look and I am always pleased with the appearance. The line-spacing is just right, the header font-weight perfect, it looks good on dektop and mobile, it’s clean and it’s fast.


There are two gripes.

  1. I noticed when you don’t center an image, the image caption will sort of blend with the text. And it will not really be clear that the caption belongs to the image. The fix is easy though: center your images and the caption will be centered. Another solution could be to make the caption font smaller or use a different shade of grey to make it more distinct.
  2. The other gripe is one I have to examine a little bit closer, but I don’t think the Neve quote blocks look all that good. If anything a good quote might be best served by a serif font to stand out a bit. But, this is by no means a deal breaker, but I might take a closer look at this.

But also I don’t want to tweak too much. I actually really like that I can use this theme with default settings and that it looks really good. So if you’re looking for a great, clean, fast theme: give Neve a try!

The post New WordPress theme: Neve appeared first on Jan van den Berg.

Tobias Pfeiffer (PragTob)

The great Rubykon Benchmark 2020: CRuby vs JRuby vs TruffleRuby August 24, 2020 02:30 PM

It has been far too long, more than 3.5 years since the last edition of this benchmark. Well what to say? I almost had a new edition ready a year ago and then the job hunt got too intense and now the heat wave in Berlin delayed me. You don’t want your computer running at […]

August 23, 2020

Derek Jones (derek-jones)

Time-to-fix when mistake discovered in a later project phase August 23, 2020 10:09 PM

Traditionally the management of software development projects divides them into phases, e.g., requirements, design, coding and testing. A mistake introduced in one phase may not be detected until a later phase. There is long-standing folklore that earlier mistakes detected in later phases are much much more costly to fix persists, despite the original source of this folklore being resoundingly debunked. Fixing a mistake later is likely to a bit more costly, but how much more costly? A lack of data prevents reliable analysis; this question also suffers from different projects having different cost-to-fix profiles.

This post addresses the time-to-fix question (cost involves all the resources needed to perform the fix). Does it take longer to correct mistakes when they are detected in phases that come after the one in which they were made?

The data comes from the paper: Composing Effective Software Security Assurance Workflows. The 35,367 (yes, thirty-five thousand) logged fixes, from 39 projects drawn from three organizations, contains information on: phases in which the mistake was made and fixed, time taken, person ID, project ID, date/time, plus other stuff :-)

Every project has its own characteristics that affect time-to-fix. Project 615, avionics software developed by organization A, has the most fixes (7,503) and is analysed here.

Avionics software is safety critical, and each major phase included its own review and inspection. The major phases include: requirements gathering, requirements analysis, high level design, design, coding, and testing. When counting the number of phases between introduction/fix, should review and inspection each count as a phase?

The primary reason for doing a review and inspection is to check the correctness (i.e., lack of mistakes) in the corresponding phase. If there is a time-to-fix penalty for mistakes found in these symbiotic-phases, I suspect it will be different from the time-to-fix penalty between major phases (which for simplicity, I’m assuming is major-phase independent).

The time-to-fix has a resolution of 1-minute, and some fix times are listed as taking a minute; 72% of fixes are recorded as taking less than 10-minutes. What kind of mistakes require less than 10-minutes to fix? Typos and other minutiae.

The plot below shows time-to-fix for mistakes having a given ‘distance’ between introduction/fix phase, for fixes taking at least 1, 5 and 10-minutes (code+data):

Time-to-fix for mistakes having a given number of phases between introduction and fix.

There is a huge variation in time-to-fix, and the regression lines (which have the form: fixTime approx e^{sqrt{phaseSep}}) explains just 6% of the variance in the data, i.e., there is a small increase with phase separation, but it is almost down in the noise.

All but one of the 38 people who worked on the project made multiple fixes (30 made more than 20 fixes), and may have got faster with practice. Adding the number of previous fixes by people making more than 20 fixes to the model gives: fixTime approx e^{sqrt{phaseSep}}/fixNum^{0.03}, and improves the model by less than 1-percent.

Fixing mistakes is a human activity, and individual performance often has a big impact on fitted models. Adding person ID to the model as a multiplication factor: i.e., fixTime approx personID*{e^{sqrt{phaseSep}}/fixNum^{0.03}}, improves the variance explained to 14% (better than a poke in the eye, just). The fitted value of personID varies between 0.66 and 1.4 (factor of two, human variation).

The answer to the time-to-fix question posed earlier (for project 615), is that it does take slightly longer to fix a mistake detected in phases occurring after the one in which the mistake was introduced. The phase difference is tiny, with differences in human performance having a bigger impact.

Patrick Louis (venam)

Computer Architecture Takeaways August 23, 2020 09:00 PM

Alchemy, ancient and modern

Computer architecture can be considered a boring topic, one that is studied during CS education, then put aside, and leaves place to the shiny new toys that capture the attention.
I’ve recently revisited it, and I’d like to summarize some takeaways.

What is It

Computer architecture, like everything in the architecture and design domain, is concerned with building a thing, which here is a computer and all its components, according to requirements often called “-ilities”, such as cost, reliability, efficiency, speed, ease of use, and more.
Thus, it is inherently not limited to hardware, weight, power consumptions, and size constraints, but also includes taking into account decisions about design that fit a use-case within constraints.

Energy and Cost

Long gone are the days when we only cared about cramming more power in a single machine. We’ve moved to a world of battery-powered portable devices such as laptops and mobile phones. On such devices, we care about energy consumption as it directly affects battery life.

So what are some tips to avoid wasting energy.

  • Do nothing well: As simple as it sounds it isn’t straight forward. We need to know when is the right time to deactivate a processor completely, otherwise we’ll pay a high penalty when putting it back online.
  • Dynamic voltage-frequency scaling (DVFS): This is about changing the frequency of the processor’s clock, letting it consume instructions slower or faster. However, the same energy will be used for the same tasks, it’s just that the tasks will take longer to execute if the clock is slowed.
  • Overclocking (Turbo mode): This is about boosting the power of a processor so that it executes more instructions, and so would finish tasks faster. You can notice the trade-off between either finishing tasks early and consuming more power, or taking time but finishing them later.
  • Design for the typical case: Make the processor more efficient for the case that is the most frequent. Quite intuitive!

Keep in mind that these are all about dynamic power consumption, that is power consumption used by performing some actions, while in the background, there is always a static power consumption to keep the current system alive. Reducing static power consumption has an effect on the whole system.

wafer yield

Similar to energy, cost is also important, because computers are now an everyday item. The cost of a microprocessor is in direct relation to the learning curve that companies have to take to build them, while on the other hand the price of DRAM actually tracks the cost of manufacturing. When we talk of microprocessors, we are talking about a composition of one or multiple die, which have their circuit printed by UV lithography on a wafer (photolithography). And so, the price is in relation to how many dies (square shaped) can fit on a wafer (circle shaped), along with the yield, which is the number of non-defective dies, plus the cost of testing them and packaging them.
This is one of the reason big companies focus on commodity hardware — hardware that’s relatively cheap and replaceable. Only in the case of specialized or scientific computing will we ever see costly, hard-to-replace, and custom pieces of hardware.

Dependability is also a big factor that comes into play: how long will the piece of hardware last. This is especially important in big warehouse. If we’re going to replace a cheap piece way more frequently than another one that is just a little bit more expensive, we may be better off choosing the second one. Technically, we talk of Mean Time To Failure (MTTF) and Mean Time To Repair (MTTR).

Measurements and Bottlenecks

You can’t talk without the numbers to prove it, that’s why measurement is everything. We’re talking about benchmarks for computer architecture, similar to Phoronix-style benchmarks.
There are many ways to measure, report, and summarize performance, and different scenarios in which they apply. These aren’t limited to pure crunching of numbers, but benchmarks can go as far as to simulate kernels, desktop behaviors, and web-servers. Three of the most important ones are the EEMBC, which is a set of kernels used to predict hardware performance, the TPC benchmarks which are directed towards cloud infrastructures, and the SPEC family of benchmarks which touch a bit of everything.
Many of the results of these benchmarks are proprietary and/or costly.

Measuring is important because we want to make sure our most common case is fast. That’s a fact that comes out of something called Amdahl’s Law. Simply said, if you optimize a portion of code that is only used 1% of the time you’ll only be able to have a maximum speedup of 1%.
Additionally, we should also check if we can parallelize computation in the common case, and if we can apply the principle of locality to it, that is reuse data and instructions so that they stay close temporally and physically.

ISA — Instruction Set Architecture

As much as people freak out about assembly being low-level, it is still a language for humans (not machines) that requires another software, called an assembler, to be converted to actual machine code instructions. However, it is most often tied to the type of instructions a machine can process — what we call the instruction set architecture, or ISA for short. That is one reason why we have many assembly flavors: because we have many ISAs.

To understand the assembly flavor you are writing in, it’s important to know the differences and features of the ISA, and if additional proprietary options are provided by the manufacturer. These can include some of the following.

  • The classification of ISA: Today, most ISAs are general-purpose register architectures, that means operands can either be register or memory location. There are two sub-classes of this: the register-memory ISAs, where memory can directly be accessed as part of instructions, like 80x86, and load-store ISAs, where memory can only be accessed through load and store operations, like ARMv8 and RISC-V. Let’s note that all new ISAs after 1985 are load-store.

  • The way the memory is addressed: Today, all ISAs point at memory operands using byte addressing, that means we can access values in memory by byte, in contrast with some previous ISAs where we had to fetch them by word (word-addressable). Additionally, some architectures require that objects be aligned in memory, or encourage users to align them for efficiency reasons. In ARMv8 they must be aligned, and in 80x86 and RISC-V it isn’t required but encouraged.

  • The modes in which memory can be addressed: We know that the operand to address memory has to be bytes but there are many ways to precise how to get them. We could get a value by pointing to the address stored in a register, or by pointing to the value stored at the immediate address of a constant, or to the value stored at the address formed by the sum of the value of a register plus a constant (displacement). These 3 modes are available in RISC-V. 80x86 adds other modes such as: no register (absolute), getting the address from one register as an index and another register the displacement, and from two registers where one register is multiplied by the size of the operand in bytes, and more. ARMv8 has the 3 RISC-V modes plus PC-relative addressing, the sum of two registers, and the sum of two registers where one registers is multiplied by the size of the operands in byte. Yes, there are so many ways to give an address in memory!

  • Types and size of operands/data/registers: An ISA can support one or multiple types of operands ranging from: 4-bit (nibble), 8-bit, 16-bit (half word), 32-bit (integer or word), 64-bit (double word or long integer), and in the IEEE-754 floating-point we can have 32-bit (single precision), 64-bit (double precision), etc. 80x86 even supports 80-bit floating-point (extended double precision).

  • The type of operations available: What can we do on the data, can we do data transfer, arithmetic and logical operations, control flow, floating-point operations, vector operations, etc. 80x86 has a very large set of operations that can be done. Let’s also note that some assembly flavors include the type of operands within the operations and thus new instructions need to be added for new types (see under CISC).

  • The way control flow instructions work: All ISAs today include at least the following: conditional branches, unconditional jumps, and procedure calls and returns. Normally, the addresses used with those are PC-related. On RISC-V, the condition for the branch is checked based on the value content of registers, while on 80x86, the test condition code bits are set as side effects of previous arithmetic/logic operations. As for the return address, on ARM-v8 and RISC-V, it is placed in a register, while on 80x86, it is placed on the stack in memory. This is one of the reason stack overflows on 80x86 are so dangerous.

  • Encoding the ISA instructions: Finally, we have to convert things to machine code. There are two choices in encoding: with fixed length or with variable length. ARM-v8 and RISC-V instructions are fixed at 32-bit long, which simplifies instruction decoding. 80x86, on the other hand, has variable length instructions ranging from 1 to 18 bytes. The advantage is that the machine code takes less space, and so the program is usually smaller. Keep in mind that all the previous choices affect how the instructions are encoded into a binary representation. For example, the number of registers and the number of addressing modes need to be represented somehow.

Moreover, ISAs are grossly put into two categories, CISC and RISC, the Complex Instruction Set Computer and Reduced Instruction Set Computer.
RISC, are computers with a small, highly optimized set of instructions, with numerous registers, and highly regular instruction pipeline. Usually, RISCs are load-store architectures with fixed size instruction encoding to keep the clock cycle per instruction (CPI) constant.
CISC, are computers with a very large set of instructions, instructions which can execute several lower-level operations, can have side effects, can access memory through a single instruction that encompasses multiple ones, etc. It englobes anything that isn’t RISC or that isn’t a load-store architecture.

It has to be said, that different manufacturers can implement one ISA differently than others. Which means, that their implementation can consume the same instruction encoding but that the actual hardware is different. ISAs are like the concept of interfaces in OOP.

Memory Hierarchy and Tech

memory hierarchy

Many authors have written a great deal about memory hierarchy, basically it’s all about creating layers of indirection, adding caches to speed things up at each layer, wanting to keep what we’re going to use close temporally and physically, while having in consideration how the layers are going to be used.
For example, the associativity of an L2 instruction cache might not be as effective when applied to an L2 data cache. When in doubt, refer to the benchmark measurements.

There are four big questions that we should ask at each layer:

  • Where can a block be placed
  • How is a block found if it is there
  • Which block should be replaced on a miss
  • What happens on a write

With these, there’s also an interplay with both the lower and upper layers. For instance, issues like the size and format of cached lines when they have to be moved up or down between the caching layers.
Additionally, synchronization and coherence mechanisms between multiple caches might be important.

NAND memory

When it comes to data storage, the hardware speed, its ability to retain information, its power consumption, its size, and other criteria, are what matters. Let’s review some common technologies in use today.

Static RAM, or SRAM, has low latency and requires low power to retain bits, however for every bit at least 6 transistors are required. It’s normally used in processor cache and has a small storage capacity.
Dynamic RAM, or DRAM, is slower but requires only one transistor per bit. However, it has to both be periodically refreshed (every ~8ms) and must be re-written after being read. DRAM is usually split into rows and columns, where the upper half of the address is found in the row and the lower in the column, we talk of row access strobe (RAS) and column access strobe (CAS). It is normally used as the main memory.
Because DRAM is cheaper to manufacture, to make it more profitable, it has received a lot of improvements to face its limitations. For example, some of the optimization are related to bandwidth. Namely, double data rate allows transferring twice per clock signal, and multiple banks allow accessing data in different places at the same time. Multiple banks are key to SIMD (see under Single Instruction Multiple Data Stream).

address form

Flash memory technology, aka EEPROM, be it NAND or NOR gated, is becoming popular as a replacement for hard disk drives because of its non-volatility and low power. It can act as a cache in between the disk and the main memory. It has its own limitation in the way it updates by blocks.
Some of the latest hype is about Phase Change Memory (PCM or PRAM), which is a type of nonvolatile memory that is meant as a replacement for flash memory but that is more energy efficient.

However, we can’t rely on the physical medium alone to bring all the improvements. Some techniques have to be put in place to make the most out of what we got.

address form


Efficiency doesn’t matter if the medium fails. All these beautiful hardware can fall victim to two types of errors: soft errors, which can be fixed by error correcting codes (ECC), and hard errors, which make the section of data defective, requiring either redundancy or replacement to avoid data loss. Beware of cosmic rays!

Effectiveness can also be found in the layout and architecture used for caching. Here are some interesting questions that can be answered by performing benchmarks for the use-case. Remember, that everything is about trade-offs.

  • Should we use large block size to reduce misses that are caused by the block not being already there. However, that would also increase conflict miss (two blocks that collide because of their addresses in the cache) and miss penalty time (the time spent to fetch a block when it isn’t already there).
  • Should we enlarge the whole cache to reduce miss rate. However, it would increase the hit time (time to find the cache line) and power consumption.
  • Should we add more cache levels, which would reduce the overall memory access time but increase power consumption and complexity, especially between caches.
  • Should we prioritize read misses over write misses to reduce miss penalty, does it fit our case.
  • Should we use multiple independent cache banks to support simultaneous access, or does it add complexity.
  • Should we make the cache non-blocking, allowing hits before previous misses complete (hit under miss).
  • Should we merge the write buffers that has writes to the same block to avoid unnecessary travel or would that slow writes.
  • Should we rely on compiler optimizations that take in consideration the locality of the cache such as loop interchange (change the loop so that memory is accessed in sequential order), and subdividing the loops into small matrices that fit into the cache. Or would that behavior confuse the cache and actually have an opposite effect.
  • Should we use/have prefetching instructions if the ISA allows it to fill the cache manually or would that let the compiler mishandle the cache and fill it with unnecessary garbage, throwing away the locality of other programs.

These are all questions that can’t be answered without actual benchmarks.

virtual memory

For processes, there’s never enough space in the main memory. In multi-programming systems, the OS virtualizes access to memory, for both simplicity, space restriction, and security reasons. Each process thinks it’s running alone on the system.
The same four questions about memory placement applies. Particularly, the OS is the one who decides what to do with each piece, or page, of virtual memory. A page being the unit of manipulation, and often being the same size as a disk sector.
This means that there’s another layer of addresses, virtual ones, that need to be translated to physical ones, and that the OS needs to know where that pages currently are: either on the disk or in main memory.
For a faster translation, we could use a cache sitting near the core that would contain the recent translated addresses, a so-called translation buffer or translation look aside buffer (TLB).

hypothetical memory hierarchy with TLB

Additionally, if the virtual address can be composed in a way that doesn’t require going back to memory to fetch the data, but can be used to point to the data directly, in another cache for example, then the translation can be less burdensome.

Virtual memory can also be extended with protection, Each translation entry we can have extra bits representing permission or access rights attributes. In practice, there are at least two modes: user mode and supervisor mode. The operating system can rely on them to switch between kernel mode and user mode, limiting access to certain memory addresses and other sensitive features, a dance between hardware and software.

Virtual memory is the technology that makes virtual machines a reality. A program called a hypervisor, or virtual machine monitor, is responsible to manage virtual memory in such a way that different ISAs and operating systems can run simultaneously on the same machine without polluting one another. It does it by adding a layer of memory called “real memory” in between physical and virtual memory. Optimizations such as having the TLB entries not constantly flush when switching between modes, having virtual machine guests OS be allowed to handle device interrupts, and more are needed to make this tolerable.


Parallelism allows things to happen at the same time, in parallel. There are two classes of parallelism: Data-Level Parallelism (DLP), and Task-Level Parallelism (TLP). Respectively, one gives the ability to execute a single operation on multiple pieces of data at the same time, and one to effectuate multiple different operations at the same time.
Concretely, according to Flynn’s taxonomy, we talk of Single-Instruction stream Single-Data stream (SISD), Single-Instruction stream Multiple-Data stream (SIMD), Multiple-Instruction stream Single-Data stream (MISD), and Multiple-Instruction stream Multiple-Data stream (MIMD).

None of these would be possible without the help of something called pipelining. Pipelining allows instructions to be overlapped in execution by splitting them into smaller pieces that can be run independently, and that together form a full instruction. It is like a car assembly line for instructions, we keep fetching instruction on each lane and push them down at each step.
A typical breakdown of an instruction goes as follows:

  • Instruction fetch
  • Instruction decode and register fetch
  • Execution or effective address cycle
  • Memory access
  • Write-back

Pipeline example

That means with 5 lanes we should possibly be able to execute 5 instructions per clock cycle. However, the world isn’t perfect, and we face multiple major issues that don’t make this scenario possible: Hardware limitations, such as when we have a specific number of units that can perform the current step (structural hazards), when the data operands of instructions are dependent on one another (data hazards), when there are branches in the code, conditions that should be met for it to execute (control hazards). Another thing to keep in mind is that some instructions may take more than one cycle to finish executing and so may incur delay in the pipeline.

Pipeline multiple FP OPs

The data hazards category can be split into 3 sub-categories:

  • Read after write (RAW): When data needs only to be read after it has been written by another instruction.
  • Write after read (WAR): When data needs to be written only after another instruction has finished reading it.
  • Write after write (WAW): When data needs to be written only after another has written to it.

Many ingenious techniques have been created to avoid these issues. From stalling, to data forwarding, finding if the data dependence is actually needed, renaming variables in registers to virtual registers, to loop unrolling, and more.

Virtual registers is a technique used in dynamic scheduled pipelines aka dynamic scheduling. Unlike in-order instructions where we have to wait for long-running instructions to finish for another one that may or may not depend on it to be processed, dynamic scheduling uses out-of-order instructions and solves the dependencies internally. Popular algorithms are scoreboard, Tomasulo’s, and the reorder buffer (ROB).
These algorithms achieve the out-of-order execution by relying on additional hardware structures to store values that could possibly, when certain of the output, write them back to memory.


The hazards that affect performance the most are control hazards. Because of their conditional aspect, whether a branch is taken or not, we either have to freeze the pipeline until we know if it’s taken, or we choose to continue loading instructions from one of the path and flush them if the branch wasn’t actually taken.
What we can do to reduce the cost of branches is to try to predict, to speculate, through hardware or software/compiler. The compiler can do a Profile-Guided Optimization (POG), running the software and gathering information about which branches are taken the most frequently, to then indicate it, one way or another, in the final binary. As far as hardware goes, past behavior is the best indicator of future one, and so instruction cache can have their own prediction mechanism based on previous values. Some well-known algorithms: 2-bit prediction, tournament predictor, tagged-hybrid predictor, etc.

In all these cases, we need to pay close attention to what is executed; we shouldn’t execute instructions from a branch that wasn’t supposed to be taken, notably in the case where it’ll affect how the program behaves. However, in speculative instructions and dynamic scheduling, we allow executing future instructions from any branch as long as the code doesn’t have an effect. We then face a problem when it comes to exceptions in instructions that weren’t supposed to be executed, how do we handle them. It depends on the types of exceptions if we terminate or resume execution. But handling exceptions can be slow, thus some architecture provide two modes, one with precise exception and one without for faster run.

Another way to speed instructions is to issue more of them at the same time. Some of the techniques in this category put more emphasis on hardware, like statically and dynamically scheduled superscalar processors or increasing the fetch bandwidth, while others rely on the software/compiler: such as very long instruction word (VLIW) processors, which packages multiple instructions into one big chunk that is fetched at the same time.


Sometimes, whole chunks of code are independent, and so it would be advantageous to run them in parallel. We do that with multithreading, thread-level parallelism, a form of MIMD because we both execute different instructions on shared data between threads. In a multiprocessor environment, we can assign n threads to each processor.
There are 3 categories of multithreading scheduling:

  • Fine grained multithreading: When we switch between threads at each clock cycle.
  • Coarse grained multithreading: When we switch between threads only on costly stalls.
  • Simultaneous multithreading (SMT): Fine grained multithreading but with the help of multiple issue (issuing multiple instructions at the same time).

There are two ways to lay the processors in an architecture, and it directly affects multithreading, specially when it comes to sharing data.

  • Symmetric multiprocessors (SMP): In this case we have a single shared memory, the processors are equidistant, and so we have uniform memory latency.
  • Distributed shared memory (DSM): In this case the processors are separated by other types of hardware, the memory distributed among processors, the processors may not be at the same distance from one another, and so we have a non-uniform memory access (NUMA).

In thread-level parallelism the bottleneck lies in how the data is synchronized between the different threads that may live in different processors — it is a problem of cache coherence and consistency.
The two main protocols to solve this are directory based, which consists of sharing the status of each block kept in one location, usually the location is the lowest level cache L3, or snooping, which consists of each core tracking the status of each block and notify others when it is changed.
Along with these, we need ways to handle conflicts between thread programmatically, and so processors offer lock mechanisms such as atomic exchange, test-and-set, and fetch-and-increment.

Let’s now move our attention to another type of parallelism: SIMD, data level parallelism.

SIMD adds a boost to any program that relies heavily on doing small similar operations on big matrices of data, such as in multimedia or scientific applications.
We note 3 implementations of SIMD:

  • Vector architectures: They have generic vector registers, like arrays that can contain arbitrary data and where we can specify the size of operands in the vector before executing an instruction.
  • SIMD extensions (Intel MMX, SSE, AVX): It is a bunch of additional instructions added as an afterthought to handle SIMD related data, they have fixed size for each operand, the number of data operands are encoded in op code, there are no sophisticated addressing modes such as strided or scatter-gather, and mask registers are not usually present.
  • GPU: A specialized proprietary unit that can receive custom instructions from the CPU to perform the single operation on data heavy input.

All of these provide instructions to the compiler that more or less act like this: Load an array in a vector register only specifying the start address and size, operate on that special vector register, and finally push back the result into memory.
Some architectures provide ways to apply the instructions conditionally on the data in the vector register by specifying a bit mask. But beware of loading a whole dataset for a bit mask that only applies to a very small portion of the code. Vector instructions, like all instructions have a start up time, a latency that depends on the length of the operands, structural hazards, and data dependencies.
Another feature that some provide is the scatter-gather, for when array indices are represented by values present in another array.

Memory banks are a must for SIMD because we operate intensively on data that could be in multiple places. This is why we need support for high bandwidth for vector loads and stores, and to spread accesses across multiple banks.

We measure performance in SIMD by using the concept of roofline, a performance model. It calculates graphically when we reach the peak performance of our hardware, the more left oriented the peak is, the more we’re using out of our hardware.


GPUs are beasts of their own, specialized in only SIMD instructions. They’re part of a heterogeneous execution model, because we have the CPU as host and the GPU as an external device that is requested to execute instructions.
Interacting with GPU differs between vendors, some standards exists such as OpenCL but are not widely used. Normally, it’s a C-like programming language that is made easy to represent SIMD. However, we sometimes call them Single-Instruction Multiple-Thread because GPUs are internally composed of hundreds, and sometimes thousands, of threads, often called lanes in GPU parlance.

GPUs are insanely fast because they have a simple architecture that doesn’t care about data dependence or other hassle that normal processors have to deal with.


To make SIMD productive we have to use it properly by finding portions of code that can be executed with these instructions. Compilers are still struggling to optimize for data-level parallelism, loop-level algorithms can be used to try to find if array indices can be represented by affine functions, and then deduce more information from this. This is why multimedia libraries rely on assembly written manually by developers.

Big Warehouses

When architecture is applied at big scales, at warehouse scale, we have to think a bit differently. Today, the world lives in the cloud, from internet providers, to data centers, and governments, all have big warehouses full of computers.
Let’s mention some things that could be surprising about warehouse scale computing.

  • At this scale, the cost against performance matters a lot.
  • At this scale energy efficiency is a must, as it translates into power-consumptions and monthly bills.
  • At this scale dependability via redundancy is a must, we have so many machines that at least one component is bound to fail every day. The hardware should also be easily replaceable.
  • At this scale high network I/O is a must. Gigantic amount of money is put on switches and load balancers.
  • At this scale we have to think about the cost of investment, the CAPEX (capital expenditures) and OPEX (operational expenditures), the loan repayment for the construction of the datacenter, and the return on investment (ROI).
  • At this scale, we have to think wisely about the location we choose for the data center, be it because of cooling issues, of distance to the power-grid, of the cost of acquiring the land, distance to internet lines, etc.

In warehouses, we apply request-level parallelism, the popular map/reduce model.

And this is it for this article, one thing I haven’t mentioned but that is getting more traction, are domain specific architectures — custom processors made for special cases such as neural network, encryption, cryptocurrency, and camera image processing.


This was a small recap of topics related to computer architecture. It was not meant as a deep dive into it but just a quick overview targeted at those who haven’t touched it in a long time or that are new to it.
I hope you’ve at least learned a thing or two.
Thank you for reading!


  • Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design) 6th Edition


  • Internet Archive Book Images / No restrictions
  • Wafer_die&#039;s_yield_model_(10-20-40mm).PNG: Shigeru23derivative work: Cepheiden / CC BY-SA (
  • 2x910 / CC BY-SA (

Ponylang (SeanTAllen)

Last Week in Pony - August 23, 2020 August 23, 2020 01:08 PM

Corral now has ability to run scripts when a dependency is fetched. This has been used to install necessary libraries on Windows starting with the latest releases of the crypto, net_ssl, and regex packages. Ponyup 0.6.0 is out, with some minor improvements.

Kevin Burke (kb)

To Predict If You’ll Like a Beer, Look at the Hops August 23, 2020 03:10 AM

Generally if you name a food or drink, people know whether they like it or not. It is rare for someone to drink a merlot, or try pizza from a new restaurant — toasted bread, melted cheese, tomato sauce and toppings - and be wildly surprised at their reaction to the taste.

I can't quite figure that out for pale ales though. Some pale ales and IPA's had flavors I really liked, and some had flavors I really disliked. I had a tough time predicting which ones I would like and not like.

I had some suspicions - I didn't think I liked beers with much higher ABV than normal or beers that had citrus in them. But I also liked some beers with high ABV and one of my favorite "everyone has it" beers - Sierra Nevada - describes itself as "pine and citrus," so that wasn't quite right.

Anyway, I decided to be somewhat rigorous about this and order a few different types of beers from the bottle shop, and then figure out what I liked or didn't like about them. It turns out the key is the hops - there are some hop varieties (Cascade, Chinook, Noble) that I like a lot, and other hop varieties (Citra, Galaxy, Enigma, others) that I don't at all. If the hop description mentions passion fruit, I probably won't like it. Other than that, I can keep lists.

This is both satisfying - I can predict which beers I will like and not like, now — and frustrating. Why is this so difficult for consumers to figure out? Why does the category definition of "pale ale" include so much stuff? Like imagine if you ordered a "cheese pizza", and sometimes it would come with anchovies and sometimes with pineapple, and sometimes with nothing. People would demand better words to describe the differences between the things.

If you have ideas or answers, I would love to hear from you.

August 22, 2020

Unrelenting Technology (myfreeweb)

The touchscreen (both finger and pen support) on my Pixelbook has been broken... August 22, 2020 03:51 PM

The touchscreen (both finger and pen support) on my Pixelbook has been broken for a while (the Wacom digitizer was always present on i2c but it wasn’t sending events). There was like one time where I managed to get it to work briefly by holding the pen against it in some way, but that was it. Today I took the laptop out of the bag by the middle part, squeezing the lid a bit. Aaaand… touch works now! Something was going on with wiring somehow inside the lid (not the hinge) I guess? :/

August 18, 2020

Jeremy Morgan (JeremyMorgan)

How Do I Compare Strings in Go? August 18, 2020 01:59 AM

So you’re just learning Go and how things work. You need to compare two strings to see if they’re equal. You want to do it as simply and quickly as possible. In this tutorial we’re going to learn: Different ways to compare Strings Comparing strings ignoring case Measuring performance of different methods. So let’s get started. Note: There’s a video version of this article as well. Basic String Comparison So you need to compare a string.

August 16, 2020

Ponylang (SeanTAllen)

Last Week in Pony - August 16, 2020 August 16, 2020 11:34 PM

We have new releases for crypto libraries and new bots to automate changelogs and release notes. The shared Docker containers for openssl and libressl builders are being replaced.

Derek Jones (derek-jones)

Quality control in a zero cost of replication business August 16, 2020 10:29 PM

When a new manufacturing material becomes available, its use is often integrated with existing techniques, e.g., using scientific management techniques for software production.

Customers want reliable products, and companies that sell unreliable products don’t make money (and may even lose lots of money).

Quality assurance of manufactured products is a huge subject, and lots of techniques have been developed.

Needless to say, quality assurance techniques applied to the production of hardware are often touted (and sometimes applied) as the solution for improving the quality of software products (whatever quality is currently being defined as).

There is a fundamental difference between the production of hardware and software:

  • Hardware is designed, a prototype made and this prototype refined until it is ready to go into production. Hardware production involves duplicating an existing product. The purpose of quality control for hardware production is ensuring that the created copies are close enough to identical to the original that they can be profitably sold. Industrial design has to take into account the practicalities of mass production, e.g., can this device be made at a low enough cost.
  • Software involves the same design, prototype, refinement steps, in some form or another. However, the final product can be perfectly replicated at almost zero cost, e.g., downloadable file(s), burn a DVD, etc.

Software production is a once-off process, and applying techniques designed to ensure the consistency of a repetitive process don’t sound like a good idea. Software production is not at all like mass production (the build process comes closest to this form of production).

Sometimes people claim that software development does involve repetition, in that a tiny percentage of the possible source code constructs are used most of the time. The same is also true of human communications, in that a few words are used most of the time. Does the frequent use of a small number of words make speaking/writing a repetitive process in the way that manufacturing identical widgets is repetitive?

The virtually zero cost of replication (and distribution, via the internet, for many companies) does more than remove a major phase of the traditional manufacturing process. Zero cost of replication has a huge impact on the economics of quality control (assuming high quality is considered to be equivalent to high reliability, as measured by number of faults experienced by customers). In many markets it is commercially viable to ship software products that are believed to contain many mistakes, because the cost of fixing them is so very low; unlike the cost of hardware, which is non-trivial and involves shipping costs (if only for a replacement).

Zero defects is not an economically viable mantra for many software companies. When companies employ people to build the same set of items, day in day out, there is economic sense in having them meet together (e.g., quality circles) to discuss saving the company money, by reducing production defects.

Many software products have a short lifespan, source code has a brief and lonely existence, and many development projects are never shipped to paying customers.

In software development companies it makes economic sense for quality circles to discuss the minimum number of known problems they need to fix, before shipping a product.

Patrick Louis (venam)

Wild Mushrooms in Lebanon August 16, 2020 09:00 PM

The project about mapping wild mushrooms in Lebanon is out!

A video speaks louder than words:

Your browser does not support the video tag.

The project consists of a map with wild mushroom specimens, their locations, along with pictures and descriptions of them. It is based on the only two research papers on the topic I’ve found, Joseph Thiébaut research paper “Champignons observés dans le Liban et la Syrie de 1930 à 1933” along with Nadine Modad research paper “Survey and identification of wild mushrooms in Lebanon” and my own research and findings over the past few years.
It took me around 2 months, or almost 15h to fill the map. These research papers have been my bedtime stories for quite a while.

I’ve been interested and researched mushrooms in the region since our scavenging excursion in 2017 where we’ve found a boletus luridiformis along with many other species.

This includes, apart from reading the research papers above:

  • Reading books such as:
    • The Edible Mushroom Book — a guide to foraging and cooking — Anna Del Conte, Thomas Laesee
    • The complete Mushroom Hunter — An illustrated guide to finding, harvesting, and enjoying wild mushrooms — Gary Lincoff
    • North American Species of Lactarius — Alexander H. Smith
  • Watching documentaries and following Youtube channels such as:
  • Frequently going on hikes during Autumn and Spring to find new species.
  • Seeking out dried and non-dried, or served at restaurants, exotic mushrooms such as morels, king oyster, shiitake, portobello, porcini, cordyceps, lion’s mane, and more.
  • Actively following /r/mycology subreddit.
  • Getting in the mood by playing the fungi board game.

… and much more.

Fungi are now a hobby of mine and I’ll keep doing research and adding specimens to the collection on the map as I discover them.

Again, here is the project link if you missed it.


And, here are some pictures for you enjoyment:

elfin saddle mushroooomz lactarius sp. mushroooomz mushroooomz pithya mushroooomz mushroooomz mushroooomz mushroooomz mushroooomz

Let me know what you think of this project and if you like it, and remember to be safe when harvesting for consumption.

Pages From The Fire (kghose)

Travel in “The Expanse” August 16, 2020 03:27 AM

At least up to what I’ve seen in season 2, the expanse at least tries to acknowledge Newtonian physics. There are odd bits where they mix up where they should have centrifugal gravity and not, and in which direction, but largely, they try. Thankfully there is no FTL nonsense (yet), but the civilization seems to… Read More Travel in “The Expanse”

Jeff Carpenter (jeffcarp)

Grace Hopper 2019 Trip Report August 16, 2020 12:00 AM

Despite this trip report being over 9 months late, I wanted to share it because I can’t stop thinking about how positive an experience this conference was. Grace Hopper is the largest women in tech conference in the world, with around 25,000 attendees flying into Orlando, FL from all parts of the world for the 2019 conference. In previous years I had been interested in attending but hadn’t gotten the chance—and I (as a man) also strongly did not want to take the spot of a potential woman visiting the conference.

August 12, 2020

Jeremy Morgan (JeremyMorgan)

7 Reasons Why Front End Developers Going Full Stack Should Choose Go August 12, 2020 05:54 PM

So you’re a front end developer, and you want to learn some backend stuff. You want to become a full stack developer someday, so where do you start? Google’s Go language is an excellent place. For instance, let’s say you want to build a RESTful API to test the calls from your React Application. You could use JSONPlaceholder, Reqres, or even SoapUI. All excellent options. Or you could spend an evening take A Tour of Go and follow a tutorial like this one to build a local API that does exactly what you want, and mocks whatever you want.

August 11, 2020

Unrelenting Technology (myfreeweb)

Wi-Fi not connecting (well, getting instantly deauthed due to AP-STA-POSSIBLE-PSK-MISMATCH after connecti... August 11, 2020 07:14 PM

Wi-Fi not connecting (well, getting instantly deauthed due to AP-STA-POSSIBLE-PSK-MISMATCH after connecting) is apparently a relatively common problem with IoT devices. And most people seem to point to ESP8266-based ones.

Well, I’ve never had a problem with ESP, but today I’ve been setting up an RTL8711AF based device (Xiaomi qmi.powerstrip.v1) and it was failing just like that.

Turns out this device just completely fails when 802.11w Management Frame Protection is on (even optionally). Ugh. Thanks Realtek.

August 10, 2020

Geoff Wozniak (GeoffWozniak)

RSS has been moved to Atom August 10, 2020 10:52 PM

The feed is now found at

Indrek Lasn (indreklasn)


Hair washing involves water, shampoo and conditioner and often also a hairdryer, that is, it takes some time to occupy the bathroom or visit the hair salon. For more references, check out: Best Dry Shampoos for Fine Hair

Dry or dry shampoo is the best solution for those days when you need to improve your look, but don’t have time to wash your hair. Although the idea is relatively recent, there are already many cosmetic brands betting on this type of hair product that works as a spray, cleaning, perfuming and mainly removing the greasy aspect … until the hair washing with a normal shampoo.

What is meant by dry shampoo?

Dry shampoo is a product to instantly and superficially clean hair. It is a spray that must be applied to the root of dry hair, to absorb oil and give volume, leaving the hair looser.

What is dry shampoo for?

The dry shampoo serves, first of all, to clean and revitalize the hair, but the truth is that its main objective is to remove the oiliness and the appearance of greasy hair in a few minutes.

When to use a dry shampoo?

There are many situations in which the use of a dry shampoo is indicated and effective. When…

· a last minute appointment comes up and you don’t have time to wash your hair.

· goes to the gym every day and doesn’t want or doesn’t have time to wash her hair daily.

· she has very oily hair, so much so that even washing regularly, the strands look greasy at the end of the day.

· the hair has gone through a coloring and does not intend to exaggerate the number of washes with water, so as not to lose its color.

· the hair has been straightened and intends to space the washing with traditional shampoo as much as possible, to maintain the effect.

· the hair is even washed, but it needs an extra volume to do a certain hairstyle.

· you want to remove odors from your hair, such as the smell of tobacco or fried food.

What type of hair can dry shampoo be used on?

Dry shampoo is particularly suitable for oily hair, but not only! It can also be applied to mixed hair, with oily roots and dry tips in order to balance oiliness along the length of the strands. In addition, it cleans and perfumes dry hair and gives more volume to fine hair. However, in hair that is too thin you can leave the strands “glued” to each other, instead of loosening.

How to apply dry shampoo?

Using a dry shampoo is very simple, fast and practical!

· Start by parting the hair in a few strands, so that you can apply the product close to the root, but not directly on the scalp, because the powder can clog the pores.

· Then spray the spray in parallel, about 20/30 centimeters away from the hair, and repeat the process throughout the hair.

· Massage the strands with your fingertips so that the product penetrates better.

· Wait a few minutes for the wires to absorb the product.

· Finally, brush your hair to remove the white layer that remains on the strands.

Dry shampoo should not be used very often. The ideal is once a week because this product can clog the pores and leave the hair more dry and opaque if used many times.

Is it possible to replace traditional shampoo with dry shampoo?

No. Dry shampoo is a kind of emergency solution to clean and improve the appearance of hair on days when you don’t have time to wash it. It serves to make an artificial and quick cleaning, guaranteeing the effect up to two days, at most.

What are the best dry shampoos for sale on the market?

There are several dry shampoos on the market that absorb oil from the hair and return some volume to the root. All have their quality and effectiveness, but none is intended to replace conventional shampoo, water and conditioner.

August 09, 2020

Derek Jones (derek-jones)

Extreme value theory in software engineering August 09, 2020 10:26 PM

As its name suggests, extreme value theory deals with extreme deviations from the average, e.g., how often will rainfall be heavy enough to cause a river to overflow its banks.

The initial list of statistical topics to I thought ought to be covered in my evidence-based software engineering book included extreme value theory. At the time, and even today, there were/are no books covering “Statistics for software engineering”, so I had no prior work to guide my selection of topics. I was keen to cover all the important topics, had heard of it in several (non-software) contexts and jumped to the conclusion that it must be applicable to software engineering.

Years pass: the draft accumulate a wide variety of analysis techniques applied to software engineering data, but, no use of extreme value theory.

Something else does not happen: I don’t find any ‘Using extreme value theory to analyse data’ books. Yes, there are some really heavy-duty maths books available, but nothing of a practical persuasion.

The book’s Extreme value section becomes a subsection, then a subsubsection, and ended up inside a comment (I cannot bring myself to delete it).

It appears that extreme value theory is more talked about than used. I can understand why. Extreme events are newsworthy; rivers that don’t overflow their banks are not news.

Just over a month ago a discussion cropped up on the UK’s C++ standards’ panel mailing list: was email traffic down because of COVID-19? The panel’s convenor, Roger Orr, posted some data on monthly volumes. Oh, data :-)

Monthly data is a bit too granular for detailed analysis over relatively short periods. After some poking around Roger was able to send me the date&time of every post to the WG21‘s Core and Lib reflectors, since February 2016 (there have been various changes of hosts and configurations over the years, and date of posts since 2016 was straightforward to obtain).

During our email exchanges, Roger had mentioned that every now and again a huge discussion thread comes out of nowhere. Woah, sounds like WG21 could do with some extreme value theory. How often are huge discussion threads likely to occur, and how huge is a once in 10-years thread that they might have to deal with?

There are two techniques for analysing the distribution of extreme values present in a sample (both based around the generalized extreme value distribution):

  • Generalized Extreme Value (GEV) uses block maxima, e.g., maximum number of daily emails sent in each month,
  • Generalized Pareto (GP) uses peak over threshold: pick a threshold and extract day values for when more than this threshold number of emails was sent.

The plots below show the maximum number of monthly emails that are expected to occur (y-axis) within a given number of months (x-axis), for WG21’s Core and Lib email lists. The circles are actual occurrences, and dashed lines 95% confidence intervals; GEP was used for these fits (code+data):

Expected maximum for emails appearing on C++'s core and lib reflectors within a given period

The 10-year return value for Core is around a daily maximum of 70 +-30, and closer to 200 +-100 for Lib.

The model used is very simplistic, and fails to take into account the growth in members joining these lists and traffic lost when a new mailing list is created for a new committee subgroup.

If any readers have suggests for uses of extreme value theory in software engineering, please let me know.

Postlude. This discussion has reordered events. My original interest in the mailing list data was the desire to find some evidence for the hypothesis that the volume of email increased as the date of the next WG21 meeting approached. For both Core and Lib, the volume actually decreases slightly as the date of the next meeting approaches; see code for details. Also, the volume of email at the weekend is around 60% lower than during weekdays.

Scott Sievert (stsievert)

COVID-19, age and lockdowns August 09, 2020 12:00 AM

I wrote “Visualization of the COVID-19 infection rates” with two goals: to warn people about the upcoming pandemic and to provide insight into that pandemic.

The US took precautions within a couple months, and the length and intensity of the precautions has surprised me. Even four months later, individuals generally believe they should take actions to limit the spread of COVID-19. This includes wearing a mask and working remotely if possible.

But are these precautions justified? There’s no harm done if everyone gets an benign virus. Do the data justify mandating wearing masks and closing schools? Let’s look.

The hospital data from New York City (NYC) indicates that they are past the most intense part of the infection:

New COVID-19 cases/hospitalizations/deaths are down by 30–50$\times$ since the peak. By this measure, NYC has moved “flattened the curve” and are seeing minimal new cases, hospitalizations, and deaths.

However, the lockdowns are still continuing. The subway rides have been down below normal weekend levels for nearly 5 months:

How necessary are these lockdowns? Let’s look at some data to find out.

Case study: Sweden

Sweden has a different approach; the government made strong recommendations to the elderly to “limit close contact with other people” and

… [are] encourag[ing] citizens to use common sense, work from home if possible, and not gather in crowds over 50. Primary schools are open, as are bars and restaurants, with images showing people enjoying drinks and crowding streets.

—”Sweden Sticks With Controversial COVID-19 Approach”.

CNBC reports that “[Sweden] did not go into lockdown, instead issuing recommendations about social distancing and working from home while allowing many schools and businesses to stay open.”

Obviously, not having a lockdown has significant benefits: kids can see their friends at schools, restaurants/bars are still serving food and don’t have to lay people off, etc. In fact, the Sweden economy has performed well, at least when compared the US. Here’s a table on the annualized GDP growth rate:

Time Sweden US
2020, Q1 +0.1% -5.0%
2020, Q2 -8.6% -32.9%

There have even been stories written about how the Sweden economy has performed better than the economies of neighboring countries.

This must have come at a cost, right? Sure, they might have been able to keep their schools open and their economy functioning, but certainly more people contracted COVID-19? Absolutely:

But the number of infections is meaningless. No one cares if everyone contracts a harmless disease. Let’s look how harmful COVID is with the deaths attributed to COVID:

Clearly, far more elderly people have deceased from COVID than younger people when normalized by the population in that age group. The data from Sweden is high resolution – they specify the number people aged between (say) 75 and 80 years old that have died. The data from NYC are unfortunately too coarse to do any detailed comparisons; however, the general trend is clear: NYC and Sweden have the approximately the same number of deaths per population.

That’s right: NYC and Sweden have (approximately) the same number of deaths per population, even after normalizing for age. There’s no obvious difference as with the case count.

Maybe NYC is an outlier because of their population density.1 Let’s make the same plot for the US instead:

About 0.5% of the US population over 85 has deceased due to COVID. For the 40 year old, 0.005% of the US population has deceased due to COVID. For context, the US suicide rate is 150 per million or 0.015% for the population aged 35 to 44.

Let’s look at the various death rates for the US, and see how the number of deaths from COVID compare for each age group. Let’s plot these death counts relative to the number of COVID deaths:

A value of 20 on this chart means the death rate from (say) suicide is 20× greater than the death rate COVID-19 for that age group. I defined “death rate” for suicide/etc as low as it can be, n_dead / n_people. For COVID-19, death rate is defined as n_dead / n_infected.

This chart is a little misleading; this compares the deaths in 2015 to the number of COVID-19 deaths, not the deaths that occurred during the COVID-19 lockdowns from suicide/drugs/etc. I hypothesize that the number of suicides and drug overdoses have increased during the lockdowns. The suicide rate in 2015 is 20× the death rate of COVID-19 for the population aged 15–24; I suspect the suicide rate has increased, especially because the CDC director reports that deaths from suicide/drug overdoses are “far greater” than COVID deaths for high school aged students:

But there has been another cost that we’ve seen, particularly in high schools. We’re seeing, sadly, far greater suicides now than we are deaths from COVID. We’re seeing far greater deaths from drug overdose that are above excess that we had as background than we are seeing the deaths from COVID.

Robert Redfield, July 14th, 2020

COVID-19 lockdowns come with both economic costs and mental health costs. Let’s look at some data on COVID-19 and children.

COVID-19 and children

Iceland has performed a contact tracing study that studies infection and traces it back to it’s source, then recurses. Iceland tested 6% of their population in their contact tracing study before April 4th.2 Of the people randomly sampled, none of the children under 10 tested positive for COVID-19 despite a 0.8% positive rate for people older than 10 years. They also found that the infection probability increased (gradually) with age for the population under 20 years old. Iceland’s study included genetic tracing to determine the index cases, but unfortunately did not distinguish “school” and “work.”

Preliminary evidence from the NIH suggests that children are more likely to be missing the receptor for COVID-19, specifically because children are more susceptible to allergic asthma. The NIH is further funding this study to examine correlation the relevant gene and infection, and also COVID-19 in children:

One interesting feature of this novel coronavirus pandemic is that very few children have become sick with COVID-19 compared to adults. Is this because children are resistant to infection with SARS-CoV-2, or because they are infected but do not develop symptoms? The HEROS study will help us begin to answer these and other key questions.

Anthony S. Fauci, M.D., NIAID Director

Spreading without any symptoms, asymptomatic spread is rare; spreading before symptoms develop is “is believed to be far more common than asymptomatic spread” (source).


I presented data that provides evidence to support these hypotheses:

  • Elderly people have a significantly higher risk of contracting and dying from COVID-19.
  • The death rate for the population under 20 is minimal relative to suicide and drug overdose death rates.
  • Sweden and the US have similar death rates despite drastic differences in their public policy approach.

As an aside, here’s data from Minnesota on the age of various patient classes:

Population Median age
All MN residents 38.2
People who positive
for COVID (patients)
Patients not in hospital 34
Patients in hospital 59
Patients in ICU 61
Patients who die 83

This means that half the people in the ICU are over the age of 61, and half of the COVID-19 hospitalizations are older than 59.

Data sources

  1. NYC has about twice the population density of Stockholm and about 5× the population. 

  2. “Spead of SARS-CoV-2 in the Icelandic Population.” Gudbjartsson et. al. New England Journal of Medicine. DOI: 10.1056/NEJMoa2006100

August 08, 2020

Pete Corey (petecorey)

Now You’re Thinking with Arrays August 08, 2020 12:00 AM

I’ve been using the J programming language on and off (mostly off), for the past couple years, and I still find myself failing to grasp the “array-oriented” approach.

Recently I wanted to find the discrete derivative, or the forward difference of a list of integers. This boils down to finding differences, or deltas, between each successive pair of list elements. So, for a list of integers, 1 2 4 7 11, the list of deltas would be 1 2 3 4.

My first stab at building a verb that does this looked like this:

   (-/"1@:}.@:(],._1&|.)) 1 2 4 7 11
1 2 3 4

The idea is that we take our list, rotate it once to the left, and stitch it onto itself. This gives us a list of tuples of each pair of subsequent numbers, except for the first tuple which holds our first and last list values. We drop that tuple and map minus over the remaining pairs.

This solution seems overly verbose and complicated for something as seemingly fundamental as calculating differences between subsequent list values.

I asked for help on #JLang Twitter, and learned about the “cut” verb, specifically the :._3 form of cut, which executes a verb over subarrays, or “regular tilings” of its input. Armed with this knowledge, we can map minus over all length two tilings of our list:

   2(-~/;._3) 1 2 4 7 11
1 2 3 4

Very nice!

I was happy with this solution, but #JLang Twitter pried my mind open even further and made me realize that I still haven’t fully grasped what it means to work in an “array oriented” mindset.

It was explained to me that I should work with the entire array as a unit, rather than operate on each over the elements individually. What I’m really after is the “beheaded” (}.) array minus the “curtailed” (}:) array.

   (}. - }:) 1 2 4 7 11
1 2 3 4

This is the shortest, clearest, and, in hindsight, most obvious solution. It’s clear to me that I still need to work on getting into the “array-oriented” mindset when working with J, but hopefully with enough exposure to solutions liks this, I’ll get there.

Now we’re thinking with arrays!

August 06, 2020

Frederic Cambus (fcambus)

NetBSD on the NanoPi NEO2 August 06, 2020 08:41 PM

The NanoPi NEO2 from FriendlyARM has been serving me well since 2018, being my test machine for OpenBSD/arm64 related things.

As NetBSD/evbarm finally gained support for AArch64 in NetBSD 9.0, released back in February, I decided to give it a try on this device. The board only has 512MB of RAM, and this is where NetBSD really shines. Things have become a lot easier since jmcneill@ now provides bootable ARM images for a variety of devices, including the NanoPi NEO2.

On first boot, the system will resize the filesystem to automatically expand to the size of the SD card.

Growing ld0 MBR partition #1 (1052MB -> 60810MB)
Growing ld0 disklabel (1148MB -> 60906MB)
Resizing /
/dev/rld0a: grow cg |************************************                 |  69%

Once the system is up and running, we can add a regular user in the wheel group:

useradd -m -G wheel username

And add a password to the newly created user:

passwd username

From there we do not need the serial console anymore and can connect to the device using SSH.

NetBSD has binary packages available for this architecture, and installing and configuring pkgin can be done as follow:

export PKG_PATH=
pkg_add pkgin
echo $PKG_PATH > /usr/pkg/etc/pkgin/repositories.conf
pkgin update

The base system can be kept up to date using sysupgrade, which can be installed via pkgin:

pkgin in sysupgrade

The following variable need to be set in /usr/pkg/etc/sysupgrade.conf:


Lastly, the device has two user controllable LEDs which can be toggled on and off using sysctl.

To switch both LEDs on:

sysctl -w hw.led.nanopi_green_pwr=1
sysctl -w hw.led.nanopi_blue_status=1

To switch off the power LED automatically at boot time:

echo "hw.led.nanopi_green_pwr=0" >> /etc/sysctl.conf

Here is a dmesg for reference purposes:

[     1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[     1.000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[     1.000000]     2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights reserved.
[     1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[     1.000000]     The Regents of the University of California.  All rights reserved.

[     1.000000] NetBSD 9.0_STABLE (GENERIC64) #0: Wed Aug  5 15:20:21 UTC 2020
[     1.000000]
[     1.000000] total memory = 497 MB
[     1.000000] avail memory = 479 MB
[     1.000000] timecounter: Timecounters tick every 10.000 msec
[     1.000000] armfdt0 (root)
[     1.000000] simplebus0 at armfdt0: FriendlyARM NanoPi NEO 2
[     1.000000] simplebus1 at simplebus0
[     1.000000] simplebus2 at simplebus0
[     1.000000] cpus0 at simplebus0
[     1.000000] simplebus3 at simplebus0
[     1.000000] psci0 at simplebus0: PSCI 1.1
[     1.000000] cpu0 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu0: package 0, core 0, smt 0
[     1.000000] cpu0: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.000000] cpu0: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.000000] cpu0: Dcache line 64, Icache line 64
[     1.000000] cpu0: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.000000] cpu0: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.000000] cpu0: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.000000] cpu0: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.000000] cpu0: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.000000] cpu1 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu1: package 0, core 1, smt 0
[     1.000000] cpu2 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu2: package 0, core 2, smt 0
[     1.000000] cpu3 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu3: package 0, core 3, smt 0
[     1.000000] gic0 at simplebus1: GIC
[     1.000000] armgic0 at gic0: Generic Interrupt Controller, 224 sources (215 valid)
[     1.000000] armgic0: 16 Priorities, 192 SPIs, 7 PPIs, 16 SGIs
[     1.000000] fclock0 at simplebus2: 24000000 Hz fixed clock (osc24M)
[     1.000000] sunxisramc0 at simplebus1: SRAM Controller
[     1.000000] fclock1 at simplebus2: 32768 Hz fixed clock (ext_osc32k)
[     1.000000] gtmr0 at simplebus0: Generic Timer
[     1.000000] gtmr0: interrupting on GIC irq 27
[     1.000000] armgtmr0 at gtmr0: Generic Timer (24000 kHz, virtual)
[     1.000000] timecounter: Timecounter "armgtmr0" frequency 24000000 Hz quality 500
[     1.000010] sun8ih3ccu0 at simplebus1: H3 CCU
[     1.000010] sun8ih3rccu0 at simplebus1: H3 PRCM CCU
[     1.000010] sunxide2ccu0 at simplebus1: DE2 CCU
[     1.000010] sunxigpio0 at simplebus1: PIO
[     1.000010] gpio0 at sunxigpio0: 94 pins
[     1.000010] sunxigpio0: interrupting on GIC irq 43
[     1.000010] sunxigpio1 at simplebus1: PIO
[     1.000010] gpio1 at sunxigpio1: 12 pins
[     1.000010] sunxigpio1: interrupting on GIC irq 77
[     1.000010] fregulator0 at simplebus0: vcc3v3
[     1.000010] fregulator1 at simplebus0: usb0-vbus
[     1.000010] fregulator2 at simplebus0: gmac-3v3
[     1.000010] sun6idma0 at simplebus1: DMA controller (12 channels)
[     1.000010] sun6idma0: interrupting on GIC irq 82
[     1.000010] com0 at simplebus1: ns16550a, working fifo
[     1.000010] com0: console
[     1.000010] com0: interrupting on GIC irq 32
[     1.000010] sunxiusbphy0 at simplebus1: USB PHY
[     1.000010] sunxihdmiphy0 at simplebus1: HDMI PHY
[     1.000010] sunximixer0 at simplebus1: Display Engine Mixer
[     1.000010] sunxilcdc0 at simplebus1: TCON1
[     1.000010] sunxilcdc0: interrupting on GIC irq 118
[     1.000010] sunxirtc0 at simplebus1: RTC
[     1.000010] emac0 at simplebus1: EMAC
[     1.000010] emac0: Ethernet address 02:01:f7:f9:2f:67
[     1.000010] emac0: interrupting on GIC irq 114
[     1.000010] rgephy0 at emac0 phy 7: RTL8211E 1000BASE-T media interface
[     1.000010] rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[     1.000010] h3codec0 at simplebus1: H3 Audio Codec (analog part)
[     1.000010] sunximmc0 at simplebus1: SD/MMC controller
[     1.000010] sunximmc0: interrupting on GIC irq 92
[     1.000010] motg0 at simplebus1: 'otg' mode not supported
[     1.000010] ehci0 at simplebus1: EHCI
[     1.000010] ehci0: interrupting on GIC irq 104
[     1.000010] ehci0: EHCI version 1.0
[     1.000010] ehci0: 1 companion controller, 1 port
[     1.000010] usb0 at ehci0: USB revision 2.0
[     1.000010] ohci0 at simplebus1: OHCI
[     1.000010] ohci0: interrupting on GIC irq 105
[     1.000010] ohci0: OHCI version 1.0
[     1.000010] usb1 at ohci0: USB revision 1.0
[     1.000010] ehci1 at simplebus1: EHCI
[     1.000010] ehci1: interrupting on GIC irq 110
[     1.000010] ehci1: EHCI version 1.0
[     1.000010] ehci1: 1 companion controller, 1 port
[     1.000010] usb2 at ehci1: USB revision 2.0
[     1.000010] ohci1 at simplebus1: OHCI
[     1.000010] ohci1: interrupting on GIC irq 111
[     1.000010] ohci1: OHCI version 1.0
[     1.000010] usb3 at ohci1: USB revision 1.0
[     1.000010] sunxiwdt0 at simplebus1: Watchdog
[     1.000010] sunxiwdt0: default watchdog period is 16 seconds
[     1.000010] /soc/gpu@1e80000 at simplebus1 not configured
[     1.000010] gpioleds0 at simplebus0: nanopi:green:pwr nanopi:blue:status
[     1.000010] /soc/timer@1c20c00 at simplebus1 not configured
[     1.000010] /soc/video-codec@1c0e000 at simplebus1 not configured
[     1.000010] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[     1.000010] cpu2: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.000010] cpu2: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.040229] cpu2: Dcache line 64, Icache line 64
[     1.040229] cpu2: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.050220] cpu2: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.060220] cpu2: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.070220] cpu2: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.070220] cpu2: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.090221] cpu1: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.090221] cpu1: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.100222] cpu1: Dcache line 64, Icache line 64
[     1.110221] cpu1: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.110221] cpu1: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.120222] cpu1: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.130222] cpu1: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.140223] cpu1: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.150222] cpu3: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.160223] cpu3: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.160223] cpu3: Dcache line 64, Icache line 64
[     1.170223] cpu3: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.180223] cpu3: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.180223] cpu3: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.190223] cpu3: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.200224] cpu3: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.210224] sdmmc0 at sunximmc0
[     1.240225] uhub0 at usb0: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[     1.240225] uhub0: 1 port with 1 removable, self powered
[     1.240225] uhub1 at usb2: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[     1.250226] uhub1: 1 port with 1 removable, self powered
[     1.250226] uhub2 at usb1: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[     1.260226] uhub2: 1 port with 1 removable, self powered
[     1.260226] uhub3 at usb3: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[     1.275641] uhub3: 1 port with 1 removable, self powered
[     1.275641] IPsec: Initialized Security Association Processing.
[     1.350228] sdmmc0: SD card status: 4-bit, C10, U1, A1
[     1.350228] ld0 at sdmmc0: <0x03:0x5344:SC64G:0x80:0x0cd9141d:0x122>
[     1.360690] ld0: 60906 MB, 7764 cyl, 255 head, 63 sec, 512 bytes/sect x 124735488 sectors
[     1.370228] ld0: 4-bit width, High-Speed/SDR25, 50.000 MHz
[     1.990242] boot device: ld0
[     1.990242] root on ld0a dumps on ld0b
[     2.000243] root file system type: ffs
[     2.010242] kern.module.path=/stand/evbarm/9.0/modules

Marc Brooker (mjb)

Surprising Economics of Load-Balanced Systems August 06, 2020 12:00 AM

Surprising Economics of Load-Balanced Systems

The M/M/c model may not behave like you expect.

I have a system with c servers, each of which can only handle a single concurrent request, and has no internal queuing. The servers sit behind a load balancer, which contains an infinite queue. An unlimited number of clients offer c * 0.8 requests per second to the load balancer on average. In other words, we increase the offered load linearly with c to keep the per-server load constant. Once a request arrives at a server, it takes one second to process, on average. How does the client-observed mean request time vary with c?

Option A is that the mean latency decreases quickly, asymptotically approaching one second as c increases (in other words, the time spent in queue approaches zero). Option B is constant. Option C is a linear improvement, and D is a linear degradation in latency. Which curve do you, intuitively, think that the latency will follow?

I asked my Twitter followers the same question, and got an interestingly mixed result:

Breaking down the problem a bit will help figure out which is the right answer. First, names. In the terminology of queue theory, this is an M/M/c queuing system: Poisson arrival process, exponentially distributed client service time, and c backend servers. In teletraffic engineering, it's Erlang's delay system (or, because terminology is fun, M/M/n). We can use a classic result of queuing theory to analyze this system: Erlang's C formula E2,n(A), which calculates the probability that an incoming customer request is enqueued (rather than handled immediately), based on the number of servers (n aka c), and the offered traffic A. For the details, see page 194 of the Teletraffic Engineering Handbook. Here's the basic shape of the curve (using our same parameters):

Follow the blue line up to half the saturation point, at 2.5 rps offered load, and see how the probability is around 13%. Now look at the purple line at half its saturation point, at 5 rps. Just 3.6%. So at half load the 5-server system is handling 87% of traffic without queuing, with double the load and double the servers, we handle 96.4% without queuing. Which means only 3.6% see any additional latency.

It turns out this improvement is, indeed, asymptotically approaching 1. The right answer to the Twitter poll is A.

Using the mean to measure latency is controversial (although perhaps it shouldn't be). To avoid that controversy, we need to know whether the percentiles get better at the same rate. Doing that in closed form is somewhat complicated, but this system is super simple, so we can plot them out using a Monte-Carlo simulation. The results look like this:

That's entirely good news. The median (p50) follows the mean line nicely, and the high percentiles (99th and 99.9th) have a similar shape. No hidden problems.

It's also good news for cloud and service economics. With larger c we get better latency at the same utilization, or better utilization for the same latency, all at the same per-server throughput. That's not good news only for giant services, because most of this goodness happens at relatively modest c. There are few problems related to scale and distributed systems that get easier as c increases. This is one of them.

There are some reasonable follow-up questions. Are the results robust to our arbitrary choice of 0.8? Yes, they are1. Are the M/M/c assumptions of Poisson arrivals and exponential service time reasonable for typical services? I'd say they are reasonable, albeit wrong. Exponential service time is especially wrong: realistic services tend to be something more like log-normal. It may not matter. More on that another time.

Update: Dan Ports responded to my thread with a fascinating Twitter thread pointing to Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency from SoCC'14 which looks at this effect in the wild.


  1. Up to a point. As soon as the mean arrival rate exceeds the system's ability to complete requests, the queue grows without bound and latency goes to infinity. In our case, that happens when the request load exceeds c. More generally, for this system to be stable λ/cμ must be less than 1, where λ is the mean arrival rate, and μ is the mean time taken for a server to process a request.

August 05, 2020

Andrew Owen (yumaikas)

Art Challenge: The Middle Grind August 05, 2020 10:50 PM

The story so far

Emily came across an art challenge on Pintrest, and suggested that we could both do each prompt for it.

An art challenge that lists out 30 days of art prompts

Her medium of preference is pencil and ink, and mine is pixel art. This, unlike the previous post, covers 16 entries, because I fell behind in blog posts.

It’s also longer, and definitely has represented both Emily and I getting ready for the art challenge to be done with

Day 9: Urban Legend


An ink sketch of a wendigo


A pixel picture of a weeping Mary statue

Day 10: Insect


A drawing of a iridescent beetle with a blue shell


A pixel art drawing of a dragonfly

Day 11: Something you ate today


A nice looking ink sketch of a bagel, with a pen and an eraser on the sketch book



Day 12: Your Spirit Animal


A detailed ink drawing of a bat


A pixel-art picture of a squirrel sitting on a porch (or jumping over a log)

Day 13: Song LyricsYour Happy Place


A picture of Emily wrapped in a blanket on a couch, with a lamp, tissue box, phone, and Nintendo Switch


A pixel art picture of my laptop, with Asperite open.

Day 14: Historical Figure


A ink picture of a corset, a


An attempt to make a pixel art photo of Ada Lovelace

Day 15: Guilty Pleasure


A sketch of a Yellow Nitendo Switch with Stardew Valley on the screen


An abstract grid of white, blue and brown grid squares, representative of a Scrabble Board

Day 16: Zodiac Sign


A picture of a Capricorn goat with horns and and a webbed mane


The Aquarius sign is imposed over a big yellow moon over a waves, with a small lighthouse in the background

Day 17: Favorite TV Show


A picture of a naked Homer Simpson, his butt facing the viewer.


A picture of

Day 18: Something with Wings


A picture of a bat with 3 jack-o-lanterns, which is nibbling on the largest jack-o-lantern


A picture of a bat

Day 19: Famous Landmark


One of the sections from stonehenge


A pixel-art picture of the pyramids of Giza

Day 20: Beverage


A drawing of a cup of water


A picture of cup of water

Day 21: Teeth


A picture of a Zombie Skull with prominent teeth


A pixel-art picture of an alligator skull

Day 22: Earth Day


A picture of the earth, with clouds, being held up by a pair of hands


A pixel-art picture of the earth

Day 23: Dessert


A cupcake with sprinkles


An ice cream cone on a metal stand with little chocolate chips

Day 24: Movie Prop


A drawing of the cat from Kiki's Deliver Service.


A pixel-drawing of Wilson the volley ball from Castaway

August 04, 2020

Pepijn de Vos (pepijndevos)

A Rust HAL for your LiteX FPGA SoC August 04, 2020 12:00 AM

ULX3S demo

FPGAs are amazing in their versatility, but can be a real chore when you have to map out a giant state machine just to talk to some chip over SPI. For such cases, nothing beats just downloading an Arduino library and quickly hacking some example code. Or would there be a way to combine the versatility of an FPGA with the ease of Arduino libraries? That is the question I want to explore in this post.

Of course you can use an f32c softcore on your FPGA as an Arduino, but that’s a precompiled core, and basically doesn’t give you the ability to use your FPGA powers. Or you can build your own SoC with custom HDL components, but then you’re back to bare-metal programming.

Unless you can tap into an existing library ecosystem by writing a hardware abstraction layer for your SoC. And that is exactly what I’ve done by writing a Rust embedded HAL crate that works for any LiteX SoC!

LiteX allows you to assemble a SoC by connecting various components to a common Wishbone bus. It supports various RISC-V CPU’s (and more), and has a library of useful components such as GPIO and SPI, but also USB and Ethernet. These all get memory-mapped and can be accessed via the Wishbone bus by the CPU and other components.

The amazing thing is that LiteX can generate an SVD file for the SoC, which contains all the registers of the components you added to the SoC. This means that you can use svd2rust to compile this SVD file into a peripheral access crate.

This PAC crate abstracts away memory addresses, and since the peripherals themselves are reusable components, it is possible to build a generic HAL crate on top of it that supports a certain LiteX peripheral in any SoC that uses it. Once the embedded HAL traits are implemented, you can use these LiteX peripherals with every existing Rust crate.

The first step is to install LiteX. Due to a linker bug in Rust 1.45, I used the 1.46 beta. I’m also installing into a virtualenv to keep my system clean. While we’re going to use Rust, gcc is still needed for compiling the LiteX BIOS and for some objcopy action.

#rustup default beta
virtualenv env
source env/bin/activate
chmod +x
./ init install
./ gcc
export PATH=$PATH:$(echo $PWD/riscv64-*/bin/)

Now we need to make some decisions about which FPGA board and CPU we’re going to use. I’m going to be using my ULX3S, but LiteX supports many FPGA boards out of the box, and others can of course be added. For the CPU we have to pay careful attention to match it with an architecture that Rust supports. For example Vexrisc supports the im feature set by default, which is not a supported Rust target, but it also supports an i and imac variant, both of which Rust supports. PicoRV32 only supports i or im, so can only be used in combination with the Rust i target.

So let’s go ahead and make one of those. I’m going with the Vexrisc imac variant, but on a small iCE40 you might want to try the PicoRV32 (or even Serv) to save some space. Of course substitute the correct FPGA and SDRAM module on your board.


cd litex-boards/litex_boards/targets
python --cpu-type vexriscv --cpu-variant imac --csr-data-width 32 --device LFE5U-85F --sdram-module AS4C32M16 --csr-svd ulx3s.svd --build --load
rustup target add riscv32imac-unknown-none-elf


python --cpu-type picorv32 --cpu-variant minimal --csr-data-width 32 --device LFE5U-85F --sdram-module AS4C32M16 --csr-svd ulx3s.svd --build --load
rustup target add riscv32i-unknown-none-elf

Most parameters should be obvious. The --csr-data-width 32 parameter sets the register width, which I’m told will be the default in the future, and saves a bunch of bit shifting later on. --csr-svd ulx3s.svd tells LiteX to generate an SVD file for your SoC. You can omit --build and --load and manually do these steps by going to the build/ulx3s/gateware/ folder and running I also prefer to use the awesome openFPGALoader rather than the funky ujprog with a sweet openFPGALoader --board ulx3s ulx3s.bit.

Now it is time to generate the PAC crate with svd2rust. This crate is completely unique to your SoC, so there is no point in sharing it. As long as the HAL crate can find it you’re good. Follow these instructions to create a Cargo.toml with the right dependencies. In my experience you may want to update the version numbers a bit. I had to use the latest riscv and riscv-rt to make stuff work, but keep the other versions to not break the PAC crate.

cargo new --lib litex-pac
cd litex-pac/src
svd2rust -i ulx3s.svd --target riscv
cd ..
vim Cargo.toml

Now we can use these instructions to create our first Rust app that uses the PAC crate. I pushed my finished example to this repo. First create the app as usual, and add dependencies. You can refer to the PAC crate as follows.

litex-pac = { path = "../litex-pac", features = ["rt"]}

Then you need to create a linker script that tells the Rust compiler where to put stuff. Luckily LiteX generated the important parts for us, and we only have to define the correct REGION_ALIAS expressions. Since we will be using the BIOS, all our code will get loaded in main_ram, so I set all my aliases to that. It is possible to load code in other regions, but my attempts to put the stack in SRAM failed horribly when the stack grew too large, so better start with something safe and then experiment.


Next, you need to actually tell the compiler about your architecture and linker scripts. This is done with the .cargo/config file. This should match the Rust target you installed, so be mindful if you are not using imac. Note the regions.ld file that LiteX generated, we’ll get to that in the next step.

rustflags = [
  "-C", "link-arg=-Tregions.ld",
  "-C", "link-arg=-Tmemory.x",
  "-C", "link-arg=-Tlink.x",

target = "riscv32imac-unknown-none-elf"

The final step before jumping in with the Rust programming is writing a file that copies the linker scripts to the correct location for the compiler to find them. I mostly used the example provided in the instructions, but added a section to copy the LiteX file. export BUILD_DIR to the location where you generated the LiteX SoC.

    let mut f = File::create(&dest_path.join("regions.ld"))
        .expect("Could not create file");
    f.write_all(include_bytes!(concat!(env!("BUILD_DIR"), "/software/include/generated/regions.ld")))
        .expect("Could not write file");

That’s it. Now the code you compile will actually get linked correctly. I found these iCEBreaker LiteX examples very useful to get started. This code will actually run with minimal adjustment on our SoC, and is a good start to get a feel for how the PAC crate works. Another helpful command is to run cargo doc --open in the PAC crate to see the generated documentation.

To actually upload the code, you have to convert the binary first.

cargo build --release
cd /target/riscv32imac-unknown-none-elf/release
riscv64-unknown-elf-objcopy litex-example -O binary litex-example.bin
litex_term --kernel litex-example.bin /dev/ttyUSB0

From here we “just” need to implement HAL traits on top of the PAC to be able to use almost any embedded library in the Rust ecosystem. However, one challenge is that the peripherals and their names are not exactly set in stone. The way that I solved it is that the HAL crate only exports macros that generate HAL trait implementations. This way your SoC can have 10 SPI cores and you just have to call the spi macro to generate a HAL for them. I uploaded the code in this repo.

Of course so far we’ve only used the default SoC defined for the ULX3S. The real proof is if we can add a peripheral, write a HAL layer for it, and then use an existing library with it. I decided to add an SPI peripheral for the OLED screen. First I added the following pin definition

    ("oled_spi", 0,
        Subsignal("clk",  Pins("P4")),
        Subsignal("mosi", Pins("P3")),
    ("oled_ctl", 0,
        Subsignal("dc",   Pins("P1")),
        Subsignal("resn", Pins