Planet Crustaceans

This is a Planet instance for community feeds. To add/update an entry or otherwise improve things, fork this repo.

September 20, 2020

Derek Jones (derek-jones)

Learning useful stuff from the Projects chapter of my book September 20, 2020 09:24 PM

What useful, practical things might professional software developers learn from the Projects chapter in my evidence-based software engineering book?

This week I checked the projects chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

There turned out to be around three to four times more data publicly available than I had first thought. This is good, but there is a trap for the unweary. For many topics there is one data set, and that one data set may not be representative. What is needed is a selection of data from various sources, all relating to a given topic.

Some data is better than no data, provided small data sets are treated with caution.

Estimation is a popular research topic: how long will a project take and how much will it cost.

After reading all the papers I learned that existing estimation models are even more unreliable than I had thought, and what is more, there are plenty of published benchmarks showing how unreliable the models really are (these papers never seem to get cited).

Models that include lines of code in the estimation process (i.e., the majority of models) need a good estimate of the likely number of lines in the final software system. One issue that nobody had considered was the impact of developer variability on the number of lines written to implement the same functionality, which turns out to be large. Oops.

Machine learning has infested effort estimation research. What the machine learning models actually do is estimate adjustment, i.e., they do not create their own estimate but adjust one passed in as input to the model. Most estimation data sets are tiny, and only contain a few different variables; unless the estimate is included in the training phase, the generated model produces laughable results. Oops.

The good news is that there appear to be lots of recurring patterns in the project data. This is good news because recurring patterns are something to be explained by a theory of software project development (apparent randomness is bad news, from the perspective of coming up with a model of what is going on). I think we are still a long way from having workable theories, but seeing patterns is a good sign that one or more theories will be possible.

I think that the main takeaway from this chapter is that software often has a short lifetime. People in industry probably have a vague feeling that this is true, from experience with short-lived projects. It is not cost effective to approach commercial software development from the perspective that the code will live a long time; some code does live a long time, but most dies young. I see the implications of this reality being a major source of contention with those in academia who have spent too long babbling away in front of teenagers (teaching the creation of idealized software that lives on forever), and little or no time building software systems.

A lot of software is written by teams of people, however, there is not a lot of data available on teams (software or otherwise). Given the difficulty of hiring developers, companies have to make do with what they have, so a theory of software teams might not be that useful in practice.

Readers might have a completely different learning experience from reading the projects chapter. What useful things did you learn from the projects chapter?

September 19, 2020

Gustaf Erikson (gerikson)

Re-reading Dune and Heretics of Dune September 19, 2020 07:58 PM

I’ve re-read Frank Herbert’s 1965 novel Dune, partly inspired by the upcoming movie.

Based on my memories I first read it in 1988 or so. The first novel in the series I read was actually Heretics of Dune (published in 1984) which I borrowed from the library in Halmstad. This must have been in 1986 or ‘87. I’ve long realized that it’s not a huge deal to read some novel series out of order - especially ones that are so self-contained as the Dune novels. Heretics takes place 5,000 years after Dune, after all.

Anyway, if you’re only going to read one Dune novel, the first one is the best. It has all the goodies - the worldbuilding, the Hero’s Journey, the tight plotting and good use of language. Even the 1960s elements have aged well - while standards like telepathy are there they’re only mentioned in passing, and the central idea of prescience is part of the plot and well handled there.

I wonder what the movie will do with the implicit connection of the Fremen with modern-day inhabitants of the Middle East. While using terms like jihad was merely a frisson in the original, they take on a darker tone in today’s climate - at least among the less enlightened. I suspect the projected 2-parter will not emphasize the jihad Paul foresees throughout the novel and instead focus on the thrilling twists and turns.

After Dune I decided to re-read Heretics. There’s almost 20 years between the novels, and it’s clear that Herbert has picked up a lot of contemporary SF tropes in the meantime. The tech in Dune is almost indistinguishable from magic - devices such as suspensors and personal shields were never explained, instead added to impart flavor - and to enforce the quasi-medieval setting of the universe.

Heretics is much more explicit in its descriptions of space travel, weapons and other technology, but not in a way that feels dated. However, the novel is marred by long stretches of interior dialogue, where the protagonists muse about religion, history, and fate in excruciating detail. While I admire Herbert for bringing in female protagonists (in the form of the Bene Gesserit sisterhood), they’re really not that interesting as characters.

I consider Dune a bona-fide SF classic and anyone interested in the genre should read it. But don’t feel pressured to read more from Herbert’s universe.

September 18, 2020

Gonçalo Valério (dethos)

Django Friday Tips: Inspecting ORM queries September 18, 2020 07:01 PM

Today lets look at the tools Django provides out of the box to debug the queries made to the database using the ORM.

This isn’t an uncommon task. Almost everyone who works on a non-trivial Django application faces situations where the ORM does not return the correct data or a particular operation as taking too long.

The best way to understand what is happening behind the scenes when you build database queries using your defined models, managers and querysets, is to look at the resulting SQL.

The standard way of doing this is to set the logging configuration to print all queries done by the ORM to the console. This way when you browse your website you can check them in real time. Here is an example config:

    'handlers': {
        'console': {
            'level': 'DEBUG',
            'filters': ['require_debug_true'],
            'class': 'logging.StreamHandler',
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console', ],

The result will be something like this:

web_1     | (0.001) SELECT MAX("axes_accessattempt"."failures_since_start") AS "failures_since_start__max" FROM "axes_accessattempt" WHERE ("axes_accessattempt"."ip_address" = ''::inet AND "axes_accessattempt"."attempt_time" >= '2020-09-18T17:43:19.844650+00:00'::timestamptz); args=(Inet(''), datetime.datetime(2020, 9, 18, 17, 43, 19, 844650, tzinfo=<UTC>))
web_1     | (0.001) SELECT MAX("axes_accessattempt"."failures_since_start") AS "failures_since_start__max" FROM "axes_accessattempt" WHERE ("axes_accessattempt"."ip_address" = ''::inet AND "axes_accessattempt"."attempt_time" >= '2020-09-18T17:43:19.844650+00:00'::timestamptz); args=(Inet(''), datetime.datetime(2020, 9, 18, 17, 43, 19, 844650, tzinfo=<UTC>))
web_1     | Bad Request: /users/login/
web_1     | [18/Sep/2020 18:43:20] "POST /users/login/ HTTP/1.1" 400 2687

Note: The console output will get a bit noisy

Now lets suppose this logging config is turned off by default (for example, in a staging server). You are manually debugging your app using the Django shell and doing some queries to inspect the resulting data. In this case str(queryset.query) is very helpful to check if the query you have built is the one you intended to. Here’s an example:

>>> box_qs = Box.objects.filter(
>>> str(box_qs.query)
'SELECT "boxes_box"."id", "boxes_box"."name", "boxes_box"."description", "boxes_box"."uuid", "boxes_box"."owner_id", "boxes_box"."created_at", "boxes_box"."updated_at", "boxes_box"."expires_at", "boxes_box"."status", "boxes_box"."max_messages", "boxes_box"."last_sent_at" FROM "boxes_box" WHERE ("boxes_box"."expires_at" > 2020-09-18 18:06:25.535802+00:00 AND NOT ("boxes_box"."owner_id" = 10))'

If the problem is related to performance, you can check the query plan to see if it hits the right indexes using the .explain() method, like you would normally do in SQL.

>>> print(box_qs.explain(verbose=True))
Seq Scan on public.boxes_box  (cost=0.00..13.00 rows=66 width=370)
  Output: id, name, description, uuid, owner_id, created_at, updated_at, expires_at, status, max_messages, last_sent_at
  Filter: ((boxes_box.expires_at > '2020-09-18 18:06:25.535802+00'::timestamp with time zone) AND (boxes_box.owner_id <> 10))

This is it, I hope you find it useful.

September 17, 2020

Gustaf Erikson (gerikson)

Six months since WFH began September 17, 2020 02:57 PM

September 16, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Primitive binary functions September 16, 2020 05:00 AM


Welcome back to the “Compiling a Lisp” series. Last time, we added some primitive unary instructions like add1 and integer->char. This time, we’re going to add some primitive binary functions like + and <. After this post, we’ll be able to compile programs like:

(< (+ 1 2) (- 4 3))

Note that these expressions may look like function calls but, like last chapter, they are not opening new stack frames (which I’ll explain more about later). Instead, the compiler will recognize that the programmer is directly applying the symbol + and generate special code. You can think about this kind of like an inlined function call.

It’s important to remember that the compiler has a certain internal contract: the result of any given compiled expression is stored in rax. This isn’t some intrinsic property of all compilers, but it’s one we’ve kept so far in this series.

This is similar to but not the same as the calling convention that I mentioned earlier, where function results are stored in rax. That calling convention is for interacting with other people’s code. Within your own generated code, there are no rules. So we could pick any other register, really, for storing intermediate results.

Now that we’re building primitive functions that can take two arguments, you might notice a problem: our strategy of storing the result in rax won’t work on its own. If we were to naïvely write something like the following to implement +, then rax would get overwritten in the code generated by compiling operand1(args):

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "+")) {
      _(Compile_expr(buf, operand2(args)));
      // The result of this is stored in rax ^
      _(Compile_expr(buf, operand1(args)));
      // Oops, we just overwrote rax ^
      Emit_add_something(buf, /*dst=*/kRax));
      return 0;
    // ...
  // ...

We could try and work around this by adding some kind of register allocation algorithm and take advantage of rcx, rdx, etc. Or, simpler, we could decide to allocate all intermediate values on the stack and move on with our lives. I prefer the latter. It’s simpler.

Stack background info

Since we can’t yet save our compiled programs to disk, there’s some amount of setup that has to happen before they’re run. Right now, the C programs I’m providing along with this series compile to binaries that just run the test suites for the compiler. They don’t actually run full programs. For this reason, there are already some call frames on the stack by the time our generated code is run.

Let’s take a look at the stack at the moment we enter a compiled Lisp program:

|                  | High addresses
|  main            |
+------------------+ |
|  ~ some data ~   | |
|  ~ some data ~   | |
+------------------+ |
|  compile_test    | |
+------------------+ |
|  ~ some data ~   | |
|  ~ some data ~   | v
|  Testing_exe...  | rsp (stack pointer)
|                  | <-- Our frame!
|                  | Low addresses

In this diagram, we have the C program’s main function, which has its own local variables and so on. Then the main function calls the compile_test unit suite. This in turn calls this Testing_execute_expr function (abbreviated in the diagram), which is responsible for calling into our generated code. Every call stores the return address (some place to find the next instruction to execute) on the stack and adjusts rsp down.

Refresher: the call stack grows down. Why? Check out this StackOverflow answer that quotes an architect on the Intel 4004 and 8080 architectures. It’s stayed the same ever since.

In this diagram, we have rsp pointing at a return address somewhere inside the function Testing_execute_expr, since that’s what called our Lisp entrypoint. We have some data “above” (higher addresses) rsp that we’re not allowed to poke at, and we have this empty space “below” (lower addresses) rsp that is in our current stack frame. I say “empty” because we haven’t yet stored anything there, not because it’s necessarily zero-ed out. I don’t think there are any guarantees about the values in this stack frame.

We can use our stack frame to write and read values for our current Lisp program. With every recursive subexpression, we can allocate a little more stack space to keep track of the values. When I say “allocate”, I mean “subtract from the stack pointer”, because the stack is already a contiguous space in memory allocated for us. For example, here is how we can write to the stack:

mov [rsp-8], 0x4

This puts the integer 4 at displacement -8 from rsp. On the stack diagram above, it would be at the slot labeled “Our frame”. It’s also possible to read with a positive or zero displacement, but those point to previous stack frames and the return address, respectively. So let’s avoid manipulating those.

Note that I used a multiple of 8. Not every store has to be a to an address that is a multiple of 8, but it is natural and I think also faster to store 8-byte-sized things at aligned addresses.

Let’s walk through a real example to get more hands-on experience with this stack storage idea. We’ll use the program (+ 1 2). The compiled version of that program should:

  • Move compile(2) to rax
  • Move rax into [rsp-8]
  • Move compile(1) to rax
  • Add [rsp-8] to rax

So after compiling that, the stack will look like this:

|                  | High addresses
|  Testing_exe...  | RSP
|  0x8             | RSP-8 (result of compile(2))
|                  | Low addresses

And the result will be in rax, per our internal compiler contract.

This is all well and good, but at some point we’ll need our compiled programs to emit the push instruction or make function calls of their own. Both of these modify the stack pointer. push writes to the stack and decrements rsp. call is roughly equivalent to push followed by jmp.

For that reason, x86-64 comes with another register called rbp and it’s designed to hold the Base Pointer. While the stack pointer is supposed to track the “top” (low address) of the stack, the base pointer is meant to keep a pointer around to the “bottom” (high address) of our current stack frame.

This is why in a lot of compiled code you see the following instructions repeated1:

push rbp
mov rbp, rsp
sub rsp, N  ; optional; allocate stack space for locals
; ... function body ...
mov rsp, rbp  ; required if you subtracted from rsp above
pop rbp

The first three instructions, called the prologue, save rbp to the stack, and then set rbp to the current stack pointer. Then it’s possible to maintain steady references to variable locations on the stack even as rsp changes. Yes, the compiler could adjust its internal table of references every time the compiler emits code that modifies rsp, but that sounds much harder.

The last three instructions, called the epilogue, fetch the old rbp that we saved to the stack, write it back into rbp, then exit the call.

To confirm this for yourself, check out this sample compiled C code. Look at the disassembly following the label square. Prologue, code, epilogue.

Stack allocation infrastructure

Until now, we haven’t needed to keep track of much as we recursively traverse expression trees. Now, in order to keep track of how much space on the stack any given compiled code will need, we have to add more state to our compiler. We’ll call this state the stack_index — Ghuloum calls it si — and we’ll pass it around as a parameter. Whatever it’s called, it points to the first writable (unused) index in the stack at any given point.

In compiled functions, the first writable index is -kWordSize (-8), since the base pointers is already at 0.

int Compile_function(Buffer *buf, ASTNode *node) {
  Buffer_write_arr(buf, kFunctionPrologue, sizeof kFunctionPrologue);
  _(Compile_expr(buf, node, -kWordSize));
  Buffer_write_arr(buf, kFunctionEpilogue, sizeof kFunctionEpilogue);
  return 0;

I’ve also gone ahead and added the prologue and epilogue. They’re stored in static arrays. This makes them easier to modify, and also makes them accessible to testing helpers. The testing helpers can use these arrays to make testing easier for us — we can check if our expected code is book-ended by this code.

static const byte kFunctionPrologue[] = {
    // push rbp
    // mov rbp, rsp
    kRexPrefix, 0x89, 0xe5,

static const byte kFunctionEpilogue[] = {
    // pop rbp
    // ret

For Compile_expr, we just pass this new stack index through.

int Compile_expr(Buffer *buf, ASTNode *node, word stack_index) {
  // ...
  if (AST_is_pair(node)) {
    return Compile_call(buf, AST_pair_car(node), AST_pair_cdr(node),
  // ...

And for Compile_call, we actually get to use it. Let’s look back at our stack storage strategy for compiling (+ 1 2) (now replacing rsp with rbp):

  • Move compile(2) to rax
  • Move rax into [rbp-8]
  • Move compile(1) to rax
  • Add [rbp-8] to rax

For binary functions, this can be generalized to:

  • Compile arg2 (stored in rax)
  • Move rax to stack_index
  • Compile arg1 (stored in rax)
  • Do something with the results (in [rbp-stack_index] and rax)

The key is this: for the first recursive call to Compile_expr, the compiler is allowed to emit code that can use the current stack_index and anything below that on the stack. For the second recursive call to Compile_expr, the compiler has to bump stack_index, since we’ve stored the result of the first compiled call at stack_index.

Take a look at our implementation of binary add:

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "+")) {
      _(Compile_expr(buf, operand2(args), stack_index));
      Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
      _(Compile_expr(buf, operand1(args), stack_index - kWordSize));
      Emit_add_reg_indirect(buf, /*dst=*/kRax, /*src=*/Ind(kRbp, stack_index));
      return 0;
    // ...
  // ...

In this snippet, Ind stands for “indirect”, and is a constructor for a struct. This an easy and readable way to represent (register, displacement) pairs for use in reading from and writing to memory. We’ll cover this more detail in the instruction encoding.

To prove to ourselves that this approach works, we’ll add some tests later.

Other binary functions

Subtraction, multiplication, and division are much the same as addition. We’re also going to completely ignore overflow, underflow, etc.

Equality is different in that it does some comparisons after the fact (see Primitive unary functions). To check if two values are equal, we compare their pointers:

    if (AST_symbol_matches(callable, "=")) {
      _(Compile_expr(buf, operand2(args), stack_index));
      Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
      _(Compile_expr(buf, operand1(args), stack_index - kWordSize));
      Emit_cmp_reg_indirect(buf, kRax, Ind(kRbp, stack_index));
      Emit_mov_reg_imm32(buf, kRax, 0);
      Emit_setcc_imm8(buf, kEqual, kAl);
      Emit_shl_reg_imm8(buf, kRax, kBoolShift);
      Emit_or_reg_imm8(buf, kRax, kBoolTag);
      return 0;

It uses a new comparison opcode that compares a register with some memory. This is why we can’t use the Compile_compare_imm32 helper function.

The less-than operator (<) is very similar to equality, but instead we use setcc with the kLess flag instead of the kEqual flag.

New opcodes

We used some new opcodes today, so let’s take a look at the implementations. First, here is the indirection implementation I mentioned earlier:

typedef struct Indirect {
  Register reg;
  int8_t disp;
} Indirect;

Indirect Ind(Register reg, int8_t disp) {
  return (Indirect){.reg = reg, .disp = disp};

I would have used the same name in the struct and the constructor but unfortunately that’s not allowed.

Here’s an implementation of an opcode that uses this Indirect type. This emits code for instructions of the form mov [reg+disp], src.

uint8_t disp8(int8_t disp) { return disp >= 0 ? disp : 0x100 + disp; }

void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0x89);
  Buffer_write8(buf, 0x40 + src * 8 + dst.reg);
  Buffer_write8(buf, disp8(dst.disp));

The disp8 function is a helper that encodes negative numbers.

The opcodes for add, sub, and cmp are similar enough to this one, except src and dst are swapped. mul is a little funky because it doesn’t take two parameters. It assumes that one of the operands is always in rax.


As usual, we’ll close with some snippets of tests.

Here’s a test for +. I’m trying to see if inlining the text assembly with the hex makes it more readable. Thanks Kartik for the suggestion.

TEST compile_binary_plus(Buffer *buf) {
  ASTNode *node = new_binary_call("+", AST_new_integer(5), AST_new_integer(8));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  byte expected[] = {
      // 0:  48 c7 c0 20 00 00 00    mov    rax,0x20
      0x48, 0xc7, 0xc0, 0x20, 0x00, 0x00, 0x00,
      // 7:  48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
      0x48, 0x89, 0x45, 0xf8,
      // b:  48 c7 c0 14 00 00 00    mov    rax,0x14
      0x48, 0xc7, 0xc0, 0x14, 0x00, 0x00, 0x00,
      // 12: 48 03 45 f8             add    rax,QWORD PTR [rbp-0x8]
      0x48, 0x03, 0x45, 0xf8};
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_integer(13));

Here’s a test for <.

TEST compile_binary_lt_with_left_greater_than_right_returns_false(Buffer *buf)
  ASTNode *node = new_binary_call("<", AST_new_integer(6), AST_new_integer(5));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_false(), result, "0x%lx");

There are more tests in the implementation, as usual. Take a look if you like.

This has been a more complicated post than the previous ones, I think. The stack allocation may not make sense immediately. It might take some time to sink in. Try writing some of the code yourself and see if that helps.

Next time we’ll add the ability to bind variables using let. See you then!

  1. You may also see an enter instruction paired with a leave instruction. These are equivalent. Read more here

September 15, 2020

Kevin Burke (kb)

How to Get a Human Operator on the California EDD Paid Family Leave line September 15, 2020 08:23 PM

The California EDD Paid Family Leave phone tree is like a choose your own adventure book, where almost every option leaves you with no option to contact a human. This can be frustrating. But you can reach a human if you know the right buttons to press!

Here is how to reach a human:

  • Call the EDD Paid Family Leave number at 877-238-4373.

  • Press '1' for "benefit information."

  • Follow the prompts to enter your SSN, zip code, date of birth, and weekly benefit amount.

  • The computer will read you an automated list of information about your claim.

The computer will then read a list of prompts. Wait!! After the computer asks if you want to go back to the main menu it will say "press 0 to speak to a human." Press 0 and then wait and you should get a human!

Unrelenting Technology (myfreeweb)

Burstable Graviton2 inst... September 15, 2020 08:23 PM

Burstable Graviton2 instances are now a thing. Cool! Changed the instance type for this website from a1.medium to t4g.micro so that Jeff Bezos gets less of my money :P (Basically no money until the end of this year, even — there’s a free trial for t4g.micro for all AWS accounts!)

September 14, 2020

Ponylang (SeanTAllen)

Last Week in Pony - September 13, 2020 September 14, 2020 12:20 AM

A Pony talk given by Sophia Drossopoulou is now available on InfoQ.

September 13, 2020

Derek Jones (derek-jones)

Learning useful stuff from the Reliability chapter of my book September 13, 2020 09:38 PM

What useful, practical things might professional software developers learn from my evidence-based software engineering book?

Once the book is officially released I need to have good answers to this question (saying: “Well, I decided to collect all the publicly available software engineering data and say something about it”, is not going to motivate people to read the book).

This week I checked the reliability chapter; what useful things did I learn (combined with everything I learned during all the other weeks spent working on this chapter)?

A casual reader skimming the chapter would conclude that little was known about software reliability, and they would be right (I already knew this, but I learned that we know even less than I thought was known), and many researchers continue to dig in unproductive holes.

A reader with some familiarity with reliability research would be surprised to see that some ‘major’ topics are not discussed.

The train wreck that is machine learning has been avoided (not forgetting that the data used is mostly worthless), mutation testing gets mentioned because of some interesting data (the underlying problem is that mutation testing assumes that coding mistakes are local to one line, but in practice coding mistakes often involve multiple lines), and the theory discussions don’t mention non-homogeneous Poisson process as the basis for software fault models (because this process is not capable of solving the questions asked).

What did I learn? My highlights include:

  • Anne Choa‘s work on population estimation. The takeaway from this work is that if people want to estimate the number of remaining fault experiences, based on previous experienced faults, then every occurrence (i.e., not just the first) of a fault needs to be counted,
  • Phyllis Nagel and Janet Dunham’s top read work on software testing,
  • the variability in the numeric percentage that people assign to probability terms (e.g., almost all, likely, unlikely) is much wider than I would have thought,
  • the impact of the distribution of input values on fault experiences may be detectable,
  • really a lowlight, but there is a lot less publicly available data than I had expected (for the other chapters there was more data than I had expected).

The last decade has seen fuzzing grow to dominate the headlines around software reliability and testing, and provide data for people who write evidence-based books. I don’t have much of a feel for how widely used it is in industry, but it is a very useful tool for reliability researchers.

Readers might have a completely different learning experience from reading the reliability chapter. What useful things did you learn from the reliability chapter?

Patrick Louis (venam)

Did You Know Fonts Could Do All This? September 13, 2020 09:00 PM

Confusing Mexican Calendar, at least for those not in the know

Freetype, included in the font stack on Unix, is quite complex. There are so many layers to get it to do what it does that it’s easy to get lost. From finding the font, to actually rendering it, and everything in between.
Like most of the world, I use a rather low screens definition (1366x768 with 96 dpi) and rather old-ish laptop, unlike some font designers that live in a filter bubble where everyone has the latest macbook. Thus, good and legible font rendering is important.
Let’s play with lesser known toggles available to us when it comes to font rendering and see what they do, let’s have fun and explore possibilities.

A General Picture

Generally, to make a font look better on screens, which are arrays of pixels, we use a combination of these three:

  • Antialiasing: Applying a light shade around the glyph. It is useful at small scale, when you don’t have enough pixels, but it makes most glyphs look bolder.

Font anti-alias example

  • Subpixel rendering: A technique similar to antialias but using subpixels, the color components inside the pixels. By applying a small amount of colors on the sides you can reach more granular precision. However, if applied clumsily, or if you simply move the window containing the text, these colored subpixels will become apparent, what we call fringe.

Font sub-pixel rendering example

  • Hinting: Pixels are blocks but text is made of curves, that means these curves will never match exactly with screen pixels. Hinting is about repositioning or selecting the closest pixels while trying as much as possible to keep the shape of the glyph intact. There are multiple levels of hinting, hinting information provided by the font itself (bytecode interpreter hinting), and hinting provided by the rendering library (auto-hinting).

Font hinting example

NB: “It’s just text”… This article is yet another that shows how fonts aren’t as easy as they look. For more info about the font stack, please visit my previous article on the topic, and if you want an idea of what it means to draw them on the screen take a look a this article.

What is applied, when, how to control all of this, can we see what they do, and should we even care?

Freetype and fontconfig default rendering these days is pretty good, so there shouldn’t be anything to worry about; Until there’s something to worry about, like a font not looking the way you want.
Our first stop will be something that intrigued me because I haven’t heard many talk about it: the Freetype driver’s properties.
The Freetype driver is used whenever hinting is needed, so this is the part it actually changes — how hinting is applied.

Getting The Right Tools For The Task

Let’s start with arming ourselves with ways to easily test all this.
Freetype2 demos utilities are a must, you can clone them here or fetch them from your package repositiory, for example Debian and Arch Linux.
These will give you a bunch of useful tools such as ftdiff, ftview, ftstring ftgrid, fttimer, ftbench, and others. The most important ones for us are ftdiff and ftgrid.

Example usage:

ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf
ftgrid -r 96 -f 20 10 ~/.local/share/fonts/times.ttf
ftstring -r 96 -m 'Hello World!' 10 ~/.local/share/fonts/times.ttf

Additionally, you can install pango-view from pango-tools to later test if fontconfig applies your configurations properly. It can be used by preparing a file written in pango markup and displaying it using pango-view --markup file.pangpang.
You can set the fontconfig debug level higher to see which font is actually loaded by setting the FC_DEBUG to something like 4096, FC_DEBUG=4096.

More values can be found here, we’ll use them later to see if our fontconfig settings are applied properly:

Name         Value    Meaning
MATCH            1    Brief information about font matching
MATCHV           2    Extensive font matching information
EDIT             4    Monitor match/test/edit execution
FONTSET          8    Track loading of font information at startup
CACHE           16    Watch cache files being written
CACHEV          32    Extensive cache file writing information
PARSE           64    (no longer in use)
SCAN           128    Watch font files being scanned to build caches
SCANV          256    Verbose font file scanning information
MEMORY         512    Monitor fontconfig memory usage
CONFIG        1024    Monitor which config files are loaded
LANGSET       2048    Dump char sets used to construct lang values
MATCH2        4096    Display font-matching transformation in patterns

Yet another way is to test directly in your browser URL bar:

data:text/html,<meta charset="utf8"><p style="font-family: Times New Roman;">Hello World</p>

The Freetype2 Drivers Properties

So let’s get back to our testing of Freetype2 drivers.
On this documentation page, ft (freetype) properties are listed and are said to affect the behavior of the drivers, each touching a different one. They are set by modifying the FREETYPE_PROPERTIES environment variable, normally loaded from /etc/profile.d/
However, most of the ones listed are targeted at the CFF, Type 1, and CID fonts driver and not at TrueType fonts, so they do nothing if you don’t have these font types. The only toggle available for TrueType is the interpreter-version which controls the bytecode interpreter, the rasterizer, and thus how the outline gets hinted.

The options available to us are the following:

  • 35 — For classic mode GDI (Win 98/2000)
  • 38 — GDI+ old (Vista, Win 7), Infinality, considered slow
  • 40 — For minimal mode (stripped down Infinality, this is the default) (After Win 7)

Kind of weird that we jump from 35 to 38, where did 36 and the rest go? The answer is that it’s a choice from the Freetype devs to only include those and not the ones in between.

And the differences look as follows, notice the native hinter in the left column:

  • v35
FREETYPE_PROPERTIES="truetype:interpreter-version=35" ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf

ftdiff interpreter v35

FREETYPE_PROPERTIES="truetype:interpreter-version=35" ftgrid -r 96 -f 36 10 ~/.local/share/fonts/times.ttf

ftgrid interpreter v35

  • v38
FREETYPE_PROPERTIES="truetype:interpreter-version=38" ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf

ftdiff interpreter v38

FREETYPE_PROPERTIES="truetype:interpreter-version=38" ftgrid -r 96 -f 36 10 ~/.local/share/fonts/times.ttf

ftgrid interpreter v38

  • v40
FREETYPE_PROPERTIES="truetype:interpreter-version=40" ftdiff -r 96 -s 10 ~/.local/share/fonts/times.ttf

ftdiff interpreter v40

FREETYPE_PROPERTIES="truetype:interpreter-version=40" ftgrid -r 96 -f 36 10 ~/.local/share/fonts/times.ttf

ftgrid interpreter v40

We can also test using pango-view (remember again that this should be a font that has native hinting enabled but not the auto-hinter):

<span font_family="Times New Roman" font="10" foreground="black" alpha="83%">
Lorem ipsum dolor sit amet, c
onsectetur adipiscing elit, s
ed do eiusmod tempor incididu
nt ut labore et dolore magna 
aliqua. Ut enim ad minim venia
m, quis nostrud exercitation u
llamco laboris nisi ut aliquip
ex ea commodo consequat. Duis 
aute irure dolor in reprehende
rit in voluptate velit esse ci
llum dolore eu fugiat nulla pa
riatur. Excepteur sint occaeca
t cupidatat non proident, sunt
in culpa qui officia deserunt 
mollit anim id est laborum.

You can also change the font via the --font= argument of pango-view.

FREETYPE_PROPERTIES="truetype:interpreter-version=35" pango-view --markup text.bangarang
  • v35

pango interpreter v35

  • v38

pango interpreter v38

  • v40

pango interpreter v40

So definitely, older interpreter versions were rougher with hinting, much bolder, and could deform the glyphs. The newer ones are more minimal with it. We also notice that the auto-hinter isn’t that bad and that avoiding hinting can help. I took the specific case of the Windows font ‘Times New Roman’ because it has the reputation of rendering badly with Freetype, mostly because of the job the interpreter does. Applying very light or no hinting at all helps tremendously, even at very small point size as you can see in the next comparison. The hinting does indeed help legibility at this scale but the font shape and personality is completely destroyed.

From left to right: v35, v38, v40.

pang interpreter small point comparison

How Fontconfig Works

We’re not done with hinting yet, there can be many levels of hinting that can be applied, but let’s first take a detour and learn a bit about fontconfig and how to use it.

Fontconfig is the layer in the font stack responsible for loading the font along with the configurations that tell the next layer how to find the font file and what changes to apply when rendering it. It is usually composed of a library, a preset of configuration files, and a bunch of helpful tools all starting with the prefix fc- such as: fc-cache, fc-query, fc-match, and fc-conflist, to name a few.

The configuration files are usually found in /etc/fonts/ and split into the presets available /etc/fonts/conf.avail, and the chosen presets in /etc/fonts/conf.d, which are symbolic links to the former.
The precedence of the rules is alphanumerical, a first-come first-served principle, thus 01-custom-rule.conf will be loaded before 99-not-important-rule.conf. Local user configurations, in the user’s $XDG_CONFIG_HOME/fontconfig directory, are loaded from one of these configurations that contains an include statement. On my machine it is the 50-user.conf, so it’s precedence is lower than anything loaded before it. This isn’t practical when testing rules so rename this file to something like 01-user.conf. Now anything you put in $XDG_CONFIG_HOME/fontconfig/conf.d or $XDG_CONFIG_HOME/fontconfig/fonts.conf should have priority.
You can make sure the order and configurations are loaded properly by using the fc-conflist command. It lists in order of precedence the configurations found, the ones starting with a + are loaded, the ones with - are not.

These files are composed of mainly 4 components:

  • Match rules: If something matches, then edit the properties mentioned. There are ton of matching and editing rules, even including stuff like the program name that is currently trying to load the fonts and custom ones. You can also match at different times: when looking for a pattern/font, after finding the font, when scanning the font.
  • Aliases creation: An alias is a font name shorthand, it’s useful when querying generic family names such as “monospace”.
  • Inclusion of other configurations: There can be so many configuration files that it’s good practice to split them.
  • Where to look for settings and fonts, and if some fonts should be skipped entirely (like if they aren’t scalable — bitmap): You may think that the location of fonts is a constant value, but it’s not. For example, on my machine it’s set in /etc/fonts/fonts.conf as:
<!-- Font directory list -->
<dir prefix="xdg">fonts</dir>
<!-- the following element will be removed in the future -->

Editing XML files is cumbersome, unfortunately today there aren’t many GUIs or simpler tools to set these. I’ve found a single one to date that is named fontweak but that isn’t complete.
It’s a shame because it’s rare to find people that have a clue about how to actually set font configuration nicely.

If you want more info, you can consult man 5 fonts-conf. It’s heavy content and can be confusing content, but still great content.

NB: Fontconfig is not enough to configure every graphical program, some programs load font settings in a simpler way through Xresources, the RESOURCE_MANAGER of X.

Testing Different Hinting

Let’s close this parenthesis and get back to hinting.
Fontconfig has 4 settings related to it, of which one is a matching criterion and the other three are edit rules. They are the following.

  • fonthashint: Matching test to check if the font has built-in hints, namely bytecode interpreter hinting.
  • hinting: If set to true, it tells the next phase, the rasterizer, that hinting in general will be applied.
  • autohint: Use the autohinter instead of the normal hinter. This will skip entirely the bytecode interpreter.
  • hintstyle: The harshness of the hinting that will be applied. It could either be hintnone, hintslight, hintmedium, or hintfull. It needs to be mentioned that these will use a mix of the autohinter and bytecode interpreter if the font has hints. For example, hintslight will snap on the vertical grid only but hintmedium and hintfull will snap harder on the horizontal grid too.

Practically, what does it mean? Let’s show what a font looks like with a combination of these hinting configurations.
Remember that if you’re having issues applying these configurations in your user fontconfig file that you can set the FC_DEBUG environment variable we mentioned before. Always be sure everything loads properly by checking fc-conflist and the currently applied match rules via fc-match --verbose YourFontSearchHere

Let’s test hinting enabled, autohint enabled, and full on grid snapping.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting autohint+hintfull

What about disabling autohint and full on grid snapping.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting no-autohint+hintfull

Not so pretty, maybe just snapping vertically is better, let’s try no-autohinter and a slight hinting.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting no-autohint+hintslight

Better but it still looks too bold. Let’s try again the autohinter but with a softer hinting now.

<edit mode="assign" name="hinting">
<edit name="autohint" mode="assign">
<edit mode="assign" name="hintstyle">

Test Hinting autohint+hintslight

It looks very similar to the full hinting, let’s test without hinting at all.

<edit mode="assign" name="hinting">

Test Hinting disabled

It seems like the auto-hinter is doing a good job at aligning the letters vertically in a subtle way. When zoomed in, you can clearly see how the letters seem a bit more compressed with the auto-hinter turned on.

Test Hinting vs No-Hinting

Overall, for the specific font I tested, “Times New Roman”, no hinting at all or slight auto-hinting are the best on my display.

Subpixel Rendering

Let’s move to subpixel rendering.
Fontconfig offers some preset to how harshly the subpixel rendering is done. lcddefault is color-balanced and normalized, lcdlegacy is neither normalized nor color-balanced, it uses any sub-pixels it can find, lcdlight is similar to lcddefault but applies a lighter hint to the surrounding pixels, and lcdnone disables it.
Additionally, there’s also ways to enable Microsoft’s Cleartype subpixel rendering by recompiling Freetype (disabled by default because of patent), and ways to tweak the subpixel rendering matrix by manually editing the Freetype code. But why go through the hassle.

Before testing these, you should find out what’s the subpixel geometry of your screen by consulting this page, and set it as the rgba property. Normally, preset files such as 10-sub-pixel-rgb.conf already come installed so you simply have to symlink them to the /etc/fonts/conf.d directory.

NB: These tests don’t seem to show differences with pango-view but starting any other graphical program should be enough.
NB: Fringes are more apparent with white text on black background.

Here’s the result of the comparison, you can clearly see the fringes when the wrong subpixel geometry is chosen, here my screen has rgb geometry. Also, no-subpixel rendering at all seems like a very good choice for bitmap fonts, keep this in mind.

Test Subpixel geometry comparison

I’ve tried to notice the differences between lcddefault, lcdlight, and lcdlegacy but it’s so minimal that it isn’t worth mentioning. So lcddefault should be fine in most cases. Someone made a comparison on this website if you want to check.

NB: It is rare, but if fonts look deformed on your screen it might be because your DPI isn’t detected properly by fontconfig. Find it on X11 by doing xdpyinfo | grep -B 2 resolution and set it with the following match:

<match target="pattern">
	<edit name="dpi" mode="assign">


Antialias is the settings you should almost never turn off, unless your font is bitmap/non-scalable.
This picture clearly shows the advantage of antialias on scalable fonts. On the right is the non-antialiased version.

Test Anti-Alias comparison

Weird things happen when the 10-scale-bitmap-fonts.conf preset is present. The following image shows a bitmap font without hinting and antialias on the left and on the right with them. Removing this file should fix the font and show it as crisp as possible.

Test Anti-Alias bitmap

NB: If you want to convert bitmap/pcf/bdf fonts to be supported by Pango see this thread on the forums.

Applying What We’ve Learned

Some fonts are known to render badly with Freetype, such as Windows fonts. So let’s test what we’ve learned to make them look better.

You can get a copy of the Windows font from a Windows machine, they are present in the C:\Windows\Fonts\* directory (PS: I do not take responsibility if you do this, for legal reasons).
You should now have the fonts, put them in either $XDG_DATA_HOME/fonts (usually $HOME/.local/share/fonts) or $XDG_DATA_DIRS/fonts (usually /usr/share/fonts).
Be sure to have followed the previous advice of renaming 50-user.conf to 01-user.conf, and confirm that your local font configuration is the first by executing fc-conflist.

Now let’s take the name of all the Windows font we got:

fc-query --format='%{family}\n' * | sort | uniq
  • Arial
  • Arial Black
  • Calibri
  • Calibri Light
  • Cambria
  • Cambria Math
  • Comic Sans MS
  • Consolas
  • Georgia
  • Impact
  • Javanese Text
  • Segoe Print
  • Segoe Script
  • Segoe UI
  • Segoe UI Emoji
  • Segoe UI Historic
  • Segoe UI Black
  • Segoe UI Light
  • Segoe UI Semibold
  • Segoe UI Semilight
  • Segoe UI Symbol
  • Tahoma
  • Times New Roman
  • Trebuchet MS
  • Verdana
  • Webdings
  • Wingdings

And let’s add some rules to our fontconfig file as follows:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

	<description>Make Windows Font Look Good</description>

	<match target="font">
		<edit name="iswindowsfont" mode="assign">
					<string>Arial Black</string>
					<string>Calibri Light</string>
					<string>Cambria Math</string>
					<string>Comic Sans MS</string>
					<string>Javanese Text</string>
					<string>Segoe Print</string>
					<string>Segoe Script</string></eq>
					<string>Segoe UI</string>
					<string>Segoe UI Emoji</string>
					<string>Segoe UI Historic</string>
					<string>Segoe UI Black</string>
					<string>Segoe UI Light</string>
					<string>Segoe UI Semibold</string>
					<string>Segoe UI Semilight</string>
					<string>Segoe UI Symbol</string>
					<string>Times New Roman</string>
					<string>Trebuchet MS</string>

	<match target="font">
		<test name="iswindowsfont" compare="eq">
		<edit mode="assign" name="hinting">
		<edit name="autohint" mode="assign">
		<edit mode="assign" name="hintstyle">
		<edit mode="assign" name="antialias">
		<edit name="embeddedbitmap" mode="assign">


File also hosted here

This may look like a big script and it might be your first time seeing someone write such script for fontconfig but don’t worry. It’s pretty simple overall, it checks the name of the family of the font and sets a variable to true iswindowsfont if it matches. Then, if this is set, it configures the values we want for this group of fonts. You can play with the values if you aren’t satisfied, the grouping should help.
You shouldn’t even have to run fc-cache, this should take effect as soon as you restart an application that uses fontconfig.

fc-match --verbose 'Cambria' | grep iswindowsfont
# iswindowsfont: True(w)


This is it for this post.
I hope you’ve learned a thing or two about font configurations with Freetype and Fontconfig and were surprised by at least one of them.

If you’ve enjoyed my article, have comments, suggestions, or simply want to say thanks, please leave a comment.



  • Internet Archive Book Images / No restrictions

Gonçalo Valério (dethos)

The app I’ve used for the longest period of time September 13, 2020 03:18 PM

What is the piece of software (app) you have used continuously for the longest period of time?

This is an interesting question. More than 2 decades have passed since I’ve got my first computer. Throughout all this time my usage of computers evolved dramatically, most of the software I installed at the time no longer exists or is so outdated that there no point in using it.

Even the “type” of software changed, before I didn’t rely on so many web apps and SaaS (Software as a service) products that dominate the market nowadays.

The devices we use to run the software also changed, now it’s common for people to spend more time on certain mobile apps than their desktop counterparts.

In the last 2 decades, not just the user needs changed but also the communication protocols in the internet, the multimedia codecs and the main “algorithms” for certain tasks.

It is true that many things changed, however others haven’t. There are apps that were relevant at the time, that are still in use and I expect that they will still be around in for many years.

I spent some time thinking about my answer to the question, given I have a few strong contenders.

One of them is Firefox. However my usage of the browser was split by periods when I tried other alternatives. I installed it when it was initially launched and I still use it nowadays, but the continuous usage time doesn’t take it to the first place.

I used Windows for 12/13 straight years before switching to Linux, but it is still not enough (I also don’t think operating systems should be taken into account for this question, since for most people the answer would be Windows).

VLC is another contender, but like it happened to Firefox, I started using it early and then kept switching back and forth with other media players throughout the years. The same applies to the “office” suite.

The final answer seems to be Thunderbird. I’ve been using it daily since 2004, which means 16 years and counting. At the time I was fighting the ridiculously small storage limit I had for my “webmail” inbox, so I started using it to download the messages to my computer in order to save space. I still use it today for totally different reasons.

And you, what is the piece of software or app you have continuously used for the longest period of time?

September 11, 2020

Bit Cannon (wezm)

Finding an Alternative to iOS September 11, 2020 11:20 PM

I've used iPhones since 2008, adding thousands of dollars to Apple's giant pile of cash. Much like my move from macOS to Linux more than 3 years ago, Apple's recent behaviour has prompted me to consider iPhone/iOS alternatives. Join me on this journey into the world of Android and the lack of real choice that smartphones present in 2020.


For about 12 years I've owned iPhones, most bought outright, totalling thousands of dollars. I've held on to my most recent iPhone, an iPhone X longer than all others. Contrary to claims of planned obsolescence it still works well. I like technology though, and was planning to replace it this year and pass it on to my father.

Apple have recently ramped up their hostility towards the developers that make iOS the desirable platform it is. App Store horror stories are nothing new, but lately Apple seems to have really ramped up their desire to extract money from every developer's business, despite being one of the richest companies in the world. They seemingly do so without regard for whether the end-user experience is actually better for it.

Recent events, perhaps starting with the Hey saga and continuing with the ongoing battle with Epic have not reflected well. Apple appears to see developers as owing them for the privilege of being in their store and using their APIs. This is despite app development requiring a yearly membership fee of AU$149, and purchase of Mac hardware for development.

We understand that Basecamp has developed a number of apps and many subsequent versions for the App Store for many years, and that the App Store has distributed millions of these apps to iOS users. These apps do not offer in-app purchase — and, consequently, have not contributed any revenue to the App Store over the last eight years.

Apple App Review Board

Epic decided that it would like to reap the benefits of the App Store without paying anything for them.

— Apple legal submission, via Marco Arment

Apple: Epic only looking for a free ride

Epic, according to Apple, has given Apple $257,000,000 in commission fees in two years over in-app purchases that Apple has no hand, act, part in, doesn't host on their servers, just for the privilege of existing on their OS. ‘Free ride’.

Steve Troughton-Smith

To take just one example, Epic has for years used Apple's groudbreaking graphics technology, Metal. [..] Apple doesn't charge anything beyond its standard commission for the use of Metal or any of the other tools that Epic has used to develop great games on iOS.

— Philip Schiller, via Steve Troughton-Smith

The only alternative to Metal is OpenGL and Apple have deprecated that!

Anyway, whether you agree with Apple or not this whole thing has me (a developer by trade, and a past contributor to the App Store) feeling offside. Additionally, since I now use Linux full-time there are other sources of friction:

  • iPhones work best when paired with a Mac (or even a PC running Windows).
  • Apple only support building apps on Macs, so if I want to cobble together an app for my phone it's no longer possible.

My only real recourse as a consumer is voting with my wallet and perhaps sharing my reasoning on this blog, so here we are. If enough people do this maybe they will take notice, maybe they won't, but I feel I at least need to try. Just like last time, when I sought a replacement for Mac OS X and switched to Linux I have been evaluating alternatives to iOS.

It's worth noting at this point that I really dislike Google. I distanced myself from all of their services about 8 years ago. The only Google service I use regularly is YouTube. I use Fastmail for email, DuckDuckGo for search, Apple + Flickr for photos, Mattermost, iMessage, Matrix, and Telegram for chat.

Evaluating Alternatives

Initial research turned up the following candidates. Almost all were immediately written-off due to lacking apps or being too immature:

  • Android as shipped on a mainstream phone
    • Full of apps and services dependant on Google.
  • LineageOS
  • LineageOS for microG
    • LineageOS with microG compatibility library to allow running apps that rely on Google APIs, without using Google services.
  • postmarketOS
    • Good in theory: An Alpine Linux based OS for your phone. However, it notes, "Beta version. Calls don't work on most phones yet", on the home page.
  • Librem 5 + PureOS
    • By all accounts the expensive hardware is still not great quality and the software is still being built.
  • LuneOS (WebOS)
    • Very small ecosystem.
  • Sailfish OS
    • Bills itself as, "the mobile OS solution for corporations and governments", right on the front page. I am neither of these things.
  • Give up on a smartphone
    • Get a basic phone for calls and texts and do everything else on a real computer, possibly an ultra compact like the GPD Pocket 2.
    • A friend who has never owned a smartphone talked me out of this. It's possible but very inconvenient. Especially due to some things only being possible with a smartphone like ride sharing.

Turns out duopolies suck: you can choose some modicum of respect for privacy with developer hostile Apple, or get a bit more freedom with surveillance capitalist Google. The candidates that seem most viable for me are LineageOS, and LineageOS for microG. To test out this theory I purchased the cheapest phone supported by LineageOS that I could get new: a Redmi 7 by Xiaomi for AU$175.

For the price I was honestly expecting this phone to be hot garbage. It is in fact much better that I expected. However, this was just a platform for testing the software ecosystem, I won't be reviewing the hardware or letting it colour my impressions of Android. If this experiment goes well my plan would be to by a higher quality iPhone replacement phone.


I spent a small amount of time with the stock ROM1 (MIUI) that the Redmi comes with to get a bit of a baseline. It worked well enough and was fairly aesthetically pleasing, but the ads and tracking were truly horrifying. Just take a look at this post describing the steps required to disable data collection and ads — and this is just what you can turn off. Who knows what else it's doing behind the scenes.

LineageOS + Open GApps

I quickly nuked MIUI and installed LineageOS + Open GApps (nano). Open GApps gives you access to some of Google's closed-source apps and libraries, crucially the Google Play Store. The "Open" part of the name refers to the open-source scripts the project publishes for the generation of up-to-date Google Apps packages.

This ROM provides a decent balance between open-source Android and access to the breadth of the Google Play Store. In hindsight The nano version of Open GApps includes more Google than I actually want. I think the ideal for me is the pico package, which is just what's needed to run the Google Play Store.

With this install I attempted to replace the apps that I use most on iOS. For the following apps I just used the Android version:

  • Authy
  • Deliveroo
  • Discord
  • Element (Matrix)
  • Fastmail
  • Firefox
  • Firefox Focus
  • Instagram
  • Mattermost
  • Reddit
  • Slack
  • Telegram
  • Up
  • YouTube

For these apps I found a replacement that I was mostly happy with:

For these apps I wasn't able to/have not yet found a replacement that I was happy with (please don't send me recommendations):

In general I don't find Android apps to be as nice, or as polished as iOS apps. John and Ben recently discussed this on the 9 September episode of Dithering, which matched my experience. I also really dislike the visual style and slow animations of the Material design language. Especially the circular animation on tap. The apps I like the most are the ones that shun the Material style for their own.

Something I learnt from my move to Linux though, was to embrace the platform's conventions as opposed to trying to reproduce the system you're moving from as much as possible. So I will put my dislike of Material aside.

Screenshot of the emoji keyboard on LineageOS

So Ugly

One thing I'm not sure I can put aside is the use of the super ugly Noto Color Emoji font for emoji on Android. On Linux my system emoji font is JoyPixels and I go to certain lengths to avoid seeing Noto Color Emoji. Almost any other widely available emoji font would be preferable to me. I did try side-loading a JoyPixels package when flashing the ROM but couldn't get it to stick. Apparently something changed in Android 10.

I could "root" the phone and swap out the font file but in the same way I've never jail-broken an iPhone this is not a path I want to go down right now. If worst comes to worst I could actually build LineageOS from source and swap out Noto Color Emoji — what a concept!

LineageOS for microG

microG is a library that implements various APIs provided by Google closed-source libraries in order to be able to run more apps — those that depend on Google's mapping APIs for example. The microG versions of the APIs don't rely on Google servers. Critically going down this path you lose access to Google's push notification servers. Some apps like Telegram work around this but for the most part you lose notifications.

LineageOS for microG gives a familiar LineageOS experience initially. Instead of the Google Play Store though, it uses F-Droid, a repository of strictly free and open-source software. As expected there are far fewer apps available on F-Droid. Most of the big names are missing.

I think if you were especially principled, were happy to use web apps for many things (like Twitter), and didn't use a smartphone all that much LineageOS for microG could work. After spending some time with it though, it's just too limited for me.

Picking a New Phone

The experiment so far showed that I could probably get by with LineageOS + Open GApps. I started looking into what real phone to get as opposed to the Redmi 7 test phone. I had these requirements:

  1. I want a phone around the size of my iPhone X (5.8" display).
    • I find larger phones like the iPhone 6 Plus I owned, and Redmi 7 uncomfortable in pocket, especially when sitting.
  2. If I go with Android I want it to run LineageOS or similar (least amount of Google as possible).
  3. Available in Australia.

That basically only leaves Google Pixel 3 and 4 phones. Pixel 4 seems to have been a bit of a dud. It was discontinued after 9 months and the unreleased successor is rumoured to revert a bunch of the changes it introduced: back to fingerprint sensor, removal of radar gesture sensor. Pixel 3 (from 2018) seemed like it could be viable… but then I looked at GeekBench benchmarks:

  • Pixel 3 — 468 single core, 1833 multi-core
  • Pixel 4 — 610 single core, 2210 multi-core
  • iPhone X — 916 single core, 2334 multi-core

At the time of writing no Android phone is faster in single core performance than my iPhone X from 2017. The OnePlus 8 is at the top with a score of 900. It seems they caught up on multi-core ~last year (by having more cores).

So if the Pixel 3 is my main option I'd be spending money to upgrade to a significantly slower phone made by Google to escape Apple's restrictive, developer hostile, albeit more privacy respecting ecosystem… this is not immediately compelling.

Closing Thoughts

I'm really torn. The upcoming Pixel 5 would likely be a good option if it were possible to strip out as much of the Google dependencies as possible. If past releases are anything to go by it seems that it's likely to be almost another year or so before LineageOS is available for the Pixel 5.

I don't like the idea of buying a Pixel 3 given that it's a step backwards performance wise. After 3 years with the iPhone X I kind of what the replacement to perform close to it. Sadly modern web pages and fake native apps (apps built with web tech) demand fast performance. For example, the Redmi 7 has a really hard time with long Medium articles.

Another option would be to just keep using the iPhone X. It still performs well, battery capacity is still 89% of new, it's still getting major iOS updates. And I'm still voting with my wallet by not giving Apple more money. I did however tell my Dad to hold off buying a new phone earlier in the year because he could have mine when I replace it. So I kind of need a new phone one way or another.

For now I'm going to wait for the Pixel 5 and new iPhones to be released later this year and continue to follow Apple's behaviour towards developers. It's not uncommon for them to actually listen to their customers eventually — often it takes longer than it feels it should though (*cough* butterfly keyboard). As usual subscribe to the feed, or follow me on Twitter or the Fediverse for future updates.


I'm using "ROM" (Read Only Memory) here knowing that it's incorrect, since that's the typical language for alternate OSes for Android phones.

Jan van den Berg (j11g)

Moby-Dick – Herman Melville September 11, 2020 06:20 PM

I suspect Moby-Dick — the quintessential Great American Novel — has the curious accolade of being one of the most famous books ever, while also being one of the least read books. Its reputation greatly exceeds its appeal. Nonetheless, I had always wanted to read this extraordinary 170 year old book. And now that I did, I think I understand its reputation as well as I understand the incongruent appeal.

Moby-Dick stats

Moby-Dick clocks in around 650+ pages and 212,000 words. It’s not a small book but it’s also not the biggest book I ever read. But it was definitely one of the hardest, and one that demanded a dedicated and focused effort to finish.

Long story short: reading Moby-Dick is hard work and it’s not exactly the most riveting thing I ever read.

It doesn’t keep you on the edge of your seat. Surprisingly very little happens for such a big book. You can summarize the entire thing in one sentence (yes, I’ll get to the allegories later).

That is not to say that this is not a smart book. Herman Melville’s IQ probably bordered on genius and he pulled out all the stops with Moby-Dick. However, those two things don’t necessarily make for a good book. Why is it then than Moby-Dick is so revered? I can think of a few things.

Moby-Dick – Herman Melville (1851) – 656 pages. Don’t mind the sticker.

Words, just so.many.different.words

Melville’s dictionary must be the most abused book ever. Because if there was an Olympics for using the most different words, Herman Melville would win first, second and third place. This is actually a scientific fact: “About 44% of the distinct set of words in this novel occur only once”

Read that again: 44% of all words in Moby-Dick are used only once.

If you don’t believe me just open this book on any page and you can tell this right away. Moby-Dick is not like any other book.

It is divided in 135 small chapters — and one very important epilogue — each chapter deals with a dedicated subject. And it seems Melville took it as an exercise to fill each chapter with as many different words as he could. Not only that, he likes to use long, half page long rambling sentences. There is also an enormous variation in style per chapter; from dialogue to scientific descriptions to inner thoughts to poetic or philosophical or almost theatrical treaties. And to top it all off, this is all done in English from 170 years ago. Just to give you an idea of what a chore it is to read.

And all of these things are reasons Moby-Dick stands out among other books. Another is because it’s about whaling.


Whaling in the 19th century was astoundingly difficult and fantastical venture. If I hadn’t known about it and you would explain it to me I wouldn’t believe you. People actually set out on wooden ships for three or four years and just randomly sail around the world until they found some whales?! Whales that are actual leviathans and that can kill any man in an instant? And when they do spot these whales, they set out on even smaller wooden boats to try to harpoon these 100 foot creatures, BY HAND?! Surely this is all made up! This cannot be real! But it is.

Whaling is an absolutely insane endeavour. And this makes it a terrific backdrop for a story.

I would like to argue no man before or after has know more about whaling than Melville. He not only writes from his own experiences as a whaler, he also had probably read everything ever written (at that point) about whaling and whales. And he uses all this knowledge to bombard the reader with more facts than your brain can handle, about whaling, whales and whalers.

He also shares detailed glimpses of 19th century Nantucket life. Which makes this book a time-capsule of the American spirit. These are reasons this book is so revered in the English speaking world. So much so, that it is regarded as the definitive Great American Novel.

Even though the book suffered greatly from negative reviews and criticism about alleged blasphemy. And it wasn’t until a good 70 years later that Moby-Dick started to be regarded as the classic we now all know. (But this is a story by itself).

Without the bookcover. Gorgeous.


On to the good parts. Moby-Dick is not really about the demonstration of Melville’s mastery of language or even about whaling. These two things make it unique, but what makes it good is what is under the surface (see what I did there?).

This book is absolutely brimmed with allegories, allusions and metaphors. Some small, some encapsulate the entire plot, some are even displayed by the book’s structure.

The most clear-cut one is of course that the whale Moby-Dick represents fate itself. But there are many more. Philosophical or contemplative of nature. You can talk and discuss and debate on this endlessly.


There is one meta-allegory I particularly like. In Moby-Dick we read about a whaler, Ahab, that sets out to kill this mythical monster Moby-Dick, a sperm whale he lost his leg to previously. We as a reader slowly get to experience how this whaler goes maniacally insane and takes his crew with him. Until they all go under.
In a sense this is about Melville himself and his experience and difficulty writing this book! And we, the readers, are the crew.

This is just one take. But there are many more direct allegories, about names, stories and references. Specifically the boats and captains Ahab and Ishmael meet along the way, are loaded with biblical references and meaning. I am sure I probably missed a whole bunch too. Melville uses these narrative devices to deal with many different themes. And it is exactly this what sets Moby-Dick apart from other books. There is a score of things that aren’t said, but implied.

My copy of the book ends with a couple of letters from Melville about his book and his struggles in getting it published. Right after the letters the book, oddly enough, shares a couple of very negative reviews from the time of publishing. I am not sure why they are in there. Maybe to demonstrate that people did not recognize the genius at once? Or how remarkable it is that this book still became a classic? I am not sure.


All in all Moby-Dick is a distinctive and unique reading experience detailing a story about a very specific time and endeavour. And I can now boast “I read Moby-Dick”, and I am glad I did but I will also say I didn’t really enjoy reading it all that much.

I think I understand what Melville set out to do and I admire his genius. I also think I understand the appeal of this book 170 years later. This book makes you work and that is not a problem, but there were times that I really had to force myself, and that does not happen to books that are favorites of mine.

Melville was a genius wordsmith and put many ideas in this book for people to contemplate over for generations to come. But as is the case with music, I don’t care how many different notes a guitar player can hit on his guitar in 1 minute, that is not music, that is a demonstration of mastery. In the end it is about what songs this mastery produces. And in this case, I think I wanted to have liked the song more.

The post Moby-Dick – Herman Melville appeared first on Jan van den Berg.

September 09, 2020

Kevin Burke (kb)

Let employees sell their equity September 09, 2020 10:27 PM

Sometimes people choose to work for one company over another for reasons related to the work environment, for example what the company does, and whether the other employees create a place that's pleasant to work at. But a major factor is compensation. If Company A and Company B are largely comparable, but Company A offers $30,000 more in base pay per year more than Company B, most people will choose Company A.

At tech companies, compensation usually breaks down into four components: company stock, benefits, cash salary, and bonus. When you get an offer from a company, these are the four areas that the recruiter will walk you through. The equity component is a key part of the compensation at startups. Small startups hope that the potential for a large payoff is worth sacrificing a few years of smaller base pay.

If you join a small startup and you get stock, you generally can't sell it until an "exit event" - an IPO or acquisition - even if your entire stock grant has vested. Generally, any stock sale before an exit event will require approval of the board, and the boards generally frown on stock sales, for reasons I will get into. So while you may own something that is worth a lot of money, you can't convert it into cash you can actually spend for a half decade or more.

By contrast, if you join a public company, your compensation includes equity that you can sell basically immediately after it vests, because it trades on a public exchange. There are hundreds of people who will compete to offer the best price for your shares every day between 9am and 4:30pm.

As an employee, how should you think about the equity component of your offer? One reason to take a big equity stake is to bet on yourself. If you have a great idea about how you can make the company 10%, 50%, or 200% more valuable, and you think you can execute it, you should take an equity stake! After you implement the changes, your equity will be massively more valuable. Broadly speaking this is what "activist investors" try to do; they have a theory about how to improve companies, they buy a stake and hope the value changes in line with the theory.

One problem with this is that you are much likely to be in a position to make these changes if you are someone important like a C-level executive or a distinguished engineer. However, most tech employees are not C-level executives. If you are an engineer on the fraud team, and you try really, really hard at your job for a year, maybe you can increase the value of the company by 1% or 2%. You are just not in a position, scope wise, to drastically alter the trajectory of the company by yourself.

Rationally speaking, it does not make much sense for you, an engineer on the fraud team, to double or triple your effort just to make your equity stake worth 1% more. There might be other reasons to do it - you could really buy into the mission, or you hate being yelled at or whatever - but just looking at the compensation, whether you, personally, work really hard or slack off, your stock is probably going to be worth about the same. Unless you are the CEO or other C-level executive, at which point you have a big enough lever that your level of effort matters.

Another way to think about it is, imagine you have invested your money in a broad range of stocks and bonds, and then someone asked you to sell 30% of it and place it all in a single tech stock. Modern portfolio theory would suggest that that is a bad thing to do. You could gain a lot if the stock does well, but on the other hand, if the company's accountant was embezzling funds, or the company lost a lawsuit, or the company lost a database or had the factory struck by lightning or something, you could lose a ton of money that you wouldn't if you were better diversified. It's not worth the risk.

All this goes to say that employees should value their equity substantially less than an equivalent amount of cash. Outside of the C-level, you can't do much to make the equity more valuable, and an extra dollar worth of equity takes your portfolio further away from an ideal portfolio that you could buy if you just had cash. (For more on this topic you should read Lisa Meulbroek (hi, Professor Meulbroek), whose CV is criminally underrated.)

(On the flip side, if your company is small and valuable, it may have its pick of investors to take money from, and be able to dictate investment terms. Holding equity in a company like this is a way to approximate the "deal flow" of a good Silicon Valley investor - as an employee you are getting the chance to buy and hold stock in a company at prices that would not be accessible to you otherwise. This may be true of small, hot startups but it gets less and less true the bigger a company gets and the more fundraising rounds it goes through.)

One implication is that you should prefer to work at public companies. At a public company, you can take your equity compensation and immediately sell it and buy VT (or even QQQ) or whatever and be much better off because you are diversified. You can't do that at a private startup.

Another problem is that public companies tend to have better equity packages. I went through a round of interviews recently and I was stunned at how paltry the equity offers were from private, Series A-C companies. For most of the offers I received, the company valuation would need to increase by 8-20x for the yearly compensation to achieve parity with the first-year offer from a public SF-based company, let alone to exceed it. Even if they did achieve 4 doublings of their valuation, you might not be able to sell the private company stock, so you're still behind the public company.

I expect larger companies to have better compensation, it's part of the deal, but that large of a differential, plus the cash premium to be able to sell instantly, makes it foolish to turn down the public company offer. 1

So how can you compete if you're a smaller company? The obvious answers are what they've always been: recruit people with backgrounds that bigger companies overlook, give people wild amounts of responsibility, sell people on the vision, commit to "not being evil" and actually follow through on it.

But you can also try to eliminate an advantage that public companies have by letting your employees sell their equity. Not just, like, one time, at a huge discount before you go public, or when you get to Stripe's size and want to appease your employees. But routinely; because your employees want to boost their cash base, or buy the stock market, or buy a vacation, or whatever.

There are some objections. Having more than 500 shareholders triggers SEC disclosure requirements, which can be a pain to deal with. So require employees to sell to other employees or existing investors. Cashing out entirely might send the wrong signals, so limit sales to 10-20% of your stake per calendar year. A liquid market might require repricing stock options constantly. So implement quarterly trading windows.

Executives might not want to see what the market value of your stock is at a given time. That's tougher. But a high day-to-day price might convince people to join when they otherwise wouldn't. A low price might convince you to change direction faster than waiting for the next fundraising round.

There are also huge benefits. Employees can cash in earlier in ways that are generally only available to executives. They can take some risk off the table. People who want to double up on their equity position can do so.

Finally, you might be able to attract employees you might not otherwise be able to. A lot of folks who are turned off by the illiquidity of an equity offer might turn their heads when you describe how they can sell a portion at market value every year.

Big companies have big moats. One of them - the ability to convert stock to cash instantly - doesn't need to be one.

Thanks to Dan Luu and Alan Shreve for reading drafts of this post.

You may think they were lowballing me, but this was after negotiation with each. Another possibility is that I did differently on the interviews for each, and the smaller companies offered me lower packages because they thought I did worse. I think I did about equally well on the interviews for each.

Patrick Louis (venam)

Notes About Compilers September 09, 2020 09:00 PM

Architect style wall, nothing really related but it looks good and gives a vibe

Compilers, these wonderful and intricate pieces of software that do so much and that so many know little of. Similar to the previous article about computer architecture, I’ll take a look at another essential, but lesser known, CS topic: Compilers.
I won’t actually dive into much details but I’ll keep it short to my notes, definitions, and what I actually found intriguing and helpful.

General schema of a compiler pieces

A compiler is divided into a frontend and a backend. The frontend role is to parse the textual program, or whatever format the programmer uses to input the code, verify it, and turn it into a representation that’s easier to work with — an IR or Intermediary Representation.
Anything after getting this intermediate representation, which is usually either a tree or a three-address code, is the backend which role is to optimize the code and generate an output. This output could be anything ranging from another programming language, what’s called a transpiler, to compiling into specific machine code instructions.
These days many programming languages rely on helpful tools to make these steps easier. For example, most of them use Yacc and Lex to build the front-end, and then use LLVM to automatically have a backend. LLVM IR is a backend that could in theory plug to any compiler frontend, thus any compiler relying on it will necessarily benefit from optimizations done in the LLVM IR.

Personally, I’ve found that the most interesting parts were in the backend. While the frontend consist of gruesome parsing, things become fascinating when you realizing everything can be turned into three-address code, instructions that consist of maximum 3 operands and that have only one operand on the left side for assignment and one operator on the right.
From this point on, you can apply every king od optimizations possible, like if loops over arrays can have their address represented by linear functions, or if dependence between data allows to reposition the code, of if following the lifetime of values help. In the backend you can manage what the process will look like in memory, and you can also implement garbage collection.

Overall, learning a bit about compilers doesn’t hurt. It gives insights into the workings of the languages we use everyday, removing the magic around them but keeping the awe and amazement.
So here are my rough notes and definitions I took while learning about compilers, I hope these help someone going on the same path as there’s a lot of jargon involved.


  • Terminals: Basic symbols from which strings are formed, also called token name.

  • Nonterminals: Syntactic variables that denote sets of strings. It helps define the language generated by the gammar, imposing a hierarchical structure on the language that is key to syntax analysis and translation.

  • Production: What nonterminals produce, the manner in which the terminals and nonterminals can be combind to form strings. They have a left/head side and a body/right side, separated by -> or sometimes ::==

  • Grammar: The combination of terminal symbols, nonterminal symbols, productions (nonterminals output)

  • Context free grammar: It has 4 components: terminal symbols/tokens, nonterminal symbols/syntactic variables (a string of terminals), productions (nonterminals called the head/left side + arrow + sequence of terminals and/or nonterminals the body or right side), and the designation of nonterminals as start symbol.

  • The language: The strings that we can derive from the grammar.

  • Parse Tree: Finding a tree that can be used to derive/yield a string in the language.

  • Parsing: The process of finding a parse tree for a given string of terminals.

  • Ambiguous grammar: A grammar that can have more than one parse tree that can generate a given string.

  • Associativity: The side to which the operator belongs to if the operator is within two tokens. Could be left-side associativity or right-side associativity. This is a way to assign and resolve the priority/precedence of operators.

  • Syntax-directed translation scheme: Attaching rules (semantic rules) or program fragments to productions in a grammar. The output is the translated program.

A schema representing simple syntax-directed translation

  • Attributes: Any quantity associated with a programming construct.

  • Syntax tree: The tree generated from a syntax-directed translation.

  • Synthesized attributes: We can associate attributes with terminals and nonterminals, then also attach rules that dictate how to fill these attributes. This can be done in syntax-directed translation.

  • Semantic rules: When displaying a syntax-directed grammar, the semantic rules are the attached actions that need to be done to synthesized attributes (other than the usual production).

  • Tree traversal: How we visit each element of a tree, could be depth first, aka go to children first, or breadth first/top-down, aka root first.

  • Translation schemes: executing program fragments, semantic actions, instead of concatenating strings.

  • Top-down parsing: Start at the root/breadth first, the starting nonterminal, and repeatedly perform: select one production at that node and construct children, find next node at which the subtree is constructed. The selection involves trial and error.

  • lookahead symbol: The current or future terminal being scanned in the input. Typically, the leftmost terminal of the input string.

  • Recursive-descent parsing: a top-down method of syntax analysis in which you recursively try to process the input. There’s a set of procedures, one for each nonterminal.

void A() {
	Choose an A-production, A->XaX2 ... Xk;
	for (i = 1 to k) {
		if (Xi is a nonterminal) {
			call procedure Xi();
		} else if ( i equals the current input symbol a) {
			advance the input to the next symbol;
		} else {
			/* an error has occurred */
  • Backtracking: Going backward in the input to parse them again using another production as the new choice.

  • Predictive parsing: A form of recursive-descent parsing in which the lookahead symbol unambiguously determines the flow of control through the procedure body of non-terminal. This implicitly defines a parse tree for the input and can also be used to build an explicit parse tree. The procedure does two things: It decides which production to use by examining the lookadhead symbol if it is in the FIRST(a), The procedure mimics the body of the chosen production, it fakes execution until a terminal.

  • FIRST(a): Function to return the set of terminals that appear as the first symbols of one or more strings of terminals generated from a.

    1. If X is a terminal, then FIRST(X) = {X}.
    2. If X is a nonterminal and X-> Y1Y2...Yk is a production, then place a in FIRST(X), a in FIRST(Yi) and ε in all of FIRST(Y1)...FIRST(Yi-1), that basically means that X -> ε a.
    3. If X -> ε is a production, then add ε to FIRST(X).

  • FOLLOW: Function to return the rightmost symbols in the derivation sentential form.
    1. Place $ in FOLLOW(S), S is the start symbol
    2. If there is a production A-> aBb then everything in FIRST(b) except ε is in FOLLOW(B), so in sum any terminal that follows B
    3. If there is a production A -> aB, or a production A -> aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B).

  • Left recursion: A recursive-descent parser could loop forever, we need to avoid that. It can be eliminated by rewriting the offending production. Example A -> Aa | B which is left recursive, can be rewriten as A -> BR, R -> aR | ε.
    Algorithm to remove left recursion:
arrange the nonterminals in some order A1,A2,..., An
for (each i from 1 to n) {
	for (each j from 1 to i-1) {
		replace each production of them form Ai -> Aiy by the
		productions Ai -> d1y | d2y| .. | dky, where
		Aj -> d1 | d2 | ... | dk are all current Aj-productions
	eliminate the immediate left recursion among the Ai-productions
  • Left Factoring: When it’s ambiguous which production to select in A -> aB1 | aB2 , we can defer the selection to later by factoring it to A -> aA1 and A1 -> B1 | B2. We factor by the most common prefix.

  • Abstract syntax tree or syntax tree: A tree in which interior nodes represent an operator and children node represent operands of the operator. They differs from parse tree in the way that they have programming construct in interior nodes instead of non-terminals.

  • Token: Terminal with additional information, name and optional attribute value. The name is an abstract symbol representing a kind of lexical unit, be it a keyword, an identifier, etc.

  • Lexeme: sequence of characters from the source program that comprises a single token name. It’s an instance of that token.

  • Pattern: A description of the form that the lexeme of a token may take. A sequence of characters that form a keyword or form identifiers and other tokens, any more complex string structure that needs to be matched.

  • Lexical analysis/analyzer: a lexical analyzer reads characters from the input and groups them into “token objects”. Basically, it creates the tokens. It could be split into two parts, a scanning that consists of processing input by removing comments and compacting white spaces, and a proper lexical analysis that is the more complex portion that produces the token.

Interaction between syntax analyzer and parser

  • Reading Ahead: It’s useful to read future characters to decide if they are part of the same lexeme. A technique is to use an input buffer or a peek variable that holds the next character.

  • Input buffering: The best technique is to use a buffer pairs, 2 buffers of the size of a disk block so that reading is more efficient. We use a lexemeBegin pointer and a forward pointer. To check if we are out of bound of a buffer or that reading is finished we can use “sentinels”, special characters that specify the end of file, if in the middle, or end of buffer, if at the end of the buffer. This character can be EOF.

  • Keywords: character strings, lexeme, that identify constructs such as if, for, do, etc.

  • Identifier: also a character string, lexeme, that identify a named value.

  • Symbol Tables: Data structures used by compilers to hold information about source-program constructs. The info is collected incrementally by the analysis phase and used by the synthesis phases to generate the target code. Entries in it contain info about identifiers such as its character string (lexeme), its type, its position in storage, and any other relevant info. Each scope usually have their own symbol table. It gets filled during the analysis phase, the semantic action fills the symbol table, then for example factor -> id, id token gets replaced by its symbol that was declared in the table.

  • Intermediate Representations: The frontend generates an intermediate representation of the source program so that the backend can generate the target program. The two most important are: Trees (parse trees and abstract syntax trees), and linear representations (such as three-address code).

  • Static checking: The process of checking if the program follows syntactic and semantic rules of the source language. Assures that the program will compile successfully, and catches errors early. Contain: Syntactic checking: checks grammar, identifier declared, scope check, break statement at end of loops, and type checking.

  • Type Checking: Assures that an operator or function is applied the right number and type of operands, also handles the conversion if necessary aka “coercion”.

  • Strings and languages: A string is synonym for word or sentence. It’s a finite set of characters, its length measured as |s|. ε, or e, is the string of length 0, |ε| = 0. Strings can be concatenated, if concatenated with ε they stay the same. We can define exponentiation of strings, as in s**0 = ε, s**1 = s, s**2 = ss, s**n = s**(n-1) s. Language is the countable set of strings over an alphabet, a set { x y z }, empty set is 0 or { ε }.

  • Operations over language: We can perform union, concatenation, and closure, which are the most important operations. Union of two different languages is the same as in set theory. Concatenation is strings formed by taking strings from the first language and string from the second language. Closure aka kleene of L or L* is a set of string you can get by doing concatenation of L zero or more time. L+, positive closure is 1 or more time.

Closure operation

DFA NFA NDFA NFA Example Subset construction Transition table for conversion

  • Regular expressions aka regex: The joint of all operations over language done in an expresive way. There are precedence priority rules: All are left associative, the highest precedence goes to *, then concatenation, then union |. A language that can be defined by regular expressions is called a regular set.

Algebraic laws for regular expressions

  • Regular definitions: Like variables holding a regex for later use, to make it more readable.

Summary of lexer 1 Summary of lexer 2 Summary of lexer 3

Notational convention 1 Notational convention 2

  • Aho-Corasick algorithm: Algorithm that permits to find the longuest prefix that matched a single keyword that is found as a prefix of a string. It defines a special transition diagram called a trie, it is a tree structured transition diagram. Define for every node of that tree a failure function which is the previous state that fits the prefix, f(s), s being the current position in the string we are trying to match. The seek pointer should be put back at b(f(s)+1) in case of error. There’s also the KMP algoritm to match the string.

Pseudo code for failure function:

t = 0;
f(1) = 0;
for (s = 1; s < n; s++) {
	while (t > 0 && b(s+1) != b(t+1)) t = f(t);
	if (b(s+1) == b(t+1)) {
		t = t + 1;
		f(s+1) = t;
	} else {
		f(s+1) = 0;
  • Conflict resolution in Lex:
    1. Always prefer a longer prefix to a shorter prefix.
    2. If the longest possible prefix matches two or more patterns, prefer the pattern listed first in the Lex program.

Position of Parser in compiler model

  • Constructing parse tree through derivation: Begin with the start symbol and then at each step replace a nonterminal by the body of one of its production. It’s a top-down construction of a parse tree. We use the => to denote “derives”. This proves that a certain terminal derives, in a number of steps, from a particular instance of an expression. If a form with no-nonterminals derives from the start symbol we can say that it is a sentential form of the grammar. The language of a grammar is the set of sentences. Some grammars can be equivalent, different path for same sentence, we denote leftmost derivation and rightmost/canonical derivation.

  • LL(1) grammars: L for left to right scanning, L for a leftmost derivation, 1 for using one input symbol of lookahead at each step to make parsing action decisions.

  • Constructing parse trees through reduction: Reduction or bottom-up parsing, is the inverse of derivation, it consists of reducing terminals until the start symbol is found. A “handle” is a substring that matches the substring of the body of a production, it is reduced to the left-most/head of it.

  • LR(k) parsing: L for left to right scanning of the input, R for constructing the rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions.

  • Items in LR(0): States represent sets of “items”, it’s a production of G grammar with a dot at some position of the body indicating where we are in the parsing. For example: A -> X . Y Z

  • Augmented expression grammar in LR(0): a grammar with added initial state S' that produces S, such as: S' -> S, we accept the state once everything is reduced to S'.

  • CLOSURE and GOTO in LR(0): The CLOSURE of a set of items I for a grammar G is constructed as follows: add every items in I to the CLOSURE(I), if A -> a.Bb is in CLOSURE(I) then replace B by what B produces, example: B -> y, then we add A -> a.yb we do this until we cannot apply this rule anymore. We call the added ones nonkernel items, and the initial ones kernel items.
    The GOTO(I, X), where I is the set of items and X a grammar symbol, defines the closure of the set of all items [ A -> a X. b ] such that [A -> a .X b] is in I. It defines the transitions in the LR(0) automaton for a grammar on input X.

  • CLOSURE and GOTO in LR(1): LR(1) is similar to LR(0) however it has one lookahead character, an item as the form: [A -> a.B, a] where this production is valid only when the next input symbol is a.

SetOfItems CLOSURE(I) {
		for (each item [A -> a.Bb, a] in I)
			for (each production B -> y in G')
				for (each terminal c in FIRST(ba)
					add [B -> .y, c] to set I;
	until no more items are added to I;
	return I;

SetOfItems GOTO(I, X) {
	initialize J to be the empty set;
	for (each item [A -> a.XB, a] in I)
		add item [A -> aX.B, a] to set J;
	return CLOSURE(J);

void items(G') {
	initialize C to {CLOSURE({|S' -> .S, $|})};
		for (each set of items I in C)
			for (each grammar symbol X)
				if (GOTO(I, X) is not empty and not in C)
					add GOTO(I,X) to C;
	until no new sets of items are added to C;

LR(1) example 1 LR(1) example 2 LR(1) example 3 LR(1) example 4 LR(1) example 5

Parser Summary 1 Parser Summary 2 Parser Summary 3 Parser Summary 4 Parser Summary 5 Parser Summary 6

  • L-attributed translations: Class of syntax-directed translations (L for left-to-right), which encompass virtually all translations that can be performed during parsing.

  • SDD, syntax-directed definition: Context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. If X is a symbol, X.a shows a as an attribute of X.

  • Synthesized and Inherited attributes: Synthesized attributes at node N for nonterminal A are computed from the semantic rules at that node, while inherited the attribute of the children are computed from the parent’s semantic rules. Terminals only have synthesized attributes.

  • S-attributed SDD: A syntax directed definition that only contains synthesized attributes, that is the head attributes are computed from its production body at node N only (not parent).

  • L-attributed SDD: Where the inherited attributes are only defined by one of the attribute on the left or in the head of the production (left-to-right).

  • Attribute grammar: An SDD without side effects.

  • Annotated parse tree: A parse tree showing the value(s) of its attribute(s).

  • Dependency graph: A graph with arrows/edges pointing in the direction of the value that depends upon the other side of those arrows. It’s applicable for both synthesized attributes and inherited attributes.

  • Topological sort: a way of sorting the dependency graph in a way in which the attributes/node have to be processed. When there are loops in the dependency, topological sorts are not possible.

  • Syntax-directed translation (SDT) for L-Attributed Definition: A syntax directed translation where we put the action/semantic-rule right before the character that requires them, and put the semantic-rule of the head as the last rule.

Syntax directed definition summary 1 Syntax directed definition summary 2 Syntax directed definition summary 3

  • Directed acyclic graph (DAG): A way to convert a syntax-directed definition into a graph where leaves are unique/atomic operand, and interior nodes correspond to operators. A leaf node can have many parents. It expresses the syntax tree more succintly and can be used for generation of efficient code to evaluate expressions. Nodes can be stored in an array of records, where each row represents one node. Leaves have a field as lexical value and interior nodes have two fields for left and right children.
|1| id  |  ----|-> to entry for i
|2| num | 10   |
|3| +   |1 | 2 |
|4| =   |1 | 3 |
|5|  ....      |

     .- = .
   .'      `.         
  :         +         
  :      .'  `.       
  `.   .'      `      
     i          10    

Intermediate representation position in compiler 1 Intermediate representation position in compiler 2

  • Three-address code: Instructions where there are at most one operator on the right side. It is a linear representation of a syntax tree or a DAG in which explicit names correspond to the interior nodes of the graph. Three-address code is composed of addresses and instructions. An address could either be a name, a constant, or a compiler-generated temporary. Common instructions used can be an assignment instruction (x = y op z), unary operator assignment (x = op y), copy instruction of the form x = y, unconditional jump goto L, conditional jump of the form if x goto L and ifFalse x goto L, conditional jump such as if x relop y goto L relop being a conditional operator, and procedure calls such as param x for parameters and call p, n and y = call p, n (last n arguments) for procedures and function calls respectively, and return y y being the returned value, indexed copy instructions of the form x = y[i] and x[i] = y, and address and pointer assignments of the form x = &y x = *y and *x = y.

if-else to three-address code 1 if-else to three-address code 2 if-else to three-address code 3 if-else to three-address code 4

  • Quadruples (in the context of three-address code): A table where we map 4 columns: op, arg1, arg2, result. Unary operators don’t fill arg2, param don’t fill arg2 nor result, and conditional jumps have the target label in result.

  • Triples (in the context of three-address code): A table where we map 3 columns, similar to quadruples, but without the result. The result is referred to by its position only. They are one to one with syntax tree. Indirect triples are like triples but instead of pointing the result directly we point to the result position in a separate instruction table, and thus can move chunks of code independently.

  • Static single-assignment form (SSA): An intermediary representation similar to three-address code but where all assignments are to variables with distinct names. It uses ø-function to combine definitions of the same variable, returns the value of the asignment-statement corresponding to the control-flow path.

  • Translation applications: From the type of a name, the compiler can determine the type of storage (storage layout) that will be needed for that name at run time. Type information can be used to calculate addresses denoted in arrays for example.
    Array layout is either row major or column major, as: base + (i-low)*w Some types could be left chosen by the output archicture, left as symbolic type width in the intermediate representation.

  • Type checking: A method the compiler uses, with a type system, to assign type expression to each components of a source program to avoid inadvertent error and malicious misbehavior. A language is either strongly typed or not, meaning it needs all the types to be chosen explicitly.
    Two forms: synthesis and inference, synthesis builds up the type of an expression from the type of its subexpressions. It requires names to be declared before they are used. ex: if f has type s -> t and x has type s, then expression f(x) has type t. Type inference determines the type of a language construct from the way it is used. ex: if f(x) is an expression, then for some a and b, f has type a -> b and x hs type a.

  • Implicit and explicit type conversion: implicit conversion is when the compiler coerces the types, usually when widening types, explicit is when the programmer must write something to cause the conversion. Two semantic actions for checking E -> E1 + E2 one is max(t1, t2) another widen(a,t,w) which widen address a of type t into a value of type w.

Addr widen(Addr a, Type t, Type w) {
	if (t = w) return a;
	else if (t = integer and w = float) {
		temp = new Temp();
		gen(temp = '=' (float)' a);
		return temp;
	} else {
  • Polymorphic function: A type expression with a stands “for any type” which the function can be applied to. Each time a polymorphic function is applied, its bound type variables can denote a different type.

  • Unification: The problem of determining whether two expressions s and t can be made identical by substituting expressions for the variables in s and t.

  • Boolean expressions: Either used to alter the flow of control or to compute logical.

B -> B || B | B && B | !B | (B) | E rel E | true | false

We can short-circuit boolean operators, translating them into jumps:

if (x < 100 || x > 200 && x != y) x = 0;

equivalent to:

if x < 100 goto L2
ifFalse x > 200 goto L1
ifFalse x != y goto L1
L2: x = 0
  • Backpatching: A method of generating labels for jumps in boolean expression (ex: if (B)) S) in one pass as synthesized attributes.

Intermediate representation summary 1 Intermediate representation summary 2

  • Run-time environment: The environment provided by the operating system so that the program runs. Typically:
Free memory

General activation record

  • Stack vs Heap: Stack storage: for names local to a procedure. Heap storage: data that may outlive the call to the procedure that created it (we talk of virtual memory).

  • Memory Manager: A subsystem that allocates and deallocates space within the heap, it serves as an interface between application programs and the operating system. It performs two basic functions: allocation and deallocation.
    A memory manager should be space efficient, minimizing the total heap space needed by a program, program efficient, it should allow the program to run faster by making use of the memory subsystem, and have a low overhead, because memory allocation and deallocation are frequent in many programs.

  • Garbage collectors: A piece of code to reclaims chunks of storage that aren’t accessed anymore.
    Things to consider: overall execution time, space usage, pause time, program locality.
    Either we catch the transition when object become unreachable (like reference counting), or we periodically locate all the reachable objects and then infer that all the other objects are unreachable (trace-based).

  • Mutator: A subsystem that is in charge of manipulating memory. It performs 4 basic operations: Object allocation, parameter passing and return values, reference assignments, procedure returns.

  • Root set: All the data that can be accessed directly by a program, without having to dereference any pointer.

  • Code generation: The process of generating machine instruction/target program (be it asm or other) from an intermediary representation.

  • Addresses in target code: The code found in a static area is used for, static for global constants, and the heap is the dynamic managed area during program execution, stack is dynamic for holding activation records as they are created and destroyed during calls and returns

Environment summary 1 Environment summary 2 Environment summary 3 Environment summary 4 Environment summary 5

Position of code generator in compiler

  • Basic blocks and flow graphs: Dividing the code into sections called blocks, consisting of: flow that can only enter the basic block through the first instruction, no jump in the middle, and control will leave the block without halting or branching execpt possibly as the last instruction. The basic block becomes a node in a flow graph.

  • Live variable, and next-use: A variable that lives after one basic block, the next-use tell us when it’s going to be used

  • Optimizing the code: Optimization is based on multiple things including: cost of instruction, eliminate local common subexpressions, eliminate dead code, reorder statements that do not depend on one another, use algebraic laws to reorder operands of three-address instructions and sometimes simplify the computation

  • DAG for basic block: The basic block itself can be represented by a DAG, having as parents the operators and as leaves the operands. This is used for simplifications and to represent array references too.

  • Managing register and address descriptors: registers are limited and so we need an algorithm, using a getReg() method to choose what to do with the registers. We need two structures, one to know what is currently in the registers, a register descriptor, and one to know where, in which addresses, the variables are currently found, an address descriptor.

  • A register spill: When there’s no place in the current register to store the operand of an instruction and that register value needs to be stored on its own memory location.

Code generation summary 1 Code generation summary 2 Code generation summary 3

  • Peephole optimization: Improving a known target code, a peephole, by replacing instruction sequences within it by a shorter or faster sequence. It usually consists of many passes. Examples: redundant-instruction elimination, flow-of-control optimizations, algebraic simplifications, use of machine idioms.

  • Data flow graph analysis: A way of drawing the flow of a program/blocks to optimize it. When iterative, it usually consists of parameters in a semi-lattice with a domain, direction (forward, backward), a transfer function which has results in the domain, a boundary (top and bottom), a meet operator ∧ (that follows ≤ properties), equations, and initialization. Such graph can be: reaching definitions, live variables, available expressions, constant propagation, partial redundancy, etc..

Reaching definition

  • Monotonicity: A function f on a partial order is monotonic if: if x ≤ y then f(x) ≤ f(y)

Data flow summary 1 Data flow summary 2 Data flow summary 3 Data flow summary 4 Data flow summary 5 Data flow summary 6 Data flow summary 7

  • MOP (meet-over-all-paths solution): Then the “best” possible solution to a dataflow problem for node n is given by computing the dataflow information for all possible paths from entry to n, and then combining them ø. in general there will be an infinite number of possible paths to n.

  • Very busy expressions: An expression e is very busy at point p if On every path from p, expression e is evaluated before the value of e is changed

  • Natural loop: Conditions: It must have a single-entry node, called the header. This entry node dominates all nodes in the loop. There must be a back edge that enters the loop header. Otherwise, it is not possible for the flow of control to return to the header directl from the “loop”.

ILP summary 1 ILP summary 2 ILP summary 3 ILP summary 4

  • Region based analysis: Instead of iterative, we start from a small scope, apply the transfer function, and wide the scope.

  • Hardware vs software ILP: Machine that let the software manage parallelism are called VLIW machines (Very Long instruction word), and those that use the hardware are called superscalar machines. See computer architecture article.

  • Array afine optimization: When you can express the indices of the array by an affine function, you can start applying types of optimization such as time based and space based optimization.

Basic matrix multiplication 1 Basic matrix multiplication 2

Array access with matrix vector 1 Array access with matrix vector 2

Hardware optimization summary 1 Hardware optimization summary 2 Hardware optimization summary 3 Hardware optimization summary 4

Further Reading


  • Internet Archive Book Images / No restrictions

September 07, 2020

Frederic Cambus (fcambus)

Playing with Kore JSON API September 07, 2020 03:15 PM

Kore 4.0.0 has been released a few days ago, and features a brand new JSON API allowing to easily parse and serialize JSON objects.

During the last couple of years, I have been using Kore for various projects, including exposing hardware sensor values over the network via very simple APIs. In this article, I would like to present a generalization of this concept and show how easy it is to expose system information with Kore.

This small API example allows to identify hosts over the network and has been tested on Linux, OpenBSD, NetBSD, and macOS (thanks Joris!).

After creating a new project:

kodev create identify

Populate src/identify.c with the following code snippet:

#include <sys/utsname.h>

#include <kore/kore.h>
#include <kore/http.h>

#if defined(__linux__)
#include <kore/seccomp.h>


int		page(struct http_request *);

page(struct http_request *req)
	char *answer;

	struct utsname u;

	struct kore_buf buf;
	struct kore_json_item *json;

	if (uname(&u) == -1) {
		http_response(req, HTTP_STATUS_INTERNAL_ERROR, NULL, 0);
		return (KORE_RESULT_OK);

	kore_buf_init(&buf, 1024);
	json = kore_json_create_object(NULL, NULL);

	kore_json_create_string(json, "system", u.sysname);
	kore_json_create_string(json, "hostname", u.nodename);
	kore_json_create_string(json, "release", u.release);
	kore_json_create_string(json, "version", u.version);
	kore_json_create_string(json, "machine", u.machine);

	kore_json_item_tobuf(json, &buf);

	answer = kore_buf_stringify(&buf, NULL);
	http_response(req, 200, answer, strlen(answer));


	return (KORE_RESULT_OK);

And finally launch the project:

kodev run

The kodev tool will build and run the project, and we can now query the API to identify hosts:

  "system": "OpenBSD",
  "hostname": "",
  "release": "6.8",
  "version": "GENERIC.MP#56",
  "machine": "amd64"

Wesley Moore (wezm)

Slowing Down Read Rust Posting September 07, 2020 12:00 AM

After nearly 3 years and more than 3200 posts I'm going to slow down the posting frequency on Read Rust. I hope this will free up some spare time and make it easier to take breaks from social media. I aim to share all of the #rust2021 posts I can find, but after that I'll probably only share posts that seem particularly noteworthy or interesting.

I started Read Rust in January 2018 to track the posts being shared as part of the inaugural call for blog posts. When I started there were only a handful of new posts each day to triage. Now there are many more and unless I triage and publish daily they quickly pile up.

Also, I've kind of built a reflex of trying to "complete the Internet" each day by ensuring that I read my whole Twitter feed, and new posts on /r/rust. I would like to break this habit and be able to take breaks from these things, without feeling like I might miss an important post.

Whilst I think there is value in the curation and archiving of posts on Read Rust, the website doesn't see a lot of use. I think most of the value for people is following the Twitter, Mastodon, and Facebook accounts. However, there's a fair amount of overlap between posts shared on /r/rust, @rustlang, and This Week in Rust. So, I think that if folks keep an eye on one or more of those they will still see most posts of note.

If you're not into social media, the full list of more than 450 Rust RSS feeds I subscribe to is available via an OPML file on the site. So, feel free to use that to subscribe to a bunch of feeds instead. Rust blogs OPML.

It's been fun to build, and rebuild the website and surrounding tooling over the years. Read Rust was initially just an RSS feed but after requests for an actual web-page I built a small site with the Cobalt static site compiler. In late 2019 in an effort to streamline the sharing of posts I rebuilt the site as dynamic web app. In early 2020 I added full test search.

As mentioned in the introduction, from here I plan to share #rust2021 posts and after that posting will be much less frequent. Thanks for reading, and happy coding 🦀.

Frequently Anticipated Questions

Q. What about getting others to help share posts?

I considered this, and it it was actually part of the motivation for the rebuild in 2019. However, ultimately Rust is now large enough and continuing to grow such that it's become less and less feasible to curate the entire firehose of Rust content.

Q. What about making it a sort of RSS powered Rust planet?

I think there's value in curation. Rust is popular enough now that there's a lot of low effort posts, or repetitious getting started posts. Also, people rightly have diverse interests and their blog may not solely contain Rust posts. So, I'd prefer to keep the archive in the focussed state it's in now.

Q. What will happen to the site and social media accounts now?

I plan to keep the site up and running indefinitely. I am a strong believer in not breaking links on the web, and I think I have a pretty decent track record. For example, this site has been online for 13 years and I still have redirects in place from the very first version of it. I may still share the occasional post but in general I hope to free up a bit of time to work on other things.

September 06, 2020

Derek Jones (derek-jones)

Impact of function size on number of reported faults September 06, 2020 09:55 PM

Are longer functions more likely to contain more coding mistakes than shorter functions?

Well, yes. Longer functions contain more code, and the more code developers write the more mistakes they are likely to make.

But wait, the evidence shows that most reported faults occur in short functions.

This is true, at least in Java. It is also true that most of a Java program’s code appears in short methods (in C 50% of the code is contained in functions containing 114 or fewer lines, while in Java 50% of code is contained in methods containing 4 or fewer lines). It is to be expected that most reported faults appear in short functions. The plot below shows, left: the percentage of code contained in functions/methods containing a given number of lines, and right: the cumulative percentage of lines contained in functions/methods containing less than a given number of lines (code+data):

left: the percentage of code contained in functions/methods containing a given number of lines, and right: the cumulative percentage of lines contained in functions/methods containing less than a given number of lines.

Does percentage of program source really explain all those reported faults in short methods/functions? Or are shorter functions more likely to contain more coding mistakes per line of code, than longer functions?

Reported faults per line of code is often referred to as: defect density.

If defect density was independent of function length, the plot of reported faults against function length (in lines of code) would be horizontal; red line below. If every function contained the same number of reported faults, the plotted line would have the form of the blue line below.

Number of reported faults in C++ classes (not methods) containing a given number of lines.

Two things need to occur for a fault to be experienced. A mistake has to appear in the code, and the code has to be executed with the ‘right’ input values.

Code that is never executed will never result in any fault reports.

In a function containing 100 lines of executable source code, say, 30 lines are rarely executed, they will not contribute as much to the final total number of reported faults as the other 70 lines.

How does the average percentage of executed LOC, in a function, vary with its length? I have been rummaging around looking for data to help answer this question, but so far without any luck (the llvm code coverage report is over all tests, rather than per test case). Pointers to such data very welcome.

Statement execution is controlled by if-statements, and around 17% of C source statements are if-statements. For functions containing between 1 and 10 executable statements, the percentage that don’t contain an if-statement is expected to be, respectively: 83, 69, 57, 47, 39, 33, 27, 23, 19, 16. Statements contained in shorter functions are more likely to be executed, providing more opportunities for any mistakes they contain to be triggered, generating a fault experience.

Longer functions contain more dependencies between the statements within the body, than shorter functions (I don’t have any data showing how much more). Dependencies create opportunities for making mistakes (there is data showing dependencies between files and classes is a source of mistakes).

The previous analysis makes a large assumption, that the mistake generating a fault experience is contained in one function. This is true for 70% of reported faults (in AspectJ).

What is the distribution of reported faults against function/method size? I don’t have this data (pointers to such data very welcome).

The plot below shows number of reported faults in C++ classes (not methods) containing a given number of lines (from a paper by Koru, Eman and Mathew; code+data):

Number of reported faults in C++ classes (not methods) containing a given number of lines.

It’s tempting to think that those three curved lines are each classes containing the same number of methods.

What is the conclusion? There is one good reason why shorter functions should have more reported faults, and another good’ish reason why longer functions should have more reported faults. Perhaps length is not important. We need more data before an answer is possible.

Ponylang (SeanTAllen)

Last Week in Pony - September 6, 2020 September 06, 2020 07:19 PM

We have a new RFC for added syntax to extend automatic receiver recovery. The shared-docker shellcheck image is being deprecated.

Gonçalo Valério (dethos)

Giving a new life to old phones September 06, 2020 12:18 PM

Nowadays, in some “developed” countries, it is very common for people to have a bunch of old phones stored somewhere in a drawer. Ten years have passed since smartphones became ubiquitous and those devices tend to become unusable very quickly, at least for their primary purpose. Either a small component breaks, the vendor stops providing updates, newer apps don’t support those older versions, etc.

The thing is, these phones are still powerful computers. It would be great if we could give them another life once they are no longer fit for regular day to day use or the owner just wants to try a shiny new device.

I never had many smartphones, mines tend to last many years, but I still have one or two lying around. Recently I started thinking of new uses for them, make them work instead of just gathering dust. A quick search on the internet tells me that many people already had the same idea (I’m quite late to the party) and have been working on cool things to do with these devices.

However, most of these articles just throw the idea at you, without telling you how to do it. Others assume that your device is relatively recent.

Of course the difficulty increases with the age of the phone, in my case the software that I will be able to run on a 10 year old Samsung Galaxy S will not be as easy to find as the software that I can run on another device with just one or two years.

Bellow is a list posts I found online with cool things you can do with your old phones. What sets this list apart from other results is that all the items aren’t just ideas, they contain step by step instructions of how to achieve the end result.

You don’t have to follow the provided instructions rigorously and you should introduce some variations that are more appropriate to your use case.

Have fun and reuse your old devices.

September 05, 2020

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Primitive unary functions September 05, 2020 09:00 PM


Welcome back to the “Compiling a Lisp” series. Last time, we finished adding the rest of the constants as tagged pointer immediates. Since it’s not very useful to have only values (no way to operate on them), we’re going to add some primitive unary functions.

“Primitive” means here that they are built into the compiler, so we won’t actually compile the call to an assembly procedure call. This is also called a compiler intrinsic. “Unary” means the functions will take only one argument. “Function” is a bit of a misnomer because these functions won’t be real values that you can pass around as variables. You’ll only be able to use them as literal names in calls.

Though we’re still not adding a reader/parser, we can imagine the syntax for this looks like the following:

(integer? (integer->char (add1 96)))

Today we also tackle nested function calls and subexpressions.

Adding function calls will require adding a new compiler datastructure, an addition to the AST, but not to the compiled code. The compiled code will still only know about the immediate types.

Ghuloum proposes we add the following functions:

  • add1, which takes an integer and adds 1 to it
  • sub1, which takes an integer and subtracts 1 from it
  • integer->char, which takes an integer and converts it into a character (like chr in Python)
  • char->integer, which takes a character and converts it into an integer (like ord in Python)
  • null?, which takes an object and returns true if it is nil and false otherwise
  • zero?, which takes an object and returns true if it is 0 and false otherwise
  • not, which takes an object and returns true if it is false and false otherwise
  • integer?, which takes an object and returns true if it is an integer and false otherwise
  • bool?, which takes an object and returns true if it is a boolean and false otherwise

The functions add1, sub1, and the char/integer conversion functions will be our first real experience dealing with object encoding in the compiled code. What fun!

The implementations for null?, zero?, not, integer?, and bool? are so similar that I am only going to reproduce one or two in this post. The rest will be visible at assets/code/lisp/compiling-unary.c.

In order to implement these functions, we’ll also need some more instructions than mov and ret. Today we’ll add:

  • add
  • sub
  • shl
  • shr
  • or
  • and
  • cmp
  • setcc

Because the implementations of shl, shr, or, and and are so straightforward — just like mov, really — I’ll also omit them from the post. The implementations of add, sub, cmp, and setcc are more interesting.

The fundamental data structure of Lisp

Pairs, also called cons cells, two-tuples, and probably other things too, are the fundamental data structure of Lisp. At least the original Lisp. Nowadays we have fancy structures like vectors, too.

Pairs are a container for precisely two other objects. I’ll call them car and cdr for historical1 and consistency reasons, but you can call them whatever you like. Regardless of name, they could be represented as a C struct like this:

typedef struct Pair {
  ASTNode *car;
  ASTNode *cdr;
} Pair;

This is useful for holding pairs of objects (think coordinates, complex numbers, …) but it is also incredibly useful for making linked lists. Linked lists in Lisp are comprised of a car holding an object and the cdr holding another list. Eventually the last cdr holds nil, signifying the end of the list. Take a look at this handy diagram.

Fig. 1 - Cons cell list, courtesy of Wikipedia.

This represents the list (list 42 69 613), which can also be denoted (cons 42 (cons 69 (cons 613 nil))).

We’ll use these lists to represent the syntax trees for Lisp, so we’ll need to implement pairs to compile list programs.

Implementing pairs

In previous posts we implemented the immediate types the same way in the compiler and in the compiled code. I originally wrote this post doing the same thing: manually laying out object offsets myself, reading and writing from objects manually. The motivation was to get you familiar with the memory layout in the compiled code, but ultimately it ended up being too much content too fast. We’ll get to memory layouts when we start allocating pairs in the compiled code.

In the compiler we’re going to use C structs instead of manual memory layout. This makes the code a little bit easier to read. We’ll still tag the pointers, though.

const unsigned int kPairTag = 0x1;        // 0b001
const uword kHeapTagMask = ((uword)0x7);  // 0b000...0111
const uword kHeapPtrMask = ~kHeapTagMask; // 0b1111...1000

This adds the pair tag and some masks. As we noted in the previous posts, the heap object tags are all in the lowest three bits of the pointer. We can mask those out using this handy utility function.

uword Object_address(void *obj) { return (uword)obj & kHeapPtrMask; }

We’ll need to use this whenever we want to actually access a struct member. Speaking of struct members, here’s the definition of Pair:

typedef struct Pair {
  ASTNode *car;
  ASTNode *cdr;
} Pair;

And here are some functions for allocating and manipulating the Pair struct, to keep the implementation details hidden:

ASTNode *AST_heap_alloc(unsigned char tag, uword size) {
  // Initialize to 0
  uword address = (uword)calloc(size, 1);
  return (ASTNode *)(address | tag);

void AST_pair_set_car(ASTNode *node, ASTNode *car);
void AST_pair_set_cdr(ASTNode *node, ASTNode *cdr);

ASTNode *AST_new_pair(ASTNode *car, ASTNode *cdr) {
  ASTNode *node = AST_heap_alloc(kPairTag, sizeof(Pair));
  AST_pair_set_car(node, car);
  AST_pair_set_cdr(node, cdr);
  return node;

bool AST_is_pair(ASTNode *node) {
  return ((uword)node & kHeapTagMask) == kPairTag;

Pair *AST_as_pair(ASTNode *node) {
  return (Pair *)Object_address(node);

ASTNode *AST_pair_car(ASTNode *node) { return AST_as_pair(node)->car; }

void AST_pair_set_car(ASTNode *node, ASTNode *car) {
  AST_as_pair(node)->car = car;

ASTNode *AST_pair_cdr(ASTNode *node) { return AST_as_pair(node)->cdr; }

void AST_pair_set_cdr(ASTNode *node, ASTNode *cdr) {
  AST_as_pair(node)->cdr = cdr;

There a couple important things to note.

First, AST_heap_alloc very intentionally zeroes out the memory it allocates. If the members were left uninitialized, it might be possible to read off a struct member that had an invalid pointer in car or cdr. If we zero-initialize it, the member pointers represent the object 0 by default. Nothing will crash.

Second, we keep moving our ASTNode pointers through AST_as_pair. This function has two purposes: catch invalid uses (via the assert that the object is indeed a Pair) and also mask out the lower bits. Otherwise we’d have to do the masking in every operation individually.

Third, I abstracted out the AST_heap_alloc so we don’t expose the calloc function everywhere. This allows us to later swap out the allocator for something more intelligent, like a bump allocator, an arena allocator, etc.

And since memory allocated must eventually be freed, there’s a freeing function too:

void AST_heap_free(ASTNode *node) {
  if (!AST_is_heap_object(node)) {
  if (AST_is_pair(node)) {
  free((void *)Object_address(node));

This assumes that each ASTNode* owns the references to all of its members. So don’t borrow references to share between objects. If you need to store a reference to an object, make sure you own it. Otherwise you’ll get a double free. In practice this shouldn’t bite us too much because each program is one big tree.

Implementing symbols

We also need symbols! I mean, we could try mapping all the functions we need to integers, but that wouldn’t be very fun. Who wants to try and debug a program crashing on function#67? Not me. So let’s add a datatype that can represent names of things.

As above, we’ll need to tag the pointers.

const unsigned int kSymbolTag = 0x5;      // 0b101

And then our struct definition.

typedef struct Symbol {
  word length;
  char cstr[];
} Symbol;

I’ve chosen this variable-length object representation because it’s similar to how we’re going to allocate symbols in assembly and the mechanism in C isn’t so gnarly. This struct indicates that the memory layout of a Symbol is a length field immediately followed by that number of bytes in memory. Note that having this variable array in a struct is a C99 feature.

If you don’t have C99 or don’t like this implementation, that’s fine. Just store a char* and allocate another object for that string.

You could also opt to not store the length at all and instead NUL-terminate it. This has the advantage of not dealing with variable-length arrays (it’s just a tagged char*) but has the disadvantage of an O(n) length lookup.

Now we can add our Symbol allocator:

Symbol *AST_as_symbol(ASTNode *node);

ASTNode *AST_new_symbol(const char *str) {
  word data_length = strlen(str) + 1; // for NUL
  ASTNode *node = AST_heap_alloc(kSymbolTag, sizeof(Symbol) + data_length);
  Symbol *s = AST_as_symbol(node);
  s->length = data_length;
  memcpy(s->cstr, str, data_length);
  return node;

See how we have to manually specify the size we want. It’s a little fussy, but it works.

Storing the NUL byte or not is up to you. It saves one byte per string if you don’t, but it makes printing out strings in the debugger a bit of a pain since you can’t just treat them like normal C strings.

Some Lisp implementations use a symbol table to ensure that symbols allocated with equivalent C-string values return the same pointer. This allows the implementations to test for symbol equality by testing pointer equality. I think we can sacrifice a bit of memory and runtime speed for implementation simplicity, so I’m not going to do that.

Let’s add the rest of the utility functions:

bool AST_is_symbol(ASTNode *node) {
  return ((uword)node & kHeapTagMask) == kSymbolTag;

Symbol *AST_as_symbol(ASTNode *node) {
  return (Symbol *)Object_address(node);

const char *AST_symbol_cstr(ASTNode *node) {
  return (const char *)AST_as_symbol(node)->cstr;

bool AST_symbol_matches(ASTNode *node, const char *cstr) {
  return strcmp(AST_symbol_cstr(node), cstr) == 0;

Now we can represent names.

Representing function calls

We’re going to represent function calls as lists. That means that the following program:

(add1 5)

can be represented by the following C program:

Pair *args = AST_new_pair(AST_new_integer(5), AST_nil());
Pair *program = AST_new_pair(AST_new_symbol("add1"), args);

This is a little wordy. We can make some utilities to trim the length down.

ASTNode *list1(ASTNode *item0) {
  return AST_new_pair(item0, AST_nil());

ASTNode *list2(ASTNode *item0, ASTNode *item1) {
  return AST_new_pair(item0, list1(item1));

ASTNode *new_unary_call(const char *name, ASTNode *arg) {
  return list2(AST_new_symbol(name), arg);

And now we can represent the program as:

list2(AST_new_symbol("add1"), AST_new_integer(5));
// or, shorter,
new_unary_call("add1", AST_new_integer(5));

This is great news because we’ll be adding many tests today.

Compiling primitive unary function calls

Whew. We’ve built up all these data structures and tagged pointers and whatnot but haven’t actually done anything with them yet. Let’s get to the compilers part of the compilers series, please!

First, we have to revisit Compile_expr and add another case. If we see a pair in an expression, then that indicates a call.

int Compile_expr(Buffer *buf, ASTNode *node) {
  // Tests for the immediates ...
  if (AST_is_pair(node)) {
    return Compile_call(buf, AST_pair_car(node), AST_pair_cdr(node));
  assert(0 && "unexpected node type");

I took the liberty of separating out the callable and the args so that the Compile_call function has less to deal with.

We’re only supporting primitive unary function calls today, which means that we have a very limited pattern of what is accepted by the compiler. (add1 5) is ok. (add1 (add1 5)) is ok. (blargle 5) is not, because the blargle isn’t on the list above. ((foo) 1) is not, because the thing being called is not a symbol.

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args) {
  assert(AST_pair_cdr(args) == AST_nil() &&
         "only unary function calls supported");
  if (AST_is_symbol(callable)) {
    // Switch on the different primitives here...
  assert(0 && "unexpected call type");

Compile_call should look at what symbol it is, and depending on which symbol it is, emit different code. The overall pattern will look like this, though:

  • Compile the argument — the result is stored in rax
  • Do something to rax

Let’s start with add1 since it’s the most straightforward.

    if (AST_symbol_matches(callable, "add1")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_add_reg_imm32(buf, kRax, Object_encode_integer(1));
      return 0;

If we see add1, compile the argument (as above). Then, add 1 to rax. Note that we’re not just adding the literal 1, though. We’re adding the object representation of 1, ie 1 << 2. Think about why! When you have an idea, click the footnote.2

If you’re wondering what the underscore (_) function is, it’s a macro that I made to test the return value of the compile expression and return if there was an error. We don’t have any non-aborting error cases just yet, but I got tired of writing if (result != 0) return result; over and over again.

Note that there is no runtime error checking. Our compiler will allow (add1 nil) to slip through and mangle the pointer. This isn’t ideal, but we don’t have the facilities for error reporting just yet.

sub1 is similar to add1, except it uses the sub instruction. You could also just use add with the immediate representation of -1.

integer->char is different. We have to change the tag of the object. In order to do that, we shift the integer left and then drop the character tag onto it. This is made simple by integers having a 0b00 tag (nothing to mask out).

Here’s a small diagram showing the transitions when converting 97 to 'a':

High                                                           Low
0000000000000000000000000000000000000000000000000000000[1100001]00  Integer
0000000000000000000000000000000000000000000000000[1100001]00000000  Shifted
0000000000000000000000000000000000000000000000000[1100001]00001111  Character

where the number in enclosed in [brackets] is 97. And here’s the code to emit assembly that does just that:

    if (AST_symbol_matches(callable, "integer->char")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_shl_reg_imm8(buf, kRax, kCharShift - kIntegerShift);
      Emit_or_reg_imm8(buf, kRax, kCharTag);
      return 0;

Note that we’re not shifting left by the full amount. We’re only shifting by the difference, since integers are already two bits shifted.

char->integer is similar, except it’s just a shr. Once the value is shifted right, the char tag gets dropped off the end, so there’s no need to mask it out.

nil? is our first primitive with ~ exciting assembly instructions ~. We get to use cmp and setcc. The basic idea is:

  • Compare (this means do a subtraction) what’s in rax and nil
  • Set rax to 0
  • If they’re equal (this means the result was 0), set al to 1
  • Shift left and tag it with the bool tag

al is the name for the lower 8 bits of rax. There’s also ah (for the next 8 bits, but not the highest bits), cl/ch, etc.

    if (AST_symbol_matches(callable, "nil?")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_cmp_reg_imm32(buf, kRax, Object_nil());
      Emit_mov_reg_imm32(buf, kRax, 0);
      Emit_setcc_imm8(buf, kEqual, kAl);
      Emit_shl_reg_imm8(buf, kRax, kBoolShift);
      Emit_or_reg_imm8(buf, kRax, kBoolTag);
      return 0;

The cmp leaves a bit set (ZF) in the flags register, which setcc then checks. setcc, by the way, is the name for the group of instructions that set a register if some condition happened. It took me a long time to realize that since people normally write sete or setnz or something. And cc means “condition code”.

If you want to simplify your life — we’re going to do a lot of comparisons today – we can extract that into a function that compares rax with some immediate value, and then refactor Compile_call to call that.

void Compile_compare_imm32(Buffer *buf, int32_t value) {
  Emit_cmp_reg_imm32(buf, kRax, value);
  Emit_mov_reg_imm32(buf, kRax, 0);
  Emit_setcc_imm8(buf, kEqual, kAl);
  Emit_shl_reg_imm8(buf, kRax, kBoolShift);
  Emit_or_reg_imm8(buf, kRax, kBoolTag);

Let’s also poke at the implementations of cmp and setcc, since they involve some fun instruction encoding.

cmp, as it turns out, has a short-path when the register it’s comparing against is rax. This means we get to save one (1) whole byte if we want to!

void Emit_cmp_reg_imm32(Buffer *buf, Register left, int32_t right) {
  Buffer_write8(buf, kRexPrefix);
  if (left == kRax) {
    // Optimization: cmp rax, {imm32} can either be encoded as 3d {imm32} or 81
    // f8 {imm32}.
    Buffer_write8(buf, 0x3d);
  } else {
    Buffer_write8(buf, 0x81);
    Buffer_write8(buf, 0xf8 + left);
  Buffer_write32(buf, right);

If you don’t want to, just use the 81 f8+ pattern.

For setcc, we have to define this new notion of “partial registers” so that we can encode the instruction properly. We can’t re-use Register because there are two partial registers for rax. So we add a PartialRegister.

typedef enum {
  kAl = 0,
} PartialRegister;

And then we can use those in the setcc implementation:

void Emit_setcc_imm8(Buffer *buf, Condition cond, PartialRegister dst) {
  Buffer_write8(buf, 0x0f);
  Buffer_write8(buf, 0x90 + cond);
  Buffer_write8(buf, 0xc0 + dst);

Again, I didn’t come up with this encoding. This is Intel’s design.

The zero? primitive is much the same as nil?, and we can re-use that Compile_compare_imm32 function.

    if (AST_symbol_matches(callable, "zero?")) {
      _(Compile_expr(buf, operand1(args)));
      Compile_compare_imm32(buf, Object_encode_integer(0));
      return 0;

not is more of the same — compare against false.

Now we get to integer?. This is similar, but different enough that I’ll reproduce the implementation below. Instead of comparing the whole number in rax, we only want to look at the lowest 2 bits. This can be accomplished by masking out the other bits, and then doing the comparison. For that, we emit an and first and compare against the tag.

    if (AST_symbol_matches(callable, "integer?")) {
      _(Compile_expr(buf, operand1(args)));
      Emit_and_reg_imm8(buf, kRax, kIntegerTagMask);
      Compile_compare_imm32(buf, kIntegerTag);
      return 0;

It’s possible to shorten the implementation a little bit because and sets the zero flag. This means we can skip the cmp. But it’s only one instruction and I’m lazy so I’m reusing the existing infrastructure.

Last, boolean? is almost the same as integer?.

Boom! Compilers! Let’s check our work.


I’ll only include a couple tests here, since the new tests are a total of 283 lines added and are a little bit repetitive.

First, the simplest test for add1.

TEST compile_unary_add1(Buffer *buf) {
  ASTNode *node = new_unary_call("add1", AST_new_integer(123));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov rax, imm(123); add rax, imm(1); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0xec, 0x01, 0x00, 0x00,
                     0x48, 0x05, 0x04, 0x00, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(buf, expected);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_integer(124));

Second, a test of nested expressions:

TEST compile_unary_add1_nested(Buffer *buf) {
  ASTNode *node = new_unary_call(
      "add1", new_unary_call("add1", AST_new_integer(123)));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov rax, imm(123); add rax, imm(1); add rax, imm(1); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0xec, 0x01, 0x00, 0x00,
                     0x48, 0x05, 0x04, 0x00, 0x00, 0x00, 0x48,
                     0x05, 0x04, 0x00, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(buf, expected);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_integer(125));

Third, the test for boolean?.

TEST compile_unary_booleanp_with_non_boolean_returns_false(Buffer *buf) {
  ASTNode *node = new_unary_call("boolean?", AST_new_integer(5));
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // 0:  48 c7 c0 14 00 00 00    mov    rax,0x14
  // 7:  48 83 e0 3f             and    rax,0x3f
  // b:  48 3d 1f 00 00 00       cmp    rax,0x0000001f
  // 11: 48 c7 c0 00 00 00 00    mov    rax,0x0
  // 18: 0f 94 c0                sete   al
  // 1b: 48 c1 e0 07             shl    rax,0x7
  // 1f: 48 83 c8 1f             or     rax,0x1f
  byte expected[] = {0x48, 0xc7, 0xc0, 0x14, 0x00, 0x00, 0x00, 0x48, 0x83,
                     0xe0, 0x3f, 0x48, 0x3d, 0x1f, 0x00, 0x00, 0x00, 0x48,
                     0xc7, 0xc0, 0x00, 0x00, 0x00, 0x00, 0x0f, 0x94, 0xc0,
                     0x48, 0xc1, 0xe0, 0x07, 0x48, 0x83, 0xc8, 0x1f};
  EXPECT_EQUALS_BYTES(buf, expected);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_false());

I’m getting the fancy disassembly from I include it because it makes the tests easier for me to read and reason about later. You just have to make sure the text and the binary representations in the test don’t go out of sync because that can be very confusing…

Anyway, that’s a wrap for today. Send your comments on the elist! Next time, binary primitives.

  1. There’s a long-running dispute about what to call these two objects. The original Lisp machine (the IBM 704) had a particular hardware layout that led to the creation of the names car and cdr. Nobody uses this hardware anymore, so the names are historical. Some people call them first/fst and second/snd. Others call them head/hd and tail/tl. Some people have other ideas

  2. If you said “to preserve the tag” or “adding 1 would make it a pair” or some variant on that, you’re correct! Otherwise, I recommend going back to the diagram from the last couple of posts and then writing down binary representations of a couple of numbers by hand on a piece of paper. 

September 04, 2020

Kevin Burke (kb)

Building a better home network September 04, 2020 09:28 PM

I finally got my home network in a place where I am happy with it. I wanted to share my setup and what I learned about it. There has never been a better time to set up a great home network; there are several new tools that have made this easier and better than in the past. Hopefully this will help you set up your home network!

My house

My house is two stories on a standard 25 x 100 square foot San Francisco lot. The ground floor looks roughly like this:

|               |                      |
|               |         |   Office   |
|    Garage     | Mudroom |            |
|               |         |-------------
|                           | | | | | |

Upstairs looks like this:

|    ___________                       |
|               |        Living Room   |
|    Bedroom    | Kitchen              |
|               |         -------------
|               |           | | | | | |

We have a Roku in the living room. My goals for home internet were:

  • Good wireless connection in every room
  • Ethernet connections in the office
  • Ethernet connection to the Roku
  • Synology network attached storage (NAS) and other external hard drives reachable from anywhere in the house

We are lucky to have Sonic Fiber internet service. Sonic comes in to a box in the garage, and an Ethernet line runs from there to the mudroom. None of the other rooms have Ethernet connections.

Initial setup

Sonic really wants to push Eero routers to everyone.1 Eero is fairly easy to set up, and Sonic collects a small fee from renting the router to you. You can extend your home network by adding more Eero's into a mesh network.

If you have a small apartment, an Eero is probably going to be a good fit. However, the mesh network was not great for achieving any of the goals I had in mind. The repeaters (Eero beacon) do not have any Ethernet out ports. It was difficult to extend the network from the mudroom to the bedroom without renting two extenders, which added about $100 per year, increased latency and lowered speeds. Further, clients on the network kept connecting to an Eero that was further away, instead of the closest one.


(NB: please don't stop reading here as I don't recommend this.) My next step was to replace the Eero's with a traditional Netgear wireless router in the mudroom. This also could not reach to the bedroom. So I bought a powerline adapter and plugged one end in near the router and the other end in the bedroom.

Powerline adapters send signal via electric current in your house. They don't offer great speeds. Devices on your network that use a lot of electricity, like laundry machines or the microwave, can render the powerline connection unusable.

There are probably better solutions for you than powerline adapters in 2020.

Extending Ethernet to more rooms

I called a cabling company about the possibility of running Ethernet to more rooms in the house. We decided the bedroom would be very easy since it's directly above the garage. It took a team of two two hours to drill a hole in the garage, run a cable up the side of the house to the bedroom, and install an Ethernet port in the bedroom. This cost about $200.

We looked at running Ethernet to other rooms but the geography of the stairs made this really tricky.

Side note: future proofing cabling

Our house has coax cables - the traditional method of getting e.g. cable TV service - running from the garage to four rooms in the house, but it doesn't have Ethernet set up. This is disappointing since it was built within the last decade.

There are two things you can do to future proof cable runs in your house, and ensure that cables can be replaced/swapped out if mice eat them or whatever. I highly recommend you implement them any time you are running cable. One is to leave a pull cable in the wall next to whatever cable you are installing. If you need to run a new cable, you can attach it to the pull cable, and then pull it all the way through from one end to the other.

Normally cables will be stapled to the wall interior, which makes them impossible to pull through. The other option is to leave cables unstapled. This will let you use the coax/other cable directly as the pull cable. In general though it's better to just leave a second pull line in the wall behind the port.

Without either of these solutions in place, running new cables is going to be messy. You can either try to hide it by running it along the exterior walls or ceiling of your house, or drill holes in the wall every few feet, pass a new cable through, and then patch up the holes.

Side note: cat 5 vs. cat 6

Your internet speed will be bottlenecked by the slowest link in the network. Be careful it isn't your cables!

There are two flavors of Ethernet cable. Category 5 is cheaper, but can only support speeds of 100 Mbps. Category 6 is slightly more expensive but you will need it to get full gigabit speeds.

The Ethernet cables that come with the products you buy may be Cat 5 or 6. Be careful to check which one you are using (it should be written in small print on the outside of the cable).


To load, your computer looks up the IP address for Google and sends packets to it. So far so good, but how does Google send packets back? Each client on the network needs a unique local IP address. The router will translate between an open port to Google, say, port 44982, and a local IP address, say,, and send packets it receives from the broader Internet on port 44982 to the client with that IP address.

What happens if two clients on your network try to claim the same local IP address? That would be bad. Generally you set up a DHCP server to figure this out. When your phone connects to a wifi network it sends out a packet that says basically "I need an IP address." If a DHCP server is listening anywhere on the network it will find an empty IP address slot and send it back to the phone.2 The phone can then use that IP address.

Generally speaking, a consumer wireless router has three components:

  • wireless radios, that broadcast a network SSID and send packets to and from wireless clients.
  • an Ethernet switch that can split an incoming Internet connection into four or more out ports. Generally this has one WAN port (that connects to your modem/ISP) and four LAN ports (that connect to local devices on your network)
  • a DHCP server.

You can buy products that offer each of these independently - a four way switch without a radio or DHCP server will cost you about $15. But this is a convenient bundle for home networks.

If your network contains multiple switches or multiple routers you need to think about which of these devices will be giving out DHCP.

Two Routers, Too Furious

At this point my network had one router in the bedroom and one router upstairs in the living room, via an ungainly cable up the stairs. So I had good coverage in every room, and the Roku hooked up via Ethernet to the living room router, but this setup still had a few problems. I didn't have the office wired up, and the NAS only worked when you were connected to the living room router.

Furthermore, I kept running into issues where I would walk from the living room to the bedroom or vice versa but my phone/laptop would stay connected to the router in the room I was just in. Because that router was outside its normal "range", I would get more latency and dropped packets than usual, which was frustrating.

How to diagnose and measure this problem

On your laptop, hold down Option when you click the wifi button, and you'll get an extended menu that looks like this.

The key value there is the RSSI parameter, which measures the signal quality from your client to the router. This is currently at -46, a quite good value. Lower than -65 and your connection quality will start to get dicey - you will see lower bandwidth and higher latency and dropped packets.

Apple devices will hang on to the router they are currently connected to until the RSSI gets to -75 or worse, which is a very low value. This is explained in gory detail on this page. Because router coverage areas are supposed to overlap a little bit, this means the connection will have to get very bad before your phone or laptop will start looking for a new radio.

Adjust the power

Generally this means that you don't want the coverage area for the router to reach to the center of the coverage area for the other router, if you can help it. If the coverage areas don't overlap that much, clients will roam to the closest router, which will improve the connection.

You can adjust the coverage area either by physically moving the router or by lowering the power for the radios (which you may be able to do in the admin panel for the router).

If neither of these works, as a last ditch attempt you can give your routers different network names. But this makes it more difficult to keep a connection when you roam from one router to the other.

Ethernet Over... Coax?

I had not managed to get a fixed connection to the office, which would have required snaking a Ethernet cable over at least two doorways and three walls. However, I heard recently about a new technology called MoCA (multimedia over coax), which makes it possible to send an Ethernet signal over the coax line from the garage to the office. I bought a MoCA adapter for each end of the connection - about $160 in total - and wired it up and... it worked like a charm!

Moca ethernet over coax connector in

The latency is slightly higher than traditional Ethernet, but only by a few milliseconds, and the bandwidth is not as high as a normal wired connection but it's fine - I am still glad to be able to avoid a wireless connection in that room.

This change let me move my NAS into the office as well, which I'm quite happy about.

Letting Everything Talk to Each Other

At this point I had a $15 unmanaged switch in the garage that received a connection from the Sonic Fiber router, and sent it to three places - the bedroom, the living room and my office. However, the fact that it was unmanaged meant that each location requested a public IP address and DHCP from Sonic. Sonic was not happy with this arrangement - there is a limit of 8 devices per account that are stored in a table mapping a MAC address to an IP address, and after this you need to call in to have the table cleared out. This design also meant that the clients on my network couldn't talk to each other - I couldn't access the NAS unless I was connected to the living room router.

The solution was to upgrade to a "managed" switch in the garage that could give out DHCP. You can buy one that is essentially a wifi router without the radio for about $60. This has the same dashboard interface as your router does and can give out DHCP.

Once this switch was in place, I needed to update the routers to stop giving out DHCP (or put them in "pass through mode") so only a single device on the network was assigning IP addresses. I watched the routers and NAS connect, then assigned static IP's on the local network to each one. It's important to do this before you set them in pass-through mode so you can still access them and tweak their settings.

You should be able to find instructions on pass-through mode or "disable DHCP" for your router online. You may need to change the IP address for the router to match the static IP you gave out in the previous paragraph.

That's it

I finally have a network that supported everything I want to do with it! I can never move now.

Garage router setup

I hope this post was helpful. I think the most important thing to realize is that if you haven't done this in a few years, or your only experience is with consumer grade routers, there are other tools/products you can buy to make your network better.

If you are interested in this space, or interested in improving your office network along these lines, I'm working with a company that is making this drop dead easy to accomplish. Get in touch!

1. I posted on the forums to get help several times. Dane Jasper, the Sonic CEO who's active on the forums, responded to most of my questions with "you should just use Eero." I love that he is on the forums but Eero is just not great for what I'm trying to do.

2. I'm simplifying - there are two roundtrips, not one - but the details are really not that important.

Jeremy Morgan (JeremyMorgan)

Optimizing String Comparisons in Go September 04, 2020 07:07 PM

Want your Go programs to run faster? Optimizing string comparisons in Go can improve your application’s response time and help scalability. Comparing two strings to see if they’re equal takes processing power, but not all comparisons are the same. In a previous article, we looked at How to compare strings in Go and did some benchmarking. We’re going to expand on that here. It may seem like a small thing, but as all great optimizers know, it’s the little things that add up.

September 02, 2020

Eric Faehnrich (faehnrich)

Booting a 486 From Floppy with the Most Up-to-Date Stable Linux Kernel September 02, 2020 06:04 PM

pretty cool simple writeup of a floppy to boot modern linux on a 486

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Booleans, characters, nil September 02, 2020 07:45 AM


Welcome back to the “Compiling a Lisp” series. Last time, we compiled integer literals. In today’s relatively short post, we’ll add the rest of the immediate types. Our programs will look like this:

  • 'a'
  • true
  • false
  • nil or ()

In addition, since we’re not adding too much exciting stuff today, I made writing tests a little bit easier by adding fixtures. Now, if we want, we can get a pre-made Buffer object passed into the test, and then have it destroyed afterward.


Since we’re coming back to the pointer tagging scheme, I’ve reproduced the “pointer templates” (I don’t think that’s a real term) diagram from the last post below.

High							     Low
0000000000000000000000000000000000000000000000000XXXXXXX00001111  Character
00000000000000000000000000000000000000000000000000000000X0011111  Boolean
0000000000000000000000000000000000000000000000000000000000101111  Nil

Notice that we have a pattern among the other immediates (character, boolean, and nil) – the lower four bits are all the same, and that sets them apart from other pointer types.

Also notice that among those immediates, they can be discriminated by the two bits just above those four:

High							     Low
0000000000000000000000000000000000000000000000000XXXXXXX00[00][1111]  Character
00000000000000000000000000000000000000000000000000000000X0[01][1111]  Boolean
0000000000000000000000000000000000000000000000000000000000[10][1111]  Nil

So a lower four bits of 0b1111 means immediate, and from there 0b00 means character, 0b01 means boolean, and 0b10 means nil. There’s even room to add another immediate tag pattern (0b11) if we like.

Let’s add some of the symbolic constants for bit manipulation.

const unsigned int kImmediateTagMask = 0x3f;

const unsigned int kCharTag = 0xf;   // 0b00001111
const unsigned int kCharMask = 0xff; // 0b11111111
const unsigned int kCharShift = 8;

const unsigned int kBoolTag = 0x1f;  // 0b0011111
const unsigned int kBoolMask = 0x80; // 0b10000000
const unsigned int kBoolShift = 7;

Notice that we don’t have any for nil. That’s because nil is a singleton and has no payload at all. It’s just a solitary 0x2f.

For the others, we need to put the payload alongside the tag, and that requires a shift and a bitwise or. The first operation, the shift, moves the payload left enough that there’s space for a tag, and the or adds the tag.

word Object_encode_char(char value) {
  return ((word)value << kCharShift) | kCharTag;

char Object_decode_char(word value) {
  return (value >> kCharShift) & kCharMask;

word Object_encode_bool(bool value) {
  return ((word)value << kBoolShift) | kBoolTag;

bool Object_decode_bool(word value) { return value & kBoolMask; }

word Object_true() { return Object_encode_bool(true); }

word Object_false() { return Object_encode_bool(false); }

word Object_nil() { return 0x2f; }

For bool, we’ve done a little trick. Since we only care if the value is true or false, instead of doing both a shift and mask to decode, we can turn off the tag bits. The resulting value will be either 0b00000000 for false or 0b10000000 for true. Since any non-zero value is truthy in C, we can “cast” that to a C bool by just returning it.

Note that the cast from char and bool to word is necessary because — as I learned the hard way, several months ago — shifting a type left more to the left than the size has bits is either undefined or implementation-defined behavior. I can’t remember which offhand but the situation went sideways and left me scratching my head.

I added Object_true and Object_false because I thought they might come in handy at some point, but we don’t have a use for them now. If you are strongly against including dead weight code, then feel free to omit them.

Now let’s add some more AST utility functions before we move on to compiling:

bool AST_is_char(ASTNode *node) {
  return ((word)node & kImmediateTagMask) == kCharTag;

char AST_get_char(ASTNode *node) { return Object_decode_char((word)node); }

ASTNode *AST_new_char(char value) {
  return (ASTNode *)Object_encode_char(value);

bool AST_is_bool(ASTNode *node) {
  return ((word)node & kImmediateTagMask) == kBoolTag;

bool AST_get_bool(ASTNode *node) { return Object_decode_bool((word)node); }

ASTNode *AST_new_bool(bool value) {
  return (ASTNode *)Object_encode_bool(value);

bool AST_is_nil(ASTNode *node) { return (word)node == Object_nil(); }

ASTNode *AST_nil() { return (ASTNode *)Object_nil(); }

Enough talk about object encoding. Let’s compile some immediates.


The implementation is much the same as for integers. Check the type, pull out the payload, move to rax.

int Compile_expr(Buffer *buf, ASTNode *node) {
  if (AST_is_integer(node)) {
    word value = AST_get_integer(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_integer(value));
    return 0;
  if (AST_is_char(node)) {
    char value = AST_get_char(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_char(value));
    return 0;
  if (AST_is_bool(node)) {
    bool value = AST_get_bool(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_bool(value));
    return 0;
  if (AST_is_nil(node)) {
    Emit_mov_reg_imm32(buf, kRax, Object_nil());
    return 0;
  assert(0 && "unexpected node type");

I suppose we could coalesce these by checking if the node is any sort of immediate and then writing the address immediately back with Emit_mov_reg_imm32… but that would be breaking abstractions or something.


The testing is also so much the same — so much so, that I’ll only include the test for compiling characters. The other code is available from assets/code/lisp if you would like a reference.

TEST compile_char(Buffer *buf) {
  char value = 'a';
  ASTNode *node = AST_new_char(value);
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov eax, imm('a'); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0x0f, 0x61, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(buf, expected);
  word result = Testing_execute_expr(buf);
  ASSERT_EQ(result, Object_encode_char(value));

You’ll notice that instead of void, the function now takes Buffer*. This is part of the new testing fixtures setup that I mentioned earlier. The implementation is a macro that uses greatest.h’s “pass a parameter to your test” feature. Running a test looks much the same:


Anyway, that’s a wrap for today. Next time we’ll add some unary primitives for querying and manipulating the objects we have already.

Marc Brooker (mjb)

Focus on the Good Parts September 02, 2020 12:00 AM

Focus on the Good Parts

Skepticism and cynicism can get in your way.

Back in May, I wrote Reading Research: A Guide for Software Engineers, answering common questions I get about why and how to read research papers. In that post, I wrote about three modes of reading: solution finding, discovery, and curiosity. In subsequent conversations, I've realized there's another common issue that gets in engineers' ways when they read research, especially in the discovery and curiosity modes: too much skepticism.

The chief deficiency I see in the skeptical movement is its polarization: Us vs. Them — the sense that we have a monopoly on the truth; that those other people who believe in all these stupid doctrines are morons; that if you're sensible, you'll listen to us; and if not, to hell with you. (from Carl Sagan's The Demon Haunted World)

I could blame it on comment thread culture, racing to make that top comment pointing out errors in the paper. I could blame it on the low signal-to-noise ratio of content in general. I could blame it on poor research, poor writing, or incorrect data. But whatever is to blame, many readers approach technical content with their first goal being to find errors and mistakes, gaps in logic, or incomplete justifications of statements. When a mistake is found, the reader is justified in throwing out the whole piece of writing (unreliable!), the authors (sloppy!), their institutions (clueless!), or even the whole field (substandard!). It's also a perfect opportunity to write that comment or tweet pointing out the problems. After all, if you found the author's mistake, doesn't that make you smarter and better than the author?

This approach gets in the way of your ability to learn from reading. I'd encourage you to take a different one: read with the goal of finding the good stuff. Dig for the ideas, the insights, the analyses and the data points that provide value. Look for what you can learn.

I'm not suggesting that you don't carefully approach what you read. You absolutely should make sure what you believe is well-supported. Don't waste your life reading crap. Your time is too valuable for that.

The flip side of this is relying too much on social proof. If you open the comment thread first, you'll find that the piece you're about to read is great or it's crap or it's another piece of junk published by those people (you know, them, the incompetent ones). Then, when you finally read the paper, you'll be less smart. You'll be biased towards confirming the opinions of others, rather than reading and understanding the material. I'm not against comment threads, but I never read them first.

Again, you can go too far in this direction. A lot of academic publishing is an exercise in social proof. Almost all the filtering we use to reduce the firehose of content down to a manageable stream depends on social proof. We use these tools because they're powerful, and scalable. But remember than popularity with Hacker News commenters, and even publication in a prestigious conference or journal, is only weak evidence of quality. Unpopularity, and rejection, are weak evidence of a lack of quality.

An Example

Fox and Brewer's classic paper Harvest, Yield, and Scalable Tolerant Systems contains many great ideas. The framing of Harvest and Yield is very useful, and I've found it's had a big influence on the way that I have approached system design over the years. The first time I read it, though, I put it down. The parts describing CAP (Section 2 and 3) are confusing at best and wrong at worst (as I've blogged about before). I couldn't get past them.

It was only after being encouraged by a colleague that I read the whole thing. Taken as a whole, it's full of great ideas. If I had kept tripping over my skepticism, and getting stuck on the bad parts, I never would have been able to learn from it.

August 31, 2020

Frederic Cambus (fcambus)

Modernizing the OpenBSD console August 31, 2020 06:30 PM

At the beginning were text mode consoles. Traditionally, *BSD and Linux on i386 and amd64 used text mode consoles which by default provided 25 rows of 80 columns, the "80x25 mode". This mode uses a 8x16 font stored in the VGA BIOS (which can be slightly different across vendors).

OpenBSD uses the wscons(4) console framework, inherited from NetBSD.

CRT monitors allowed to set the resolution you wanted, so on bigger monitors a 80x25 console in textmode was fairly large but not blurry.

Framebuffer consoles allowed taking advantage of larger monitor sizes, to fit more columns and row. With the switch to LCD monitors, also in part driven by the decreasing costs of laptops, the fixed size panels became a problem as the text mode resolution needed to be stretched, leading to distortion and blurriness.

One thing some people might not realize, is the huge discrepancy between text mode and framebuffer consoles regarding the amount of data you have to write to cover the whole screen. In text mode, we only need to write 2 bytes per character: 1 byte for the ASCII code, and 1 byte for attributes. So in 80x25 text mode, we only need to write 80 * 25 * 2 bytes of data, which is 4000 bytes, and the VGA card itself takes care of plotting characters to the screen. In framebuffer however, to fill a 4K UHD-1 (3840x2160) screen in 32bpp mode we need to send 3840 * 2160 * 4 bytes of data, which is 33177600 bytes (approximately 33 MB).

On framebuffer consoles, OpenBSD uses the rasops(9) subsystem (raster operations), imported from NetBSD in 2001.

While they had been used for a while on platforms without VGA cards, framebuffer consoles were only enabled on i386 and amd64 in 2013 for inteldrm(4) and radeondrm(4).

In recent years, rasops(9) itself and framebuffer drivers have seen some improvements:

General improvements:

  • Add and enable efifb(4), EFI framebuffer driver (yasuoka@, August 2015)
  • Implement counter-clockwise rotation (kettenis@, August 2017)
  • Implement scrollback in rasops(9) (jcs@, April 2018)

Performance related improvements:

  • Make it possible to use RI_WRONLY during early boot (kettenis@, September 2015)
  • Introduce rasops_wronly_do_cursor() (kettenis@, August 2018)
  • Remap EFI framebuffer early to use write combining (kettenis@, September 2018)
  • Do PAT setup earlier, so mapping the framebuffer WC actually works (kettenis@, December 2018)
  • Fast conditional console scrolling (John Carmack, June 2020)
  • Optimize character rendering in 32bpp mode (John Carmack, June 2020)

Console fonts improvements:

There is an article about Spleen in the OpenBSD Journal with more information, notably on the font selection mechanism relative to screen resolution.

And work slowly continues to make framebuffer consoles more usable.

It is interesting to note that while NetBSD has been adding a lot of features to rasops(9) over the years, OpenBSD has taken a more conservative approach. There is however one major feature that NetBSD currently has which would be beneficial: the capability for loading fonts of different metrics and subsequently resizing screens.

Looking forward, performance of various operations could likely still be improved, possibly by leveraging the new OpenBSD dynamic tracing mechanism to analyze bottlenecks.

Another open question is UTF-8 support, Miod Vallat started work in this direction back in 2013 but there are still a few things missing. I have plans to implement sparse font files support in the future, at least so one can take advantage of box drawing and possibly block elements characters.

Lastly, a major pain point has been the lack of larger fonts in RAMDISK kernels, making installations and upgrades very difficult and error-prone on large DPI monitors as the text is basically unreadable. There is no technical blocker to make this happen, which ironically makes it the most difficult kind of issue to tackle.

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Integers August 31, 2020 04:46 PM


Welcome back to the “Compiling a Lisp” series. Last time we made a small code execution demo. Today we’re going to add the first part of our language: integer literals. Our programs will look like this:

  • 123
  • -10
  • 0

But we’re not going to put a parser in. That can come later when it gets harder to manually construct syntax trees.

Also, since implementing full big number support is pretty tricky, we’re only going to support fixed-width numbers. It’s entirely possible to then implement big number support in Lisp after we build out some more features.

Pointer tagging scheme

Since the integers are always small (less than 64 bits), and we’re targeting x86-64, we can represent the integers as tagged pointers. To read a little more about that, check out the “Pointer tagging” section of my Programming languages resources page. Since we’ll also represent some other types of objects as tagged pointers, I’ll sketch out a tagging scheme up front. That way it’s easier to reason about than if I draw it out post-by-post.

High							     Low
0000000000000000000000000000000000000000000000000XXXXXXX00001111  Character
00000000000000000000000000000000000000000000000000000000X0011111  Boolean
0000000000000000000000000000000000000000000000000000000000101111  Nil

In this diagram, we have some pointer templates composed of 0s, 1s, and Xs. 0 refers to a 0 bit and 1 refers to a 1 bit.

X is a placeholder that refers to payload data for that value. For immediate values — values whose data are part of the pointer itself — the Xs refer to the data. For heap-allocated objects, it is the pointer address.

It’s important to note that we can only accomplish this tagging scheme because on modern computer systems the lower 3 bits of heap-allocated pointers are 0 because allocations are word-aligned — meaning that all pointers are numbers that are multiples of 8. This lets us a) differentiate real pointers from fake pointers and b) stuff some additional data there in the real pointers.

These tags let us quickly distinguish objects from one another. Just check the lower bits:

  • Lower 2 bits 00 means integer
  • Lower 3 bits 111 means one of the other immediate value types; check the lower 7 bits to tell them apart
  • For any of the other types, there’s a one-to-one mapping of bit pattern in the lower 3 bits to the type

This is a choice that Ghuloum made when drawing up the compiler paper. It’s entirely possible to pick your own encoding as long as your encoding also has the property that it’s possible to distinguish the type based on the pointer.1

We’re going to be a little clever and use the same encoding scheme inside the compiler to represent Abstract Syntax Tree (AST) nodes as we are going to use in the compiled code. I mean, why not? We’re going to have to build the encoding and decoding tools anyway.

Pointer tagging in practice

We’ll start off with integer encoding, since we don’t have any other types yet.

#include <assert.h>   // for assert
#include <stdbool.h>  // for bool
#include <stddef.h>   // for NULL
#include <stdint.h>   // for int32_t, etc
#include <string.h>   // for memcpy
#include <sys/mman.h> // for mmap

#include "greatest.h"

// Objects

typedef int64_t word;
typedef uint64_t uword;

const int kBitsPerByte = 8;                        // bits
const int kWordSize = sizeof(word);                // bytes
const int kBitsPerWord = kWordSize * kBitsPerByte; // bits

Ignore greatest.h — that is a header-only library I use for lightweight testing.

word and uword are type aliases that I will use throughout the codebase to refer to types of values that fit in registers. It saves us a bunch of typing and helps keep types consistent.

To avoid some mysterious magical constants, I’ve also defined helpful names for the number of bits in a byte (a standard C feature), the number of bytes in a word, and the number of bits in a word.

const unsigned int kIntegerTag = 0x0;
const unsigned int kIntegerTagMask = 0x3;
const unsigned int kIntegerShift = 2;
const unsigned int kIntegerBits = kBitsPerWord - kIntegerShift;
const word kIntegerMax = (1LL << (kIntegerBits - 1)) - 1;
const word kIntegerMin = -(1LL << (kIntegerBits - 1));

word Object_encode_integer(word value) {
  assert(value < kIntegerMax && "too big");
  assert(value > kIntegerMin && "too small");
  return value << kIntegerShift;

// End Objects

As we saw above, integers can be fit inside pointers by shifting them two bits to the left. We have this handy-dandy function, Object_encode_integer, for that.

I’ve added some bounds checks to make sure we don’t accidentally mangle the values coming in. If the number we’re trying to encode is too big or too small, shifting it left by 2 bits will chop off the left end.

This function is pretty low-level. It doesn’t add any new type information (it returns a word, just as it takes a word). It’s meant to be a utility function inside the compiler. We’ll add another function in a moment that builds on top of this one to make ASTs.

Syntax trees

While we could pass around words all day and try really hard to keep the boundary between integral values and pointer values straight, I don’t much fancy that. I like my type in formation, thank you very much. So we’re going to add a thin veneer over the object encoding that both gives us some nicer type APIs and gives the C compiler some hints about when we’ve already encoded an object.

// AST

struct ASTNode;
typedef struct ASTNode ASTNode;

ASTNode *AST_new_integer(word value) {
  return (ASTNode *)Object_encode_integer(value);

bool AST_is_integer(ASTNode *node) {
  return ((word)node & kIntegerTagMask) == kIntegerTag;

word AST_get_integer(ASTNode *node) { return (word)node >> kIntegerShift; }

// End AST

We’ll use these functions pretty heavily in the compiler, especially as we add more datatypes.

An expandable byte buffer

Now that we can manually build programs, let’s get cracking writing our buffers. We have to emit the machine code to somewhere, after all. Remember the mmap/memcpy stuff from last time? We’re going to wrap those in some easier-to-remember APIs.

// Buffer

typedef unsigned char byte;

typedef enum {
} BufferState;

typedef struct {
  byte *address;
  BufferState state;
  size_t len;
  size_t capacity;
} Buffer;

byte *Buffer_alloc_writable(size_t capacity) {
  byte *result = mmap(/*addr=*/NULL, /*length=*/capacity,
                      /*prot=*/PROT_READ | PROT_WRITE,
                      /*flags=*/MAP_ANONYMOUS | MAP_PRIVATE,
                      /*filedes=*/-1, /*offset=*/0);
  assert(result != MAP_FAILED);
  return result;

void Buffer_init(Buffer *result, size_t capacity) {
  result->address = Buffer_alloc_writable(capacity);
  assert(result->address != MAP_FAILED);
  result->state = kWritable;
  result->len = 0;
  result->capacity = capacity;

void Buffer_deinit(Buffer *buf) {
  munmap(buf->address, buf->capacity);
  buf->address = NULL;
  buf->len = 0;
  buf->capacity = 0;

int Buffer_make_executable(Buffer *buf) {
  int result = mprotect(buf->address, buf->len, PROT_EXEC);
  buf->state = kExecutable;
  return result;

These functions are good building blocks for creating and destroying buffers. They abstract away some of the fiddly parameters and add runtime checks.

We still need to write into the buffer at some point, though, and we’re not going to memcpy whole blocks in. So let’s add some APIs for incremental writing.

byte Buffer_at8(Buffer *buf, size_t pos) { return buf->address[pos]; }

void Buffer_at_put8(Buffer *buf, size_t pos, byte b) { buf->address[pos] = b; }

This Buffer_at_put8 is the building block of the rest of the compiler. Every write will go through this function. But notice that it is pretty low-level; it does not do any bounds checks and it does not advance the current position in the buffer. So let’s add some more functions to do that…

word max(word left, word right) { return left > right ? left : right; }

void Buffer_ensure_capacity(Buffer *buf, word additional_capacity) {
  if (buf->len + additional_capacity <= buf->capacity) {
  word new_capacity =
      max(buf->capacity * 2, buf->capacity + additional_capacity);
  byte *address = Buffer_alloc_writable(new_capacity);
  memcpy(address, buf->address, buf->len);
  int result = munmap(buf->address, buf->capacity);
  assert(result == 0 && "munmap failed");
  buf->address = address;
  buf->capacity = new_capacity;

void Buffer_write8(Buffer *buf, byte b) {
  Buffer_ensure_capacity(buf, sizeof b);
  Buffer_at_put8(buf, buf->len++, b);

void Buffer_write32(Buffer *buf, int32_t value) {
  for (size_t i = 0; i < sizeof value; i++) {
    Buffer_write8(buf, (value >> (i * kBitsPerByte)) & 0xff);

// End Buffer

With the addition of Buffer_ensure_capacity, Buffer_write8, and Buffer_write32, we can start putting together functions to emit x86-64 instructions. I added both write8 and write32 because we’ll need to both emit single bytes and 32-bit immediate integer values. The helper function ensures that we don’t need to think about endian-ness every single time we emit a 32-bit value.

Emitting instructions

There are a couple ways we could write an assembler:

  • Emit binary directly in the compiler, with comments
  • Make a table of all the possible encodings of the instructions we want (meaning mov eax, 1 and mov ecx, 1 are distinct, for example) and fetch chunks of bytes from there
  • Use some encoding logic to make re-usable building blocks

I chose to go with the last option, though I’ve seen all three while looking for a nice C assembler library. It allows us to write code like Emit_mov_reg_imm32(buf, Rcx, 123), which if you ask me, looks fairly similar to mov rcx, 123.

If we were writing C++ we could get really clever with operator overloading… or we could not.

Note that I did not make up this encoding logic. This is a common phenomenon in instruction sets and it helps in decoding (for the hardware) and encoding (for the compilers).

// Emit

typedef enum {
  kRax = 0,
} Register;

static const byte kRexPrefix = 0x48;

void Emit_mov_reg_imm32(Buffer *buf, Register dst, int32_t src) {
  Buffer_write8(buf, kRexPrefix);
  Buffer_write8(buf, 0xc7);
  Buffer_write8(buf, 0xc0 + dst);
  Buffer_write32(buf, src);

void Emit_ret(Buffer *buf) { Buffer_write8(buf, 0xc3); }

// End Emit

Boom. Two instructions. One mov, one ret. The REX prefix is used in x86-64 to denote that the following instruction, which might have been decoded as something else in x86-32, means something different in 64-bit mode.

In this particular mov’s case, it is the difference between mov eax, IMM and mov rax, IMM.

Compiling our first program

Now that we can emit instructions, it’s time to choose what instructions to emit based on the input program. We have a very restricted set of input programs (yes, several billion of them, if you’re being persnickety about the range of possible integers) so the implementation is short and sweet.

If we see a literal integer, encode it and put it in rax. Done.

// Compile

int Compile_expr(Buffer *buf, ASTNode *node) {
  if (AST_is_integer(node)) {
    word value = AST_get_integer(node);
    Emit_mov_reg_imm32(buf, kRax, Object_encode_integer(value));
    return 0;
  assert(0 && "unexpected node type");

int Compile_function(Buffer *buf, ASTNode *node) {
  int result = Compile_expr(buf, node);
  if (result != 0) {
    return result;
  return 0;

// End Compile

I make a distinction between expr and function because we don’t always want to ret. We only want to ret the result of a function body, which might be composed of several nested expressions. This divide will become clearer as we add more expression types.

Making sure it works

Our compiler is all well and good, but it’s notably more complicated than the mini JIT demo from the last post. It’s one thing to test that by manually checking the return code of main, but I think we should have some regression tests to keep us honest as we go forth and break things.

For that, I’ve written some testing utilities to help check that we generated the right code, and also to execute the JITed code and return the result.

typedef int (*JitFunction)();

// Testing

#define EXPECT_EQUALS_BYTES(buf, arr)                                          \
  ASSERT_MEM_EQ(arr, (buf)->address, sizeof arr)

word Testing_execute_expr(Buffer *buf) {
  assert(buf != NULL);
  assert(buf->address != NULL);
  assert(buf->state == kExecutable);
  // The pointer-pointer cast is allowed but the underlying
  // data-to-function-pointer back-and-forth is only guaranteed to work on
  // POSIX systems (because of eg dlsym).
  JitFunction function = *(JitFunction *)(&buf->address);
  return function();

// End Testing

ASSERT_MEM_EQ will check the generated code and point out any differences if it finds them. Even though this only prints out hex representations of the generated code, it’s very helpful. I often paste unexpected output into rasm2 (part of the radare2 suite), Cutter (also part of the radare2 suite), or this online disassembler. If the instructions look super unfamiliar, it means we messed up the encoding!

Since we have our utilities, we’re going to use the greatest.h testing API to write some unit tests for our compiler and compiler utilities.

// Tests

TEST encode_positive_integer(void) {
  ASSERT_EQ(0x0, Object_encode_integer(0));
  ASSERT_EQ(0x4, Object_encode_integer(1));
  ASSERT_EQ(0x28, Object_encode_integer(10));

TEST encode_negative_integer(void) {
  ASSERT_EQ(0x0, Object_encode_integer(0));
  ASSERT_EQ((word)0xfffffffffffffffc, Object_encode_integer(-1));
  ASSERT_EQ((word)0xffffffffffffffd8, Object_encode_integer(-10));

TEST buffer_write8_increases_length(void) {
  Buffer buf;
  Buffer_init(&buf, 5);
  ASSERT_EQ(buf.len, 0);
  Buffer_write8(&buf, 0xdb);
  ASSERT_EQ(Buffer_at8(&buf, 0), 0xdb);
  ASSERT_EQ(buf.len, 1);

TEST buffer_write8_expands_buffer(void) {
  Buffer buf;
  Buffer_init(&buf, 1);
  ASSERT_EQ(buf.capacity, 1);
  ASSERT_EQ(buf.len, 0);
  Buffer_write8(&buf, 0xdb);
  Buffer_write8(&buf, 0xef);
  ASSERT(buf.capacity > 1);
  ASSERT_EQ(buf.len, 2);

TEST buffer_write32_expands_buffer(void) {
  Buffer buf;
  Buffer_init(&buf, 1);
  ASSERT_EQ(buf.capacity, 1);
  ASSERT_EQ(buf.len, 0);
  Buffer_write32(&buf, 0xdeadbeef);
  ASSERT(buf.capacity > 1);
  ASSERT_EQ(buf.len, 4);

TEST buffer_write32_writes_little_endian(void) {
  Buffer buf;
  Buffer_init(&buf, 4);
  Buffer_write32(&buf, 0xdeadbeef);
  ASSERT_EQ(Buffer_at8(&buf, 0), 0xef);
  ASSERT_EQ(Buffer_at8(&buf, 1), 0xbe);
  ASSERT_EQ(Buffer_at8(&buf, 2), 0xad);
  ASSERT_EQ(Buffer_at8(&buf, 3), 0xde);

TEST compile_positive_integer(void) {
  word value = 123;
  ASTNode *node = AST_new_integer(value);
  Buffer buf;
  Buffer_init(&buf, 10);
  int compile_result = Compile_function(&buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov eax, imm(123); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0xec, 0x01, 0x00, 0x00, 0xc3};
  EXPECT_EQUALS_BYTES(&buf, expected);
  word result = Testing_execute_expr(&buf);
  ASSERT_EQ(result, Object_encode_integer(value));

TEST compile_negative_integer(void) {
  word value = -123;
  ASTNode *node = AST_new_integer(value);
  Buffer buf;
  Buffer_init(&buf, 100);
  int compile_result = Compile_function(&buf, node);
  ASSERT_EQ(compile_result, 0);
  // mov eax, imm(-123); ret
  byte expected[] = {0x48, 0xc7, 0xc0, 0x14, 0xfe, 0xff, 0xff, 0xc3};
  EXPECT_EQUALS_BYTES(&buf, expected);
  word result = Testing_execute_expr(&buf);
  ASSERT_EQ(result, Object_encode_integer(value));

SUITE(object_tests) {

SUITE(buffer_tests) {

SUITE(compiler_tests) {

// End Tests


int main(int argc, char **argv) {

These tests pass, at least for me. And no Valgrind errors, either! The full source for this post can be put together by putting together the individual code snippets back to back, in order. I recommend following along and typing it manually, to get the full educational experience, but if you must copy and paste it should still work. :)

If you want to convince yourself the tests work, modify the values we’re checking against in some places. Then you’ll see the test fail. Never trust a test suite that you haven’t seen fail… it might not be running the tests!

I think there is also a way to use greatest.h to do setup and teardown so we don’t have to do all that buffer machinery, but I haven’t figured out an ergonomic way to do that yet.

Next time on Dragon Ball Z, we’ll compile some other immediate constants.

  1. Actually, you can get away with a scheme that only plays games with pointer tagging for immediate objects, and uses a header as part of the heap-allocated object to encode additional information about the type, the length, etc. This is what runtimes like the JVM do. 

Joe Nelson (begriffs)

Tips for stable and portable software August 31, 2020 12:00 AM

After several years’ involvement with quickly evolving programming languages, I’ve come to appreciate stability. I’d like to make my programs easy to build on a wide variety of systems with minimal adjustment. I’d like them to keep working long into the future as environments change.

To think about stability more clearly, let’s divide a functioning program into its layers. Then we can examine development choices one layer at a time.

concentric circles of program resources

concentric circles of program resources

The more features a program needs, the further out it must reach through the layers.

Layer 0: Programming language

Choose a language with multiple implementations and a standard

Every language has to start somewhere, often as an implementation by a single person or small group. At this stage the language evolves rapidly, and to be fair it’s this stage that advances the state of the art.

However, using a language in its single-implementation stage means you’re committing a percentage of your energy to the “research project” of the language itself. You’ll deal with breaking changes (including tools), and experimental dead-ends.

If you love the idea behind a new language, or believe it’s a winner and that your early familiarity will pay off, then go for it! Otherwise use a language that has advanced beyond a single implementation. That way you can focus on your domain of expertise rather than keeping up with a language research agenda.

Languages get to the next stage when groups of people fork them for new situations and architectures. Some people add features, other people discover difficulties in their environments. Stakeholders then debate and reach consensus through a standardization process. The end result is that the standard, rather than a particular software artifact, defines the language and has the final say.

Naturally the whole thing takes a while. Standardized languages are going to be fairly old. They’ll miss out on recent ideas, but will be well understood. Here are some mature languages with standards:

  • Ada
  • C
  • Common Lisp
  • ECMAScript
  • Pascal
  • SQL

I’ve been using C lately because of its portability, simple (yet expressive) abstract machine model, and deep compatibility with POSIX and foundational libraries.

Avoid – or wrap – compiler language extensions

If you’re using a language with a standard, take advantage of it. First, choose a specific version of the standard. Older versions are generally more widely supported, but have fewer features. In the C world I usually pick C99 because it has some conveniences over C89, and is still supported pretty much everywhere (although only partially on Windows).

Consult your compiler documentation to see if the compiler can catch accidental uses of non-standard behavior. In clang or gcc, add the following flags to your Makefile:

# enforce a specific version of the standard
CFLAGS += -std=c99 -pedantic

Substitute another version for “c99” as desired. The pedantic flag rejects all programs that use forbidden extensions, and some other programs that do not follow ISO C.

If you do want to use compiler extensions (such as those in gcc or clang), wrap them behind your own macros so that the code stays portable. The PostgreSQL project does this kind of thing in c.h. Here’s an example at random:

 * Use "pg_attribute_always_inline" in place of "inline" for functions that
 * we wish to force inlining of, even when the compiler's heuristics would
 * choose not to.  But, if possible, don't force inlining in unoptimized
 * debug builds.
#if (defined(__GNUC__) && __GNUC__ > 3 && defined(__OPTIMIZE__)) || defined(__SUNPRO_C) || defined(__IBMC__)
/* GCC > 3, Sunpro and XLC support always_inline via __attribute__ */
#define pg_attribute_always_inline __attribute__((always_inline)) inline
#elif defined(_MSC_VER)
/* MSVC has a special keyword for this */
#define pg_attribute_always_inline __forceinline
/* Otherwise, the best we can do is to say "inline" */
#define pg_attribute_always_inline inline

Notice how they adapt to various compilers and provide a final fallback. Of course, avoiding extensions in the first place is the simplest option, when possible.

Layer 1: Standard library

Learn it, and consult the standard

Take time to learn your language’s standard library. It’s a freebie, you get it wherever your program goes. Read about the library functions in the language standard, since they will be covered there.

Gaining knowledge of the standard library can help reduce reliance on unnecessary third-party libraries. The ECMAScript world, for instance, is rife with micro-libraries that attempt to supplement the ECMA standard’s real or perceived shortcomings.

The size of a single-implementation language’s library is a trade-off between ease of implementation and ease of use. A giant library like that in the Go language makes it harder for creators of would-be rival implementations, and thus slows the progress to a robust standard.

To learn more about the C standard library, see my article.

Learn the rationale and gotchas

Because standards bodies avoid breaking existing codebases, and because stable languages are slow to change, there will be weird or dangerous functions in the standard library. However the dangers are well known and documented in supporting literature, unlike the dangers in new, relatively untested systems.

Here are some great books for C:

  • “The CERT C Coding Standard” by Robert C. Seacord (ISBN 978-0321984043). Illustrates potential insecurity with, among other things, the standard library. Lists real code that caused vulnerabilities.
  • “The Standard C Library” by P. J. Plauger (ISBN 978-0131315099). Thorough details about the C89 stdlib.
  • “C Traps and Pitfalls” by Andrew Koenig (978-0201179286).
  • “C Programming FAQs” by Steve Summit (ISBN 978-0201845198). I can see why these were historically the most frequently asked questions. I asked many of them myself.

Also the C99 standard has an accompanying rationale document. It talks about alternate designs considered and rejected.

Layer 2: POSIX

Similarly to how competing C implementations led to the C standard, the Unix wars led to POSIX. POSIX specifies a “lowest common denominator” interface that many operating systems honor to a greater or lesser degree.

Read the spec, compare with man pages

Whenever you use system calls outside the C standard library, check whether they’re part of POSIX, and if their official description differs from your local man pages. The Open Group offers a free searchable HTML version of POSIX.1. As of this writing it’s POSIX.1-2017 (which is POSIX.1-2008 plus two technical corrigenda).

There’s one more complication: POSIX.1-2008 (aka “Issue 7”) isn’t fully supported everywhere. (For instance I found that macOS doesn’t support pthread barriers, semaphores, or asynchronous thread cancellation.) I think the root cause is that 2008 requires thread and real-time functionality that was previously in optional extensions. If you stick to functionality in POSIX.1-2001 (aka Issue 6) you should be safe on all reasonably recent platforms.

Activate a version

To call POSIX functions you must define the _POSIX_C_SOURCE “feature test” macro before including header files. Select a specific POSIX version by using one of these values:

Edition Release year Macro value
1 1988 (N/A)
2 1990 1
3 1992 2
4 1993 199309L
5 1995 199506L
6 2001 200112L
7 2008 200809L

Header files hide or reveal functions based on the feature test macro. For example, the getline() function from Issue 7 allocates memory and reads a line.

/* line.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h> /* ssize_t */

int main(void)
	char *line = NULL;
	size_t len = 0;
	ssize_t read;
	while ((read = getline(&line, &len, stdin)) != -1)
		printf("Length %zd: %s", read, line);
	return 0;

Trying to use getline() on Issue 6 (POSIX.1-2001) fails:

$ cc -std=c99 -pedantic -Werror -D_POSIX_C_SOURCE=200112L line.c -o line

line.c:10:17: error: implicit declaration of function 'getline' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        while ((read = getline(&line, &len, stdin)) != -1)
1 error generated.

Selecting Issue 7 with -D_POSIX_C_SOURCE=200809L fixes it.

Important note: setting _POSIX_C_SOURCE will hide non-POSIX operating system extras in the standard headers. The best practice is to separate your source files into those that are POSIX conformant, and those (hopefully few) that aren’t. Compile the latter without the feature macro and link them all together at the end.

Use POSIX in the build process too

POSIX defines the interface for not just the library functions discussed earlier, but for the shell and common tools too. If you use those tools for your builds then you don’t need to install any extra software on destination machines to compile your project.

Probably the most common sources of accidental lock-in are bashisms and GNU extensions to Make. For scripts, use sh, and use (POSIX) make for Makefiles. Too many projects use GNU features needlessly. In fact, learning the portable subset of Make features leads to cleaner, more reliable builds.

This is a topic for an entire article of its own. Chris Wellons wrote a nice tutorial about it. Also “Managing Projects with make” by Andrew Oram (ISBN 0-937175-90-0) is a little book that’s packed with good advice.

Layer 3: Operating system extras

Operating systems include useful functionality beyond POSIX. For instance extensions to pthreads (setting reader-writer preference or thread processor affinity), access to specialized hardware (like audio or graphics), alternate I/O interfaces and semantics, and functions for safety like strlcpy or pledge.

Three ways to use these features portably are to:

  1. wrap them in your own interface and conditionally compile the implementation, or
  2. build a static shim library (“libcompat”) as part of your project to use when functionality is missing, or
  3. link to a third party library that abstracts the details.

We’ll talk about third-party libraries later. Let’s look at option one now.

Detecting OS functions

Consider the example of generating random data. It requires help from the OS since POSIX offers only pseudo-random numbers.

We’ll split our Makefile into two parts:

  1. Makefile – specifies targets, dependencies and rules, that hold on all systems
  2. – sets macros and build flags specific to the local system

The Makefile will include the specifics of like this:

# inside the Makefile...

# set up common options and then...


We’ll generate with a configure script. A developer will run the script before their first build to detect the environment options. The most primitive way for configure to work would be to try parse uname and make decisions based on what OS or distro it sees. A more accurate way is to try to directly probe the needed OS C functions.

To see if a C function exists, we can just try compiling test snippets of code and see if they succeed. You might think this is awkward or that it requires cluttering your project with test code, but it’s actually pretty elegant.

First make this shell script helper function:

compiles ()
	stage="$(mktemp -d)"
	echo "$2" > "$stage/test.c"
	(cc -Werror "$1" -o "$stage/test" "$stage/test.c" >/dev/null 2>&1)
	rm -rf "$stage"
	return $cc_success

The compiles() function takes two arguments: an optional compiler flag, and the source code to attempt to compile.


Note that mktemp and cc are not POSIX compliant. You can write your own mktemp function using POSIX primitives, but I wanted to conserve space in this example. For cc, the spec offers c99 (or c89 in 4th edition POSIX). However, the c99 utility doesn’t allow controlling warning levels, and I wanted to specify that warnings be treated as errors. The cc alias is a common de-facto standard.

Let’s use the helper to check for OS random number generators. The BSD world offers arc4random_buf to get random bytes, and Linux offers getrandom. The configure script can check for each feature like this:

if compiles "" "
	#include <stdint.h>
	#include <stdlib.h>
	int main(void)
		void (*p)(void *, size_t) = arc4random_buf;
		return (intptr_t)p;

if compiles "-D_POSIX_C_SOURCE=200112L" "
	#include <stdint.h>
	#include <sys/types.h>
	#include <sys/random.h>
	int main(void)
		ssize_t (*p)(void *, size_t, unsigned int) = getrandom;
		return (intptr_t)p;

See? Not too bad. These code snippets test not only whether the functions exist, but also check their type signatures. Notice how the second example is compiled with POSIX for the ssize_t type, while the first example is intentionally not marked POSIX conformant because doing so would hide the extra function arc4random_buf that BSD puts in stdlib.h.

Wrap OS functions behind your own

It’s helpful to isolate the use of non-portable functions in a distinct translation unit, and export your own interface on top. That way it’s more straightforward to set up conditional compilation in one place, or to refactor in the future.

Let’s continue the example from the previous section of generating random bytes. With the hard work of OS feature detection behind us, we can wrap the differing OS interfaces behind our own function:

#include <stdint.h>
#include <stdlib.h>
#include <sys/random.h>

void get_random_bytes(void *buf, size_t n)
#if defined HAVE_ARC4RANDOM  /* BSD */
	arc4random_buf(buf, n);
#elif defined HAVE_GETRANDOM /* Linux */
	getrandom(buf, n, 0);
#error OS does not provide recognized function to get entropy

The Makefile defines HAVE_ARC4RANDOM or HAVE_GETRANDOM using CFLAGS when the corresponding functions exist. The code can just use ifdefs. Notice the #error in the #else case to fail compilation with a clear message on unsupported platforms.

The degree of portability we strive for causes trade-offs. Example: we could add a fallback to reading /dev/random. The configure script from the previous section could check whether the device exists:

if test -c /dev/random; then

Using that information, we could add another #elif in get_random_bytes() so that it can potentially work on more systems. However, in this case, the increased portability would require a change in interface. Since fopen() or fread() on /dev/random could fail, our function would need to return bool. Currently the OS functions we’re calling can’t fail, so a void return is fine.

Test on multiple OSes and hardware

The true test of portability is, of course, building and running on multiple operating systems, compilers, and hardware architectures. It can be surprising to see what assumptions this can uncover. Testing portability early and often makes it easier to keep a program shipshape.

The PostgreSQL project, for instance, maintains a bunch of disparate machines known as the “buildfarm.” Buildfarm members each have their own OS, compiler, and architecture. The team compiles every new feature on these machines and runs the test suite there.

Focusing on the architectures alone, we can see an impressive variety in the buildfarm:

Even if you have no intention to run on these architectures, testing there will lead to better code. (See my article C Portability Lessons from Weird Machines.)

Begriffs Buildfarm?

I’ve been thinking of assembling a buildfarm and offering a paid continuous integration service. If this interests you, please send me an email. I think the project is a good cause, and with enough subscriptions I could cover the electricity and hardware costs.

Layer 4: third-party libraries

Many languages have their own application-level package managers, but C has no exclusive package manager. The language has too much history and spans too many environments to have locked into that. Instead people build dependencies from source, or use the OS package manager.

Build with pkg-config

Linking to libraries requires knowing their path, name, and compiler settings. Additionally we want to know which version is installed and whether it’s in-bounds. Since there’s no application-level package manager for C, we need to use another tool to discover installed libraries.

The most cross-platform way to find – and build against – dependency libraries is pkg-config. The tool allows you to query system packages, regardless of how they were installed. To be compatible with pkg-config, each library foo provides a libfoo.pc file containing keys and values like this:


Name: libfoo
Description: The foo library
Version: 1.2.3
Cflags: -I${includedir}/foo
Libs: -L${libdir} -lfoo

The pkg-config executable can query the metadata and provide flags for your Makefile. Call it from your configure script like this:

# check that a sufficient version is installed
pkg-config --print-errors 'libfoo >= 1.0'

# save flags to
cat >> <<-EOF
	CFLAGS += $(pkg-config --cflags libfoo)
	LDFLAGS += $(pkg-config --libs-only-L libfoo)
	LDLIBS += $(pkg-config --libs-only-l libfoo)

Notice the LDLIBS vs LDFLAGS distinction. LDLIBS are options that need to go at the very end of the build line. The default POSIX make suffix rules don’t mention LDLIBS, but here’s a rule you can use instead:

	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $< $(LDLIBS)

Sometimes an operating system will include extra functionality and package it up as a portable library you can use on other operating systems. In this case you can use pkg-config conditionally.

For instance, OpenBSD spun off the LibreSSL project (a more usable OpenSSL). OpenBSD includes the functionality internally. In the configure script just do an operating system check:

# LibreSSL
case "$(uname -s)" in
		# included with OS
		echo 'LDLIBS += -ltls' >>
		# requires a package
		pkg-config --print-errors 'libtls >= 2.5.0'
		cat >> <<-EOF
			CFLAGS += $(pkg-config --cflags libtls)
			LDFLAGS += $(pkg-config --libs-only-L libtls)
			LDLIBS += $(pkg-config --libs-only-l libtls)

For more information about pkg-config, see Dan Nicholson’s guide.

Compensating for the standard library

The C standard library has no generic collections. You have to write your own linked lists, trees, and hash tables. Real Programmers™ might like this, but I don’t.

POSIX offers limited help with their interface in search.h:

  • Binary search tree. This interface has worked for me, although twalk() doesn’t contain an argument to pass auxiliary data to the callback. The callback needs to consult a global or thread-local variable for that. The quality of implementation may vary as well, likely with regard to how/if the tree is balanced.
  • Queue. Very basic functions to insert or delete from a doubly linked (possibly circular) list. It takes void*, but expects a structure whose first two members are pointers to the same structure type (forward and backward pointers).
  • Hash table. Unnecessarily constrained interface. It creates a single hash table in hidden memory. You can destroy the table and later make another, but can never have more than one active at a time anywhere in the callstack. Obviously not thread safe, but that seems to be the least of its problems.

To go beyond that, you’ll have to use third-party libraries. Many well-known libraries seem pretty bloated (GLib, tbox, Apache Portable Runtime). I found a smaller, cleaner library called simply C Algorithms. Haven’t used it in a project yet, but it looks stable and well tested. I also built the library locally with added pedantic C99 flags and got no warnings.

Two other stable libraries (code snippets?) which have received a lot of use over the years are Uthash and BSD’s queue(3) (browse queue.h from OpenBSD, or the FreeBSD variant).

Uthash describes itself this way:

Any C structure can be stored in a hash table using uthash. Just add a UT_hash_handle to the structure and choose one or more fields in your structure to act as the key. Then use these macros to store, retrieve or delete items from the hash table."

The BSD queue code has been used and improved all the way back to the 1990s. It provides macros to create and manipulate singly-linked lists, simple queues, lists, and tail queues. The man page is quite good.

The functionality differs in the codebase of OpenBSD and FreeBSD. I use the OpenBSD version, but it has a little less functionality. In particular, FreeBSD adds the STAILQ (singly-linked tail queue), and a list swap operation. There was once a CIRCLEQ for circular queues, but it used dodgy coding practices and was removed.

Both Uthash and Queue are header files with macros that you vendor into your project and include rather than linking against. In general I consider “header-only libraries” to be undesirable because they abuse the notion of a translation unit, bloat object files, and make debugging harder. However I’ve used these libraries and they do work well.

User interface

The fewer UI features a program requires, the more portable it will be and the fewer opportunities there will be for it to mess up. (Does your command line app really need to output an emoji rocket ship or animated-in-place text spinner?)

The lowest common denominator is the standard I/O library in C, or its equivalent in other languages. Reading and writing text, pretending to be a teletype.

The next level of sophistication is static output but an input line you can modify (like the fancier teletypes that could edit a line before sending). Different terminals support intraline editing differently, and you should use a library to handle it. The classic is GNU readline. Readline provides this functionality:

  • Moving the text cursor (vi and emacs modes)
  • Searching the command history
  • Controlling a kill ring
  • Using tab completion

Its license is straight up GPL though, not even LGPL. There are more permissive knockoffs like libedit (requires ncurses), or linenoise (which is restricted to VT100 terminals/emulators).

Going up yet another level is the text user interface (TUI), where the whole screen is your canvas, but you draw on it with text. Historically terminal control codes diverged wildly, so a standard programming interface was born, X/Open Curses. The most popular implementation is ncurses, which adds some nonstandard extensions as well.

Curses handles these tasks:

  • Terminal capability detection
  • “Raw” mode keyboard input
  • Cursor motion
  • Line drawing
  • Highlighting, underlining
  • Inserting and deleting lines and characters
  • Status line
  • Area clear
  • Windows
  • Color

To stop pretending the computer is an archaic device from the 70s, you can use the cross-platform SDL2 library. It gives low level access to audio, keyboard, mouse, joystick, and graphics hardware. The platform support really is impressive. Everything from Unix, Mac, and Windows to mobile and web rendering.

Finally, for a classic native desktop application with widgets, the most stable and portable choice is probably Motif. The interface is stark, but it runs everywhere, and won’t change or break on you.

Sample of Motif widgets

Sample of Motif widgets

The Motif Programming Manual (free download) says this by way of introduction:

So why motif? Because it remains what it has long been: the common native windowing toolkit for all the UNIX platforms, fully supported by all the major operating system vendors. It is still the only truly industrial strength toolkit capable of supporting large scale and long term projects. Everything else is tainted: it isn’t ready or fully functionally complete, or the functional specification changes in a non-backwards-compatible manner per release, or there are performance issues. Perhaps it doesn’t truly port across UNIX systems, or it isn’t fully ICCCM compliant with software written in any other toolkit on the desktop, or there are political battles as various groups try to control the specification for their own purposes. […] With motif, you know where you are: it’s stable, it’s robust, it’s professionally supported, and it all works.

A reference manual is also available for download.

I was a little skeptical that it would be supported on macOS, but I tried the hello world example and, sure enough, it worked fine on XQuartz. I think there’s value in using Motif rather than a monster like GTK.

August 30, 2020

Derek Jones (derek-jones)

The aims of software engineering research August 30, 2020 10:19 PM

Physics researchers aim to explain the workings of the universe (technically they build models whose behavior mimics that of the universe we can measure), biologists the workings of biological systems, and psychologists the working of the human mind.

What are researchers in software engineering aiming to do?

Talking to academics, the answer is that they aim to do research that can be published in a high impact journal.

What do those involved in commercial software development think software engineering researchers should be aiming to achieve?

Most of the commercial developers I have asked have never thought about the subject; hardly surprising, they have plenty of other issues to think about.

Those who pay for software, rather than create it, want it to be cheaper and delivered faster.

Vendors are under some pressure to reduce costs and deliver sooner. But since its inception, software has been a sellers market, which means the customer pressure does not have the impact it has in other industries.

The very large organizations who pay lots of money for software for their own use (e.g., the U.S. Department of Defence) recognise that research into software production may well save them lots of money, and at one time interesting things were being discovered, but then funding got rerouted to people with an aversion to actual software engineering, i.e., academics.

Cheaper and faster will always be of interest, and will start to become a hot topic in software engineering research once software starts to becoming a buyers market.

Maintaining existing systems continues its growth to dominating what nearly every software developer does. Dependencies on the rest of the software world (e.g., libraries and compilers) is starting to consume a large percentage of maintenance costs. Managers want to know which packages are likely to have a long and stable lifetime, and which are likely to be short-lived. An understanding of the evolution of software ecosystems is a pressing need. This is really cheaper and faster over the long term.

Cheaper and faster (short term for development, long term for maintenance) covers everything.

It’s tempting to list personnel selection, i.e., who is likely to make the best software developer. But why should the process of selecting software developers be any different from the processes used to select people to become doctors, lawyers and other professions? I’m sure that those involved in the various professions would like a magic wand that points to the appropriate people (for some definition of appropriate), this magic wand is no more likely to exist for software developers than any other profession.

What do you think the aims of software engineering research should be?

Ponylang (SeanTAllen)

Last Week in Pony - August 30, 2020 August 30, 2020 09:39 PM

The Flynn project aims to bring a Pony-like actor-model implementation to Swift using a modified version of the Pony runtime. New releases of ponyc, ponyup, and some bots.

Gustaf Erikson (gerikson)

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: The smallest program August 30, 2020 03:49 AM


Welcome to the first post in the “Compiling a Lisp” series. We’re going to write a small program today. Before we actually compile anything, though, let’s build up a bit of a foundation for code execution. That way, we can see the code compile and run and be satisfied with the results of both.

Instead of compiling to disk, like most compilers you may be familiar with (GCC, Clang, DMD, Python, etc), we’re going to compile in memory. This means that every time we run the program we have to compile it again, but it also means we don’t have to deal with whatever on-disk format an executable has to be on your platform (ELF, Mach-O, etc). We can just point the processor at the code and say “go”. This style of compilation is known as “Just-in-Time” compilation, because the compilation happens right when you need it, and not before1.

Let’s start with a small demo.

#include <assert.h>   /* for assert */
#include <stddef.h>   /* for NULL */
#include <string.h>   /* for memcpy */
#include <sys/mman.h> /* for mmap and friends */

const unsigned char program[] = {
    // mov eax, 42 (0x2a)
    0xb8, 0x2a, 0x00, 0x00, 0x00,
    // ret

const int kProgramSize = sizeof program;

typedef int (*JitFunction)();

int main() {
  void *memory = mmap(/*addr=*/NULL, /*length=*/kProgramSize,
                      /*prot=*/PROT_READ | PROT_WRITE,
                      /*flags=*/MAP_ANONYMOUS | MAP_PRIVATE,
                      /*filedes=*/-1, /*offset=*/0);
  memcpy(memory, program, kProgramSize);
  int result = mprotect(memory, kProgramSize, PROT_EXEC);
  assert(result == 0 && "mprotect failed");
  JitFunction function = *(JitFunction*)&memory;
  int return_code = function();
  assert(return_code == 42 && "the assembly was wrong");
  result = munmap(memory, kProgramSize);
  assert(result == 0 && "munmap failed");
  return return_code;

If you want to understand the pointer shenanigans see the footnote2, but if you would like to ignore it and pretend I never did that please keep reading. The program works, though:

sequoia% gcc -Wall -Wextra -pedantic -fno-strict-aliasing mmap-demo.c
sequoia% ./a.out 
sequoia% echo $?

Let’s back up and go through that demo line-by-line. I’ll skip the includes since that’s just part of life in C.

The machine code

First let’s take a look at our program. Here we have some raw machine code encoded as hex bytes, with helpful commentary by yours truly explaining what the bytes mean in human-speak.

const unsigned char program[] = {
    // mov eax, 42 (0x2a)
    0xb8, 0x2a, 0x00, 0x00, 0x00,
    // ret

I generated this code by going to the Compiler Explorer, making the compiler compile to binary, and typing in a C program that just returns 423.

This is as good a method as any for doing some initial research for what instructions you want to emit. You’ll have to look a little further afield (like in this quick reference or the official Intel x86-64 manual) if you want to figure out how to encode instructions without manually having a table for all the variations you want. We’ll touch more on that later.

In this machine code, 0xb8 is the instruction for “move the following 32-bit integer to the register eax”. It’s a special case of the mov instruction. eax is (the lower half of) one of several general-purpose registers in x86-64. It is also the register conventionally used for return values, but that could vary between calling conventions. It’s not important to know all the details of every calling convention, but it is important to know that a calling convention is just that — a convention. It is an agreement between the people who write functions and the people who call functions about how data gets passed around. In this case, we are moving 42 into eax because eax is the return register in the System V AMD64 calling convention (used on macOS, Linux, other Unices these days) and because we’re calling this hand-built function from C like any other function. It needs to be a well-behaved citizen and put data in places the compiler writers expected.

The next 4 bytes are the number, going from least significant byte to most significant byte.

Finally, 0xc3 is the instruction for ret. ret fetches the return address of the function that called our function off the stack, and jumps to it. This transfers control back to the main function of the C program.

When you put all of that together, you get a very small but well-formed program that returns 42.

The typedef

Next, we use C’s function pointer syntax to declare a type JitFunction that refers to a function that takes no arguments and returns an int.

typedef int (*JitFunction)();

While technically we should specify the size of the integer (after all, we know we want to return a 32-bit integer), I avoided that in this demo because it adds more headers and visual noise.

This declaration, when used with the actual call to the function, tells the C compiler how to arrange the registers and the stack for the call.

The mmap and memcpy dance

Now we allocate a new chunk of memory. We don’t use malloc to do it because mprotect needs the address to be page-aligned. Maybe it’s possible to use malloc and then posix_memalign, but I’ve never seen anybody do that. So we mmap it.

I don’t want to explain all the possible parameter configurations for mmap, especially because they vary between systems. Our configuration requests:

  • memory without specifying a destination address (addr=NULL),
  • of a particular length (length=kProgramSize),
  • that is both readable and writable (prot=PROT_READ | PROT_WRITE),
  • is not mapped to a file, but acts like malloc (flags=MAP_ANONYMOUS, fd=-1, offset=0),
  • and is not shared between processes (flags=MAP_PRIVATE)

And, since memory is kind of useless if we don’t do anything with it, we copy the program into it.

  void *memory = mmap(/*addr=*/NULL, /*length=*/kProgramSize,
                      /*prot=*/PROT_READ | PROT_WRITE,
                      /*flags=*/MAP_ANONYMOUS | MAP_PRIVATE,
                      /*filedes=*/-1, /*offset=*/0);
  memcpy(memory, program, kProgramSize);

You might be wondering why we need to make a whole new buffer and copy into it if we already have some memory containing the code. There are at least two reasons.

First, we need to guarantee that the memory is page-aligned for mprotect – same as above.

Second, in our actual compiler we won’t just have some static array that we copy code from. We’re going to be producing it on the fly and appending to a buffer as we go. We’ll be re-using this mmap dance, but not necessarily the memcpy.

The mprotect

Modern operating systems implement a security feature called “W^X”, pronounced “write xor execute”. This policy prohibits a piece of memory from being both writable and executable at the same time, which makes it harder for people to find exploits in buggy software.

In order to both write our program into a buffer, we need to have an explicit transition point where our memory goes from being both readable and writable to executable. This is mprotect.

  int result = mprotect(memory, kProgramSize, PROT_EXEC);
  assert(result == 0 && "mprotect failed");

If we didn’t do this, something bad would happen at runtime. On my machine, I get a segmentation fault.

The cast

In order to actually call the function, we need to first wrangle the void* into the right type. While we could do the cast and call in one line, I find it easier to read to cast first and call later.

  JitFunction function = *(JitFunction*)&memory;

The call

Ahh, some action! This very innocuous-looking code is maybe the most exciting part of the whole program. We finally take our code, marked executable, treat it the same as any old C function, and call it!

  int return_code = function();
  assert(return_code == 42 && "the assembly was wrong");

The first time I got this working I was very happy with myself.

The clean up

Just as every malloc must be paired with a free, every mmap must be paired with a munmap. Unlike free, munmap returns an error code so we check it.

  result = munmap(memory, kProgramSize);
  assert(result == 0 && "munmap failed");

Some proof

Just so we can convince ourselves that our program actually worked (who knows, maybe the asserts didn’t run), propagate the result of our function call to the outside world. We can then check the return code in $?.

  return return_code;

Note that while the return type of main is int, return codes can only be between 0 and 255, as they are char-sized.

Wrapping up

That was a lot of words for explaining return 42. Hopefully they were helpful words. With this small demo, we’ve gotten used to some building blocks that we’ll use when compiling and executing Lisp programs.

Next up, compiling integers.

  1. Unlike other JITs, though, we won’t be doing any of the fancy inline caching, deoptimization, or other tricks. We’re just going to compile the code, compile it once, and move on with our lives. 

  2. Hold your nose and ignore the ugly pointer casting. This avoids the compiler complaining even with -pedantic on. It’s technically not legal to cast between data pointers and function pointers, but POSIX systems are required to support it. Also relevant are the C strict aliasing rules, so we use -fno-strict-aliasing. I’m not an expert on what that means so see this nice StackOverflow post

  3.   int main() {
        return 42;

August 29, 2020

Simon Zelazny (pzel)

Large directory feature not enabled on this filesystem August 29, 2020 10:00 PM

TIL Firefox & Chromium will keep regenerating .cache/fontconfig until your filesystem blows up!

My wife's computer had been acting really strangely the last couple of days, and today in culminated in extremely bad performance: the computer was usable for a couple seconds, then completely unresponsive for about 2 seconds, then usable again.

Htop and top proved useless, because during the unresponsive periods their UI would freeze, so whatever application was causing the staggering was not easily detectable.

Dmesg showed what was going on. The following log message was appearing with a regularity that corresponded to the 'hiccups'.

[1805.005848] EXT4-fs warning (device dm-3): ext4_dx_add_entry:2357: Large directory feature is not enabled on this fileystem
[1805.005340] EXT4-fs warning (device dm-3): ext4_dx_add_entry:2352: Directory (ino: 15337216) index full, reach max htree level :2

Some internet spelunking revealed that it's possible to enable this 'Large directory' feature on a mounted device, and the invocation turned out to be:

tune2fs -O large_dir /dev/mapper/pool-abcd

(Note: this is on a LUKS-encrypted partition). Immediately after enabling this feature, the system stopped hiccupping.

Finding the culprit

Now, I was still bothered by which directory was so large that it hit against filesystem limits.

find / -inum 15337216

The find command revealed that the directory in question was $HOME/.cache/fontconfig. I tried to ls inside it, but the ls command seemed to hang forever. Instead, I ran:

find . | wc -l

and this revealed that the target directory had almost 15 million files inside! Deleting them took ~2 hours with the following command:

cd $HOME/.cache/fontconfig
find . -delete

The entire thing turned out to be caused by a bug in the interaction of Firefox and Chromium with fontconfig. Running Firefox once caused 480 new cache files to appear. Running Chrome subsequently added a couple hundred more files. Another run of Firefox again added a bunch of files.

To fix this temporarily I issued the following commands:

cd $HOME/.cache
rm -rf fontconfig
touch fontconfig

Now, there is no directory for the browsers to fight over, and no performance issues so far.

Maxwell Bernstein (tekknolagi)

Compiling a Lisp: Overture August 29, 2020 08:16 PM

In my last series, I wrote about building a Lisp interpreter. This time, we’re going to write a Lisp compiler.

This series is an adaptation of Abdulaziz Ghuloum’s excellent paper An Incremental Approach to Compiler Construction, with several key differences:

  • Our implementation is in C, instead of Scheme
  • Our implementation generates machine code directly, instead of generating text assembly
  • Our implementation may omit some runtime data structures

See my implementation for reference, but note that it may be incomplete and also may look a little bit different than the compiler detailed in these posts.

You probably have a couple questions, like why Lisp? and why compile directly to x86-64? and why C? and come on, another Lisp series?, and those are all very reasonable questions that will be answered in due time.

I want to implement this compiler in another language than Scheme because it will prevent me from copying too much of the code from the paper. Even though the paper doesn’t actually contain the source for the whole compiler (most of it is, after all, left as exercises for the reader), I think I will learn a lot more when I have to write all of the code by myself. I get to make my own mistakes and you get to watch me make and fix them in “real” time.

I also don’t want to generate text assembly, but those reasons are a little different than my reason for choosing another implementation language:

First, I think that would be harder to test: I want to have an in-process unit test suite that compiles Lisp programs and executes them on-the-fly. Shelling out to a system assembler like as or nasm would be somewhat error prone. What if the person building this doesn’t have the assembler I need? Sure, I could also write a small assembler as part of this compiler, but that’s a lot of work. Harder than just generating x86-64 directly, perhaps.

Second, I want to learn more about machine architecture. While add a, b seems like one machine instruction, it could probably be encoded in 50 different ways depending on whether a and b are registers, stack locations, other memory addresses, immediates, which registers they are, etc. Shelling out to an assembler abstracts a lot of that detail away. I want to get my hands dirty. Hopefully you do, too.

I chose Lisp because that’s what the Ghuloum paper uses, and because Lisp can be represented as a small, compact, dynamically typed language. Many interpreter implementations are under 200 lines. I don’t think this compiler will be that short, though.

For questions, comments, and suggestions please post on this elist. It’s a public inbox that I can use to discuss and receive patches. I got the idea from Chris Wellons.

Background knowledge

In order to get the most out of this series, I recommend having at least passing familiarity with the following:

  • C or a C-like language
  • some kind of assembly language
  • Abstract Syntax Trees and recursive tree traversal
  • no particular aversion to parentheses

Having the background helps your focus be more on the bigger picture than the minutia, but it is by no means required. I expect most of this series to be fairly readable. If it’s not, that’s a bug and you should report it to me.

Structure of the series

I plan on writing this series in installments where each installment adds a feature of some kind. Maybe that feature is a new bit of Lisp functionality, or maybe it’s a refactoring of the compiler, or maybe it’s a compiler optimization.

For this reason, each post will tend to depend on the code and understanding from previous posts. As such, I recommend reading the series in order. I’ll still try to keep the big ideas understandable for those who don’t.

At each stage of the compiler, we should have a battery of tests that ensure that the features we have already build continue to work as expected.

I plan on adhering to this rough plan:

  1. Compile integers
  2. Compile other immediate constants (booleans, ASCII characters, the empty list)
  3. Compile unary primitives (add1, sub1, integer->char, char->integer, null?, zero?, etc)
  4. Compile binary primitives (+, -, *, /, =, etc)
  5. Compile local variables (let-expressions)
  6. Compile conditional expressions (if-expressions)
  7. Compile heap allocation (cons, strings, symbols, etc)
  8. Compile label procedure calls
  9. Compile closures
  10. Add tail-call optimization
  11. Compile complex constants (quote)
  12. Compile variable assignment (set!)
  13. Add macro expander
  14. Add extended forms using macro expander (let*, letrec, etc)
  15. Add support for libraries and separate compilation
  16. Compile foreign function calls
  17. Add error checking to primitives and procedure calls
  18. Compile variable-arity procedures (aka varargs)
  19. Compile apply
  20. Add output ports (kind of like FILE*)
  21. Add write, display
  22. Add input ports
  23. Add a tokenizer in Lisp
  24. Add a reader in Lisp
  25. Add a Lisp interpreter (or compiler) in Lisp

With optional add-ons also described in the initial paper:

  • Big numbers, IEEE754 floats, complex numbers
  • User-defined macros
  • A module system
  • Heap overflow handler and garbage collector
  • Stack overflow handler
  • Improved code generation

And optional add-ons not described in the original paper:

  • An intermediate representation for optimization
  • Generate executables and write them to disk
  • An interpreter with optional just-in-time compiler

You may have noticed that this is a lot of steps, and there are some steps that I intend to take but have completely omitted because I want to roll them into other posts. Things like:

  • Code generation infrastructure (a writable buffer, mmap/mprotect, etc)
  • Compiler data structures (variable environments, label environments, etc)
  • Testing infrastructure (unit testing, integration testing)

So it’s really actually more work than I listed above. This series may take a long time. It may take some twisty turns. It may take some shortcuts. But there is good news: I’ve already written the compiler up to compiling heap allocation (still working on procedure calls), and even if I don’t finish this series there is still Ghuloum’s excellent paper to learn from.

Next up, the smallest program.

August 24, 2020

Jan van den Berg (j11g)

New WordPress theme: Neve August 24, 2020 08:18 PM

Frequent visitors might notice a change to the site: I switched WordPress themes.

I have been a happy user of the Independent Publisher theme since this site started, and I still use it on my other blog. It’s a terrific theme and I like a lot.

But because I really like clean and simple aesthetic I made quite a few tweaks to it, specifically to the fonts and CSS.

My favorite themes are usually black and white themes. Two of my favorite examples of this aesthetic are and Both excellent looking sites in my opinion, and a joy to read.

So I looked closely at those sites and copied a few things from them. For example: both use the gorgeous Merriweather serif font as the main font for the body text. So did my site (this wasn’t the previous theme default font). I really like serif fonts, they add a sort of legibility and make big text blocks more readable.

But I always kept tweaking the theme: letter-spacing, font-size, colors and more, and I was never 100% happy with it. Especially when things looked good on the desktop, it would look a bit off on mobile. Or the other way around.

Some tweaks I made to the previous theme. Complete listing


Last week I came across this tweet for a new theme called Neve from ThemeIsle and the example was striking enough to give it a try. And I was happily surprised but how easy, complete and fast this theme was out of the box. I have made exactly 0 CSS tweaks to it. What you’re seeing now is default Neve. I have tried *many* themes over the years, and always most lack something. Neve checks all the boxes for what I have been looking for, for quite some time.

And even though Neve uses sans-serif fonts, I found this theme to have the most overall consistent experience (desktop and mobile) and the configuration options are plentiful. And it’s really fast: which is really important. My site feels snappier because of it.

So I made the decision to switch themes. And I like it a lot. The last couple of days I go to my own site, revisit old posts, just to see how they look and I am always pleased with the appearance. The line-spacing is just right, the header font-weight perfect, it looks good on dektop and mobile, it’s clean and it’s fast.


There are two gripes.

  1. I noticed when you don’t center an image, the image caption will sort of blend with the text. And it will not really be clear that the caption belongs to the image. The fix is easy though: center your images and the caption will be centered. Another solution could be to make the caption font smaller or use a different shade of grey to make it more distinct.
  2. The other gripe is one I have to examine a little bit closer, but I don’t think the Neve quote blocks look all that good. If anything a good quote might be best served by a serif font to stand out a bit. But, this is by no means a deal breaker, but I might take a closer look at this.

But also I don’t want to tweak too much. I actually really like that I can use this theme with default settings and that it looks really good. So if you’re looking for a great, clean, fast theme: give Neve a try!

The post New WordPress theme: Neve appeared first on Jan van den Berg.

Tobias Pfeiffer (PragTob)

The great Rubykon Benchmark 2020: CRuby vs JRuby vs TruffleRuby August 24, 2020 02:30 PM

It has been far too long, more than 3.5 years since the last edition of this benchmark. Well what to say? I almost had a new edition ready a year ago and then the job hunt got too intense and now the heat wave in Berlin delayed me. You don’t want your computer running at […]

August 23, 2020

Derek Jones (derek-jones)

Time-to-fix when mistake discovered in a later project phase August 23, 2020 10:09 PM

Traditionally the management of software development projects divides them into phases, e.g., requirements, design, coding and testing. A mistake introduced in one phase may not be detected until a later phase. There is long-standing folklore that earlier mistakes detected in later phases are much much more costly to fix persists, despite the original source of this folklore being resoundingly debunked. Fixing a mistake later is likely to a bit more costly, but how much more costly? A lack of data prevents reliable analysis; this question also suffers from different projects having different cost-to-fix profiles.

This post addresses the time-to-fix question (cost involves all the resources needed to perform the fix). Does it take longer to correct mistakes when they are detected in phases that come after the one in which they were made?

The data comes from the paper: Composing Effective Software Security Assurance Workflows. The 35,367 (yes, thirty-five thousand) logged fixes, from 39 projects drawn from three organizations, contains information on: phases in which the mistake was made and fixed, time taken, person ID, project ID, date/time, plus other stuff :-)

Every project has its own characteristics that affect time-to-fix. Project 615, avionics software developed by organization A, has the most fixes (7,503) and is analysed here.

Avionics software is safety critical, and each major phase included its own review and inspection. The major phases include: requirements gathering, requirements analysis, high level design, design, coding, and testing. When counting the number of phases between introduction/fix, should review and inspection each count as a phase?

The primary reason for doing a review and inspection is to check the correctness (i.e., lack of mistakes) in the corresponding phase. If there is a time-to-fix penalty for mistakes found in these symbiotic-phases, I suspect it will be different from the time-to-fix penalty between major phases (which for simplicity, I’m assuming is major-phase independent).

The time-to-fix has a resolution of 1-minute, and some fix times are listed as taking a minute; 72% of fixes are recorded as taking less than 10-minutes. What kind of mistakes require less than 10-minutes to fix? Typos and other minutiae.

The plot below shows time-to-fix for mistakes having a given ‘distance’ between introduction/fix phase, for fixes taking at least 1, 5 and 10-minutes (code+data):

Time-to-fix for mistakes having a given number of phases between introduction and fix.

There is a huge variation in time-to-fix, and the regression lines (which have the form: fixTime approx e^{sqrt{phaseSep}}) explains just 6% of the variance in the data, i.e., there is a small increase with phase separation, but it is almost down in the noise.

All but one of the 38 people who worked on the project made multiple fixes (30 made more than 20 fixes), and may have got faster with practice. Adding the number of previous fixes by people making more than 20 fixes to the model gives: fixTime approx e^{sqrt{phaseSep}}/fixNum^{0.03}, and improves the model by less than 1-percent.

Fixing mistakes is a human activity, and individual performance often has a big impact on fitted models. Adding person ID to the model as a multiplication factor: i.e., fixTime approx personID*{e^{sqrt{phaseSep}}/fixNum^{0.03}}, improves the variance explained to 14% (better than a poke in the eye, just). The fitted value of personID varies between 0.66 and 1.4 (factor of two, human variation).

The answer to the time-to-fix question posed earlier (for project 615), is that it does take slightly longer to fix a mistake detected in phases occurring after the one in which the mistake was introduced. The phase difference is tiny, with differences in human performance having a bigger impact.

Patrick Louis (venam)

Computer Architecture Takeaways August 23, 2020 09:00 PM

Alchemy, ancient and modern

Computer architecture can be considered a boring topic, one that is studied during CS education, then put aside, and leaves place to the shiny new toys that capture the attention.
I’ve recently revisited it, and I’d like to summarize some takeaways.

What is It

Computer architecture, like everything in the architecture and design domain, is concerned with building a thing, which here is a computer and all its components, according to requirements often called “-ilities”, such as cost, reliability, efficiency, speed, ease of use, and more.
Thus, it is inherently not limited to hardware, weight, power consumptions, and size constraints, but also includes taking into account decisions about design that fit a use-case within constraints.

Energy and Cost

Long gone are the days when we only cared about cramming more power in a single machine. We’ve moved to a world of battery-powered portable devices such as laptops and mobile phones. On such devices, we care about energy consumption as it directly affects battery life.

So what are some tips to avoid wasting energy.

  • Do nothing well: As simple as it sounds it isn’t straight forward. We need to know when is the right time to deactivate a processor completely, otherwise we’ll pay a high penalty when putting it back online.
  • Dynamic voltage-frequency scaling (DVFS): This is about changing the frequency of the processor’s clock, letting it consume instructions slower or faster. However, the same energy will be used for the same tasks, it’s just that the tasks will take longer to execute if the clock is slowed.
  • Overclocking (Turbo mode): This is about boosting the power of a processor so that it executes more instructions, and so would finish tasks faster. You can notice the trade-off between either finishing tasks early and consuming more power, or taking time but finishing them later.
  • Design for the typical case: Make the processor more efficient for the case that is the most frequent. Quite intuitive!

Keep in mind that these are all about dynamic power consumption, that is power consumption used by performing some actions, while in the background, there is always a static power consumption to keep the current system alive. Reducing static power consumption has an effect on the whole system.

wafer yield

Similar to energy, cost is also important, because computers are now an everyday item. The cost of a microprocessor is in direct relation to the learning curve that companies have to take to build them, while on the other hand the price of DRAM actually tracks the cost of manufacturing. When we talk of microprocessors, we are talking about a composition of one or multiple die, which have their circuit printed by UV lithography on a wafer (photolithography). And so, the price is in relation to how many dies (square shaped) can fit on a wafer (circle shaped), along with the yield, which is the number of non-defective dies, plus the cost of testing them and packaging them.
This is one of the reason big companies focus on commodity hardware — hardware that’s relatively cheap and replaceable. Only in the case of specialized or scientific computing will we ever see costly, hard-to-replace, and custom pieces of hardware.

Dependability is also a big factor that comes into play: how long will the piece of hardware last. This is especially important in big warehouse. If we’re going to replace a cheap piece way more frequently than another one that is just a little bit more expensive, we may be better off choosing the second one. Technically, we talk of Mean Time To Failure (MTTF) and Mean Time To Repair (MTTR).

Measurements and Bottlenecks

You can’t talk without the numbers to prove it, that’s why measurement is everything. We’re talking about benchmarks for computer architecture, similar to Phoronix-style benchmarks.
There are many ways to measure, report, and summarize performance, and different scenarios in which they apply. These aren’t limited to pure crunching of numbers, but benchmarks can go as far as to simulate kernels, desktop behaviors, and web-servers. Three of the most important ones are the EEMBC, which is a set of kernels used to predict hardware performance, the TPC benchmarks which are directed towards cloud infrastructures, and the SPEC family of benchmarks which touch a bit of everything.
Many of the results of these benchmarks are proprietary and/or costly.

Measuring is important because we want to make sure our most common case is fast. That’s a fact that comes out of something called Amdahl’s Law. Simply said, if you optimize a portion of code that is only used 1% of the time you’ll only be able to have a maximum speedup of 1%.
Additionally, we should also check if we can parallelize computation in the common case, and if we can apply the principle of locality to it, that is reuse data and instructions so that they stay close temporally and physically.

ISA — Instruction Set Architecture

As much as people freak out about assembly being low-level, it is still a language for humans (not machines) that requires another software, called an assembler, to be converted to actual machine code instructions. However, it is most often tied to the type of instructions a machine can process — what we call the instruction set architecture, or ISA for short. That is one reason why we have many assembly flavors: because we have many ISAs.

To understand the assembly flavor you are writing in, it’s important to know the differences and features of the ISA, and if additional proprietary options are provided by the manufacturer. These can include some of the following.

  • The classification of ISA: Today, most ISAs are general-purpose register architectures, that means operands can either be register or memory location. There are two sub-classes of this: the register-memory ISAs, where memory can directly be accessed as part of instructions, like 80x86, and load-store ISAs, where memory can only be accessed through load and store operations, like ARMv8 and RISC-V. Let’s note that all new ISAs after 1985 are load-store.

  • The way the memory is addressed: Today, all ISAs point at memory operands using byte addressing, that means we can access values in memory by byte, in contrast with some previous ISAs where we had to fetch them by word (word-addressable). Additionally, some architectures require that objects be aligned in memory, or encourage users to align them for efficiency reasons. In ARMv8 they must be aligned, and in 80x86 and RISC-V it isn’t required but encouraged.

  • The modes in which memory can be addressed: We know that the operand to address memory has to be bytes but there are many ways to precise how to get them. We could get a value by pointing to the address stored in a register, or by pointing to the value stored at the immediate address of a constant, or to the value stored at the address formed by the sum of the value of a register plus a constant (displacement). These 3 modes are available in RISC-V. 80x86 adds other modes such as: no register (absolute), getting the address from one register as an index and another register the displacement, and from two registers where one register is multiplied by the size of the operand in bytes, and more. ARMv8 has the 3 RISC-V modes plus PC-relative addressing, the sum of two registers, and the sum of two registers where one registers is multiplied by the size of the operands in byte. Yes, there are so many ways to give an address in memory!

  • Types and size of operands/data/registers: An ISA can support one or multiple types of operands ranging from: 4-bit (nibble), 8-bit, 16-bit (half word), 32-bit (integer or word), 64-bit (double word or long integer), and in the IEEE-754 floating-point we can have 32-bit (single precision), 64-bit (double precision), etc. 80x86 even supports 80-bit floating-point (extended double precision).

  • The type of operations available: What can we do on the data, can we do data transfer, arithmetic and logical operations, control flow, floating-point operations, vector operations, etc. 80x86 has a very large set of operations that can be done. Let’s also note that some assembly flavors include the type of operands within the operations and thus new instructions need to be added for new types (see under CISC).

  • The way control flow instructions work: All ISAs today include at least the following: conditional branches, unconditional jumps, and procedure calls and returns. Normally, the addresses used with those are PC-related. On RISC-V, the condition for the branch is checked based on the value content of registers, while on 80x86, the test condition code bits are set as side effects of previous arithmetic/logic operations. As for the return address, on ARM-v8 and RISC-V, it is placed in a register, while on 80x86, it is placed on the stack in memory. This is one of the reason stack overflows on 80x86 are so dangerous.

  • Encoding the ISA instructions: Finally, we have to convert things to machine code. There are two choices in encoding: with fixed length or with variable length. ARM-v8 and RISC-V instructions are fixed at 32-bit long, which simplifies instruction decoding. 80x86, on the other hand, has variable length instructions ranging from 1 to 18 bytes. The advantage is that the machine code takes less space, and so the program is usually smaller. Keep in mind that all the previous choices affect how the instructions are encoded into a binary representation. For example, the number of registers and the number of addressing modes need to be represented somehow.

Moreover, ISAs are grossly put into two categories, CISC and RISC, the Complex Instruction Set Computer and Reduced Instruction Set Computer.
RISC, are computers with a small, highly optimized set of instructions, with numerous registers, and highly regular instruction pipeline. Usually, RISCs are load-store architectures with fixed size instruction encoding to keep the clock cycle per instruction (CPI) constant.
CISC, are computers with a very large set of instructions, instructions which can execute several lower-level operations, can have side effects, can access memory through a single instruction that encompasses multiple ones, etc. It englobes anything that isn’t RISC or that isn’t a load-store architecture.

It has to be said, that different manufacturers can implement one ISA differently than others. Which means, that their implementation can consume the same instruction encoding but that the actual hardware is different. ISAs are like the concept of interfaces in OOP.

Memory Hierarchy and Tech

memory hierarchy

Many authors have written a great deal about memory hierarchy, basically it’s all about creating layers of indirection, adding caches to speed things up at each layer, wanting to keep what we’re going to use close temporally and physically, while having in consideration how the layers are going to be used.
For example, the associativity of an L2 instruction cache might not be as effective when applied to an L2 data cache. When in doubt, refer to the benchmark measurements.

There are four big questions that we should ask at each layer:

  • Where can a block be placed
  • How is a block found if it is there
  • Which block should be replaced on a miss
  • What happens on a write

With these, there’s also an interplay with both the lower and upper layers. For instance, issues like the size and format of cached lines when they have to be moved up or down between the caching layers.
Additionally, synchronization and coherence mechanisms between multiple caches might be important.

NAND memory

When it comes to data storage, the hardware speed, its ability to retain information, its power consumption, its size, and other criteria, are what matters. Let’s review some common technologies in use today.

Static RAM, or SRAM, has low latency and requires low power to retain bits, however for every bit at least 6 transistors are required. It’s normally used in processor cache and has a small storage capacity.
Dynamic RAM, or DRAM, is slower but requires only one transistor per bit. However, it has to both be periodically refreshed (every ~8ms) and must be re-written after being read. DRAM is usually split into rows and columns, where the upper half of the address is found in the row and the lower in the column, we talk of row access strobe (RAS) and column access strobe (CAS). It is normally used as the main memory.
Because DRAM is cheaper to manufacture, to make it more profitable, it has received a lot of improvements to face its limitations. For example, some of the optimization are related to bandwidth. Namely, double data rate allows transferring twice per clock signal, and multiple banks allow accessing data in different places at the same time. Multiple banks are key to SIMD (see under Single Instruction Multiple Data Stream).

address form

Flash memory technology, aka EEPROM, be it NAND or NOR gated, is becoming popular as a replacement for hard disk drives because of its non-volatility and low power. It can act as a cache in between the disk and the main memory. It has its own limitation in the way it updates by blocks.
Some of the latest hype is about Phase Change Memory (PCM or PRAM), which is a type of nonvolatile memory that is meant as a replacement for flash memory but that is more energy efficient.

However, we can’t rely on the physical medium alone to bring all the improvements. Some techniques have to be put in place to make the most out of what we got.

address form


Efficiency doesn’t matter if the medium fails. All these beautiful hardware can fall victim to two types of errors: soft errors, which can be fixed by error correcting codes (ECC), and hard errors, which make the section of data defective, requiring either redundancy or replacement to avoid data loss. Beware of cosmic rays!

Effectiveness can also be found in the layout and architecture used for caching. Here are some interesting questions that can be answered by performing benchmarks for the use-case. Remember, that everything is about trade-offs.

  • Should we use large block size to reduce misses that are caused by the block not being already there. However, that would also increase conflict miss (two blocks that collide because of their addresses in the cache) and miss penalty time (the time spent to fetch a block when it isn’t already there).
  • Should we enlarge the whole cache to reduce miss rate. However, it would increase the hit time (time to find the cache line) and power consumption.
  • Should we add more cache levels, which would reduce the overall memory access time but increase power consumption and complexity, especially between caches.
  • Should we prioritize read misses over write misses to reduce miss penalty, does it fit our case.
  • Should we use multiple independent cache banks to support simultaneous access, or does it add complexity.
  • Should we make the cache non-blocking, allowing hits before previous misses complete (hit under miss).
  • Should we merge the write buffers that has writes to the same block to avoid unnecessary travel or would that slow writes.
  • Should we rely on compiler optimizations that take in consideration the locality of the cache such as loop interchange (change the loop so that memory is accessed in sequential order), and subdividing the loops into small matrices that fit into the cache. Or would that behavior confuse the cache and actually have an opposite effect.
  • Should we use/have prefetching instructions if the ISA allows it to fill the cache manually or would that let the compiler mishandle the cache and fill it with unnecessary garbage, throwing away the locality of other programs.

These are all questions that can’t be answered without actual benchmarks.

virtual memory

For processes, there’s never enough space in the main memory. In multi-programming systems, the OS virtualizes access to memory, for both simplicity, space restriction, and security reasons. Each process thinks it’s running alone on the system.
The same four questions about memory placement applies. Particularly, the OS is the one who decides what to do with each piece, or page, of virtual memory. A page being the unit of manipulation, and often being the same size as a disk sector.
This means that there’s another layer of addresses, virtual ones, that need to be translated to physical ones, and that the OS needs to know where that pages currently are: either on the disk or in main memory.
For a faster translation, we could use a cache sitting near the core that would contain the recent translated addresses, a so-called translation buffer or translation look aside buffer (TLB).

hypothetical memory hierarchy with TLB

Additionally, if the virtual address can be composed in a way that doesn’t require going back to memory to fetch the data, but can be used to point to the data directly, in another cache for example, then the translation can be less burdensome.

Virtual memory can also be extended with protection, Each translation entry we can have extra bits representing permission or access rights attributes. In practice, there are at least two modes: user mode and supervisor mode. The operating system can rely on them to switch between kernel mode and user mode, limiting access to certain memory addresses and other sensitive features, a dance between hardware and software.

Virtual memory is the technology that makes virtual machines a reality. A program called a hypervisor, or virtual machine monitor, is responsible to manage virtual memory in such a way that different ISAs and operating systems can run simultaneously on the same machine without polluting one another. It does it by adding a layer of memory called “real memory” in between physical and virtual memory. Optimizations such as having the TLB entries not constantly flush when switching between modes, having virtual machine guests OS be allowed to handle device interrupts, and more are needed to make this tolerable.


Parallelism allows things to happen at the same time, in parallel. There are two classes of parallelism: Data-Level Parallelism (DLP), and Task-Level Parallelism (TLP). Respectively, one gives the ability to execute a single operation on multiple pieces of data at the same time, and one to effectuate multiple different operations at the same time.
Concretely, according to Flynn’s taxonomy, we talk of Single-Instruction stream Single-Data stream (SISD), Single-Instruction stream Multiple-Data stream (SIMD), Multiple-Instruction stream Single-Data stream (MISD), and Multiple-Instruction stream Multiple-Data stream (MIMD).

None of these would be possible without the help of something called pipelining. Pipelining allows instructions to be overlapped in execution by splitting them into smaller pieces that can be run independently, and that together form a full instruction. It is like a car assembly line for instructions, we keep fetching instruction on each lane and push them down at each step.
A typical breakdown of an instruction goes as follows:

  • Instruction fetch
  • Instruction decode and register fetch
  • Execution or effective address cycle
  • Memory access
  • Write-back

Pipeline example

That means with 5 lanes we should possibly be able to execute 5 instructions per clock cycle. However, the world isn’t perfect, and we face multiple major issues that don’t make this scenario possible: Hardware limitations, such as when we have a specific number of units that can perform the current step (structural hazards), when the data operands of instructions are dependent on one another (data hazards), when there are branches in the code, conditions that should be met for it to execute (control hazards). Another thing to keep in mind is that some instructions may take more than one cycle to finish executing and so may incur delay in the pipeline.

Pipeline multiple FP OPs

The data hazards category can be split into 3 sub-categories:

  • Read after write (RAW): When data needs only to be read after it has been written by another instruction.
  • Write after read (WAR): When data needs to be written only after another instruction has finished reading it.
  • Write after write (WAW): When data needs to be written only after another has written to it.

Many ingenious techniques have been created to avoid these issues. From stalling, to data forwarding, finding if the data dependence is actually needed, renaming variables in registers to virtual registers, to loop unrolling, and more.

Virtual registers is a technique used in dynamic scheduled pipelines aka dynamic scheduling. Unlike in-order instructions where we have to wait for long-running instructions to finish for another one that may or may not depend on it to be processed, dynamic scheduling uses out-of-order instructions and solves the dependencies internally. Popular algorithms are scoreboard, Tomasulo’s, and the reorder buffer (ROB).
These algorithms achieve the out-of-order execution by relying on additional hardware structures to store values that could possibly, when certain of the output, write them back to memory.


The hazards that affect performance the most are control hazards. Because of their conditional aspect, whether a branch is taken or not, we either have to freeze the pipeline until we know if it’s taken, or we choose to continue loading instructions from one of the path and flush them if the branch wasn’t actually taken.
What we can do to reduce the cost of branches is to try to predict, to speculate, through hardware or software/compiler. The compiler can do a Profile-Guided Optimization (POG), running the software and gathering information about which branches are taken the most frequently, to then indicate it, one way or another, in the final binary. As far as hardware goes, past behavior is the best indicator of future one, and so instruction cache can have their own prediction mechanism based on previous values. Some well-known algorithms: 2-bit prediction, tournament predictor, tagged-hybrid predictor, etc.

In all these cases, we need to pay close attention to what is executed; we shouldn’t execute instructions from a branch that wasn’t supposed to be taken, notably in the case where it’ll affect how the program behaves. However, in speculative instructions and dynamic scheduling, we allow executing future instructions from any branch as long as the code doesn’t have an effect. We then face a problem when it comes to exceptions in instructions that weren’t supposed to be executed, how do we handle them. It depends on the types of exceptions if we terminate or resume execution. But handling exceptions can be slow, thus some architecture provide two modes, one with precise exception and one without for faster run.

Another way to speed instructions is to issue more of them at the same time. Some of the techniques in this category put more emphasis on hardware, like statically and dynamically scheduled superscalar processors or increasing the fetch bandwidth, while others rely on the software/compiler: such as very long instruction word (VLIW) processors, which packages multiple instructions into one big chunk that is fetched at the same time.


Sometimes, whole chunks of code are independent, and so it would be advantageous to run them in parallel. We do that with multithreading, thread-level parallelism, a form of MIMD because we both execute different instructions on shared data between threads. In a multiprocessor environment, we can assign n threads to each processor.
There are 3 categories of multithreading scheduling:

  • Fine grained multithreading: When we switch between threads at each clock cycle.
  • Coarse grained multithreading: When we switch between threads only on costly stalls.
  • Simultaneous multithreading (SMT): Fine grained multithreading but with the help of multiple issue (issuing multiple instructions at the same time).

There are two ways to lay the processors in an architecture, and it directly affects multithreading, specially when it comes to sharing data.

  • Symmetric multiprocessors (SMP): In this case we have a single shared memory, the processors are equidistant, and so we have uniform memory latency.
  • Distributed shared memory (DSM): In this case the processors are separated by other types of hardware, the memory distributed among processors, the processors may not be at the same distance from one another, and so we have a non-uniform memory access (NUMA).

In thread-level parallelism the bottleneck lies in how the data is synchronized between the different threads that may live in different processors — it is a problem of cache coherence and consistency.
The two main protocols to solve this are directory based, which consists of sharing the status of each block kept in one location, usually the location is the lowest level cache L3, or snooping, which consists of each core tracking the status of each block and notify others when it is changed.
Along with these, we need ways to handle conflicts between thread programmatically, and so processors offer lock mechanisms such as atomic exchange, test-and-set, and fetch-and-increment.

Let’s now move our attention to another type of parallelism: SIMD, data level parallelism.

SIMD adds a boost to any program that relies heavily on doing small similar operations on big matrices of data, such as in multimedia or scientific applications.
We note 3 implementations of SIMD:

  • Vector architectures: They have generic vector registers, like arrays that can contain arbitrary data and where we can specify the size of operands in the vector before executing an instruction.
  • SIMD extensions (Intel MMX, SSE, AVX): It is a bunch of additional instructions added as an afterthought to handle SIMD related data, they have fixed size for each operand, the number of data operands are encoded in op code, there are no sophisticated addressing modes such as strided or scatter-gather, and mask registers are not usually present.
  • GPU: A specialized proprietary unit that can receive custom instructions from the CPU to perform the single operation on data heavy input.

All of these provide instructions to the compiler that more or less act like this: Load an array in a vector register only specifying the start address and size, operate on that special vector register, and finally push back the result into memory.
Some architectures provide ways to apply the instructions conditionally on the data in the vector register by specifying a bit mask. But beware of loading a whole dataset for a bit mask that only applies to a very small portion of the code. Vector instructions, like all instructions have a start up time, a latency that depends on the length of the operands, structural hazards, and data dependencies.
Another feature that some provide is the scatter-gather, for when array indices are represented by values present in another array.

Memory banks are a must for SIMD because we operate intensively on data that could be in multiple places. This is why we need support for high bandwidth for vector loads and stores, and to spread accesses across multiple banks.

We measure performance in SIMD by using the concept of roofline, a performance model. It calculates graphically when we reach the peak performance of our hardware, the more left oriented the peak is, the more we’re using out of our hardware.


GPUs are beasts of their own, specialized in only SIMD instructions. They’re part of a heterogeneous execution model, because we have the CPU as host and the GPU as an external device that is requested to execute instructions.
Interacting with GPU differs between vendors, some standards exists such as OpenCL but are not widely used. Normally, it’s a C-like programming language that is made easy to represent SIMD. However, we sometimes call them Single-Instruction Multiple-Thread because GPUs are internally composed of hundreds, and sometimes thousands, of threads, often called lanes in GPU parlance.

GPUs are insanely fast because they have a simple architecture that doesn’t care about data dependence or other hassle that normal processors have to deal with.


To make SIMD productive we have to use it properly by finding portions of code that can be executed with these instructions. Compilers are still struggling to optimize for data-level parallelism, loop-level algorithms can be used to try to find if array indices can be represented by affine functions, and then deduce more information from this. This is why multimedia libraries rely on assembly written manually by developers.

Big Warehouses

When architecture is applied at big scales, at warehouse scale, we have to think a bit differently. Today, the world lives in the cloud, from internet providers, to data centers, and governments, all have big warehouses full of computers.
Let’s mention some things that could be surprising about warehouse scale computing.

  • At this scale, the cost against performance matters a lot.
  • At this scale energy efficiency is a must, as it translates into power-consumptions and monthly bills.
  • At this scale dependability via redundancy is a must, we have so many machines that at least one component is bound to fail every day. The hardware should also be easily replaceable.
  • At this scale high network I/O is a must. Gigantic amount of money is put on switches and load balancers.
  • At this scale we have to think about the cost of investment, the CAPEX (capital expenditures) and OPEX (operational expenditures), the loan repayment for the construction of the datacenter, and the return on investment (ROI).
  • At this scale, we have to think wisely about the location we choose for the data center, be it because of cooling issues, of distance to the power-grid, of the cost of acquiring the land, distance to internet lines, etc.

In warehouses, we apply request-level parallelism, the popular map/reduce model.

And this is it for this article, one thing I haven’t mentioned but that is getting more traction, are domain specific architectures — custom processors made for special cases such as neural network, encryption, cryptocurrency, and camera image processing.


This was a small recap of topics related to computer architecture. It was not meant as a deep dive into it but just a quick overview targeted at those who haven’t touched it in a long time or that are new to it.
I hope you’ve at least learned a thing or two.
Thank you for reading!


  • Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design) 6th Edition


  • Internet Archive Book Images / No restrictions
  • Wafer_die&#039;s_yield_model_(10-20-40mm).PNG: Shigeru23derivative work: Cepheiden / CC BY-SA (
  • 2x910 / CC BY-SA (

Ponylang (SeanTAllen)

Last Week in Pony - August 23, 2020 August 23, 2020 01:08 PM

Corral now has ability to run scripts when a dependency is fetched. This has been used to install necessary libraries on Windows starting with the latest releases of the crypto, net_ssl, and regex packages. Ponyup 0.6.0 is out, with some minor improvements.

Kevin Burke (kb)

To Predict If You’ll Like a Beer, Look at the Hops August 23, 2020 03:10 AM

Generally if you name a food or drink, people know whether they like it or not. It is rare for someone to drink a merlot, or try pizza from a new restaurant — toasted bread, melted cheese, tomato sauce and toppings - and be wildly surprised at their reaction to the taste.

I can't quite figure that out for pale ales though. Some pale ales and IPA's had flavors I really liked, and some had flavors I really disliked. I had a tough time predicting which ones I would like and not like.

I had some suspicions - I didn't think I liked beers with much higher ABV than normal or beers that had citrus in them. But I also liked some beers with high ABV and one of my favorite "everyone has it" beers - Sierra Nevada - describes itself as "pine and citrus," so that wasn't quite right.

Anyway, I decided to be somewhat rigorous about this and order a few different types of beers from the bottle shop, and then figure out what I liked or didn't like about them. It turns out the key is the hops - there are some hop varieties (Cascade, Chinook, Noble) that I like a lot, and other hop varieties (Citra, Galaxy, Enigma, others) that I don't at all. If the hop description mentions passion fruit, I probably won't like it. Other than that, I can keep lists.

This is both satisfying - I can predict which beers I will like and not like, now — and frustrating. Why is this so difficult for consumers to figure out? Why does the category definition of "pale ale" include so much stuff? Like imagine if you ordered a "cheese pizza", and sometimes it would come with anchovies and sometimes with pineapple, and sometimes with nothing. People would demand better words to describe the differences between the things.

If you have ideas or answers, I would love to hear from you.

August 22, 2020

Unrelenting Technology (myfreeweb)

The touchscreen (both finger and pen support) on my Pixelbook has been broken... August 22, 2020 03:51 PM

The touchscreen (both finger and pen support) on my Pixelbook has been broken for a while (the Wacom digitizer was always present on i2c but it wasn’t sending events). There was like one time where I managed to get it to work briefly by holding the pen against it in some way, but that was it. Today I took the laptop out of the bag by the middle part, squeezing the lid a bit. Aaaand… touch works now! Something was going on with wiring somehow inside the lid (not the hinge) I guess? :/

August 18, 2020

Jeremy Morgan (JeremyMorgan)

How Do I Compare Strings in Go? August 18, 2020 01:59 AM

So you’re just learning Go and how things work. You need to compare two strings to see if they’re equal. You want to do it as simply and quickly as possible. In this tutorial we’re going to learn: Different ways to compare Strings Comparing strings ignoring case Measuring performance of different methods. So let’s get started. Note: There’s a video version of this article as well. Basic String Comparison So you need to compare a string.

August 16, 2020

Ponylang (SeanTAllen)

Last Week in Pony - August 16, 2020 August 16, 2020 11:34 PM

We have new releases for crypto libraries and new bots to automate changelogs and release notes. The shared Docker containers for openssl and libressl builders are being replaced.

Derek Jones (derek-jones)

Quality control in a zero cost of replication business August 16, 2020 10:29 PM

When a new manufacturing material becomes available, its use is often integrated with existing techniques, e.g., using scientific management techniques for software production.

Customers want reliable products, and companies that sell unreliable products don’t make money (and may even lose lots of money).

Quality assurance of manufactured products is a huge subject, and lots of techniques have been developed.

Needless to say, quality assurance techniques applied to the production of hardware are often touted (and sometimes applied) as the solution for improving the quality of software products (whatever quality is currently being defined as).

There is a fundamental difference between the production of hardware and software:

  • Hardware is designed, a prototype made and this prototype refined until it is ready to go into production. Hardware production involves duplicating an existing product. The purpose of quality control for hardware production is ensuring that the created copies are close enough to identical to the original that they can be profitably sold. Industrial design has to take into account the practicalities of mass production, e.g., can this device be made at a low enough cost.
  • Software involves the same design, prototype, refinement steps, in some form or another. However, the final product can be perfectly replicated at almost zero cost, e.g., downloadable file(s), burn a DVD, etc.

Software production is a once-off process, and applying techniques designed to ensure the consistency of a repetitive process don’t sound like a good idea. Software production is not at all like mass production (the build process comes closest to this form of production).

Sometimes people claim that software development does involve repetition, in that a tiny percentage of the possible source code constructs are used most of the time. The same is also true of human communications, in that a few words are used most of the time. Does the frequent use of a small number of words make speaking/writing a repetitive process in the way that manufacturing identical widgets is repetitive?

The virtually zero cost of replication (and distribution, via the internet, for many companies) does more than remove a major phase of the traditional manufacturing process. Zero cost of replication has a huge impact on the economics of quality control (assuming high quality is considered to be equivalent to high reliability, as measured by number of faults experienced by customers). In many markets it is commercially viable to ship software products that are believed to contain many mistakes, because the cost of fixing them is so very low; unlike the cost of hardware, which is non-trivial and involves shipping costs (if only for a replacement).

Zero defects is not an economically viable mantra for many software companies. When companies employ people to build the same set of items, day in day out, there is economic sense in having them meet together (e.g., quality circles) to discuss saving the company money, by reducing production defects.

Many software products have a short lifespan, source code has a brief and lonely existence, and many development projects are never shipped to paying customers.

In software development companies it makes economic sense for quality circles to discuss the minimum number of known problems they need to fix, before shipping a product.

Patrick Louis (venam)

Wild Mushrooms in Lebanon August 16, 2020 09:00 PM

The project about mapping wild mushrooms in Lebanon is out!

A video speaks louder than words:

Your browser does not support the video tag.

The project consists of a map with wild mushroom specimens, their locations, along with pictures and descriptions of them. It is based on the only two research papers on the topic I’ve found, Joseph Thiébaut research paper “Champignons observés dans le Liban et la Syrie de 1930 à 1933” along with Nadine Modad research paper “Survey and identification of wild mushrooms in Lebanon” and my own research and findings over the past few years.
It took me around 2 months, or almost 15h to fill the map. These research papers have been my bedtime stories for quite a while.

I’ve been interested and researched mushrooms in the region since our scavenging excursion in 2017 where we’ve found a boletus luridiformis along with many other species.

This includes, apart from reading the research papers above:

  • Reading books such as:
    • The Edible Mushroom Book — a guide to foraging and cooking — Anna Del Conte, Thomas Laesee
    • The complete Mushroom Hunter — An illustrated guide to finding, harvesting, and enjoying wild mushrooms — Gary Lincoff
    • North American Species of Lactarius — Alexander H. Smith
  • Watching documentaries and following Youtube channels such as:
  • Frequently going on hikes during Autumn and Spring to find new species.
  • Seeking out dried and non-dried, or served at restaurants, exotic mushrooms such as morels, king oyster, shiitake, portobello, porcini, cordyceps, lion’s mane, and more.
  • Actively following /r/mycology subreddit.
  • Getting in the mood by playing the fungi board game.

… and much more.

Fungi are now a hobby of mine and I’ll keep doing research and adding specimens to the collection on the map as I discover them.

Again, here is the project link if you missed it.


And, here are some pictures for you enjoyment:

elfin saddle mushroooomz lactarius sp. mushroooomz mushroooomz pithya mushroooomz mushroooomz mushroooomz mushroooomz mushroooomz

Let me know what you think of this project and if you like it, and remember to be safe when harvesting for consumption.

Pages From The Fire (kghose)

Travel in “The Expanse” August 16, 2020 03:27 AM

At least up to what I’ve seen in season 2, the expanse at least tries to acknowledge Newtonian physics. There are odd bits where they mix up where they should have centrifugal gravity and not, and in which direction, but largely, they try. Thankfully there is no FTL nonsense (yet), but the civilization seems to… Read More Travel in “The Expanse”

Jeff Carpenter (jeffcarp)

Grace Hopper 2019 Trip Report August 16, 2020 12:00 AM

Despite this trip report being over 9 months late, I wanted to share it because I can’t stop thinking about how positive an experience this conference was. Grace Hopper is the largest women in tech conference in the world, with around 25,000 attendees flying into Orlando, FL from all parts of the world for the 2019 conference. In previous years I had been interested in attending but hadn’t gotten the chance—and I (as a man) also strongly did not want to take the spot of a potential woman visiting the conference.

August 12, 2020

Jeremy Morgan (JeremyMorgan)

7 Reasons Why Front End Developers Going Full Stack Should Choose Go August 12, 2020 05:54 PM

So you’re a front end developer, and you want to learn some backend stuff. You want to become a full stack developer someday, so where do you start? Google’s Go language is an excellent place. For instance, let’s say you want to build a RESTful API to test the calls from your React Application. You could use JSONPlaceholder, Reqres, or even SoapUI. All excellent options. Or you could spend an evening take A Tour of Go and follow a tutorial like this one to build a local API that does exactly what you want, and mocks whatever you want.

August 11, 2020

Unrelenting Technology (myfreeweb)

Wi-Fi not connecting (well, getting instantly deauthed due to AP-STA-POSSIBLE-PSK-MISMATCH after connecti... August 11, 2020 07:14 PM

Wi-Fi not connecting (well, getting instantly deauthed due to AP-STA-POSSIBLE-PSK-MISMATCH after connecting) is apparently a relatively common problem with IoT devices. And most people seem to point to ESP8266-based ones.

Well, I’ve never had a problem with ESP, but today I’ve been setting up an RTL8711AF based device (Xiaomi qmi.powerstrip.v1) and it was failing just like that.

Turns out this device just completely fails when 802.11w Management Frame Protection is on (even optionally). Ugh. Thanks Realtek.

August 10, 2020

Geoff Wozniak (GeoffWozniak)

RSS has been moved to Atom August 10, 2020 10:52 PM

The feed is now found at

Indrek Lasn (indreklasn)


Hair washing involves water, shampoo and conditioner and often also a hairdryer, that is, it takes some time to occupy the bathroom or visit the hair salon. For more references, check out: Best Dry Shampoos for Fine Hair

Dry or dry shampoo is the best solution for those days when you need to improve your look, but don’t have time to wash your hair. Although the idea is relatively recent, there are already many cosmetic brands betting on this type of hair product that works as a spray, cleaning, perfuming and mainly removing the greasy aspect … until the hair washing with a normal shampoo.

What is meant by dry shampoo?

Dry shampoo is a product to instantly and superficially clean hair. It is a spray that must be applied to the root of dry hair, to absorb oil and give volume, leaving the hair looser.

What is dry shampoo for?

The dry shampoo serves, first of all, to clean and revitalize the hair, but the truth is that its main objective is to remove the oiliness and the appearance of greasy hair in a few minutes.

When to use a dry shampoo?

There are many situations in which the use of a dry shampoo is indicated and effective. When…

· a last minute appointment comes up and you don’t have time to wash your hair.

· goes to the gym every day and doesn’t want or doesn’t have time to wash her hair daily.

· she has very oily hair, so much so that even washing regularly, the strands look greasy at the end of the day.

· the hair has gone through a coloring and does not intend to exaggerate the number of washes with water, so as not to lose its color.

· the hair has been straightened and intends to space the washing with traditional shampoo as much as possible, to maintain the effect.

· the hair is even washed, but it needs an extra volume to do a certain hairstyle.

· you want to remove odors from your hair, such as the smell of tobacco or fried food.

What type of hair can dry shampoo be used on?

Dry shampoo is particularly suitable for oily hair, but not only! It can also be applied to mixed hair, with oily roots and dry tips in order to balance oiliness along the length of the strands. In addition, it cleans and perfumes dry hair and gives more volume to fine hair. However, in hair that is too thin you can leave the strands “glued” to each other, instead of loosening.

How to apply dry shampoo?

Using a dry shampoo is very simple, fast and practical!

· Start by parting the hair in a few strands, so that you can apply the product close to the root, but not directly on the scalp, because the powder can clog the pores.

· Then spray the spray in parallel, about 20/30 centimeters away from the hair, and repeat the process throughout the hair.

· Massage the strands with your fingertips so that the product penetrates better.

· Wait a few minutes for the wires to absorb the product.

· Finally, brush your hair to remove the white layer that remains on the strands.

Dry shampoo should not be used very often. The ideal is once a week because this product can clog the pores and leave the hair more dry and opaque if used many times.

Is it possible to replace traditional shampoo with dry shampoo?

No. Dry shampoo is a kind of emergency solution to clean and improve the appearance of hair on days when you don’t have time to wash it. It serves to make an artificial and quick cleaning, guaranteeing the effect up to two days, at most.

What are the best dry shampoos for sale on the market?

There are several dry shampoos on the market that absorb oil from the hair and return some volume to the root. All have their quality and effectiveness, but none is intended to replace conventional shampoo, water and conditioner.

August 09, 2020

Derek Jones (derek-jones)

Extreme value theory in software engineering August 09, 2020 10:26 PM

As its name suggests, extreme value theory deals with extreme deviations from the average, e.g., how often will rainfall be heavy enough to cause a river to overflow its banks.

The initial list of statistical topics to I thought ought to be covered in my evidence-based software engineering book included extreme value theory. At the time, and even today, there were/are no books covering “Statistics for software engineering”, so I had no prior work to guide my selection of topics. I was keen to cover all the important topics, had heard of it in several (non-software) contexts and jumped to the conclusion that it must be applicable to software engineering.

Years pass: the draft accumulate a wide variety of analysis techniques applied to software engineering data, but, no use of extreme value theory.

Something else does not happen: I don’t find any ‘Using extreme value theory to analyse data’ books. Yes, there are some really heavy-duty maths books available, but nothing of a practical persuasion.

The book’s Extreme value section becomes a subsection, then a subsubsection, and ended up inside a comment (I cannot bring myself to delete it).

It appears that extreme value theory is more talked about than used. I can understand why. Extreme events are newsworthy; rivers that don’t overflow their banks are not news.

Just over a month ago a discussion cropped up on the UK’s C++ standards’ panel mailing list: was email traffic down because of COVID-19? The panel’s convenor, Roger Orr, posted some data on monthly volumes. Oh, data :-)

Monthly data is a bit too granular for detailed analysis over relatively short periods. After some poking around Roger was able to send me the date&time of every post to the WG21‘s Core and Lib reflectors, since February 2016 (there have been various changes of hosts and configurations over the years, and date of posts since 2016 was straightforward to obtain).

During our email exchanges, Roger had mentioned that every now and again a huge discussion thread comes out of nowhere. Woah, sounds like WG21 could do with some extreme value theory. How often are huge discussion threads likely to occur, and how huge is a once in 10-years thread that they might have to deal with?

There are two techniques for analysing the distribution of extreme values present in a sample (both based around the generalized extreme value distribution):

  • Generalized Extreme Value (GEV) uses block maxima, e.g., maximum number of daily emails sent in each month,
  • Generalized Pareto (GP) uses peak over threshold: pick a threshold and extract day values for when more than this threshold number of emails was sent.

The plots below show the maximum number of monthly emails that are expected to occur (y-axis) within a given number of months (x-axis), for WG21’s Core and Lib email lists. The circles are actual occurrences, and dashed lines 95% confidence intervals; GEP was used for these fits (code+data):

Expected maximum for emails appearing on C++'s core and lib reflectors within a given period

The 10-year return value for Core is around a daily maximum of 70 +-30, and closer to 200 +-100 for Lib.

The model used is very simplistic, and fails to take into account the growth in members joining these lists and traffic lost when a new mailing list is created for a new committee subgroup.

If any readers have suggests for uses of extreme value theory in software engineering, please let me know.

Postlude. This discussion has reordered events. My original interest in the mailing list data was the desire to find some evidence for the hypothesis that the volume of email increased as the date of the next WG21 meeting approached. For both Core and Lib, the volume actually decreases slightly as the date of the next meeting approaches; see code for details. Also, the volume of email at the weekend is around 60% lower than during weekdays.

Scott Sievert (stsievert)

COVID-19, age and lockdowns August 09, 2020 12:00 AM

I wrote “Visualization of the COVID-19 infection rates” with two goals: to warn people about the upcoming pandemic and to provide insight into that pandemic.

The US took precautions within a couple months, and the length and intensity of the precautions has surprised me. Even four months later, individuals generally believe they should take actions to limit the spread of COVID-19. This includes wearing a mask and working remotely if possible.

But are these precautions justified? There’s no harm done if everyone gets an benign virus. Do the data justify mandating wearing masks and closing schools? Let’s look.

The hospital data from New York City (NYC) indicates that they are past the most intense part of the infection:

New COVID-19 cases/hospitalizations/deaths are down by 30–50$\times$ since the peak. By this measure, NYC has moved “flattened the curve” and are seeing minimal new cases, hospitalizations, and deaths.

However, the lockdowns are still continuing. The subway rides have been down below normal weekend levels for nearly 5 months:

How necessary are these lockdowns? Let’s look at some data to find out.

Case study: Sweden

Sweden has a different approach; the government made strong recommendations to the elderly to “limit close contact with other people” and

… [are] encourag[ing] citizens to use common sense, work from home if possible, and not gather in crowds over 50. Primary schools are open, as are bars and restaurants, with images showing people enjoying drinks and crowding streets.

—”Sweden Sticks With Controversial COVID-19 Approach”.

CNBC reports that “[Sweden] did not go into lockdown, instead issuing recommendations about social distancing and working from home while allowing many schools and businesses to stay open.”

Obviously, not having a lockdown has significant benefits: kids can see their friends at schools, restaurants/bars are still serving food and don’t have to lay people off, etc. In fact, the Sweden economy has performed well, at least when compared the US. Here’s a table on the annualized GDP growth rate:

Time Sweden US
2020, Q1 +0.1% -5.0%
2020, Q2 -8.6% -32.9%

There have even been stories written about how the Sweden economy has performed better than the economies of neighboring countries.

This must have come at a cost, right? Sure, they might have been able to keep their schools open and their economy functioning, but certainly more people contracted COVID-19? Absolutely:

But the number of infections is meaningless. No one cares if everyone contracts a harmless disease. Let’s look how harmful COVID is with the deaths attributed to COVID:

Clearly, far more elderly people have deceased from COVID than younger people when normalized by the population in that age group. The data from Sweden is high resolution – they specify the number people aged between (say) 75 and 80 years old that have died. The data from NYC are unfortunately too coarse to do any detailed comparisons; however, the general trend is clear: NYC and Sweden have the approximately the same number of deaths per population.

That’s right: NYC and Sweden have (approximately) the same number of deaths per population, even after normalizing for age. There’s no obvious difference as with the case count.

Maybe NYC is an outlier because of their population density.1 Let’s make the same plot for the US instead:

About 0.5% of the US population over 85 has deceased due to COVID. For the 40 year old, 0.005% of the US population has deceased due to COVID. For context, the US suicide rate is 150 per million or 0.015% for the population aged 35 to 44.

Let’s look at the various death rates for the US, and see how the number of deaths from COVID compare for each age group. Let’s plot these death counts relative to the number of COVID deaths:

A value of 20 on this chart means the death rate from (say) suicide is 20× greater than the death rate COVID-19 for that age group. I defined “death rate” for suicide/etc as low as it can be, n_dead / n_people. For COVID-19, death rate is defined as n_dead / n_infected.

This chart is a little misleading; this compares the deaths in 2015 to the number of COVID-19 deaths, not the deaths that occurred during the COVID-19 lockdowns from suicide/drugs/etc. I hypothesize that the number of suicides and drug overdoses have increased during the lockdowns. The suicide rate in 2015 is 20× the death rate of COVID-19 for the population aged 15–24; I suspect the suicide rate has increased, especially because the CDC director reports that deaths from suicide/drug overdoses are “far greater” than COVID deaths for high school aged students:

But there has been another cost that we’ve seen, particularly in high schools. We’re seeing, sadly, far greater suicides now than we are deaths from COVID. We’re seeing far greater deaths from drug overdose that are above excess that we had as background than we are seeing the deaths from COVID.

Robert Redfield, July 14th, 2020

COVID-19 lockdowns come with both economic costs and mental health costs. Let’s look at some data on COVID-19 and children.

COVID-19 and children

Iceland has performed a contact tracing study that studies infection and traces it back to it’s source, then recurses. Iceland tested 6% of their population in their contact tracing study before April 4th.2 Of the people randomly sampled, none of the children under 10 tested positive for COVID-19 despite a 0.8% positive rate for people older than 10 years. They also found that the infection probability increased (gradually) with age for the population under 20 years old. Iceland’s study included genetic tracing to determine the index cases, but unfortunately did not distinguish “school” and “work.”

Preliminary evidence from the NIH suggests that children are more likely to be missing the receptor for COVID-19, specifically because children are more susceptible to allergic asthma. The NIH is further funding this study to examine correlation the relevant gene and infection, and also COVID-19 in children:

One interesting feature of this novel coronavirus pandemic is that very few children have become sick with COVID-19 compared to adults. Is this because children are resistant to infection with SARS-CoV-2, or because they are infected but do not develop symptoms? The HEROS study will help us begin to answer these and other key questions.

Anthony S. Fauci, M.D., NIAID Director

Spreading without any symptoms, asymptomatic spread is rare; spreading before symptoms develop is “is believed to be far more common than asymptomatic spread” (source).


I presented data that provides evidence to support these hypotheses:

  • Elderly people have a significantly higher risk of contracting and dying from COVID-19.
  • The death rate for the population under 20 is minimal relative to suicide and drug overdose death rates.
  • Sweden and the US have similar death rates despite drastic differences in their public policy approach.

As an aside, here’s data from Minnesota on the age of various patient classes:

Population Median age
All MN residents 38.2
People who positive
for COVID (patients)
Patients not in hospital 34
Patients in hospital 59
Patients in ICU 61
Patients who die 83

This means that half the people in the ICU are over the age of 61, and half of the COVID-19 hospitalizations are older than 59.

Data sources

  1. NYC has about twice the population density of Stockholm and about 5× the population. 

  2. “Spead of SARS-CoV-2 in the Icelandic Population.” Gudbjartsson et. al. New England Journal of Medicine. DOI: 10.1056/NEJMoa2006100

August 08, 2020

Pete Corey (petecorey)

Now You’re Thinking with Arrays August 08, 2020 12:00 AM

I’ve been using the J programming language on and off (mostly off), for the past couple years, and I still find myself failing to grasp the “array-oriented” approach.

Recently I wanted to find the discrete derivative, or the forward difference of a list of integers. This boils down to finding differences, or deltas, between each successive pair of list elements. So, for a list of integers, 1 2 4 7 11, the list of deltas would be 1 2 3 4.

My first stab at building a verb that does this looked like this:

   (-/"1@:}.@:(],._1&|.)) 1 2 4 7 11
1 2 3 4

The idea is that we take our list, rotate it once to the left, and stitch it onto itself. This gives us a list of tuples of each pair of subsequent numbers, except for the first tuple which holds our first and last list values. We drop that tuple and map minus over the remaining pairs.

This solution seems overly verbose and complicated for something as seemingly fundamental as calculating differences between subsequent list values.

I asked for help on #JLang Twitter, and learned about the “cut” verb, specifically the :._3 form of cut, which executes a verb over subarrays, or “regular tilings” of its input. Armed with this knowledge, we can map minus over all length two tilings of our list:

   2(-~/;._3) 1 2 4 7 11
1 2 3 4

Very nice!

I was happy with this solution, but #JLang Twitter pried my mind open even further and made me realize that I still haven’t fully grasped what it means to work in an “array oriented” mindset.

It was explained to me that I should work with the entire array as a unit, rather than operate on each over the elements individually. What I’m really after is the “beheaded” (}.) array minus the “curtailed” (}:) array.

   (}. - }:) 1 2 4 7 11
1 2 3 4

This is the shortest, clearest, and, in hindsight, most obvious solution. It’s clear to me that I still need to work on getting into the “array-oriented” mindset when working with J, but hopefully with enough exposure to solutions liks this, I’ll get there.

Now we’re thinking with arrays!

August 06, 2020

Frederic Cambus (fcambus)

NetBSD on the NanoPi NEO2 August 06, 2020 08:41 PM

The NanoPi NEO2 from FriendlyARM has been serving me well since 2018, being my test machine for OpenBSD/arm64 related things.

As NetBSD/evbarm finally gained support for AArch64 in NetBSD 9.0, released back in February, I decided to give it a try on this device. The board only has 512MB of RAM, and this is where NetBSD really shines. Things have become a lot easier since jmcneill@ now provides bootable ARM images for a variety of devices, including the NanoPi NEO2.

On first boot, the system will resize the filesystem to automatically expand to the size of the SD card.

Growing ld0 MBR partition #1 (1052MB -> 60810MB)
Growing ld0 disklabel (1148MB -> 60906MB)
Resizing /
/dev/rld0a: grow cg |************************************                 |  69%

Once the system is up and running, we can add a regular user in the wheel group:

useradd -m -G wheel username

And add a password to the newly created user:

passwd username

From there we do not need the serial console anymore and can connect to the device using SSH.

NetBSD has binary packages available for this architecture, and installing and configuring pkgin can be done as follow:

export PKG_PATH=
pkg_add pkgin
echo $PKG_PATH > /usr/pkg/etc/pkgin/repositories.conf
pkgin update

The base system can be kept up to date using sysupgrade, which can be installed via pkgin:

pkgin in sysupgrade

The following variable need to be set in /usr/pkg/etc/sysupgrade.conf:


Lastly, the device has two user controllable LEDs which can be toggled on and off using sysctl.

To switch both LEDs on:

sysctl -w hw.led.nanopi_green_pwr=1
sysctl -w hw.led.nanopi_blue_status=1

To switch off the power LED automatically at boot time:

echo "hw.led.nanopi_green_pwr=0" >> /etc/sysctl.conf

Here is a dmesg for reference purposes:

[     1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[     1.000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[     1.000000]     2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights reserved.
[     1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[     1.000000]     The Regents of the University of California.  All rights reserved.

[     1.000000] NetBSD 9.0_STABLE (GENERIC64) #0: Wed Aug  5 15:20:21 UTC 2020
[     1.000000]
[     1.000000] total memory = 497 MB
[     1.000000] avail memory = 479 MB
[     1.000000] timecounter: Timecounters tick every 10.000 msec
[     1.000000] armfdt0 (root)
[     1.000000] simplebus0 at armfdt0: FriendlyARM NanoPi NEO 2
[     1.000000] simplebus1 at simplebus0
[     1.000000] simplebus2 at simplebus0
[     1.000000] cpus0 at simplebus0
[     1.000000] simplebus3 at simplebus0
[     1.000000] psci0 at simplebus0: PSCI 1.1
[     1.000000] cpu0 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu0: package 0, core 0, smt 0
[     1.000000] cpu0: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.000000] cpu0: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.000000] cpu0: Dcache line 64, Icache line 64
[     1.000000] cpu0: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.000000] cpu0: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.000000] cpu0: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.000000] cpu0: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.000000] cpu0: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.000000] cpu1 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu1: package 0, core 1, smt 0
[     1.000000] cpu2 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu2: package 0, core 2, smt 0
[     1.000000] cpu3 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu3: package 0, core 3, smt 0
[     1.000000] gic0 at simplebus1: GIC
[     1.000000] armgic0 at gic0: Generic Interrupt Controller, 224 sources (215 valid)
[     1.000000] armgic0: 16 Priorities, 192 SPIs, 7 PPIs, 16 SGIs
[     1.000000] fclock0 at simplebus2: 24000000 Hz fixed clock (osc24M)
[     1.000000] sunxisramc0 at simplebus1: SRAM Controller
[     1.000000] fclock1 at simplebus2: 32768 Hz fixed clock (ext_osc32k)
[     1.000000] gtmr0 at simplebus0: Generic Timer
[     1.000000] gtmr0: interrupting on GIC irq 27
[     1.000000] armgtmr0 at gtmr0: Generic Timer (24000 kHz, virtual)
[     1.000000] timecounter: Timecounter "armgtmr0" frequency 24000000 Hz quality 500
[     1.000010] sun8ih3ccu0 at simplebus1: H3 CCU
[     1.000010] sun8ih3rccu0 at simplebus1: H3 PRCM CCU
[     1.000010] sunxide2ccu0 at simplebus1: DE2 CCU
[     1.000010] sunxigpio0 at simplebus1: PIO
[     1.000010] gpio0 at sunxigpio0: 94 pins
[     1.000010] sunxigpio0: interrupting on GIC irq 43
[     1.000010] sunxigpio1 at simplebus1: PIO
[     1.000010] gpio1 at sunxigpio1: 12 pins
[     1.000010] sunxigpio1: interrupting on GIC irq 77
[     1.000010] fregulator0 at simplebus0: vcc3v3
[     1.000010] fregulator1 at simplebus0: usb0-vbus
[     1.000010] fregulator2 at simplebus0: gmac-3v3
[     1.000010] sun6idma0 at simplebus1: DMA controller (12 channels)
[     1.000010] sun6idma0: interrupting on GIC irq 82
[     1.000010] com0 at simplebus1: ns16550a, working fifo
[     1.000010] com0: console
[     1.000010] com0: interrupting on GIC irq 32
[     1.000010] sunxiusbphy0 at simplebus1: USB PHY
[     1.000010] sunxihdmiphy0 at simplebus1: HDMI PHY
[     1.000010] sunximixer0 at simplebus1: Display Engine Mixer
[     1.000010] sunxilcdc0 at simplebus1: TCON1
[     1.000010] sunxilcdc0: interrupting on GIC irq 118
[     1.000010] sunxirtc0 at simplebus1: RTC
[     1.000010] emac0 at simplebus1: EMAC
[     1.000010] emac0: Ethernet address 02:01:f7:f9:2f:67
[     1.000010] emac0: interrupting on GIC irq 114
[     1.000010] rgephy0 at emac0 phy 7: RTL8211E 1000BASE-T media interface
[     1.000010] rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[     1.000010] h3codec0 at simplebus1: H3 Audio Codec (analog part)
[     1.000010] sunximmc0 at simplebus1: SD/MMC controller
[     1.000010] sunximmc0: interrupting on GIC irq 92
[     1.000010] motg0 at simplebus1: 'otg' mode not supported
[     1.000010] ehci0 at simplebus1: EHCI
[     1.000010] ehci0: interrupting on GIC irq 104
[     1.000010] ehci0: EHCI version 1.0
[     1.000010] ehci0: 1 companion controller, 1 port
[     1.000010] usb0 at ehci0: USB revision 2.0
[     1.000010] ohci0 at simplebus1: OHCI
[     1.000010] ohci0: interrupting on GIC irq 105
[     1.000010] ohci0: OHCI version 1.0
[     1.000010] usb1 at ohci0: USB revision 1.0
[     1.000010] ehci1 at simplebus1: EHCI
[     1.000010] ehci1: interrupting on GIC irq 110
[     1.000010] ehci1: EHCI version 1.0
[     1.000010] ehci1: 1 companion controller, 1 port
[     1.000010] usb2 at ehci1: USB revision 2.0
[     1.000010] ohci1 at simplebus1: OHCI
[     1.000010] ohci1: interrupting on GIC irq 111
[     1.000010] ohci1: OHCI version 1.0
[     1.000010] usb3 at ohci1: USB revision 1.0
[     1.000010] sunxiwdt0 at simplebus1: Watchdog
[     1.000010] sunxiwdt0: default watchdog period is 16 seconds
[     1.000010] /soc/gpu@1e80000 at simplebus1 not configured
[     1.000010] gpioleds0 at simplebus0: nanopi:green:pwr nanopi:blue:status
[     1.000010] /soc/timer@1c20c00 at simplebus1 not configured
[     1.000010] /soc/video-codec@1c0e000 at simplebus1 not configured
[     1.000010] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[     1.000010] cpu2: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.000010] cpu2: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.040229] cpu2: Dcache line 64, Icache line 64
[     1.040229] cpu2: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.050220] cpu2: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.060220] cpu2: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.070220] cpu2: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.070220] cpu2: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.090221] cpu1: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.090221] cpu1: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.100222] cpu1: Dcache line 64, Icache line 64
[     1.110221] cpu1: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.110221] cpu1: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.120222] cpu1: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.130222] cpu1: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.140223] cpu1: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.150222] cpu3: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.160223] cpu3: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.160223] cpu3: Dcache line 64, Icache line 64
[     1.170223] cpu3: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.180223] cpu3: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.180223] cpu3: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.190223] cpu3: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.200224] cpu3: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.210224] sdmmc0 at sunximmc0
[     1.240225] uhub0 at usb0: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[     1.240225] uhub0: 1 port with 1 removable, self powered
[     1.240225] uhub1 at usb2: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[     1.250226] uhub1: 1 port with 1 removable, self powered
[     1.250226] uhub2 at usb1: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[     1.260226] uhub2: 1 port with 1 removable, self powered
[     1.260226] uhub3 at usb3: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[     1.275641] uhub3: 1 port with 1 removable, self powered
[     1.275641] IPsec: Initialized Security Association Processing.
[     1.350228] sdmmc0: SD card status: 4-bit, C10, U1, A1
[     1.350228] ld0 at sdmmc0: <0x03:0x5344:SC64G:0x80:0x0cd9141d:0x122>
[     1.360690] ld0: 60906 MB, 7764 cyl, 255 head, 63 sec, 512 bytes/sect x 124735488 sectors
[     1.370228] ld0: 4-bit width, High-Speed/SDR25, 50.000 MHz
[     1.990242] boot device: ld0
[     1.990242] root on ld0a dumps on ld0b
[     2.000243] root file system type: ffs
[     2.010242] kern.module.path=/stand/evbarm/9.0/modules

Marc Brooker (mjb)

Surprising Economics of Load-Balanced Systems August 06, 2020 12:00 AM

Surprising Economics of Load-Balanced Systems

The M/M/c model may not behave like you expect.

I have a system with c servers, each of which can only handle a single concurrent request, and has no internal queuing. The servers sit behind a load balancer, which contains an infinite queue. An unlimited number of clients offer c * 0.8 requests per second to the load balancer on average. In other words, we increase the offered load linearly with c to keep the per-server load constant. Once a request arrives at a server, it takes one second to process, on average. How does the client-observed mean request time vary with c?

Option A is that the mean latency decreases quickly, asymptotically approaching one second as c increases (in other words, the time spent in queue approaches zero). Option B is constant. Option C is a linear improvement, and D is a linear degradation in latency. Which curve do you, intuitively, think that the latency will follow?

I asked my Twitter followers the same question, and got an interestingly mixed result:

Breaking down the problem a bit will help figure out which is the right answer. First, names. In the terminology of queue theory, this is an M/M/c queuing system: Poisson arrival process, exponentially distributed client service time, and c backend servers. In teletraffic engineering, it's Erlang's delay system (or, because terminology is fun, M/M/n). We can use a classic result of queuing theory to analyze this system: Erlang's C formula E2,n(A), which calculates the probability that an incoming customer request is enqueued (rather than handled immediately), based on the number of servers (n aka c), and the offered traffic A. For the details, see page 194 of the Teletraffic Engineering Handbook. Here's the basic shape of the curve (using our same parameters):

Follow the blue line up to half the saturation point, at 2.5 rps offered load, and see how the probability is around 13%. Now look at the purple line at half its saturation point, at 5 rps. Just 3.6%. So at half load the 5-server system is handling 87% of traffic without queuing, with double the load and double the servers, we handle 96.4% without queuing. Which means only 3.6% see any additional latency.

It turns out this improvement is, indeed, asymptotically approaching 1. The right answer to the Twitter poll is A.

Using the mean to measure latency is controversial (although perhaps it shouldn't be). To avoid that controversy, we need to know whether the percentiles get better at the same rate. Doing that in closed form is somewhat complicated, but this system is super simple, so we can plot them out using a Monte-Carlo simulation. The results look like this:

That's entirely good news. The median (p50) follows the mean line nicely, and the high percentiles (99th and 99.9th) have a similar shape. No hidden problems.

It's also good news for cloud and service economics. With larger c we get better latency at the same utilization, or better utilization for the same latency, all at the same per-server throughput. That's not good news only for giant services, because most of this goodness happens at relatively modest c. There are few problems related to scale and distributed systems that get easier as c increases. This is one of them.

There are some reasonable follow-up questions. Are the results robust to our arbitrary choice of 0.8? Yes, they are1. Are the M/M/c assumptions of Poisson arrivals and exponential service time reasonable for typical services? I'd say they are reasonable, albeit wrong. Exponential service time is especially wrong: realistic services tend to be something more like log-normal. It may not matter. More on that another time.

Update: Dan Ports responded to my thread with a fascinating Twitter thread pointing to Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency from SoCC'14 which looks at this effect in the wild.


  1. Up to a point. As soon as the mean arrival rate exceeds the system's ability to complete requests, the queue grows without bound and latency goes to infinity. In our case, that happens when the request load exceeds c. More generally, for this system to be stable λ/cμ must be less than 1, where λ is the mean arrival rate, and μ is the mean time taken for a server to process a request.

August 05, 2020

Andrew Owen (yumaikas)

Art Challenge: The Middle Grind August 05, 2020 10:50 PM

The story so far

Emily came across an art challenge on Pintrest, and suggested that we could both do each prompt for it.

An art challenge that lists out 30 days of art prompts

Her medium of preference is pencil and ink, and mine is pixel art. This, unlike the previous post, covers 16 entries, because I fell behind in blog posts.

It’s also longer, and definitely has represented both Emily and I getting ready for the art challenge to be done with

Day 9: Urban Legend


An ink sketch of a wendigo


A pixel picture of a weeping Mary statue

Day 10: Insect


A drawing of a iridescent beetle with a blue shell


A pixel art drawing of a dragonfly

Day 11: Something you ate today


A nice looking ink sketch of a bagel, with a pen and an eraser on the sketch book



Day 12: Your Spirit Animal


A detailed ink drawing of a bat


A pixel-art picture of a squirrel sitting on a porch (or jumping over a log)

Day 13: Song LyricsYour Happy Place


A picture of Emily wrapped in a blanket on a couch, with a lamp, tissue box, phone, and Nintendo Switch


A pixel art picture of my laptop, with Asperite open.

Day 14: Historical Figure


A ink picture of a corset, a


An attempt to make a pixel art photo of Ada Lovelace

Day 15: Guilty Pleasure


A sketch of a Yellow Nitendo Switch with Stardew Valley on the screen


An abstract grid of white, blue and brown grid squares, representative of a Scrabble Board

Day 16: Zodiac Sign


A picture of a Capricorn goat with horns and and a webbed mane


The Aquarius sign is imposed over a big yellow moon over a waves, with a small lighthouse in the background

Day 17: Favorite TV Show


A picture of a naked Homer Simpson, his butt facing the viewer.


A picture of

Day 18: Something with Wings


A picture of a bat with 3 jack-o-lanterns, which is nibbling on the largest jack-o-lantern


A picture of a bat

Day 19: Famous Landmark


One of the sections from stonehenge


A pixel-art picture of the pyramids of Giza

Day 20: Beverage


A drawing of a cup of water


A picture of cup of water

Day 21: Teeth


A picture of a Zombie Skull with prominent teeth


A pixel-art picture of an alligator skull

Day 22: Earth Day


A picture of the earth, with clouds, being held up by a pair of hands


A pixel-art picture of the earth

Day 23: Dessert


A cupcake with sprinkles


An ice cream cone on a metal stand with little chocolate chips

Day 24: Movie Prop


A drawing of the cat from Kiki's Deliver Service.


A pixel-drawing of Wilson the volley ball from Castaway

August 04, 2020

Pepijn de Vos (pepijndevos)

A Rust HAL for your LiteX FPGA SoC August 04, 2020 12:00 AM

ULX3S demo

FPGAs are amazing in their versatility, but can be a real chore when you have to map out a giant state machine just to talk to some chip over SPI. For such cases, nothing beats just downloading an Arduino library and quickly hacking some example code. Or would there be a way to combine the versatility of an FPGA with the ease of Arduino libraries? That is the question I want to explore in this post.

Of course you can use an f32c softcore on your FPGA as an Arduino, but that’s a precompiled core, and basically doesn’t give you the ability to use your FPGA powers. Or you can build your own SoC with custom HDL components, but then you’re back to bare-metal programming.

Unless you can tap into an existing library ecosystem by writing a hardware abstraction layer for your SoC. And that is exactly what I’ve done by writing a Rust embedded HAL crate that works for any LiteX SoC!

LiteX allows you to assemble a SoC by connecting various components to a common Wishbone bus. It supports various RISC-V CPU’s (and more), and has a library of useful components such as GPIO and SPI, but also USB and Ethernet. These all get memory-mapped and can be accessed via the Wishbone bus by the CPU and other components.

The amazing thing is that LiteX can generate an SVD file for the SoC, which contains all the registers of the components you added to the SoC. This means that you can use svd2rust to compile this SVD file into a peripheral access crate.

This PAC crate abstracts away memory addresses, and since the peripherals themselves are reusable components, it is possible to build a generic HAL crate on top of it that supports a certain LiteX peripheral in any SoC that uses it. Once the embedded HAL traits are implemented, you can use these LiteX peripherals with every existing Rust crate.

The first step is to install LiteX. Due to a linker bug in Rust 1.45, I used the 1.46 beta. I’m also installing into a virtualenv to keep my system clean. While we’re going to use Rust, gcc is still needed for compiling the LiteX BIOS and for some objcopy action.

#rustup default beta
virtualenv env
source env/bin/activate
chmod +x
./ init install
./ gcc
export PATH=$PATH:$(echo $PWD/riscv64-*/bin/)

Now we need to make some decisions about which FPGA board and CPU we’re going to use. I’m going to be using my ULX3S, but LiteX supports many FPGA boards out of the box, and others can of course be added. For the CPU we have to pay careful attention to match it with an architecture that Rust supports. For example Vexrisc supports the im feature set by default, which is not a supported Rust target, but it also supports an i and imac variant, both of which Rust supports. PicoRV32 only supports i or im, so can only be used in combination with the Rust i target.

So let’s go ahead and make one of those. I’m going with the Vexrisc imac variant, but on a small iCE40 you might want to try the PicoRV32 (or even Serv) to save some space. Of course substitute the correct FPGA and SDRAM module on your board.


cd litex-boards/litex_boards/targets
python --cpu-type vexriscv --cpu-variant imac --csr-data-width 32 --device LFE5U-85F --sdram-module AS4C32M16 --csr-svd ulx3s.svd --build --load
rustup target add riscv32imac-unknown-none-elf


python --cpu-type picorv32 --cpu-variant minimal --csr-data-width 32 --device LFE5U-85F --sdram-module AS4C32M16 --csr-svd ulx3s.svd --build --load
rustup target add riscv32i-unknown-none-elf

Most parameters should be obvious. The --csr-data-width 32 parameter sets the register width, which I’m told will be the default in the future, and saves a bunch of bit shifting later on. --csr-svd ulx3s.svd tells LiteX to generate an SVD file for your SoC. You can omit --build and --load and manually do these steps by going to the build/ulx3s/gateware/ folder and running I also prefer to use the awesome openFPGALoader rather than the funky ujprog with a sweet openFPGALoader --board ulx3s ulx3s.bit.

Now it is time to generate the PAC crate with svd2rust. This crate is completely unique to your SoC, so there is no point in sharing it. As long as the HAL crate can find it you’re good. Follow these instructions to create a Cargo.toml with the right dependencies. In my experience you may want to update the version numbers a bit. I had to use the latest riscv and riscv-rt to make stuff work, but keep the other versions to not break the PAC crate.

cargo new --lib litex-pac
cd litex-pac/src
svd2rust -i ulx3s.svd --target riscv
cd ..
vim Cargo.toml

Now we can use these instructions to create our first Rust app that uses the PAC crate. I pushed my finished example to this repo. First create the app as usual, and add dependencies. You can refer to the PAC crate as follows.

litex-pac = { path = "../litex-pac", features = ["rt"]}

Then you need to create a linker script that tells the Rust compiler where to put stuff. Luckily LiteX generated the important parts for us, and we only have to define the correct REGION_ALIAS expressions. Since we will be using the BIOS, all our code will get loaded in main_ram, so I set all my aliases to that. It is possible to load code in other regions, but my attempts to put the stack in SRAM failed horribly when the stack grew too large, so better start with something safe and then experiment.


Next, you need to actually tell the compiler about your architecture and linker scripts. This is done with the .cargo/config file. This should match the Rust target you installed, so be mindful if you are not using imac. Note the regions.ld file that LiteX generated, we’ll get to that in the next step.

rustflags = [
  "-C", "link-arg=-Tregions.ld",
  "-C", "link-arg=-Tmemory.x",
  "-C", "link-arg=-Tlink.x",

target = "riscv32imac-unknown-none-elf"

The final step before jumping in with the Rust programming is writing a file that copies the linker scripts to the correct location for the compiler to find them. I mostly used the example provided in the instructions, but added a section to copy the LiteX file. export BUILD_DIR to the location where you generated the LiteX SoC.

    let mut f = File::create(&dest_path.join("regions.ld"))
        .expect("Could not create file");
    f.write_all(include_bytes!(concat!(env!("BUILD_DIR"), "/software/include/generated/regions.ld")))
        .expect("Could not write file");

That’s it. Now the code you compile will actually get linked correctly. I found these iCEBreaker LiteX examples very useful to get started. This code will actually run with minimal adjustment on our SoC, and is a good start to get a feel for how the PAC crate works. Another helpful command is to run cargo doc --open in the PAC crate to see the generated documentation.

To actually upload the code, you have to convert the binary first.

cargo build --release
cd /target/riscv32imac-unknown-none-elf/release
riscv64-unknown-elf-objcopy litex-example -O binary litex-example.bin
litex_term --kernel litex-example.bin /dev/ttyUSB0

From here we “just” need to implement HAL traits on top of the PAC to be able to use almost any embedded library in the Rust ecosystem. However, one challenge is that the peripherals and their names are not exactly set in stone. The way that I solved it is that the HAL crate only exports macros that generate HAL trait implementations. This way your SoC can have 10 SPI cores and you just have to call the spi macro to generate a HAL for them. I uploaded the code in this repo.

Of course so far we’ve only used the default SoC defined for the ULX3S. The real proof is if we can add a peripheral, write a HAL layer for it, and then use an existing library with it. I decided to add an SPI peripheral for the OLED screen. First I added the following pin definition

    ("oled_spi", 0,
        Subsignal("clk",  Pins("P4")),
        Subsignal("mosi", Pins("P3")),
    ("oled_ctl", 0,
        Subsignal("dc",   Pins("P1")),
        Subsignal("resn", Pins("P2")),
        Subsignal("csn",  Pins("N2")),

and then the peripheral itself

    def add_oled(self):
        pads = self.platform.request("oled_spi")
        pads.miso = Signal()
        self.submodules.oled_spi = SPIMaster(pads, 8, self.sys_clk_freq, 8e6)

        self.submodules.oled_ctl = GPIOOut(self.platform.request("oled_ctl"))

This change has actually been accepted upstream, so now you can just add the --add-oled command line option and you get a brand new SoC with an SPI controller for the OLED display. Once the PAC is generated again and the FullDuplex trait has been implemented for it, it is simply a matter of adding the SSD1306 or SDD1331 crate, and copy-pasting some example code. Just as easy as an Arduino, but on your own custom SoC!

August 03, 2020

Jeremy Morgan (JeremyMorgan)

Better Title Case in Go August 03, 2020 05:15 PM

In this article, I’ll show you how you can create better titles in Go. We’ll be using the strings library from the Go Standard Library for this tutorial. You’ll often have a string input that you want to change the casing of, and it’s easy with Go. Lower Case If you want to change your text to lowercase, use the strings.ToLower method: package main import ( "fmt" "strings" ) func main() { fmt.

Anish Athalye (anishathalye)

Organizing Data Through the Lens of Deduplication August 03, 2020 04:00 AM

Our home file server has been running since 2008, and over the last 12 years, it has accumulated more than 4 TB of data. The storage is shared between four people, and it tends to get disorganized over time. We also had a problem with duplicated data (over 500 GB of wasted space), an issue that is intertwined with disorganization. I wanted to solve both of these problems at once, and without losing any of our data. Existing tools didn’t work the way I wanted, so I wrote Periscope to help me clean up our file server.

Periscope works differently from most other duplicate file finders. It’s designed to be used interactively to explore the filesystem, understand which files are duplicated and where duplicates live, and safely delete duplicates, all without losing any data. Periscope enables exploring the filesystem with standard tools — the shell, and commands like cd, ls, tree, and so on — while providing additional duplicate-aware commands that mirror core filesystem utilities. For example, psc ls gives a directory listing that highlights duplicates, and psc rm deletes files only if a duplicate copy exists. Here is Periscope in action on a demo dataset:

The demo uses a small synthetic dataset. For the real thing, there were a lot more duplicates; here are the stats prior to the cleanup:

$ psc summary
  tracked 669,718
   unique 175,672
duplicate 494,046
 overhead  515 GB

Early attempts

The first time I tried to clean up our file server, I used well-known duplicate file finders like fdupes and its enhanced fork jdupes. At a high level, these programs scan the filesystem and output a list of duplicates. After that, you’re on your own. When I scanned the server, the tools found 494,046 duplicate files wasting a total of 515 GB of space. Going through these manually, one at a time, would be infeasible.

Many tools have a mode where they can prompt the user and delete files; with such a large number of duplicates, this would not be useful. Some tools have features that help with space savings but not with organization: hard linking duplicates, automatically deleting duplicates chosen arbitrarily, and automatically deleting duplicates chosen based on a heuristic like path depth. These features wouldn’t work for me.

I had a hypothesis that a lot of duplicate data was the result of entire directories being copied, so if the duplicate summary could merge duplicate directories rather than listing individual files, the output might be more manageable. I tried implementing this functionality, and I soon found out that this merging strategy works well for perfect copies, but it does not work well when folders have partial overlap, and most of the duplicate data on the server was like that. I tried to work around the issue and handle partial overlap through analyzing subset relationships between directories, but I basically ended up with a gigantic Venn diagram; I couldn’t figure out a clean and useful way to visualize the information.

Patterns of disorganization

I manually inspected some of the data on our file server to understand where duplicates came from and how they should be cleaned up, and I started noticing patterns:

  • A directory of organized data alongside a “to organize” directory. For example, we had organized media in “/Photos/{year}/{month}/{event name}”, and unorganized media in “/Unorganized”, in directories like “D300S Temp Copy Feb 11 2012”. In some cases the data inside the copies was fully represented in the organized photos directory hierarchy, but in other cases there were unique files that needed to be preserved and organized.
  • A directory snapshotted at different times. In many cases, it wasn’t necessary to keep multiple backups, we just needed the full set of unique files.
  • A redundant backup of an old machine. Nowadays we use Borg for machine backups, but in the past, we had cases where entire machines were backed up temporarily, such as before migrating to a new machine. Most of this data was copied to the new machine and subsequently backed up as part of that machine’s backups, but the old copy remained. Most of this data could be deleted, but in some cases there were files that were unique and needed to be preserved.
  • Duplicates in individuals’ personal directories. We organize some shared data like photos in a shared location, and other data in personal folders. We had some data that was copied in both locations.
  • Manually versioned documents. We had documents like “Essay.doc”, “Essay v2.doc”, “Essay v3.doc”, where some of the versions were identical to each other.

Generalizing from these patterns, I felt that an interactive tool would work best for cleaning up the data. The tool should support organizing data one directory at a time, listing directories and inspecting files to understand where duplicates live. I also wanted a safe wrapper around rm that would let me delete duplicates but not accidentally lose data by deleting a unique file. Additionally, I wanted a way to delete files in one directory only if they were present in another, so I could recursively delete everything in “/Unorganized” that was already present in “/Photos”.


Periscope implements the functionality summarized above. A psc scan searches for duplicate files in the same way as other duplicate file finders but it caches the information in a database. After that, commands like psc ls can run fast by leveraging the database. Commands like psc summary and psc report show high-level information on duplicates, psc ls and psc info enable interactively exploring the filesystem, and psc rm safely deletes duplicates.

More information on the Periscope workflow and commands is available in the documentation.

Related work

There are tons of duplicate file finders out there — fdupes, jdupes, rmlint, ddh, rdfind, fslint, duff, fddf, and fclones — to name a few. These tools find and print out duplicates; some have additional features like prompting for deletion or automatically deleting dupes based on heuristics. They were not suitable for my use case.

dupd is a utility that scans for duplicates, saves information to a database, and then allows for exploring the filesystem while querying the duplicate database for information. It was a source of inspiration for Periscope. The tools have somewhat differing philosophies and currently have two key differences: Periscope aims to provide commands that mirror coreutils counterparts (e.g. psc ls is not recursive, unlike dupd), and Periscope provides commands to safely delete files (one of dupd’s design goals is to not delete files). These seem essential for “scaling up” and handling a large volume of duplicates.


Periscope is free and open source software. Documentation, code, and binaries are available on GitHub.

Ponylang (SeanTAllen)

Last Week in Pony - August 2, 2020 August 03, 2020 01:46 AM

Pony 0.36.0 has been released! We recommend upgrading as soon as possible.

August 02, 2020

Derek Jones (derek-jones)

Scientific management of software production August 02, 2020 10:04 PM

When Frederick Taylor investigated the performance of workers in various industries, at the start of the 1900’s, he found that workers organise their work to suit themselves; workers were capable of producing significantly more than they routinely produced. This was hardly news. What made Taylor’s work different was that having discovered the huge difference between actual worker output and what he calculated could be achieved in practice, he was able to change work practices to achieve close to what he had calculated to be possible. Changing work practices took several years, and the workers did everything they could to resist it (Taylor’s The principles of scientific management is an honest and revealing account of his struggles).

Significantly increasing worker output pushed company profits through the roof, and managers everywhere wanted a piece of the action; scientific management took off. Note: scientific management is not a science of work, it is a science of the management of other people’s work.

The scientific management approach has been successfully applied to production where most of the work can be reduced to purely manual activities (i.e., requiring little thinking by those who performed them). The essence of the approach is to break down tasks into the smallest number of component parts, to simplify these components so they can be performed by less skilled workers, and to rearrange tasks in a way that gives management control over the production process. Deskilling tasks increases the size of the pool of potential workers, decreasing labor costs and increasing the interchangeability of workers.

Given the almost universal use of this management technique, it is to be expected that managers will attempt to apply it to the production of software. The software factory was tried, but did not take-off. The use of chief programmer teams had its origins in the scarcity of skilled staff; the idea is that somebody who knows what they were doing divides up the work into chunks that can be implemented by less skilled staff. This approach is essentially the early stages of scientific management, but it did not gain traction (see “Programmers and Managers: The Routinization of Computer Programming in the United States” by Kraft).

The production of software is different in that once the first copy has been created, the cost of reproduction is virtually zero. The human effort invested in creating software systems is primarily cognitive. The division between management and workers is along the lines of what they think about, not between thinking and physical effort.

Software systems can be broken down into simpler components (assuming all the requirements are known), but can the implementation of these components be simplified such that they can be implemented by less skilled developers? The process of simplification is practical when designing a system for repetitive reproduction (e.g., making the same widget again and again), but the first implementation of anything is unlikely to be simple (and only one implementation is needed for software).

If it is not possible to break down the implementation such that most of the work is easy to do, can we at least hire the most productive developers?

How productive are different developers? Programmer productivity has been a hot topic since people started writing software, but almost no effective research has been done.

I have no idea how to measure programmer productivity, but I do have some ideas about how to measure their performance (a high performance programmer can have zero productivity by writing programs, faster than anybody else, that don’t do anything useful, from the client’s perspective).

When the same task is repeatedly performed by different people it is possible to obtain some measure of average/minimum/maximum individual performance.

Task performance improves with practice, and an individual’s initial task performance will depend on their prior experience. Measuring performance based on a single implementation of a task provides some indication of minimum performance. To obtain information on an individual’s maximum performance they need to be measured over multiple performances of the same task (and of course working in a team affects performance).

Should high performance programmers be paid more than low performance programmers (ignoring the issue of productivity)? I am in favour of doing this.

What about productivity payments, e.g., piece work?

This question is a minefield of issues. Manual workers have been repeatedly found to set informal quotas amongst themselves, i.e., setting a maximum on the amount they will produce during a shift (see “Money and Motivation: An Analysis of Incentives in Industry” by William Whyte). Thankfully, I don’t think I will be in a position to have to address this issue anytime soon (i.e., I don’t see a reliable measure of programmer productivity being discovered in the foreseeable future).

Frederik Braun (freddyb)

Reference Sheet for Principals in Mozilla Code August 02, 2020 10:00 PM

Note: This is the reference sheet version. The details and the big picture are covered in Understanding Web Security Checks in Firefox (Part 1).

Principals as a level of privilege

A security context is always using one of these four kinds of Principals:

  • ContentPrincipal: This principal is used for typical …

Gustaf Erikson (gerikson)

July August 02, 2020 04:58 PM

Pete Corey (petecorey)

Descending Dungeon Numbers in J August 02, 2020 12:00 AM

Neil Sloane of the On-Line Encyclopedia of Integer Sequences has graced us with another appearance on Numberphile. In this video he discusses the sequence of “descending dungeon” numbers, their origins, how to construct them, and why they’re interesting.

I suggest you watch the video for a full explanation, but as a quick introduction, the sequence of descending dungeon numbers are generated by repeatedly reinterpreting a number as if it were written in a higher and higher base.

The sequence begins at 10. From there, we interpret 10 as if it were written in base 11, which gives us 11 in decimal (11 * 1 + 11 * 0). The next number in the sequence is 11 interpreted in base 12, or 13 (12 * 1 + 12 * 1). 13 is followed by 16. 16 is followed by 20, and so on.

Let’s try our hand at modeling the descending dungeon using the J programming language.

At the heart of our descending dungeon is the interpretation a number in a given base. J’s “base” verb (#.) lets us do just this, assuming our number is split into a list of its digits:

   11 #. 1 0
   12 #. 1 1
   13 #. 1 3
   14 #. 1 6

So it looks like we’re already half way there! Now we need a way of splitting a number into its component digits so we can feed it into #.. It turns out that we can do this using the inverse of the base verb, #.inv or #.:_1:

   10 #.inv 10
1 0
   10 #.inv 11
1 1
   10 #.inv 13
1 3
   10 #.inv 16
1 6

Great! We can now write a verb that accepts our current descending dungeon number and the next base to interpret it in, and spits our the next descending dungeon number in our sequence:

   11 (#.10&#.inv) 10
   12 (#.10&#.inv) 11
   13 (#.10&#.inv) 13
   14 (#.10&#.inv) 16

We’ll probably want to flip the order of our arguments to make it more ergonomic to feed in our list of bases, and reduce down to our sequence values:

   10 (#.10&#.inv)~ 11
   11 (#.10&#.inv)~ 12
   13 (#.10&#.inv)~ 13
   16 (#.10&#.inv)~ 14

With that change, we can easily “insert” (/) our verb between each element of our list of bases, and come up with the descending dungeon number at that point in the sequence:

   (#.10&#.inv)~/ 10
   (#.10&#.inv)~/ 10 11
   (#.10&#.inv)~/ 10 11 12
   (#.10&#.inv)~/ 10 11 12 13

We can “infix” (\) this verb to apply it to all successive parts of our list to build our our full list of descending dungeon numbers:

   (#.10&#.inv)~/\ 10 11 12 13
10 11 13 16

Now all that’s left is to clean up how we build our list of bases and plot our descent into the dungeon:

   plot (#.10&#.inv)~/\ 10 + i. 20

The video asks the question, “how quickly do these numbers grow?” The answer seems to be “very slowly, and then very quickly.” But where does that shift happen? The numbers of the sequence that we’ve seen so far seem to be increasing by a linearly increasing amount.

10 + 1 is 11. 11 + 2 is 13. 13 + 3 is 16. 16 + 4 is 20, and so on…

Let’s use J to calculate the difference between successive numbers in our descending dungeons sequence:

   sequence =: (#.10&#.inv)~/\ 10 + i. 20
   2(-~/;._3) sequence
1 2 3 4 5 6 7 8 9 10 22 48 104 224 480 1024 2176 4608 9728

We can look at this as the discrete derivative of our sequence, or the difference between each successive element. We can see that for the first ten steps into our descending dungeon, this delta increases linearly. However once we get to the eleventh step, things go off the rails.

So it seems like we can stand by our answer and definitively say that the descending dungeons sequence grows “very slowly, and then very quickly.” Good work, everyone. Mystery solved.

Overall, this was an interesting exercise in exploring numerical bases, and a fun excuse to practice our J skills. Here’s hoping Neil Sloane stars in another Numberphile video very soon!

August 01, 2020

Mark J. Nelson (mjn)

US universities announcing online Fall 2020 August 01, 2020 12:00 PM

A list of four-year universities in the United States who announced before August that their fall 2020 undergraduate classes will be taught all or almost all online, sorted by date of announcement.

There's variation within the plans in the list. Some are exclusively online, while others plan to have a limited number of in-person courses, e.g. science labs. Also, some universities plan to have the dorms and on-campus services open at reduced capacity, while others plan to have the campus mostly closed.

May 5 – California State University system: "our planning approach will result in CSU courses primarily being delivered virtually for the fall 2020 term, with limited exceptions".

June 11 – University of California Irvine (Irvine, CA): "Almost all undergraduate courses will be delivered in a remote format in the fall quarter. A few exceptions are being evaluated, and consist of specialized upper-division labs, specific clinical and experiential courses, and some design courses in Engineering."

June 15 – Harvard University (Cambridge, MA): "regardless of where our students are living, whether on campus or at home, learning will continue to be remote next year, with only rare exceptions" (Faculty of Arts and Sciences, which includes the undergraduate college).

June 16 – University of California Davis (Davis, CA): "When fall quarter instruction starts Sept. 30, the campus plans to offer most courses remotely, though some courses will also be available in person, depending on health guidelines and instructor preference."

June 17 – University of California Riverside (Riverside, CA): "No instructor will be required to teach in-person and no student will be required to participate in-person until the campus returns to normal operations".

June 17 – University of California Santa Cruz (Santa Cruz, CA): "UC Santa Cruz will offer most courses remotely or online and provide in-person instruction for a small number of courses that cannot be delivered remotely, as is the case for some laboratory, studio and field study courses".

June 22 – Bowdoin College (Brunswick, ME): "In order to provide the best learning experience possible, nearly all classes, including those on campus, will be taught online."

June 22 – University of Massachusetts Boston (Boston, MA): "Certain lab courses in the sciences and nursing courses that require the use of the simulation center will remain on campus. The rest of the curriculum will be delivered to you via remote instruction."

June 25 – Haskell Indian Nations University (Lawrence, KS): "Haskell President Ronald Graham told the Journal-World Thursday that all classes would be held virtually for the fall semester".

June 26 – New School (New York, NY): "All classes will be online this fall. Given what we know today, we believe that remote learning is the best option for the health and safety of the entire New School community and for preventing the spread of the virus".

June 29 – University of Massachusetts Amherst (Amherst, MA): "Only essential face-to-face labs, studios, performance, and other courses involving hands-on work will be conducted on campus and in-person. ... All other courses will be delivered remotely."

June 30 – Wilmington University (New Castle, DE): "No one knows if COVID-19 will continue to spread at its current pace. The virus is still in its first phase, and there is growing concern that numbers will increase, or possibly result in a second wave. Due to these uncertainties and the need to keep our community safe, courses will remain online for the fall 2020 semester."

June 30 – Zaytuna College (Berkeley, CA): "Zaytuna's leadership has decided that instruction for the Fall 2020 semester will be conducted online for both the undergraduate and graduate programs".

July 1 – Hampton University (Hampton, VA): "out of an abundance of caution for the health, safety and welfare of our students as well as the faculty, administrative staff, administrators, maintenance and custodial staff, and others with whom students might interact, Hampton University will provide remote instruction only for the first semester of academic year 2020-2021".

July 1 – Texas College (Tyler, TX): "Our efforts are not to compete with other entities and how they respond, but rather give consideration to the needs of our students, faculty and staff and internally assess what is needed for a safe environment pursuant to the resources we have available to us. With this as the backdrop of our planning, a decision has been made to offer online instruction only for the fall term."

July 1 – University of Southern California (Los Angeles, CA): "our undergraduate students primarily or exclusively will be taking their courses online in the fall term".

July 6 – Princeton University (Princeton, NJ): "Based on the information now available to us, we believe ... we will need to do much of our teaching online and remotely."

July 6 – Rutgers University (New Brunswick, NJ): "I am writing today to inform you that after careful consideration of all possible models for safely and effectively delivering instruction during the ongoing coronavirus pandemic, Rutgers is planning for a Fall 2020 semester that will combine a majority of remotely delivered courses with a limited number of in-person classes".

July 7 – Marymount Manhattan College (New York, NY): "Given new developments in the COVID-19 global pandemic, we have adopted a Virtual Classes/Open Campus model in which all classes will be offered in an online format".

July 8 – Pomona College (Claremont, CA): "As the public health situation deteriorated over the last two weeks, we had to look at the facts and make a responsible decision: In this unfolding emergency, we will not be able to bring students back to campus in the fall."

July 8 – Scripps College (Claremont, CA): "The Administration and Board of Trustees of the College have determined that our community can best achieve its mission and maintain safety by offering Scripps classes online during the fall 2020 semester".

July 10 – Jarvis Christian College (Hawkins, TX): "To limit exposure to the virus, we will continue our online classes for fall 2020".

July 10 – Savannah College of Art and Design (Savannah/Atlanta, GA): "Following careful deliberation of all reasonable options, the university is announcing that, as of right now, Fall 2020 on-ground courses will be delivered primarily virtually for SCAD Atlanta and SCAD Savannah students — with some exceptions to address the needs of certain programs and students".

July 10 – West Chester University (Chester County, PA): "My leadership team and I have made the decision to continue remote learning through the fall 2020 semester, with a few courses delivered in a hybrid format, meaning both in-person and remote, in order to assist those students with clinical placements, student teaching, performance obligations, internship sites, and similar academic responsibilities."

July 13 – Loyola University Chicago (Chicago, IL): "we have decided that the best plan for the upcoming fall semester is to shift most of our class offerings online".

July 14 – Bennett College (Greensboro, NC): "After careful research, analysis and consideration, we have made a decision to operate remotely for the Fall semester."

July 14 – Gallaudet University (Washington, DC): "We are still in the first phase of a three-phase model. Phase 1 requires primarily remote working, teaching, and learning operations. Therefore, all classroom instruction will be delivered remotely this fall."

July 14 – Loyola Marymount University (Los Angeles, CA): "This fall will be unlike any other semester: Our undergraduate courses will be principally and primarily conducted remotely".

July 14 – Simmons University (Boston, MA): "Our thoughtful process has led us to decide that all of our teaching and activities will be online for the Fall 2020 semester, with very few exceptions".

July 14 – University of San Francisco (San Francisco, CA): "Based on the surge of COVID-19 cases in San Francisco and California, Gov. Newsom’s announcements yesterday about rolling back the state’s reopening plans, and specific instructions issued to higher education institutions today by the San Francisco Department of Public Health (SFDPH), it is clear that we need to pivot to USF’s operations being primarily remote for the fall 2020 semester. This means nearly all academic courses will be online — save for certain exceptions such as those in clinical nursing programs."

July 15 – Dickinson College (Carlisle, PA): "we have come to the very difficult decision that the fall 2020 semester will be remote".

July 15 – Lesley University (Cambridge, MA): "We do not want to repeat the disruptive experience of last spring where we had to shut down our campus on short notice, so most campus facilities will remain closed at least through the end of 2020".

July 15 – Occidental College (Los Angeles, CA): "Today we are announcing that for the Fall 2020 semester all instruction will be remote".

July 15 – Rhodes College (Memphis, TN): "I write with a heavy heart to let you know that despite our hopes and plans, the external health conditions in Memphis do not support an on-campus fall semester".

July 17 – University of the Pacific (Stockton, CA): "Unfortunately, our regions’ flat and comparatively low rates of COVID-19 cases experienced through the spring have rapidly accelerated over the past month. Therefore, we have determined that it would be unwise to reopen our campuses as we had hoped and planned."

July 20 – California College of the Arts (San Francisco/Oakland, CA): "We write today with the unfortunate news that our fall semester courses will be conducted entirely remotely (online)."

July 20 – Clark Atlanta University (Atlanta, GA): "Today, Clark Atlanta University (CAU) announced the move to remote learning for all students during the Fall 2020 semester, taking the necessary precautions to ensure the safety of its students and entire CAU community."

July 20 – Grinnell College (Grinnell, IA): "It is thus with a heavy heart that, after consulting with campus leaders and the Board of Trustees Executive Committee, and discussing a multitude of options, we have determined that Fall Term 1 classes will be offered remotely".

July 20 – Morehouse College (Atlanta, GA): "I am writing to inform you of the difficult decision I have made to remain virtual for the Fall 2020 semester".

July 20 – Spelman College (Atlanta, GA): "Because of the worsening health crisis, we have reluctantly come to the realization that we can no longer safely sustain a residential campus and in-person instruction. With a sense of great disappointment, I now share with you our decision that all instruction for the fall of 2020 at Spelman will be virtual."

July 21 – University of California Berkeley (Berkeley, CA): "The increase in cases in the local community is of particular concern. Given this development, as well as it being unlikely that there will be a dramatic reversal in the public health situation before the fall semester instruction begins on Aug. 26, we have made the difficult decision to begin the fall semester with fully remote instruction."

July 22 – Azusa Pacific University (Azusa, CA): "we will now pivot to remote learning in an online modality this fall".

July 22 – Clemson University (Clemson, SC): "Clemson University will begin the Fall semester online and will delay in-person instruction until Sept. 21".

July 22 – Edinboro University (Edinboro, PA): "after careful consideration, we have decided to move most of our courses online for the fall semester".

July 22 – Lafayette College (Easton, PA): "I am sorry to say that we will not be reconvening as a community on Aug. 17. Instead, all fall semester courses at Lafayette will be offered online".

July 22 – Pepperdine University (Malibu, CA): "we have decided we can best protect the health and well-being of our students, faculty, and staff by conducting our fall semester online".

July 22 – University of Delaware (Newark, DE): "we feel it is necessary to shift our plan until conditions improve. The majority of our academic courses in the fall 2020 semester will be delivered online".

July 23 – Randolph College (Lynchburg, VA): "The simple truth is that we do not see the situation in our country improving before our campus opens to our full student body in a month's time. Because of this, we are not confident the College would be able to remain in-person the entire semester without serious COVID-19-caused disruptions. ... the College has decided to move its instruction online for the fall semester."

July 23 – South Carolina State University (Orangeburg, SC): "The recent significant escalation in infections in South Carolina and the Orangeburg community has caused us to revisit all of our plans to date for this coming fall semester. As a result, we will start the Fall Semester 2020 with all classes being delivered remotely".

July 23 – Washington State University system: "For the Fall 2020 semester, all undergraduate courses at WSU, with very few exceptions, will be delivered at a distance and will be completed remotely, with extremely limited exceptions for in-person instruction."

July 24 – Claremont McKenna College (Claremont, CA): "Given the recent, substantial increases in COVID-19 infection, hospitalization, and death rates in California and Los Angeles County and, even more decisively, the absence of necessary state and county authorization for residential, in-person higher education programs to reopen, we will not be allowed to resume on-campus learning in the fall."

July 24 – Lyon College (Batesville, AR): "With a heavy heart, I am reaching out to inform you that on Thursday, the Board of Trustees determined remote instruction for the fall would be in the best interest of the College".

July 24 – Pitzer College (Claremont, CA): "Recently ... it became abundantly clear that in spite of the challenges and financial pain, the wisest and most responsible action was to shift our focus and devote all of our energy into creating the most robust and engaging on-line learning communities possible".

July 24 – Whitman College (Walla Walla, WA): "we have made the extremely difficult decision that the fall 2020 semester will primarily be via remote learning".

July 27 – Agnes Scott College (Decatur, GA): "It is with a profound sense of sadness and disappointment that I write to inform you that we have made the painful decision to move to fully online courses for the fall semester.".

July 27 – George Washington University (Washington, DC): "we have made the difficult decision to hold all undergraduate courses online for the fall semester, with limited exceptions".

July 29 – Georgetown University (Washington, DC): "Courses for all undergraduate and graduate students will begin in virtual mode. Due to the acceleration of the spread of the virus and increasing restrictions on interstate travel we cannot proceed with our original plans for returning to campus this fall."

July 30 – American University (Washington, DC): "These evolving health conditions and government requirements now compel us to adjust our plan and offer fall semester undergraduate and graduate courses online with no residential experience."

July 30 – California Baptist University (Riverside, CA): "Today I am announcing that courses will be delivered primarily through live/synchronous remote instruction when CBU's fall semester begins".

July 30 – Johnson C. Smith University (Charlotte, NC): "Because the rate of transmission of the coronavirus shows no sign of slowing down and in the interest of the health and safety of everyone in the JCSU family and our community, the Board of Trustees, the Administration and I have made the difficult decision to deliver instruction solely online for the fall 2020 semester."

July 31 – Goucher College (Towson, MD): "It is with a great deal of disappointment that I write to inform you that, in consultation with the Fall Reopening Task Force and the Board of Trustees, I have made the difficult decision that our undergraduate students should not return back to campus this fall, and instead we should prepare to deliver this semester’s courses entirely online with the majority of our students studying from home."

July 31 – Queens University of Charlotte (Charlotte, NC): "It is with profound sadness and disappointment that I let you know we have made the decision to move to 100% virtual instruction for the fall semester, with no residential experience."

* * *

I collected the dates above either from dated announcements on the university's website, or in cases where announcements were undated (surprisingly common), from the date they were posted on the university's Twitter feed, mentioned in news articles, etc.

There's a bit of a judgment call in what I've counted as an announcement of being "almost" entirely online. I've included universities that say there are "limited exceptions" or similar, but not those that aim for a significant percentage of classes to be in-person. For example, I didn't include UCLA's June 15 announcement, which some news stories reported as "mostly online", because their stated 15-20% of classes in-person or hybrid seems to me to be too high to count as almost-all online.

See also: Here’s a List of Colleges’ Plans for Reopening in the Fall from the Chronicle of Higher Education.

Opening for a funded Masters student August 01, 2020 12:00 PM

I'm recruiting a funded Masters student to study how AI bots play games. The goal is to systematically understand the kinds of difficulty posed by games to AI algorithms, as well as the robustness of any conclusions. Some example experiments include: looking at how performance scales with parameters such as CPU time and problem size; how sensitive results are to rule variations, choice of algorithm parameters, etc.; and identification of games that maximally differentiate algorithm performance. Two previous papers of mine that give some flavor of this kind of research: [1], [2].

The primary desired skill is ability to run computational simulations, and to collect and analyze data from them. The available funding would pay for four semesters of full-ride Masters tuition, plus 15-20 hours/week of a work-study job during the academic year. The American University Game Lab offers three Masters-level degrees: the MS in Computer Science's Game & Computational Media track, the MA in Game Design, and the MFA in Games and Interactive Media.

The successful applicant would be funded on the National Science Foundation grant Characterizing Algorithm-Relative Difficulty of Agent Benchmarks. This does not have any citizenship/nationality requirements.

Anyone interested should both apply for the desired Masters program through the official application linked above (deadline July 1, though earlier is better), and email me to indicate that they would like to be considered for this scholarship. It's also fine to email me with inquiries before applying.

August 2020 update: This position has now been filled!

July 31, 2020

Aaron Bieber (qbit)

Unlocking SSH FIDO keys on device connect. July 31, 2020 11:43 PM

The problem

As a lazy type, I often find it trying to type “ssh-add -K” over and over. I even felt depleted typing it here!

Fortunately for me, OpenBSD makes it trivial to resolve this issue. All we need is:

The adder

This script will run our …hnnnssh-add -K.. command:


trap 'ssh-add -K' USR1

while true; do
	sleep 1;

Notice the trap line there? More on that later! This script should be called via /usr/local/bin/fido & from ~/.xsession or similar. The important thing is that it runs after you log in.

The watcher

hotplugd (in OpenBSD base) does things when stuff happens. That’s just what we need!

This script (/etc/hotplugd/attach) will be called every time we attach a device:



case "$DEVNAME" in
		pkill -USR1 -xf "/bin/sh /usr/local/bin/fido"

Notice that pkill command with USR1? That’s the magic that hits our trap line in the adder script!

Now enable / start up hotplugd:

# rcctl enable hotplugd
# rcctl start hotplugd

That’s it!

If you have all these bits in place, you should see ssh-askpass pop up when you connect a FIDO key to your machine!

Here is a video of it in action:

Your browser doesn't support HTML5 video. Here is a link to the video instead.

Thanks to kn@ for the USR1 suggestion! It really helped me be more lazy!

July 28, 2020

Marc Brooker (mjb)

A Story About a Fish July 28, 2020 12:00 AM

A Story About a Fish

Nothing's more boring than a fishing story.

In the 1930s, Marjorie Latimer was working as a museum curator in East London. Not the eastern part of London as one may expect. This East London is a small city on South Africa's south coast, named so thanks to colonialism's great tradition of creative and culturally relevant place names. Latimer was a keen and knowledgeable naturalist, and had a deal with local fishermen that they would let her know if they found anything unusual in their nets. One morning in 1938, she got a call from a fishing boat captain named Hendrik Goosen. He'd found something very unusual indeed, and wanted Marjorie to look at it. The fish which Hendrik Goosen showed Marjorie Latimer was truly unusual. Unlike anything she had seen before.

Latimer knew just the person to identify it: professor JLB Smith at Rhodes University in nearby Grahamstown (now Makhanda). He was away, so she had the unusual fish gutted and taxidermied, and sent sketches to the professor. He replied (in all-caps, following the fashion at the time):


Smith had immediately identified the fish as something well known to science. Many like it had been seen before. This one, however, was particularly surprising. It was alive, nearly 66 millions years after the last of its kin had been thought dead. Latimer had found a Coelacanth, a species of fish that had hardly evolved in the last 400 million years, and believed to exist only in the fossil record.

Marjorie Latimer and the Coelacanth

At the time, the Coelacanths were thought to be closely related to the Rhipidistia, which were thought to be an ancestor of all modern land-based vertebrates. The science on that topic has moved on, but Goosen's chance find, combined with Latimer's hard work in having it identified, created a special moment in the history of biology.

I was thinking about this story last night, because my daughter has been learning about Coelacanths at school. In the 1940s, JLB Smith and his wife Margaret wrote and illustrated a beautiful book called The Sea Fishes of Southern Africa. My grandmother studied biology at Rhodes during the time they were writing the book, and knew the Smiths and Marjorie Latimer. Margaret Smith gave her a signed copy of their book, sometime around 1950. I was fortunate to inherit the book, and share the Smiths description and drawings of the Coelacanths with my daughter.

I hadn't opened The Sea Fishes of Southern Africa in ten years, but re-reading Smith's description of it was like a visit with my late grandmother. She never failed to share her excitement about, and appreciation for, all living things. I vividly remember her telling the Coelacanth story, and her small part in it, sharing the wonder of discovery and the importance of paying attention to the things around us. You never know when you'll learn something new. Perhaps it is unwise to be too dogmatic.

July 26, 2020

Derek Jones (derek-jones)

Surveys are fake research July 26, 2020 10:29 PM

For some time now, my default position has been that software engineering surveys, of the questionnaire kind, are fake research (surveys of a particular research field used to be worth reading, but not so often these days; that issues is for another post). Every now and again a non-fake survey paper pops up, but I don’t consider the cost of scanning all the fake stuff to be worth the benefit of finding the rare non-fake survey.

In theory, surveys could be interesting and worth reading about. Some of the things that often go wrong in practice include:

  • poorly thought out questions. Questions need to be specific and applicable to the target audience. General questions are good for starting a conversation, but analysis of the answers is a nightmare. Perhaps the questions are non-specific because the researcher is looking for direction: well please don’t inflict your search for direction on the rest of us (a pointless plea in the fling it at the wall to see if it sticks world of academic publishing).

    Questions that demonstrate how little the researcher knows about the topic serve no purpose. The purpose of a survey is to provide information of interest to those in the field, not as a means of educating a researcher about what they should already know,

  • little effort is invested in contacting a representative sample. Questionnaires tend to be sent to the people that the researcher has easy access to, i.e., a convenience sample. The quality of answers depends on the quality and quantity of those who replied. People who run surveys for a living put a lot of effort into targeting as many of the right people as possible,
  • sloppy and unimaginative analysis of the replies. I am so fed up with seeing an extensive analysis of the demographics of those who replied. Tables containing response break-down by age, sex, type of degree (who outside of academia cares about this) create a scientific veneer hiding the lack of any meaningful analysis of the issues that motivated the survey.

Although I have taken part in surveys in the past, these days I recommend that people ignore requests to take part in surveys. Your replies only encourage more fake research.

The aim of this post is to warn readers about the growing use of this form of fake research. I don’t expect anything I say to have any impact on the number of survey papers published.

Ponylang (SeanTAllen)

Last Week in Pony - July 26, 2020 July 26, 2020 08:00 PM

The July 21 sync includes discussions about ponyup and the String API.

Gustaf Erikson (gerikson)

Two more novels by Paul McAuley July 26, 2020 07:51 PM


  • War of the Maps
  • Austral

McAuley has a wide range. These books were read in reverse publication order.

War of the Maps is a far-future SF story. After our sun has become a white dwarf, post-modern humans construct a Dyson sphere around it and seed it with humans and Earth life. According to the internal legends, they play around a bit then buzz off, leaving the rest of the environment to bumble along as best they can.

The tech level is more or less Victorian but people contend with unique challenges, such as a severe lack of metallic iron and malovelent AIs buried here and here.

Austral is a near-future crime story. A genetically modified young woman gets dragged into a kidnapping plot in a post-AGW Antarctica.

Both are well worth reading!

July 25, 2020

Gustaf Erikson (gerikson)

[SvSe] Söndagsvägen - berättelsen om ett mord av Peter Englund July 25, 2020 09:51 AM

Englund reflekterar Sveriges 60-tal via spegeln av ett sedan länge bortglömd mord. Genom att ta upp företeelser i tiden visas ett land i förändring, framförallt hur “det moderna projektet” börjar krackelera.

July 23, 2020

Andreas Zwinkau (qznc)

Peopleware July 23, 2020 12:00 AM

Leaders of software developer teams should care more about sociology.

Read full article!

July 21, 2020

Pete Corey (petecorey)

The Progression That Led Me to Build Glorious Voice Leader July 21, 2020 12:00 AM

This specific chord progression, played in this specific way, completely changed how I approach playing the guitar, opened my eyes to the beauty and elegance of voice leading seventh chords, and ultimately inspired me to build Glorious Voice Leader:

The progression is simply the C major scale, played in diatonic fourths, and harmonized as diatonic seventh chords.

To break that down further, we’re starting on C. Harmonizing C as a diatonic seventh chord in the scale of C major gives us a Cmaj7 chord. Next we’d move down a fourth to F. Harmonizing F as a seventh chord gives us Fmaj7. Next we’d move to B and a Bm7b5 chord, and continue until we arrive back at C.

For something that’s basically a glorified scale exercise, this chord progressions sounds good. It’s almost… musical.

One of the most interesting aspects of this chord progression is how, when properly voice led, the voicings fall smoothly down the scale. Try it for yourself. Pick any starting voicing of Cmaj7, and this miniature version of Glorious Voice Leader will fill in the rest of the progression:

Explore this chord progression in Glorious Voice Leader

At every transition in the progression, the root and third of the current chord stay where they are, but become the fifth and seventh of the following chord. The fifth and seventh move down a scale degree and become the root and third of the next chord.

Compare that to the same chord progression, but harmonizing each scale note as a triad, rather than a four-note seventh chord:

Explore this chord progression in Glorious Voice Leader

Without the gravity of the added seventh to pull it down, the progression tends to rise upwards. The third of the current chord moves up to become the root of the next chord, and the fifth moves up to become the third of the next chord. Only the root stays stationary, becoming the third in the next chord.

If we look closely, there isn’t much difference in the voice movement between the seventh chord and triad versions of these progressions. In fact, of the voices that move, there may be more total movement in the seventh chords. However, the two stationary voices help make the seventh chords feel more cohesive and interlocked.

This chord progression is a world unto itself, and can act as a jumping point into almost every area of music theory and study. I found the voice leading in these chords so fascinating that I dedicated hundreds of hours of my life to building Glorious Voice Leader, a tool designed to help you study and explore voice leading on the guitar.

What does this progression inspire in you?

canvas { width: 100%; height: 100%; } #root1, #root2, #root3 { width: 100%; } .subtitle { display: block; text-align: center; color: #999; margin: 1rem 0 0 0; font-size: 0.8rem; }

July 19, 2020

Derek Jones (derek-jones)

Effort estimation’s inaccurate past and the way forward July 19, 2020 10:07 PM

Almost since people started building software systems, effort estimation has been a hot topic for researchers.

Effort estimation models are necessarily driven by the available data (the Putnam model is one of few whose theory is based on more than arm waving). General information about source code can often be obtained (e.g., size in lines of code), and before package software and open source, software with roughly the same functionality was being implemented in lots of organizations.

Estimation models based on source code characteristics proliferated, e.g., COCOMO. What these models overlooked was human variability in implementing the same functionality (a standard deviation that is 25% of the actual size is going to introduce a lot of uncertainty into any effort estimate), along with the more obvious assumption that effort was closely tied to source code characteristics.

The advent of high-tech clueless button pushing machine learning created a resurgence of new effort estimation models; actually they are estimation adjustment models, because they require an initial estimate as one of the input variables. Creating a machine learned model requires a list of estimated/actual values, along with any other available information, to build a mapping function.

The sparseness of the data to learn from (at most a few hundred observations of half-a-dozen measured variables, and usually less) has not prevented a stream of puffed-up publications making all kinds of unfounded claims.

Until a few years ago the available public estimation data did not include any information about who made the estimate. Once estimation data contained the information needed to distinguish the different people making estimates, the uncertainty introduced by human variability was revealed (some consistently underestimating, others consistently overestimating, with 25% difference between two estimators being common, and a factor of two difference between some pairs of estimators).

How much accuracy is it realistic to expect with effort estimates?

At the moment we don’t have enough information on the software development process to be able to create a realistic model; without a realistic model of the development process, it’s a waste of time complaining about the availability of information to feed into a model.

I think a project simulation model is the only technique capable of creating a good enough model for use in industry; something like Abdel-Hamid’s tour de force PhD thesis (he also ignores my emails).

We are still in the early stages of finding out the components that need to be fitted together to build a model of software development, e.g., round numbers.

Even if all attempts to build such a model fail, there will be payback from a better understanding of the development process.

Ponylang (SeanTAllen)

Last Week in Pony - July 19, 2020 July 19, 2020 09:53 PM

We have some nice improvements to the website FAQ and corral documentation. RFC 67 has been approved and implemented!

July 18, 2020

Andrew Owen (yumaikas)

Art Challenge: First 8 days July 18, 2020 10:40 PM


Emily came across an art challenge on Pintrest, and suggested that we could both do each prompt for it.

An art challenge that lists out 30 days of art prompts

Her medium of preference is pencil and ink, and mine is pixel art.

Day 1: Bones


A picture of a squirrel skull with some acorns and oak leaves behind it


A picture of a skull and crossbones

Day 2: Exotic Pet


An ink sketch of a long-necked tortoise


A pixel art image of a hedgehog, named Pokey

Day 3: Something with two heads


A picture of a two headed rat


A picture of a yellow lizard in two scenes, back to back against a window, the right scene is sunny and the lizard looks a little cheery, the left scene has rain and clouds in the background, and the lizard looks glum

Day 4: Flowers


An ink picture of a flower


A pixel art image of 3 yellow flowers in a porch planter, with some flowers on a trellis behind the planter attached to some Ivy

Day 5: Childhood Toy


An ink picture of a lego minifigure with a blank face


A pixel art rendering of a blue and yellow toy plane from 3 angles, front, top and side, with the final corner having a cloud

Day 6: Eyeball


An ink picture of an eyeball that has a wick, and it melting, like a candle. It looks mad at it's predicament


A pixel art picture of an eyeball with eyelids. It is a little unsettling due to being mostly disembodied

Day 7: Crystals


An ink picture a crystal formation that is growing a mushroom, and some small plants


An animation of a lumpy green mass cracking. As the cracks progress, a crystal heart is revealed, and the text

Day 8: Something from the sea


An ink picture of a ray that has a lot of cool repeating patterns


A pixel

Postscript so far

Funnily enough, Emily’s been able to to finish her pictures much more quickly than I have. I suppose I let the pixels give me an excuse to be fussy. It’s been a good way to practice working with Asperite, and to be creative and let out some of the visuals I’ve had in my head for years, but have never expressed.

July 16, 2020

Gokberk Yaltirakli (gkbrk)

Status update, July 2020 July 16, 2020 09:00 PM

This has been a fast and chaotic month for my life and career, and a rather slow month for my blog and personal projects. I severely underestimated the effort it takes to pack up all my belongings while figuring out everything about my future employment. Because of this, the status update is both later than I intended, and has less content.

I added a search box to my website. You should be able to find it on the right hand side. I am not using a custom solution for search for now, so the search will just redirect you to DuckDuckGo.

I pushed a breaking update for JustIRC. JustIRC is one of the first Python modules I’ve written. It was created while I was just learning about network protocols and socket programming. These two facts combined resulted in an API design that was less than ideal.

A new version that fixed a lot of the problems with the API was pushed. Additionally, documentation of the module was greatly improved. Some missing functionality, such as TLS support was added. These improvements should make it much easier to use the library in new projects.

On the kernel side, things are mostly quiet aside from some bugfixes and internal changes.

A filesystem API was introduced. This API is implemented by TarFS, and will be implemented by any future filesystems. This abstraction over file operations will mainly be used for the upcoming Virtual File System implementation. A VFS will allow the kernel to mount multiple filesystems into the same file tree, and open the path for some cool FS tricks such as exposing devices and kernel internals as files and directories.

The dynamic memory allocator of the kernel was switched from a best-fit allocator to an exact-fit allocator. This ended up making the memory system more flexible. The previous allocator used to create fixed-size memory blocks of different sizes up to a maximum size. This meant larger allocations would fail, and you would have to waste some memory if they were not used.

The new allocator handles allocations of the same size much better, and the odd large allocation (like a framebuffer) does not waste any unnecessary memory. Along with the allocation algorithm, some bugs related to memory alignment were fixed. All the work on the memory allocator improved the system stability a lot.

My privacy and browser hardening extension, browser-harden, got some fixes that allow certain websites to work. Some JS-heavy frameworks were overriding a lot of browser APIs and interacting in a bad way with the extension. The issues I came across were fixed and the fix was pushed to Firefox Addons.

In order to learn desktop GUI programming, and to get familiar with the GTK framework, I made a simple GTK app in Python. It is an imageboard viewer in a single file.

That’s all for this month, thanks for reading!

Caius Durling (caius)

Tailscale, RFC1918, and DNS Rebinding Protection July 16, 2020 08:00 PM

Edit: Originally this post was written to be a workaround for Tailscale routing all DNS traffic over its own link when you configured it to push out existing DNS Server IPs. This turned out to be a bad assumption on my part. Thanks to apenwarr for helping me understand that shouldn’t be the case, and encouraging me to debug it properly rather than making assumptions.

Naturally it turned out to be a PEBKAC. I’d pushed out as the DNS Server IP which is a nameserver rather than a forwarder. This in turn meant people were getting empty answers back to DNS queries, which stopped once they quit tailscale. (Go figure, Tailscale removes the resolver from the network stack when it quits.) The post has been updated to remove that invalid assumption. 🤦🏻‍♂️

Imagine we have a fleet of machines sat in a private network somewhere on a IP range, with entries pointing at them published on public DNS servers. Eg, dig +short workhorse.fake.tld returns

Initially this all works swimmingly, until someone comes along that is using a DNS forwarder that with DNS rebinding protection enabled. Daniel Miessler has a wonderfully succinct explanation on his blog about DNS Rebinding attacks, but to protect against it you stop your resolver returning answers to DNS queries from public servers which resolve to IP addresses within standard internal network ranges. (ie, rfc1918.)

This means for those users they can successfully connect to our Tailscale network and access everything by IPs directly, but can’t access any of the internal infrastructure by hostname. eg, dig +short workhorse.fake.tld will return an empty answer for them.

Once we figured out the root cause of that, for workarounds we figured we could either run a DNS forwarder within our own infrastructure, or get all our staff to change their home DNS settings and hope they were never on locked down networks ever again.

We chose the former, and thankfully dnsmasq is really easy to configure in this fashion and we already have a node which is acting as the tailscale subnet relay, so we dropped the following config in /etc/dnsmasq.conf on there:

# Only listen for requests from VPN/local for debugging
# Google DNS
# Quad9
# Cloudflare
# Race all servers to see which wins
# Try and stop DNS rebinding, except where we expect it to happen

One quick puppet run later, and our Tailscale subnet relays are happily running both tailscale and dnsmasq, serving out answers as fast as they can to other Tailscale nodes. Add port 53 to the Tailscale ACL and away we went.

Unrelenting Technology (myfreeweb)

Wow. micro HDMI is the worst connector ever. (well, at least this... July 16, 2020 05:00 PM

Wow. micro HDMI is the worst connector ever.

(well, at least this particular adapter is terrible.. or the Pi 4 grabs too hard?)

Joe Nelson (begriffs)

Create impeccable MIME email from markdown July 16, 2020 12:00 AM

The goal

I want to create emails that look their best in all mail clients, whether graphical or text based. Ideally I’d write a message in a simple format like Markdown, and generate the final email from the input file. Additionally, I’d like to be able to include fenced code snippets in the message, and make them available as attachments.


I created a utility called mimedown that reads markdown through stdin and prints multipart MIME to stdout.

Let’s see it in action. Here’s an example message:

## This is a demo email with code

Hey, does this code look fishy to you?

#include <stdio.h>

int main(void)
	char a[] = "string literal";
	char *p  = "string literal";

	/* capitalize first letter */
	p[0] = a[0] = 'S';
	printf("a: %s\np: %s\n", a, p);
	return 0;

It blows up when I compile it and run it:

$ cc -std=c99 -pedantic -Wall -Wextra crash.c -o crash
$ ./crash
Bus error: 10

Turns out we're invoking undefined behavior.

* The C99 spec, appendix J.2 Undefined Behavior mentions this case:
  > The program attempts to modify a string literal (6.4.5).
* Steve Summit's C FAQ [question 1.32](
  covers the difference between an array initialized with string literal vs a
  pointer to a string literal constant.
* The SEI CERT C Coding standard
  demonstrates the problem with non-compliant code, and compares with compliant

After running it through the generator and emailing it to myself, here’s how the result looks in the Fastmail web interface:

rendered in fastmail

rendered in fastmail

Notice how the code blocks are displayed inline and are available as attachments with the correct MIME type.

I intentionally haven’t configured Mutt to render HTML, so it falls back to the text alternative in the message, which also looks good. Notice how the message body is interleaved with Content-Disposition: inline attachments for each code snippet.

code and text in Mutt

code and text in Mutt

The email generator also creates references for external urls. It substitutes the urls in the original body text with references, and consolidates the links into a bibliography of type text/uri-list at the end of the message. Here’s another Mutt screenshot of the end of the message, with red circles added.

links as references

links as references

The generated MIME structure of our sample message looks like this:

  I     1 <no description>          [multipa/alternativ, 7bit, 3.1K]
  I     2 ├─><no description>            [multipa/mixed, 7bit, 1.7K]
  I     3 │ ├─><no description>      [text/plain, 7bit, utf-8, 0.1K]
  I     4 │ ├─>crash.c                 [text/x-c, 7bit, utf-8, 0.2K]
  I     5 │ ├─><no description>      [text/plain, 7bit, utf-8, 0.1K]
  I     6 │ ├─>compile.txt           [text/plain, 7bit, utf-8, 0.1K]
  I     7 │ ├─><no description>      [text/plain, 7bit, utf-8, 0.5K]
  I     8 │ └─>references.uri     [text/uri-list, 7bit, utf-8, 0.2K]
  I     9 └─><no description>         [text/html, 7bit, utf-8, 1.3K]

At the outermost level, the message is split into two alternatives: HTML and multipart/mixed. Within the multipart/mixed part is a succession of message text and code snippets, all with inline disposition. The final mixed item is the list of referenced urls (if necessary).

Other niceties

Lines of the message body are re-flowed to at most 72 characters, to conform to historical length constraints. Additionally, to accommodate narrow terminal windows, mimedown uses a technique called format=flowed. This is a clever standard (RFC 3676) which adds trailing spaces to any lines that we would like the client reader to re-flow, such as those in paragraphs.

Neither hard wrapping nor format=flowed is applied to code block fences in the original markdown. Code snippets are turned into verbatim attachments and won’t be mangled.

Finally, the HTML version of the message is tasteful and conservative. It should display properly on any HTML client, since it validates with ISO HTML (ISO/IEC 15445:2000, based on HTML 4.01 Strict).

Try it yourself

Clone it here: It’s written in portable C99. The only build dependency is the cmark library for parsing markdown.

July 13, 2020

Pete Corey (petecorey)

Suggesting Chord Names with Glorious Voice Leader July 13, 2020 12:00 AM

Glorious Voice Leader, my chord-obsessed side project, now has the ability to turn a collection of notes played on the guitar fretboard into a list of possible chord names. Deciding on a specific chord name is still a very human, very context dependent task, but we can let the computer do a lot of the heavy lifting for us.

I’ve included a simplified version of this chord namer to the left. Feel free to click on the frets to enter any guitar chord you’d like the name of. Glorious Voice Leader will crunch the numbers and come up with a list of possible names that exactly describes the chord you’ve entered, sorted alphabetically.

In the full-fledged Glorious Voice Leader application, this functionality is accessible by simply clicking on the fretboard without first selecting the name of the chord you want. This felt like an intuitive design decision. You might know the shape of a specific chord you want to play in a progression, but you’re not sure of its name.

Enter it into the fretboard and Glorious Voice Leader will give you a corresponding list of names. When you click on one of those names, it’ll automatically suggest alternative voicings that voice lead smoothly from the previous chord.

The actual code behind this feature is dead simple. We simply filter over our set of all possible chord roots and qualities, and compare the set of notes in each resulting chord with the set of notes entered by the user:

let possibleNames = _.chain(qualities)
  .flatMap(quality =>, root => {
      return {
  .filter(({ root, quality }) => {
    if (_.isEmpty(chord.notes)) {
      return false;
    let chordNotes = _.chain(chord.notes)
      .map(([string, fret]) => (tuning[string] + fret) % 12)
    let qualityNotes = _.chain(quality.quality)
      .map(note => (roots[root] + note) % 12)
    return _.isEqual(chordNotes, qualityNotes);
  .map(({ root, quality }) => {
    return `${root}${}`;

From there we simply present the list of possible chord names to the user in some meaningful or actionable way.

For future work, it would be nice to sort the list of name suggestions in order of the lowest notes they entered on the fretboard. For example, if they entered the notes C, E, G, and B in ascending order, we should sort the Cmaj7 suggestion before the Am9 no 1 suggestion. As with all of the items on my future work list, there are many subtitles and nuances here that would have to be addressed before it becomes a reality.

I hope you find this helpful. If you find Glorious Voice Leader interesting or useful in any way, please let me know!

canvas { width: 100%; height: 100%; } #root { float: left; height: 40rem; margin: 0 0 0 2rem; }

July 12, 2020

Derek Jones (derek-jones)

No replies to 135 research data requests: paper titles+author emails July 12, 2020 09:05 PM

I regularly email researchers referring to a paper of theirs I have read, and asking for a copy of the data to use as an example in my evidence-based software engineering book; of course their work is cited as the source.

Around a third of emails don’t receive any reply (a small number ask why they should spend time sorting out the data for me, and I wrote a post to answer this question). If there is no reply after roughly 6-months, I follow up with a reminder, saying that I am still interested in their data (maybe 15% respond). If the data looks really interesting, I might email again after 6-12 months (I have outstanding requests going back to 2013).

I put some effort into checking that a current email address is being used. Sometimes the work was done by somebody who has moved into industry, and if I cannot find what looks like a current address I might email their supervisor.

I have had replies to later email, apologizing, saying that the first email was caught by their spam filter (the number of links in the email template was reduced to make it look less like spam). Sometimes the original email never percolated to the top of their todo list.

There are around 135 unreplied email requests (the data was automatically extracted from my email archive and is not perfect); the list of papers is below (the title is sometimes truncated because of the extraction process).

Given that I have collected around 620 software engineering datasets (there are several ways of counting a dataset), another 135 would make a noticeable difference. I suspect that much of the data is now lost, but even 10 more datasets would be nice to have.

After the following list of titles is a list of the 254 author last known email addresses. If you know any of these people, please ask them to get in touch.

If you are an author of one of these papers: ideally send me the data, otherwise email to tell me the status of the data (I’m summarising responses, so others can get some idea of what to expect).

50 CVEs in 50 Days: Fuzzing Adobe Reader
A Change-Aware Per-File Analysis to Compile Configurable Systems
A Design Structure Matrix Approach for Measuring Co-Change-Modularity
A Foundation for the Accurate Prediction of the Soft Error
A Large Scale Evaluation of Automated Unit Test Generation Using
A large-scale study of the time required to compromise
A Large-Scale Study On Repetitiveness, Containment, and
Analysing Humanly Generated Random Number Sequences: A Pattern-Based
Analysis of Software Aging in a Web Server
Analyzing and predicting effort associated with finding & fixing
Analyzing CAD competence with univariate and multivariate
Analyzing Differences in Risk Perceptions between Developers
Analyzing the Decision Criteria of Software Developers Based on
An analysis of the effect of environmental and systems complexity on
An Empirical Analysis of Software-as-a-Service Development
An Empirical Comparison of Forgetting Models
An empirical study of the textual similarity between
An error model for pointing based on Fitts' law
An Evolutionary Study of Linux Memory Management for Fun and Profit
An examination of some software development effort and
An Experimental Survey of Energy Management Across the Stack
Anomaly Trends for Missions to Mars: Mars Global Surveyor
A Quantitative Evaluation of the RAPL Power Control System
Are Information Security Professionals Expected Value Maximisers?:
A replicated and refined empirical study of the use of friends in
A Study of Repetitiveness of Code Changes in Software Evolution
A Study on the Interactive Effects among Software Project Duration, Risk
Bias in Proportion Judgments: The Cyclical Power Model
Capitalization of software development costs
Configuration-aware regression testing: an empirical study of sampling
Cost-Benefit Analysis of Technical Software Documentation
Decomposing the problem-size effect: A comparison of response
Determinants of vendor profitability in two contractual regimes:
Diagnosing organizational risks in software projects:
Early estimation of users’ perception of Software Quality
Empirical Analysis of Factors Affecting Confirmation
Estimating Agile Software Project Effort: An Empirical Study
Estimating computer depreciation using online auction data
Estimation fulfillment in software development projects
Ethical considerations in internet code reuse: A
Evaluating. Heuristics for Planning Effective and
Evaluating Pair Programming with Respect to System Complexity and
Evidence-Based Decision Making in Lean Software Project Management
Explaining Multisourcing Decisions in Application Outsourcing
Exploring defect correlations in a major. Fortran numerical library
Extended Comprehensive Study of Association Measures for
Eye gaze reveals a fast, parallel extraction of the syntax of
Factorial design analysis applied to the performance of
Frequent Value Locality and Its Applications
Historical and Impact Analysis of API Breaking Changes:
How do i know whether to trust a research result?
How do OSS projects change in number and size?
How much is “about” ? Fuzzy interpretation of approximate
Humans have evolved specialized skills of
Identifying and Classifying Ambiguity for Regulatory Requirements
Identifying Technical Competences of IT Professionals. The Case of
Impact of Programming and Application-Specific Knowledge
Individual-Level Loss Aversion in Riskless and Risky Choices
Industry Shakeouts and Technological Change
Inherent Diversity in Replicated Architectures
Initial Coin Offerings and Agile Practices
Interpreting Gradable Adjectives in Context: Domain
Is Branch Coverage a Good Measure of Testing Effectiveness?
JavaScript Developer Survey Results
Knowledge Acquisition Activity in Software Development
Language matters
Learning from Evolution History to Predict Future Requirement Changes
Learning from Experience in Software Development:
Learning from Prior Experience: An Empirical Study of
Links Between the Personalities, Views and Attitudes of Software Engineers
Making root cause analysis feasible for large code bases:
Making-Sense of the Impact and Importance of Outliers in Project
Management Aspects of Software Clone Detection and Analysis
Managing knowledge sharing in distributed innovation from the
Many-Core Compiler Fuzzing
Measuring Agility
Mining for Computing Jobs
Mining the Archive of Formal Proofs.
Modeling Readability to Improve Unit Tests
Modeling the Occurrence of Defects and Change
Modelling and Evaluating Software Project Risks with Quantitative
Moore’s Law and the Semiconductor Industry: A Vintage Model
More Testers – The Effect of Crowd Size and Time Restriction in
Motivations for self-assembling into project teams
Networks, social influence and the choice among competing innovations:
Nonliteral understanding of number words
Nonstationarity and the measurement of psychophysical response in
Occupations in Information Technology
On information systems project abandonment
On the Positive Effect of Reactive Programming on Software
On Vendor Preferences for Contract Types in Offshore Software Projects:
Peer Review on Open Source Software Projects:
Parameter-based refactoring and the relationship with fan-in/fan-out
Participation in Open Knowledge Communities and Job-Hopping:
Pipeline management for the acquisition of industrial projects
Predicting the Reliability of Mass-Market Software in the Marketplace
Prototyping A Process Monitoring Experiment
Quality vs risk: An investigation of their relationship in
Quantitative empirical trends in technical performance
Reported project management effort, project size, and contract type.
Reproducible Research in the Mathematical Sciences
Semantic Versioning versus Breaking Changes
Software Aging Analysis of the Linux Operating System
Software reliability as a function of user execution patterns
Software Start-up failure An exploratory study on the
Spatial estimation: a non-Bayesian alternative
System Life Expectancy and the Maintenance Effort: Exploring
Testing as an Investment
The enigma of evaluation: benefits, costs and risks of IT in
The impact of size and volatility on IT project performance
The Influence of Size and Coverage on Test Suite
The Marginal Value of Increased Testing: An Empirical Analysis
The nature of the times to flight software failure during space missions
Theoretical and Practical Aspects of Programming Contest Ratings
The Performance of the N-Fold Requirement Inspection Method
The Reaction of Open-Source Projects to New Language Features:
The Role of Contracts on Quality and Returns to Quality in Offshore
The Stagnating Job Market for Young Scientists
Time Pressure — A Controlled Experiment of Test-case Development and
Turnover of Information Technology Professionals:
Unconventional applications of compiler analysis
Unifying DVFS and offlining in mobile multicores
Use of Structural Equation Modeling to Empirically Study the Turnover
Use Two-Level Rejuvenation to Combat Software Aging and
Using Function Points in Agile Projects
Using Learning Curves to Mine Student Models
Virtual Integration for Improved System Design
Which reduces IT turnover intention the most: Workplace characteristics
Why Did Your Project Fail?
Within-Die Variation-Aware Dynamic-Voltage-Frequency

Author emails (automatically extracted and manually checked to remove people who have replied on other issues; I hope I have caught them all).

Ponylang (SeanTAllen)

Last Week in Pony - July 12, 2020 July 12, 2020 03:05 PM

Sync audo for July 7 is available. RFC PR #175 is ready for vote on the next sync meeting.

Andreas Zwinkau (qznc)

Crossing the Chasm July 12, 2020 12:00 AM

The book describes the dangerous transition from early adopters to an early majority market

Read full article!

July 11, 2020

Andrew Montalenti (amontalenti)

Learning about babashka (bb), a minimalist Clojure for building CLI tools July 11, 2020 06:25 PM

A few years back, I wrote Clojonic: Pythonic Clojure, which compares Clojure to Python, and concluded:

My exploration of Clojure so far has made me realize that the languages share surprisingly more in common than I originally thought as an outside observer. Indeed, I think Clojure may be the most “Pythonic” language running on the JVM today (short of Jython, of course).

That said, as that article discussed, Clojure is a very different language than Python. As Rich Hickey, the creator of Clojure, put it in his “A History of Clojure”:

Most developers come to Clojure from Java, JavaScript, Python, Ruby and other OO languages. [… T]he most significant […] problem  [in adopting Clojure] is learning functional programming. Clojure is not multiparadigm, it is FP or nothing. None of the imperative techniques they are used to are available. That said, the language is small and the data structure set evident. Clojure has a reputation for being opinionated, opinionated languages being those that somewhat force a particular development style or strategy, which I will graciously accept as meaning the idioms are clear, and somewhat inescapable.

There is one area in which Clojure and Python seem to have a gulf between them, for a seemingly minor (but, in practice, major) technical reason. Clojure, being a JVM language, inherits the JVM’s slow start-up time, especially for short-lived scripts, as is common for UNIX CLI tools and scripts.

As a result, though Clojure is a relatively popular general purpose programming language — and, indeed, one of the most popular dynamic functional programming languages in existence — it is still notably unpopular for writing quick scripts and commonly-used CLI tools. But, in theory, this needn’t be the case!

If you’re a regular UNIX user, you probably have come across hundreds of scripts with a “shebang”, e.g. something like #!/usr/bin/env python3 at the top of Python 3 scripts or #!/bin/bash for bash scripts. But I bet you have rarely, perhaps never, come across something like #!/usr/bin/env java or #!/usr/bin/env clojure. It’s not that either of these is impossible or unworkable. No, they are simply unergonomic. Thus, they aren’t preferred.

The lack of ergonomics stems from a number of reasons inherent to the JVM, notably slow start-up time and complex system-level classpath/dependency management.

Given Clojure’s concision, readability, and dynamism, it might be a nice language for scripting and CLI tools, if we could only get around that slow start-up time problem. Could we somehow leverage the Clojure standard library and a subset of the Java standard library as a “batteries included” default environment, and have it all compiled into a fast-launching native binary?

Well, it turns out, someone else had this idea, and went ahead and implemented it. Enter babashka.


To quote the README:

Babashka is implemented using the Small Clojure Interpreter. This means that a snippet or script is not compiled to JVM bytecode, but executed form by form by a runtime which implements a sufficiently large subset of Clojure. Babashka is compiled to a native binary using GraalVM. It comes with a selection of built-in namespaces and functions from Clojure and other useful libraries. The data types (numbers, strings, persistent collections) are the same. Multi-threading is supported (pmapfuture). Babashka includes a pre-selected set of Java classes; you cannot add Java classes at runtime.

Wow! That’s a pretty neat trick. If you install babashka — which is available as a native binary for Windows, macOS, and Linux — you’ll be able to run bb to try it out. For example:

$ bb
Babashka v0.1.3 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.

user=> (+ 2 2)
user=> (println (range 5))
(0 1 2 3 4)
user=> :repl/quit

And, the fast start-up time is legit. For example, here’s a simple “Hello, world!” in Clojure stored in hello.clj:

(println "Hello, world!")

Now compare:

$ multitime -n 10 -s 1 clojure hello.clj
        Mean        Std.Dev.    Min         Median      Max
user    1.753       0.090       1.613       1.740       1.954       
$ multitime -n 10 -s 1 bb hello.clj
        Mean        Std.Dev.    Min         Median      Max
user    0.004       0.005       0.000       0.004       0.012       

That’s a pretty big difference on my modern machine! That’s a median start-up time of 1.7 seconds using the JVM version, and a median start-up time of 0.004 seconds — that is, four one-thousandths of a second, or 4 milliseconds — using bb, the Babashka version! The JVM version is almost 500x slower!

How does this compare to Python?

$ multitime -n 10 -s 1 python3
        Mean        Std.Dev.    Min         Median      Max
user    0.012       0.004       0.006       0.011       0.018       

So, bb‘s start-up is as fast as, perhaps even a little faster than, Python 3. Pretty cool!

All that said, the creator of Babashka has said, publicly:

It’s not targeted at Python programmers or Go programmers. I just want to use Clojure. The target audience for Babashka is people who want to use Clojure to build scripts and CLI tools.

Fair enough. But, as Rich Hickey said, there can be really good reasons for Python, Ruby, and Go programmers to take a peek at Clojure. There are some situations in which it could really simplify your code or approach. Not always, but there are certainly some strengths. Here’s what Hickey had to say about it:

[New Clojure users often] find the amount of code they have to write is significantly reduced, 2—5x or more. A much higher percentage of the code they are writing is related to their problem domain.

Aside from being a useful tool for this niche, bb is also just a fascinating F/OSS research project. For example, the way it manages to pull off native binaries across platforms is via the GraalVM native-image facility. Studying GraalVM native-image is interesting in itself, but bb makes use of this facility and makes its benefit accessible to Clojure programmers without resorting to complex build toolchains.

With bb now stable, its creator took a stab at rewriting the clojure wrapper script itself in Babashka. That is, Clojure programmers may not have realized that when they invoke clojure on Linux, what’s really happening is that they are calling out to a bash script that then detects the local JVM and classpath, and then execs out to the java CLI for the JVM itself. On Windows, that same clojure wrapper script is implemented in PowerShell, pretty much by necessity, and serves the same purpose as the Linux bash script, but is totally different code. Well, now there’s something called deps.clj, which eliminates the need to use bash and PowerShell here, and uses Babashka-flavored Clojure code instead. See the deps.clj rationale in the README for more on that.

If you want a simple real-world example of a full-fledged Babashka-flavored Clojure program that does something useful at the command-line, you can take look at clj-kondo, a simple command-line Clojure linter (akin to pyflakes or flake8 in the Python community), which is also by the same author.

Overall, Babashka is not just a really cool hack, but also a very useful tool in the Clojurist’s toolbelt. I’ve become a convert and evangelist, as well as a happy user. Congrats to Michiel Borkent on a very interesting and powerful piece of open source software!

Note: Some of my understanding of Babashka solidified when hearing Michiel describe his project at the Clojure NYC virtual meetup. The meeting was recorded, so I’ll update this blog post when the talk is available.

Gustaf Erikson (gerikson)

June July 11, 2020 04:45 PM

Telemedecine is the only light in the darkness of COVID

This pic was supposed to be part of a pictorial depicting one day in my life during Corona, but I got bored of the concept. I just added it here so I don’t have an embarrasing gap for June 2020.

Jun 2019 | Jun 2018 | Jun 2017 | Jun 2016 | Jun 2015 | Jun 2014 | Jun 2013 | Jun 2012 | Jun 2011 | Jun 2010 | Jun 2009

Gonçalo Valério (dethos)

Why you shouldn’t remove your package from PyPI July 11, 2020 11:26 AM

Nowadays most software developed using the Python language relies on external packages (dependencies) to get the job done. Correctly managing this “supply-chain” ends up being very important and having a big impact on the end product.

As a developer you should be cautious about the dependencies you include on your project, as I explained in a previous post, but you are always dependent on the job done by the maintainers of those packages.

As a public package owner/maintainer, you also have to be aware that the code you write, your decisions and your actions will have an impact on the projects that depend directly or indirectly on your package.

With this small introduction we arrive to the topic of this post, which is “What to do as a maintainer when you no longer want to support a given package?” or ” How to properly rename my package?”.

In both of these situations you might think “I will start by removing the package from PyPI”, I hope the next lines will convince you that this is the worst you can do, for two reasons:

  • You will break the code or the build systems of all projects that depend on the current or past versions of your package.
  • You will free the namespace for others to use and if your package is popular enough this might become a juicy target for any malicious actor.

TLDR: your will screw your “users”.

The left-pad incident, while it didn’t happen in the python ecosystem, is a well known example of the first point and shows what happens when a popular package gets removed from the public index.

Malicious actors usually register packages using names that are similar to other popular packages with the hope that a user will end up installing them by mistake, something that already has been found multiple times on PyPI. Now imagine if that package name suddenly becomes available and is already trusted by other projects.

What should you do it then?

Just don’t delete the package.

I admit that in some rare occasions it might be required, but most of the time the best thing to do is to leave it there (specially for open-source ones).

Adding a warning to the code and informing the users in the README file that the package is no longer maintained or safe to use is also a nice thing to do.

A good example of this process being done properly was the renaming of model-mommy to model-bakery, as a user it was painless. Here’s an overview of the steps they took:

  1. A new source code repository was created with the same contents. (This step is optional)
  2. After doing the required changes a new package was uploaded to PyPI.
  3. Deprecation warnings were added to the old code, mentioning the new package.
  4. The documentation was updated mentioning the new package and making it clear the old package will no longer be maintained.
  5. A new release of the old package was created, so the user could see the deprecation warnings.
  6. All further development was done on the new package.
  7. The old code repository was archived.

So here is what is shown every time the test suite of an affected project is executed:

/lib/python3.7/site-packages/model_mommy/ DeprecationWarning: Important: model_mommy is no longer maintained. Please use model_bakery instead:

In the end, even though I didn’t update right away, everything kept working and I was constantly reminded that I needed to make the change.

July 10, 2020

Robin Schroer (sulami)

Keyboardio Atreus Review July 10, 2020 12:00 AM

I recently received my early bird Keybardio Atreus Promotional photo courtesy of Keyboardio

from the Kickstarter and have now been using it for about three weeks, so I am writing a review for folks considering buying one after release.

A Bit of History

Most of this is also outlined on the Atreus website, but here is the short version: my colleague Phil Hagelberg designed the original Atreus keyboard in 2014, and has been selling kits for self-assembly ever since.

In 2019 Keyboardio, the company which created the Model 01, got together with Phil to build a pre-assembled commercial version of the Atreus. Their Kickstarter ran earlier in 2020 and collected almost $400k.

Phil’s original 42-key version can be built with either a PCB or completely hand-wired, and uses a wooden, acrylic, or completely custom (e.g. 3D-printed) case.

Keyboardio split the two larger thumb keys into two regular size keys, bringing the total up to 44, and uses a PCB and Cherry MX-style switches mounted on an Aluminium plate inside a black ABS case.


At a first impression, it is incredibly smallDimensions taking from the product page: 24.3 × 10 × 2.8cm, 310g.

, noticeably smaller still than the small Apple Magic Keyboard. At the same time, it uses a regular key spacing, so once your hands are in place it does not feel cramped at all. On the contrary, every time I use a different keyboard now, I feel that half the keys are too far away to reach comfortably. It is also flat enough that I can use it without a wrist rest.

Mine has Kailh Speed Copper switches, which require 40g of force to actuate, with very early actuation points. They are somewhat comparable to Cherry MX Browns without the dead travel before the tactile bump. As mentioned above, the switches are mounted on an aluminium plate, and can be swapped without disassembly.

The early actuation point of the switches does require some getting used to, I keep experiencing some key chatter, especially on my weaker fingers, though Jesse from Keyboardio is working hard on alleviating that.

When it comes to noise, you can hear that it is a mechanical keyboard. Even with relatively quiet switches, the open construction means that the sound of the keys getting released is audible in most environments. I would hesitate to bring it to a public space, like a café or a co-working space. Open-office depends on the general noise level, and how tolerant your coworkers are, I have not had anyone complain about the sound level in video conferences.

The keycaps used are XDA-profileSymmetrical and the same height across the keyboard, like lower profile SDA. That means you can rearrange them between rows.

laser-engraved PBT of medium thickness. Apparently there have been a lot of issues with the durability of the labels, so the specifics of that might change. I personally have had a single key start to fade a bit over 3 weeks of use, but I do not actually care.

The keyboard is powered by the ATmega32U4, which is a pretty standard controller for a keyboard, it is also used in the Teensy 2.0 for example.

I would judge the overall build quality as good. While it does not feel like an ultra-premium product, there is nothing specific I can actually complain about, no rough edges or manufacturing artefacts.


Out of the box, the keyboard uses the open-source Kaleidoscope firmware, which can be configured with the (also open-source) Chrysalis graphical configurator. Chrysalis with my Layer 0

Supposedly it is also possible to use QMK, and Phil has recently written Menelaus, a firmware in Microscheme.

I have stuck with (pre-release versions of) Kaleidoscope so far, which has worked out fairly well. Chrysalis is an Electron app, and doing sweeping changes in it can be a bit cumbersome compared to using text-based, declarative configuration, but it does the job. Flashing a new version onto the keyboard only takes a few seconds. I also have to mention the extensive documentation available. Kaleidoscope has a rich plugin infrastructure, very little of which I actually use, but it does seem to rival QMK in flexibility.

I am using the Atreus with Colemak, the same layout I have been using for almost a decade now, and compared to trying the Ergodox,When I tried using an Ergodox for the first time, the ortholinear layout really threw me off, and I kept hitting right in between keys.

the switching was much smoother. I am mostly back to my regular typing speed of 80-90 WPM after three weeks, and I can still use a regular staggered layout keyboard without trouble.

The modifier keys at the bottom are unusual, but work for me. I use the three innermost keys with my thumbs, and the bottom edges by just pushing down with my palm. It does require some careful arrangement to avoid often having to press two modifiers on the same time at once.

With only 44 physical keys, the keyboard makes heavy use of layers, which can be temporarily shifted to when holding a key, or switched to permanently. By default the first extra layer has common special characters on the left half, and a numpad on the right, which works better than a regular keyboard for me.

The only problem I sometimes have is the lack of a status indicator. This means I have to keep track of the keyboard state in my head when switching layers. Not a big problem though.


My conclusion is quite simple: if you are in the market for a keyboard like this, this might be the keyboard for you. It does what it does well, and is much cheaper than anything comparable that does not require manual assembly. I personally enjoy the small form factor, the flexible (set of) firmware, and the RSI-friendly layout.

I also want to highlight the truly amazing effort Keyboardio puts into supporting their customers. You can browse the Kickstarter or their GitHub projects to see how much effort they put into this, and I have been in contact with Jesse myself while trying to debug a debouncing issue in the firmware. I am very happy to support them with my wallet.

July 06, 2020

Frederik Braun (freddyb)

Hardening Firefox against Injection Attacks – The Technical Details July 06, 2020 10:00 PM

This blog post has first appeared on the Mozilla Attack & Defense blog and was co-authored with Christoph Kerschbaumer and Tom Ritter

In a recent academic publication titled Hardening Firefox against Injection Attacks (to appear at SecWeb – Designing Security for the Web) we describe techniques which we have incorporated into Firefox …

Andreas Zwinkau (qznc)

Wardley Maps July 06, 2020 12:00 AM

A book which presents a map visualization for business strategy

Read full article!

July 05, 2020

Ponylang (SeanTAllen)

Last Week in Pony - July 5, 2020 July 05, 2020 10:40 PM

There is a new set of public Docker images for Pony with SSL system libraries installed. These will be replacing the previous “x86-64-unknown-linux-builder-with-ssl” image.

Derek Jones (derek-jones)

Algorithms are now commodities July 05, 2020 10:14 PM

When I first started writing software, developers had to implement most of the algorithms they used; yes, hardware vendors provided libraries, but the culture was one of self-reliance (except for maths functions, which were technical and complicated).

Developers read Donald Knuth’s The Art of Computer Programming, it was the reliable source for step-by-step algorithms. I vividly remember seeing a library copy of one volume, where somebody had carefully hand-written, in very tiny letters, an update to one algorithm, and glued it to the page over the previous text.

Algorithms were important because computers were not yet fast enough to solve common problems at an acceptable rate; developers knew the time taken to execute common instructions and instruction timings were a topic of social chit-chat amongst developers (along with the number of registers available on a given cpu). Memory capacity was often measured in kilobytes, every byte counted.

This was the age of the algorithm.

Open source commoditized algorithms, and computers got a lot faster with memory measured in megabytes and then gigabytes.

When it comes to algorithm implementation, developers are now spoilt for choice; why waste time implementing the ‘low’ level stuff when there were plenty of other problems waiting to be implemented.

Algorithms are now like the bolts in a bridge: very important, but nobody talks about them. Today developers talk about story points, features, business logic, etc. Given a well-defined problem, many are now likely to search for an existing package, rather than write code from scratch (I certainly work this way).

New algorithms are still being invented, and researchers continue to look for improvements to existing algorithms. This is a niche activity.

There are companies where algorithms are not commodities. Google operates on a scale where what appears to others as small improvements, can save the company millions (purely because a small percentage of a huge amount can be a lot). Some company’s core competency may include an algorithmic component (whose non-commodity nature gives the company its edge over the competition), with the non-core competency treating algorithms as a commodity.

Knuth’s The Art of Computer Programming played an important role in making viable algorithms generally available; while the volumes are frequently cited, I suspect they are rarely read (I have not taken any of my three volumes off the shelf, to read, for years).

A few years ago, I suddenly realised that I was working on a book about software engineering that not only did not contain an algorithms chapter, and the 103 uses of the word algorithm all refer to it as a concept.

Today, we are in the age of the ecosystem.

Algorithms have not yet completed their journey to obscurity, which has to wait until people can tell computers what they want and not be concerned about the implementation details (or genetic algorithm programming gets a lot better).

Patrick Louis (venam)

D-Bus and Polkit, No More Mysticism and Confusion July 05, 2020 09:00 PM

freedesktop logo

Dbus and Polkit are two technologies that emanate an aura of confusion. While their names are omnipresent in discussions, and the internet has its share of criticism and rants about them, not many have a grasp of what they actually do. In this article I’ll give an overview of these technologies.

D-Bus, or Desktop Bus, is often described as a software that allows other processes to communicate with one another, to perform inter-process communication (IPC). However, this term is generic and doesn’t convey what it is used for. Many technologies exist that can perform IPC, from plain socket, to messaging queue, so what differentiates D-Bus from them.

D-Bus can be considered a middleware, a software glue that sits in the middle to provide services to software through a sort of plugin/microkernel architecture. That’s what the bus metaphor represents, it replicates the functionality of hardware buses, with components attaching themselves to known interfaces that they implement, and providing a mean of communication between them. With D-Bus these can be either procedure calls aka methods or signals aka notifications.

While D-Bus does offer 1-to-1 and 1-to-many IPC, it’s more of a byproduct of its original purpose than a mean of efficient process to process data transfer — it isn’t meant to be fast. D-Bus emerges from the world of desktop environments where blocks are well known, and each implements a functionality that should be accessible from other processes if needed without having to reinvent the transfer mechanism for each and every software.
This is the problem it tackles: having components in a desktop environment that are distributed in many processes, each fulfilling a specific job. In such case, if a process implements the behavior needed, instead of reimplementing it, it can instead harness the feature already provided by that other process.

Its design is heavily influenced by Service Oriented Architectures (SOA), Enterprise Service Buses (ESB), and microkernel architectures.
A bus permits abstracting communication between software, replacing all direct contact, and only allowing them to happen on the bus instead.
Additionally, the SOA allows software to expose objects that have methods that can be called remotely, and also allows other software to subscribe/publish events happening in remote objects residing in other software.
Moreover, D-Bus provides an easy plug-and-play, a loose coupling, where any software could detach itself from the bus and allow another process to be plugged, containing objects that implement the same features the previous process implemented.
In sum, it’s an abstraction layer for functionalities that could be implemented by any software, a standardized way to create pluggable desktop components. This is what D-Bus is about, this is the role it plays, and it explains the difficulty in grasping the concepts that gave rise to it.

The big conceptual picture goes as follows.
We have a D-Bus daemon running at an address and services that implement well known behaviors. These services attach to the D-Bus daemon and the attachment edge has a name, a bus name.
Inside these services, there are objects that implement the well known behavior. These objects also have a path leading to them so that you can target which object within that service implements the specific interface needed.
Then, the interface methods and events can be called or registered on this object inside this service, connected to this bus name, from another service that requires the behavior implemented by that interface to be executed.

This is how these particular nested components interact with one another, and it gives rise to the following:

Address of D-Bus daemon ->
Bus Name that the service attached to ->
Path of the object within this service ->
Interface that this object implements ->
Method or Signal concrete implementation

Or in graphical form:

D-Bus ecosystem

Instead of having everyone talk to one another:

p2p interaction

Let’s take a method call example that shows these 3 required pieces of information.

org.gnome.SessionManager \
/org/gnome/SessionManager \

   boolean true

Here, we have the service bus name org.gnome.SessionManager, the object path /org/gnome/SessionManager, and the interface/method name org.gnome.SessionManager.CanShutdown, all separated by spaces. If the /org/gnome/SessionManager only implements a single interface then we could call it as such CanShutdown, but here it doesn’t.

Let’s dive deeper into the pieces we’ve mentioned. They are akin to the ones in an SOA ecosystem, but with the addition of the bus name, bus daemon, and the abstraction for the plug-and-play.

  • Objects

An object is an entity that resides in a process/service and that effectuates some work. It is identified by a path name. The path name is usually written, though not mandatory, in a namespace format where it is grouped and divided by slashes /, just like Unix file system path.

For example: /org/gnome/Nautilus/window/1.

Objects have methods and signals, methods take input and return output, while signals are events that processes can subscribe to.

  • Interfaces

These methods and signals are concrete implementations of interfaces, the same definition as in OOP.
As with OOP, interfaces are a group of abstractions that have to be defined in the object that implements them. The members, methods and signals, are also namespaced under this interface name.


member method=GetRunningApplications
absolute name of method=org.gnome.Shell.Introspect.GetRunningApplications

Some interfaces are commonly implemented by objects, such as the org.freedesktop.Introspectable interface, which, as the name implies, makes the object introspectable. It allows to query the object about its capabilities, features, and other interfaces it implements. This is a very useful feature because it allows discovery.
It’s also worth mentioning that dbus can be used in a generic way to set and get properties of services’ objects through the org.freedesktop.DBus.Properties interface.

Interfaces can be described as standard, and for documentation, in D-Bus XML configuration files so that other programmers can use the reference to implement them properly. These files can also be used to auto-generate classes from the XML, making it quicker to implement and less error-prone.
These files can usually be found under /usr/share/dbus-1/interfaces/. Our org.gnome.Shell.Introspect of earlier is there in the file org.gnome.Shell.Introspect.xml along with our method GetRunningApplications. Here’s an excerpt of the relevant section.

	@short_description: Retrieves the description of all running applications

	Each application is associated by an application ID. The details of
	each application consists of a varlist of keys and values. Available
	keys are listed below.

	'active-on-seats' - (as)   list of seats the application is active on
								(a seat only has at most one active
<method name="GetRunningApplications">
	<arg name="apps" direction="out" type="a{sa{sv}}" />

Notice the type= part, which describes the format of the output, we’ll come back to what this means in the message format section, but in short each letter represents a basic type. The out direction means that it’s the type of an output value of the method, similarly in is for method parameters. See the following example taken from org.gnome.Shell.Screenshot.xml.

	@x: the X coordinate of the area to capture
	@y: the Y coordinate of the area to capture
	@width: the width of the area to capture
	@height: the height of the area to capture
	@flash: whether to flash the area or not
	@filename: the filename for the screenshot
	@success: whether the screenshot was captured
	@filename_used: the file where the screenshot was saved

	Takes a screenshot of the passed in area and saves it
	in @filename as png image, it returns a boolean
	indicating whether the operation was successful or not.
	@filename can either be an absolute path or a basename, in
	which case the screenshot will be saved in the $XDG_PICTURES_DIR
	or the home directory if it doesn't exist. The filename used
	to save the screenshot will be returned in @filename_used.
<method name="ScreenshotArea">
	<arg type="i" direction="in" name="x"/>
	<arg type="i" direction="in" name="y"/>
	<arg type="i" direction="in" name="width"/>
	<arg type="i" direction="in" name="height"/>
	<arg type="b" direction="in" name="flash"/>
	<arg type="s" direction="in" name="filename"/>
	<arg type="b" direction="out" name="success"/>
	<arg type="s" direction="out" name="filename_used"/>
  • Proxies

Proxies are the nuts and bolts of an RPC ecosystem, they represent remote objects, along with their methods, in your native code as if they were local. Basically, these are wrappers to make it more simple to manipulate things on D-Bus programmatically instead of worrying about all the components we’ve mentioned above. Programming with proxies might look like this.

Proxy proxy = new Proxy(getBusConnection(), "/remote/object/path");
Object returnValue = proxy.MethodName(arg1, arg2);
  • Bus names

The bus name, or also sometimes called connection name, is the name of the connection that an application gets assigned when it connects to D-Bus. Because D-Bus is a bus architecture, it requires that each assigned name be unique, you can’t have two applications using the same bus name. Usually, it is the D-Bus daemon that generates this random unique value, one that begins with a colon by convention, however, applications may ask to own well-known names instead. These well-known names, as reverse domain names, are for cases when people want to agree on a standard unique application that should implement a certain behavior. Let’s say for instance a specification for a com.mycompany.TextEditor bus name, where the mandatory object path should be /com/mycompany/TextFileManager, and supporting interface org.freedesktop.FileHandler. This makes the desktop environment more predictable and stable. However, today this is still only a dream and has nothing to do with current desktop environment implementations.

  • Connection and address of D-Bus daemon

The D-Bus daemon is the core of D-Bus, it is what everything else attaches itself to. Thus, the address that the daemon uses and listens to should be well known to clients. The mean of communication can be varied from UNIX domain sockets to TCP/IP sockets if used remotely.
In normal scenarios, there are two daemons running, a system-wide daemon and a per-session daemon, one for system-level applications and one for session related applications such as desktop environments. The address of the session bus can be discovered by reading the environment variable $DBUS_SESSION_BUS_ADDRESS, while the address of the system D-Bus daemon is discovered by checking a predefined UNIX domain socket path, though it can be overridden by using another environment variable, namely $DBUS_SYSTEM_BUS_ADDRESS.
Keep in mind that it’s always possible to start private buses, private daemons for non-standard use.

  • Service

A service is the application daemon connected to a bus that provides some utility to clients via the objects it contains that implement some interfaces. Normally we talk of services when the bus name is well-known, as in not auto-generated but using a reverse domain name. Due to D-Bus nature, services are singleton and owner of the bus name, and thus are the only applications that can fulfill specific requests. If any other application wants to use the particular bus name they have to wait in a queue of aspiring owner until the first one relinquishes it.

Within the D-Bus ecosystem, you can request that the D-Bus daemon automatically start a program, if not already started, that provides a given service (well-known name) whenever it’s needed. We call this service activation. It’s quite convenient as you don’t have to remember what application does what, nor care if it’s already running, but instead send a generic request to D-Bus and rely on it to launch it.

To do this we have to define a service file in the /usr/share/dbus-1/services/ directory that describes what and how the service will run.
A simple example goes as follows.

[D-BUS Service]

You can also specify the user with which the command will be executed using a User= line, and even specify if it’s in relation with a systemd service using SystemdService=.

Additionally, if you are creating a full service, it’s a good practice to define its interfaces explicitly in the /usr/share/dbus-1/interfaces as we previously mentioned.

Now when calling the org.gnome.ServiceName, D-Bus will check to see if the service exists already on the bus, if not it will block the method call, search for the service in the directory, if it matches, start the service as specified to take ownership of the bus name, and finally continue with the method call. If there’s no service file, an error is returned. It’s possible programmatically to make such call asynchronous to avoid blocking.

This is actually a mechanism that systemd can use for service activation when the application acquires a name on dbus (Service Type=dbus). For example, polkit and wpa_supplicant. When the dbus daemon is started with --systemd-activation, as shown below, then systemd services can be started on the fly whenever they are needed. That’s also related to SystemdService= we previously mentioned, as both a systemd unit file and a dbus daemon service file are required in tandem.

dbus         498       1  0 Jun05 ?        00:01:41 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
vnm          810     795  0 Jun05 ?        00:00:19 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

And the systemd unit file for polkit.

Description=Authorization Manager

ExecStart=/usr/lib/polkit-1/polkitd --no-debug

Here’s an exploratory example of service activation.
Let’s say we found a service file for Cheese (A webcam app) in the /service directory that is called org.gnome.Cheese.service.

We have no clue what interfaces and methods it implements because its interfaces aren’t described in the /interfaces directory, so we send it any message.

$ dbus-send --session \
--dest=org.gnome.Cheese \
/ org.gnome.Cheese.nonexistent

If we now take a look at the processes, we can clearly see it has been started by the dbus daemon.

$ ps -ef | grep cheese
vnm        56841     716 11 20:53 ?        00:00:00 /usr/bin/cheese --gapplication-service
vnm        56852   56783  0 20:53 pts/4    00:00:00 grep -i cheese

Cheese probably implements introspect so let’s try to see which methods it has.

$ gdbus introspect --session \
--dest org.gnome.Cheese \
--object-path /org/gnome/Cheese | less

We can see that it implements the org.freedesktop.Application interface that is described here, but that I couldn’t find the interface description of in /usr/share/dbus-1/interfaces/. So let’s try to call one of them, the org.freedesktop.Application.Activate seems interesting, it should start the application for us.

$ gdbus call --session --dest org.gnome.Cheese \
--object-path /org/gnome/Cheese \
--method org.freedesktop.Application.Activate  '{}'

NB: I’m using gdbus instead of dbus-send because dbus-send has limitation with complex types such as (a{sv}), a dictionary of key with type “string” and value of type “variant”. We’ll explain the types in the next section.

And cheese will open.
So this call is based on pure service activation.

What kind of messages are sent, and what’s up with the type we mentioned.

Messages, the unit of data transfer in D-Bus, are composed of header and data. The header contains information regarding the sender, receiver, and the message type, while the data is the payload of the message.

The D-Bus message type, not to be confused with the type format of the data payload, could be either a signal (DBUS_MESSAGE_TYPE_SIGNAL), a method call(DBUS_MESSAGE_TYPE_SIGNAL), or an error (DBUS_MESSAGE_TYPE_ERROR).

D-Bus is fully typed and type-safe as far as the payload is concerned, that means the types are predefined and are checked to see if they fit the signatures.

The following types are available:

<contents>   ::= <item> | <container> [ <item> | <container>...]
<item>       ::= <type>:<value>
<container>  ::= <array> | <dict> | <variant>
<array>      ::= array:<type>:<value>[,<value>...]
<dict>       ::= dict:<type>:<type>:<key>,<value>[,<key>,<value>...]
<variant>    ::= variant:<type>:<value>
<type>       ::= string | int16 | uint16 | int32 | uint32 | int64 | uint64 | double | byte | boolean | objpath

These are what is represented in the previous example with the type= in the interface definition. Here are some descriptions.

b           ::= boolean
s           ::= string
i           ::= int
u           ::= uint
d           ::= double
o           ::= object path
v           ::= variant (could be different types)
a{keyvalue} ::= dictionary of key-value type
a(type)     ::= array of value of type

As was said, the actual method of transfer of the information isn’t mandated by the protocol, but it can usually be done locally via UNIX sockets, pipes, or via TCP/IP.

It wouldn’t be very secure to have anyone on the machine be able to send messages to the dbus daemon and do service activation, or call any and every method, some of them could be dealing with sensitive data and activities. It wouldn’t be very secure either to have this data sent in plain text.
On the transfer side, that is why D-Bus implements a simple protocol based on SASL profiles for authenticating one-to-one connections. For the authorization, the dbus daemon controls access to interfaces by a security system of policies.

The policies are read and represented in XML files that can be found in multiple places, including /usr/share/dbus-1/session.conf, /usr/share/dbus-1/system.conf/, /usr/share/dbus-1/session.d/*, and /usr/share/dbus-1/system.d/*.
These files mainly control which user can talk to which interface. If you are not able to talk with a D-Bus service or get an org.freedesktop.DBus.Error.AccessDeniederror, then it’s probably due to one of these files.

For example:

<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
	<policy user="vnm">
		<allow own="net.nixers"/>
		<allow send_destination="net.nixers"/>
		<allow send_interface="net.nixers.Blog" send_member="GetPosts"/>

In this example, the user “vnm” can:

  • Own the interface net.nixers
  • Send messages to the owner of the given service
  • Call GetPosts from interface net.nixers.Blog

If services need more granularity when it comes to permission, then polkit can be used instead.

There’s a lot more that can be configured in the dbus daemon, namely in the configuration files for the session wide daemon in /usr/share/dbus-1/session.conf, and the system wide daemon in /usr/share/dbus-1/system.conf. Such as the way it listens to connections, the limits regarding messages, and where they read other files.

So how do we integrate and harness dbus in our client or service programs.

libdbus schema

We do this using libraries, of course, which there are many. The most low-level one being libdbus, the reference implementation of the specification. However, it’s quite hard to use so people rely on other libraries such as GDBus (part of GLib in GNOME), QtDBus (part of Qt so KDE too), dbus-java, and sd-bus (which is part of systemd).
Some of these libraries offer the proxy capability we’ve talked, namely manipulating dbus objects as if they were local. They also could offer ways to generate classes in the programming language of choice by inputting an interface definition file (see gdbus-codegen and qdbusxml2cpp for an idea).

Let’s name a few projects that rely on D-Bus.

  • KDE: A desktop environment based on Qt
  • GNOME: A desktop environment based on gtk
  • Systemd: An init system
  • Bluez: A project adding Bluetooth support under Linux
  • Pidgin: An instant messaging client
  • Network-manager: A daemon to manage network interfaces
  • Modem-manager: A daemon to provide an API to dial with modems - works with Network-Manager
  • Connman: Same as Network-Manager but works with Ofono for modem
  • Ofono: A daemon that exposing features provided by telephony devices such as modems

One thing that is nice about D-Bus is that there is a lot of tooling to interact with it, it’s very exploratory.

Here’s a bunch of useful ones:

  • dbus-send: send messages to dbus
  • dbus-monitor: monitor all messages
  • gdbus: manipulate dbus with gtk
  • qdbus: manipulate dbus with qt
  • QDBusViewer: exploratory gui
  • D-Feet: exploratory gui

I’ll list some examples.

Monitor all the method calls in the org.freedesktop namespace.

$ dbus-monitor --session type=method_call \

For instance, we can debug what happens when we use the command line tool notify-send(1).

This is equivalent to this line of gdbus(1).

$ gdbus call --session --dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications \
--method org.freedesktop.Notifications.Notify \
my_app_name 42 \
gtk-dialog-info "The Summary" \
"Here's the body of the notification" '[]' '{}' 5000

Or as we’ve seen, we can use dbus-send(1), however it has some limitations with dictionaries and variant types. Here are some more examples of it.

$ dbus-send --system --print-reply \
--dest=org.freedesktop.systemd1 \
/org/freedesktop/systemd1/unit/apache2_2eservice \
org.freedesktop.DBus.Properties.Get \
string:'org.freedesktop.systemd1.Unit' \

$ dbus-send --system --print-reply --type=method_call \
--dest=org.freedesktop.systemd1 \
/org/freedesktop/systemd1 \
org.freedesktop.systemd1.Manager.GetUnit \

D-Feet QDBusViewer

D-Feet and QDBusViewer are GUI that are driven by the introspectability of objects. You can also introspect using gbus and qdbus.

Either through calling org.freedesktop.DBus.Introspectable.Introspect.

With gdbus:

$ gdbus call --session --dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications \
--method org.freedesktop.DBus.Introspectable.Introspect

With dbus-send:

$ dbus-send --session --print-reply \
--dest=org.freedesktop.Notifications \
/org/freedesktop/Notifications \

Or by using the introspect feature of the tool, here gdbus, which will output it in a fancy colored way:

$ gdbus introspect --session \
--dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications

D-Bus is not without limitations and critics. As we said in the introduction, it isn’t meant for high performance IPC, it’s meant for control, and not data transfer. So it’s fine to use it to activate a chat application, for instance, but not to have a whole media stream pass on it.
D-Bus has also been criticized as being bloated and over-engineered, though those claims are often unsubstantiated and only come from online rants. It remains that D-Bus is still heavily popular and that there’s no replacement that is a real contender.

Now, let’s turn our attention to Polkit.

Polkit, formerly PolicyKit, is a service running on dbus that offers clients a way to perform granular system-wide privilege authentication, something dbus default policies are not able to do, nor sudo.
Unlike sudo, that switches the user and grants permission to the whole process, polkit delimits distinct actions, categorizes users by group or name, and decides whether the action is allowed or not. This is all offered system-wide, so that dbus services can query polkit to know if clients have privileges or not.
In polkit parlance, we talk of MECHANISMS, privileged services, that offer actions to SUBJECTS, which are unprivileged programs.

The polkit authority is a system daemon, usually dbus service activated, named “polkitd”, and running as the polkitd user UID.

$ ps -ef | grep polkitd
polkitd   904  1  0 Jun05 ?  00:00:34 /usr/lib/polkit-1/polkitd --no-debug

The privileged services (MECHANISMS) can define a set of actions for which authentication is required. If another process wants to access the method of such privileged service, maybe through dbus method call, the privilege service will query polkit. Polkit will then consult two things, the action policy defined by that service and a set of programmatic rules that generally apply. If needed, polkit will initiate an authentication agent to verify that the user is who they say they are. Finally, polkit sends its result back to the privilege service and let it know if the user is allowed to perform the action or not.

In summary, the following definitions apply:

  • Subject - a user
  • Action - a privileged duty that (generally) requires some authentication.
  • Result - the action to take given a subject/action pair and a set of rules. This may be to continue, to deny, or to prompt for a password.
  • Rule - a piece of logic that maps a subject/action pair to a result.

And they materialize in these files:

  • /usr/share/polkit-1/actions - Default policies for each action. These tell polkit whether to allow, deny, or prompt for a password.
  • /etc/polkit-1/rules.d - User-supplied rules. These are JavaScript scripts.
  • /usr/share/polkit-1/rules.d - Distro-supplied rules. Do not change these because they will be overwritten by the next upgrade.

Which can be summarized in this picture:

polkit architecture

Thus, polkit works along a per-session authentication agent, usually started by the desktop environment. This is another service that is used whenever a user needs to be prompted for a password to prove its identity.
The polkit package contains a textual authentication agent called pkttyagent, which is used as a general fallback but lacks in features. I advise anyone that is trying the examples in this post to install a decent authentication agent instead.

Here’s a list of popular ones:

  • lxqt-policykit - which provides /usr/bin/lxqt-policykit-agent
  • lxsession - which provides /usr/bin/lxpolkit
  • mate-polkit - which provides /usr/lib/mate-polkit/polkit-mate-authentication-agent-1
  • polkit-efl - which provides /usr/bin/polkit-efl-authentication-agent-1
  • polkit-gnome - which provides /usr/lib/polkit-gnome/polkit-gnome-authentication-agent-1
  • polkit-kde-agent - which provides /usr/lib/polkit-kde-authentication-agent-1
  • ts-polkitagent - which provides /usr/lib/ts-polkitagent
  • xfce-polkit - which provides /usr/lib/xfce-polkit/xfce-polkit

Authentication agent

Services/mechanisms have to define the set of actions for which clients require authentication. This is done through defining a policy XML file in the /usr/share/polkit-1/actions/ directory. The actions are defined in a namespaced format, and there can be multiple ones per policy file.
A simple, grep '<action id' * | less in this directory should give an idea of the type of actions that are available. You can also list all the installed polkit actions, using the pkaction(1) command.

For example:

org.xfce.thunar.policy: <action id="org.xfce.thunar">
org.freedesktop.policykit.policy:  <action id="org.freedesktop.policykit.exec">

NB: File names aren’t required to be the same as the action id namespace.

This file defines metadata information for each action, such as the vendor, the vendor URL, the icon name, the message that will be displayed when requiring authentication in multiple languages, and the description. The important sections in the action element are the defaults and annotate elements.

The defaults element is the one that polkit inspects to know if a client is authorized or not. It is composed of 3 mandatory sub-elements: allow_any for authorization policy that applies to any client, allow_inactive for policy that apply to clients in inactive session on local console, and allow_active for client in the currently active session on local consoles.
These elements take as value one of the following:

  • no - Not authorized
  • yes - Authorized.
  • auth_self - The owner of the current session should authenticate (usually the user that logged in, your user password)
  • auth_admin - Authentication by the admin is required (root)
  • auth_self_keep - Same as auth_self but the authentication is kept for some time that is defined in polkit configurations.
  • auth_admin_keep - Same as auth_admin but also keeps it for some time

The annotate element is used to pass extra key-value pair to the action. There can be multiple key-value that are passed. Some annotations/key-values are well known, such as the org.freedesktop.policykit.exec.path which, if passed to the pkexec program that is shipped by default with polkit, will tell it how to execute a certain program.
Another defined annotation is the org.freedesktop.policykit.imply which will tell polkit that if a client was authorized for the action it should also be authorized for the action in the imply annotation.
One last interesting annotation is the org.freedesktop.policykit.owner, which will let polkitd know who has the right to interrogate it about whether other users are currently authorized to do certain actions or not.

Other than policy actions, polkit also offers a rule system that is applied every time it needs to resolve authentication. The rules are defined in two directories, /etc/polkit-1/rules.d/ and /usr/share/polkit-1/rules.d/. As users, we normally add custom rules to the /etc/ directory and leave the /usr/share/ for distro packages rules.
Rules within these files are defined in javascript and come with a preset of helper methods that live under the polkit object.

The polkit javascript object comes with the following methods, which are self-explanatory.

  • void addRule( polkit.Result function(action, subject) {...});
  • void addAdminRule( string[] function(action, subject) {...}); called when administrator authentication is required
  • void log( string message);
  • string spawn( string[] argv);

The polkit.Result object is defined as follows:

polkit.Result = {
    NO              : "no",
    YES             : "yes",
    AUTH_SELF       : "auth_self",
    AUTH_SELF_KEEP  : "auth_self_keep",
    AUTH_ADMIN      : "auth_admin",
    AUTH_ADMIN_KEEP : "auth_admin_keep",
    NOT_HANDLED     : null

Note that the rule files are processed in alphabetical order, and thus if a rule is processed before another and returns any value other than polkit.Result.NOT_HANDLED, for example polkit.Result.YES, then polkit won’t bother continuing processing the next files. Thus, file name convention does matter.

The functions polkit.addRule, and polkit.addAdminRule, have the same arguments, namely an action and a subject. Respectively being the action being requested, which has an id attribute, and a lookup() method to fetch annotations values, and the subject which has as attributes the pid, user, groups, seat, session, etc, and methods such as isInGroup, and isInNetGroup.

Here are some examples taken from the official documentation:

Log the action and subject whenever the action org.freedesktop.policykit.exec is requested.

polkit.addRule(function(action, subject) {
    if ( == "org.freedesktop.policykit.exec") {
        polkit.log("action=" + action);
        polkit.log("subject=" + subject);

Allow all users in the admin group to perform user administration without changing policy for other users.

polkit.addRule(function(action, subject) {
    if ( == "org.freedesktop.accounts.user-administration" &&
        subject.isInGroup("admin")) {
        return polkit.Result.YES;

Define administrative users to be the users in the wheel group:

polkit.addAdminRule(function(action, subject) {
    return ["unix-group:wheel"];

Run an external helper to determine if the current user may reboot the system:

polkit.addRule(function(action, subject) {
    if ("org.freedesktop.login1.reboot") == 0) {
        try {
            // user-may-reboot exits with success (exit code 0)
            // only if the passed username is authorized
            return polkit.Result.YES;
        } catch (error) {
            // Nope, but do allow admin authentication
            return polkit.Result.AUTH_ADMIN;

The following example shows how the authorization decision can depend on variables passed by the pkexec(1) mechanism:

polkit.addRule(function(action, subject) {
    if ( == "org.freedesktop.policykit.exec" &&
        action.lookup("program") == "/usr/bin/cat") {
        return polkit.Result.AUTH_ADMIN;

Keep in mind that polkit will track changes in both the policy and rules directories, so there’s no need to worry about restarting polkit, changes will appear immediately.

We’ve mentioned a tool called pkexec(1) that comes pre-installed along polkit. This program lets you execute a command as another user, by default executing it as root. It is a sort of sudo replacement but that may appear confusing to most users who have no idea about polkit. However, the integration with authentication agent is quite nice.

So how do we integrate and harness polkit in our subject and mechanism software. We do this using libraries, of course, which there is are many to integrate with different desktop environments.
The libpolkit-agent-1, or the libpolkit-gobject-1 (gtk), libraries are used by the mechanisms, and this is most of what is needed. The portion of code that requires authentication can be wrapped with a check on polkit.
For instance, the polkit_authority_check_authorization() is used to check whether a subject is authorized.

As for writing an authentication agent, it will have to implement the registration methods to be able to receive requests from polkit.

Remember, polkit is a dbus service, and thus all its interfaces are well known and can be introspected. That means that you can possibly interact with it directly through dbus instead of using a helper library.

Polkit also offers some excellent manpages that are extremely useful, be sure to check polkit(8), polkitd(8), pkcheck(1), pkaction(1), pkexec(1).

The following tools are of help:

  • polkit-explorer or polkitex - a GUI to inspect policy files
  • pkcreate - a WIP tool to easily create policy files, but it seems it is lacking
  • pkcheck - Check whether a subject has privileges or not
  • pkexec - Execute a command as another user

Let’s test through some examples.

First pkaction(1), to query the policy file.

$ pkaction -a org.xfce.thunar -v

  description:       Run Thunar as root
  message:           Authentication is required to run Thunar as root.
  vendor:            Thunar
  icon:              system-file-manager
  implicit any:      auth_self_keep
  implicit inactive: auth_self_keep
  implicit active:   auth_self_keep
  annotation:        org.freedesktop.policykit.exec.path -> /usr/bin/thunar
  annotation:        org.freedesktop.policykit.exec.allow_gui -> true

Compared to polkitex:

freedesktop logo

We can get the current shell PID.

$ ps
    PID TTY          TIME CMD
 421622 pts/21   00:00:00 zsh
 421624 pts/21   00:00:00 ps

And then give ourselves temporary privileges to org.freedesktop.systemd1.manage-units permission.

$ pkcheck --action-id 'org.freedesktop.systemd1.manage-units' --process 421622 -u
$ pkcheck --list-temp
authorization id: tmpauthz10
action:           org.freedesktop.systemd1.manage-units
subject:          unix-process:421622:195039910 (zsh)
obtained:         26 sec ago (Sun Jun 28 10:53:39 2020)
expires:          4 min 33 sec from now (Sun Jun 28 10:58:38 2020)

As you can see, if the auth_admin_keep or auth_self_keep are set, the authorization will be kept for a while and can be listed using pkcheck.

You can try to exec a process as another user, just like sudo:

$ pkexec /usr/bin/thunar

If you want to override the currently running authentication agent, you can test having pkttyagent running in another terminal passing it the -p argument for the process it will listen to.

# terminal 1
$ pkttyagent -p 423619
# terminal 2
$ pkcheck --action-id 'org.xfce.thunar' --process 423619 -u
# will display in terminal 1
==== AUTHENTICATING FOR org.xfce.thunar ====
Authentication is required to run Thunar as root.
Authenticating as: vnm

So this is it for polkit, but what’s the deal with consolekit and systemd logind, and what’s the relation with polkit.

Remember we’ve talked about sessions when discussing the <default> element of polkit policy files, this is where these two come in. Let’s quote again:

  • auth_self - The owner of the current session should authenticate (usually the user that logged in, your user password)
  • allow_active - for client in the currently active session on local consoles

The two programs consolekit and systemd logind have as purpose to be services on dbus that can be interrogated about the status of the current session, its users, its seats, its login. It can also be used to manage the session with methods for shutting down, suspending, restarting, and hibernating the machine.

$ loginctl show-session $XDG_SESSION_ID
Timestamp=Fri 2020-06-05 21:06:43 EEST

# in another terminal we monitor using
$ dbus-monitor --system
# and the output
method call time=1593360621.762509 sender=:1.59516 \
-> destination=org.freedesktop.login1 serial=2 \
path=/org/freedesktop/login1; \
interface=org.freedesktop.login1.Manager; \

method call time=1593360621.763069 sender=:1.59516 \
-> destination=org.freedesktop.login1 serial=3 \
path=/org/freedesktop/login1/session/_32; \
interface=org.freedesktop.DBus.Properties; \

As can be seen, this is done through the org.freedesktop.login1.Manager bus name.

And so, polkit uses data gathered from systemd logind or consolekit to create the 3 domain rules we’ve seen, the allow_any, allow_inactive, and allow_active. This is where these two interact with one another.
The following condition applies for the returned value of systemd logind:

  • allow_any mean any session (even remote sessions)
  • allow_inactive means Remote == false and Active == false
  • allow_active means Remote == false and Active == true

In conclusion, all these technologies, D-Bus, polkit, and systemd logind, are inherently intertwined, and this is as much a positive aspect as it is a fragile point of failure. They each complete one another but if one goes down, there could be issues echoing all across the system.
I hope this post has removed the mystification around them and helped anyone to understand what they stand for: Yet another glue in the desktop environments, similar to this post but solving another problem.


July 04, 2020

Jeff Carpenter (jeffcarp)

用20行Python构建Markov Chain语句生成器 July 04, 2020 06:44 PM

A bot who can write a long letter with ease, cannot write ill. —Jane Austen, Pride and Prejudice 这篇文章将引导您逐步学习如何使用Python从头开始编写马尔可夫链(Markov Chain),以生成好像一个真实的人写的英语的全新句子。 简·奥斯丁的《傲慢与偏见》(Pride and Prejudice by Jane Austen) 是我们用来构建马尔可夫链的文字。 Colab 上有一篇可运行的笔记本版本。 Read the English version of this post here. Setup 首先下载“傲慢与偏见”的全文。 # 下载Pride and Prejudice和并切断头. !curl | tail -n+32 > /content/pride-and-prejudice.txt # 预览文件. !head -n 10 /content/pride-and-prejudice.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 707k 100 707k 0 0 1132k 0 --:--:-- --:--:-- --:--:-- 1130k PRIDE AND PREJUDICE By Jane Austen Chapter 1 It is a truth universally acknowledged, that a single man in possession 添加一些必要的导入。

Luke Picciau (user545)

Buydisplay Epaper Screens July 04, 2020 08:46 AM

Epaper displays are one of the coolest accessories for electronics projects with their ability to consume 0 power while persisting an image and their beautful looks. Over the last few years they have become increasingly available to hobbiests and now it is possible to find all kinds of epaper displays at low prices with adapter boards making it trivial to integrate with arduino and raspberry pi based projects. I scanned ebay and found these small 2.

July 03, 2020

Unrelenting Technology (myfreeweb)

Wow, about a month ago Spot (ex-Spotinst), the service that can... July 03, 2020 12:36 AM

Wow, about a month ago Spot (ex-Spotinst), the service that can auto-restore an EC2 spot instance after it gets killed, fixed their arm64 support! (Used to be that it would always set the AMI’s “architecture” metadata to amd64, haha.)

And of course their support didn’t notify me that it was fixed , the service didn’t auto-notify me that an instance finally was successfully restored after months of trying and failing, AWS didn’t notify either (it probably can but I haven’t set anything up?), so I wasted a few bucks running a spare inaccessible clone server of my website. Oh well, at least now I can use a spot instance again without worrying about manual restore.

UPD: hmm, it still tried i386 on another restore! dang it.