Planet Crustaceans

This is a Planet instance for community feeds. To add/update an entry or otherwise improve things, fork this repo.

March 05, 2021

Sevan Janiyan (sevan)

LFS, round #5 March 05, 2021 05:35 PM

Up to this point I’ve been working with a chroot to build OS images from loop back mounted flat file which is then converted to the vmdk format for testing with virtualbox. I created packages for bpftrace and BCC, BCC was fairly trivial and the availability of a single archive which includes submodules as bcc-src-with-submodule.tar.gz …

March 04, 2021

Frederic Cambus (fcambus)

OpenBSD/loongson on the Lemote Fuloong March 04, 2021 11:10 PM

In my article about running OpenBSD/loongson on the Lemote Yeeloong back in 2016, I mentioned looking for a Fuloong. All hope seemed lost until the Summer of 2017, when a fellow OpenBSD developer was contacted by a generous user (Thanks again, Lars!) offering to donate two Lemote Fuloong machines, and I was lucky enough to get one of those units.

This machine uses the same CPU than the Yeeloong, a Loongson 2F which is a single-core MIPS-III 64-bit processor running at 800/900 MHz.

As hinted in my previous article, contrarily to the Yeeloong, the Fuloong is less strict with the type of RAM it accepts, and my device is happily running with a Kingston 2GB DDR2 SO-DIMM module (ASU256X64D2S800C6), replacing the original 512MB module.

Here is the result of a quick md5 -t benchmark:

MD5 time trial.  Processing 10000 10000-byte blocks...
Digest = 52e5f9c9e6f656f3e1800dfa5579d089
Time   = 1.726563 seconds
Speed  = 57918535.263411 bytes/second

For the record, LibreSSL speed benchmark results are available here.

System message buffer (dmesg output):

Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2021 OpenBSD. All rights reserved.

OpenBSD 6.9-beta (GENERIC) #74: Fri Feb 26 08:02:25 MST 2021
real mem = 2147483648 (2048MB)
avail mem = 2116452352 (2018MB)
random: boothowto does not indicate good seed
mainbus0 at root: Lemote Fuloong
cpu0 at mainbus0: STC Loongson2F CPU 797 MHz, STC Loongson2F FPU
cpu0: cache L1-I 64KB D 64KB 4 way, L2 512KB 4 way
bonito0 at mainbus0: memory and PCI-X controller, rev 1
pci0 at bonito0 bus 0
re0 at pci0 dev 6 function 0 "Realtek 8169" rev 0x10: RTL8169/8110SCd (0x1800), irq 4, address 00:23:9e:00:0f:71
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 2
sisfb0 at pci0 dev 8 function 0 "SiS 315 Pro VGA" rev 0x00: 640x400, 8bpp
wsdisplay0 at sisfb0 mux 1: console (std, vt100 emulation)
glxpcib0 at pci0 dev 14 function 0 "AMD CS5536 ISA" rev 0x03: rev 3, 32-bit 3579545Hz timer, watchdog, gpio, i2c
isa0 at glxpcib0
com0 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
com1 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
mcclock0 at isa0 port 0x70/2: mc146818 or compatible
gpio1 at glxpcib0: 32 pins
iic at glxpcib0 not configured
glxclk0 at glxpcib0: clock, prof
pciide0 at pci0 dev 14 function 2 "AMD CS5536 IDE" rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: <WDC WD1600BEVS-00VAT0>
wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
auglx0 at pci0 dev 14 function 3 "AMD CS5536 Audio" rev 0x01: isa irq 9, CS5536 AC97
ac97: codec id 0x414c4760 (Avance Logic ALC655 rev 0)
audio0 at auglx0
ohci0 at pci0 dev 14 function 4 "AMD CS5536 USB" rev 0x02: isa irq 11, version 1.0, legacy support
ehci0 at pci0 dev 14 function 5 "AMD CS5536 USB" rev 0x02: isa irq 11
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "AMD OHCI root hub" rev 1.00/1.00 addr 1
apm0 at mainbus0
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
scsibus1 at softraid0: 256 targets
pmon bootpath: bootduid=53153d3cd8ddc482
root on wd0a (53153d3cd8ddc482.a) swap on wd0b dump on wd0b

PCI device data:

# pcidump
Domain /dev/pci0:
 0:6:0: Realtek 8169
 0:8:0: SiS 315 Pro VGA
 0:14:0: AMD CS5536 ISA
 0:14:2: AMD CS5536 IDE
 0:14:3: AMD CS5536 Audio
 0:14:4: AMD CS5536 USB
 0:14:5: AMD CS5536 USB

Patrick Louis (venam)

Internet: Medium For Communication, Medium For Narrative Control — The Artifacts And Spaces: Defining and Understanding Propaganda, Influence, And Persuasion March 04, 2021 10:00 PM

Descartes compared the creation of pictures of memory in the brain with the traces left by needles in fabric

  • Internet: Medium For Communication, Medium For Narrative Control
  • Part 1 — The Artifacts and Spaces
  • Section 1 — Defining and Understanding Propaganda, Influence, And Persuasion
Table Of Content
  • Introduction
  • Part 1: What
    In this part we'll describe the important artifacts and places. Going over these essential, but basic, pieces is mandatory to understand how they come into play as tools.
  • Part 2: How
    In this part we'll go over how the previous elements are put into work by the different actors, who these actors are, what are their incentives, and the new dynamics.
    • New Economies: Information Economy, Attention Economy, Data Economy, Surveillance Economy
    • Internet, In Between The Corporate, Private, And Public Spheres
    • PSYOP, Narrative Warfare, And State Actors
  • Part 3: Why
    In this part we'll try to understand why we are prone to manipulation, why they work so effectively or not on us, and who is subject to them.
    • Cognitive biases
    • Criticism Bypass
    • Uncertainties and doubt
    • Anonymity
    • Information overload
    • Fake news
    • Personalization Of Information Targeting (Microtargeting)
  • Part 4: So What
    In this part we'll put forward the reasons why we should care about what is happening in the online sphere. Why it's important to pay attention to it and the effects it could have at the scale of societies, and individuals. This part will attempt to give the bigger picture of the situation.
    • Mass Control And Brainwashing?
    • Is Mass Hypnosis Possible
    • Information Paralysis And social Cooling (Nothing disappears)
    • Comparison with Dystopia
    • Real World Social Unrest
    • Uncertainty About Truth
    • Lack Of Trust (In Institutions And Others)
    • Detachment From Reality
    • Polarization
  • Part 5: Now What
    In this concluding part we'll go over the multiple solutions that have been proposed or tried to counter the negative aspects of the internet.
    • Free Market solutions
    • Education: Digital & Web Literacy
    • Legal Solutions
    • Digital Identity As Accountability
    • Technical Solutions
    • The "Back To Reality" Solutions
  • Communication
  • Influence
  • Persuasion
  • Propaganda
  • Types Of Propaganda
  • Administering
  • Reinforcement, Anchor, Hook, Imagery
  • Ethics And Issues Of Identification

Our journey begins with three terms: propaganda, influence, and persuasion. To most, they sound like synonyms but each convey a different concept. Like any form of communication the internet can be used as a channel for them. What does communication consist of?

The very broad domain of communication studies — with multiple sub-disciplines such as journalism, film critic, public relation, and political science — often use a simple reductionist model called the Shannon–Weaver model of communication.

Shannon-Weaver's Model Of Communication Diagram

The model has 7 different concepts: Information source, Encoder, Channel, Decoder, Destination, and Noise. In our case, the internet is used as the pipe that conducts the signal, the transmitter and receiver are the chosen form of media (video, text, audio), and the information source and destination are the people and algorithms at both ends.
This model is not complete as it does not take into consideration the personalities of the persons involved, nor context, history, time, or any other clues that make communication complex. Moreover, it is limited to one-to-one scenarios and doesn’t cover one-to-many.
Regardless of these flaws, the model gives a generic picture of what communication is: It is about sharing meaning — a message — through the exchange of information via symbols that have perceived meanings on both ends.
The Shannon-Weaver model is a great starting point to see what needs to be done so that the conduct becomes more reliable, reducing uncertainty. That is, we need to keep in mind the qualities of context, sender, intent, message, channel, audience, and response. Concretely, the more the exchange the less the ambiguity, both parties converging towards a common point, common interest, or focus.

All this to say that the internet is a medium for transporting information. Just like the printing press lead to novelties, change in thoughts, and brought doctrines to the public like never before, the internet has its own characteristics but can only be defined by what we use it for.
For now, let’s pause and go back to what propaganda, influence, and persuasion are.

Influence is a generic word to convey that someone’s decisions have been affected by something. Influence is passive, the person makes the decision themselves unconsciously. It can happen with or without any direct communication (verbal or nonverbal). It is a spontaneous mix of inner motivations and environment. Consequently, it is rarely, if ever, controlled by another entity’s will unless this entity has full control over the environment.
The nudge theory argues that an entity can influence people through positive or negative reinforcement, indirect suggestions. However, the long term effects of these suggestions are highly contested as they get mixed with other environmental factors like peer pressure and social norms.
An example of influence is product placement, where a product is desired based on its indirect association with a portrayed lifestyle that the viewer would, in theory, like to obtain.

Persuasion, on the other hand, is based on a two-way interaction. It is often used in negotiation when one party or many parties want to take over the other, imposing their point. The difference with negotiation is that the core of the message never changes, it is not about resolving a common conflict but gradually convincing someone.
Persuasion is thus transactional and voluntary. The party that wants their message to come across has to bridge through the individual psychological process, to exchange ideas freely and openly so that the other will be convinced. In persuasion there is a back and forth between persuader and persuadee, the persuader adapting, shaping, reinforcing, changing the message at each iteration to make it more appealing depending on the persuadee’s responses.
If the persuasion succeeds, it should result in a reaction such as “I never saw it that way before”.

Persuasion can be verbal or nonverbal. It can either look like a consensual dialogue to reach an understanding, or look like strategic information or awareness campaigns that appear one-way but are still using a feedback mechanism to hone the message. This former is called Organized Persuasive Communication (OPC), and refers to all organized persuasion activities (advertising, marketing, public relations, organizational communication), but the term can also refer to more manipulative means.
In both these cases there is a free flow of information and the persuadee is fully aware of what is happening.

People are reluctant to change and so to convince them the persuader has to shape the message in a relatable way to something the persuadee already believes, or use lies and tactics morphing the facts.
This means persuasion can take a manipulative form, which may or may not be consensual. For example when someone is persuaded under false pretences, incentivized via promises, provided extra benefits, or coerced through threats or actual infliction of costs (including withdrawal of benefits).

Propaganda goes in that manipulative direction. It is exclusively a one-to-many massive persuasive strategy that is deliberately trying to shape perceptions, cognitions, and the direct behavior to achieve the desired intent of the propagandist.

Historically, propaganda didn’t have the same meaning it has today. It meant “to propagate” or “to disseminate”, but the term has changed over time as it got attached with the dissemination of information by governments for particular purposes.
Purpose is what differentiate propaganda. It is always meant to be targeted at a specific group, category of people, or society with the goal to push them in the desired direction. This objective is often linked to institutional ideologies, that is why we call propaganda activated ideology.
This means it is inherently advantageous to the propagandists, and their ideological fellows, but not necessarily the recipients. In some cases, it is arguable, and depends on the viewer, if the message is propaganda or education.

Propaganda doesn’t shy away from using whatever means available to inculcate the message. It isn’t based on mutual understanding, omits or hides information, uses deception, uses manipulation, frames issues, coerces or deceptively coerces, distorts, lies, overemphasizes, and misdirects. Anything goes, as long as the goal is reached.
The message is crafted in a deliberate way to support a viewpoint, a narrative, a goal of an ideology.
This is why propagandist consultants are often called “spin doctors” or “spinmeisters”, reinterpreting the events under a particular perspective.

The corollary of propaganda is censorship. The entity that wants to control the perception of its ideology silences any message that doesn’t fit. It achieves the same purpose by, not filing people’s mind with approved information, but preventing them from being confronted with opposing points of view.

That is why propagandists require control of information flow so that they can manufacture it. This allows them to select the current agenda — telling the public what is important and what isn’t. Or to frame problems, influencing how the public pictures a situation, creating the images and opinions that go along the ideological goal. The propagandists try to control the media as a source of information distribution and to present distorted information from what appears to be credible sources.
Indeed, it is also imperative that the message should be packaged in ways that conceal its manipulative persuasive purpose, as we’ll see later.

Another interesting typical aspect of propaganda as communication is its heavy use of symbols and association. That is in direct relation to its institutional and ideological root. For instance: justice, liberty, devotion to a country, etc..
Symbolism is also heavily employed in advertising propaganda to sell products as ideologies and ways of living.

Wherever you look and ask, all domains of research and work that touch propaganda dismiss their association with it. Public relation dismisses it as consensual and not manipulative, and the same in the advertising industry. The studies of propaganda are filled with euphemism, diverting attention, using the techniques they are themselves describing.

The historian Zbyněk Zeman categorizes propaganda as either white, grey or black, but there are other categorizations too.

White propaganda is when the message is sent from an official source and the intent is more or less clear. For example, a message on TV by the president of a country.

Black propaganda is when the message conceals the source and intent by making it seem as if it emerged from the same people, group, or organization it is targeting. Obscuring the identity of the originator is done to make the message more credible, giving the impression it comes from the people it’s trying to discredit or manipulate. It is also used for diplomatic reasons to hide government involvement.
For the propagandist, this is the type that is safer and has greater persuasiveness.

Grey propaganda lives in between, it is a message coming from an ambiguous or non-disclosed source, and the intent isn’t known. This could be a rumor or a message based on unnamed insider’s info.

Counter-propaganda is the reaction to propaganda. It differs from it in how it is defensive, based on facts and truth, clear (understood by everyone), calling out logical fallacies. It is most often administered as a white propaganda.
Propaganda can only be countered effectively if the counter-propaganda acts quickly to overcome its damage. Propaganda uses an anchor, as we’ll see, and it needs to be dismantled as soon as possible. That means the older the propaganda the more it will be deeply rooted in people’s mind.

Self-propaganda is a type of propaganda where someone deliberately tries to convince themselves of something, even if irrational. This is akin to a Stockholm syndrome in some cases.

Sub-propaganda refers to anything that isn’t direct propaganda but to ways to keep the door open for future opportunities of propaganda.
Within this category we find promotional culture that is used in marketing and other domains. It consists of promoting commodities, celebrities, politics, civil society, social cultural traditions, economic ways, etc.. It is the salesmanship of a culture and thus cobbled with direct and indirect marketing means.

Computational propaganda is, according to Phil Howard, “the assemblage of social media platforms, autonomous agents, and big data tasked with the manipulation of public opinion.”
Computational propaganda can be directed by agendas or not. It could passively happen because of the incentive of the algorithms on the platforms — usually social media and thus views and clicks as we’ll see in other sections.
Propaganda, like any other type of communication, depends heavily on the environment and context in which it is moving.

Another related concept is persuasive technology, whenever a digital platform has full control of the environment of a user and attempts to change their attitudes through social influence and persuasion techniques. This consists in creating an environment that guides the person to perform certain actions and not others, inciting them through different means like social feedback, extrinsic or intrinsic motivations, and positive and negative reinforcement..

When it comes to administering persuasion and propaganda, the process is intuitive: Choose the target group and message to transmit. Choose the media form, the media shell, depending on how easily it can spread. Pack the message material in the shell, and send it towards either the mass/group, or towards specific individuals that could spread it further. If an opinion leader spread it then their credibility, expertise, trustworthiness, and attractiveness with respect to the target group should be analyzed to match the message.
Then ensues a series of iterations to polish the message through reinforcement, repetition, targeting opinion leaders through indoctrination, and the addition of new psychological techniques, manipulation of the message, emotions, and logical fallacies appropriate to the target audience.

If the propagandists’ or persuaders’ goal go to the contrary of the habits of the audience, it will be difficult to achieve and will require awareness and information related to behavioral change and predictors of behavior of the target audience.
That is, the message, to have its effect, will need to be adapted and employ tactics to resonate with the audience.

The message shouldn’t be imposed, instead the recipient should feel it flow effortlessly within themselves. Any propaganda shouldn’t look like propaganda to the target audience. The change will need to be self-imposed. This is similar to the nudge theory of reinforcement, it should appear as influence and not propaganda.
For that, the message should seem to give expression to the recipient’s own concerns, tensions, aspirations, and hopes. It must identify with them, this is what we call the anchor.

This anchor activates the ideology of the propagandist by tying down new attitudes or behaviors along with existing ones. It is the starting point for change because it represents something already widely accepted by potential persuadees — it is the path of least resistance.

These anchors can be categorized as either belief, value, attitude, behavior, and group norms.
An anchor of belief is one of the best anchor, and the stronger the belief of a receiver, the more likely they will attach to the propaganda goal.
A value is a kind of long-term belief that is not likely to change. It is hard to use as anchor because these are unwavering and so might conflict with the propagandist goal.
An attitude is the readiness to respond to an idea, object, or course of action. It is a predisposition that already resides within the receiver. The propagandist can activate the predisposition to further achieve the goal.
A behavior or behavioral pattern is a clear indication of how receivers are currently acting and often predicts future behaviors.
Group norms are beliefs, values, attitudes, and behaviors derived from the membership of the receiver in a group. The propagandist can use this tendency to conformity to make the message organically move laterally within the group.

A concept similar to the anchor is called the hook. It differs from the anchor in that it is based on a unique event or idea that is constantly repeated to amplify its effect.
The hook is generally a story which central element makes it newsworthy and evokes strong emotional responses, making it stick to memory. It could be fabricated, misleading and creating doubts, or real, but most importantly it should spread quickly and inundate all conversations on a topic. The hook is then attached with whatever piece of information it was packed with.

Hooks and anchors regularly use typical imagery to reach the ideological goal. For example, portraying the enemy as a baby killer.
The compaction of big ideas into small packages that can be distributed effortlessly is a must. This can be done through emotional wording, hook-like stories, symbolic images, and impactful videos. This is a form of portable philosophy created through repetition, emotions, and stories that lead to new meanings.
Notably, when using symbols, words, and images, that already live in the collective imagination, they will bring with them a baggage of feelings, concepts, and ideas. Think of all the recently rehashed tropes in media like movies, songs, images, and news. The use of strong wording will color the message, creating an association in the collective memory: genocide, patriotism, invader, victim, democracy, anarchy, enemy, confrontation, massacre, coup, radicalization, etc..

To spread further and faster, the anchors, hooks, and imageries must all interpolate and abuse our own confused cultural code. They work by exploiting issues we haven’t adequately addressed as a society. That is why people show strong responses to messages that are linked to injustice, outrage, partisanship à la we-vs-them, and controversies. We are more likely to react to things we don’t agree with.
From economic inequalities, race, nationalism, gender roles, sexual norms, etc.. It doesn’t matter which side of the issue people are on, the message will be spread and will provoke a reaction — if this is the goal of the propagandist. It could also be fabricated controversies, seeding doubt on purpose, making it seem like we’re still on the fence, ignorant (the domain of agnotology). The propagandist expertly triggers hope, satisfaction, pride, enthusiasm, but also fear, anxiety, anger, outrage, and disgust.

Let’s take a look at how Robert Cialdini details 6 principles of persuasion:

  • Reciprocity: People feel the need to give back to someone who provided a product, service, or information.
  • Scarcity: People want items that they believe are in short supply.
  • Authority: People are swayed by a credible expert on a particular topic.
  • Consistency: People strive to be consistent in their beliefs and behaviors.
  • Likability: People are influenced by those who are similar, complimentary, and cooperative.
  • Consensus: People tend to make choices that seem popular among others.

These go hand in hand with what we described so far and also reminds us of the conveyor of the packed message. We should pay attention to the credibility of the source, if the source is revealed.
Yet, a trick in propaganda is to keep confirming what you said, to never argue and assume the truth. Our mind is wired to accept and believe what others say as true, especially if not contradicting with any prior belief or value.

The rule of thumb is that the message shouldn’t create any cognitive dissonance but instead build a world that feels real to the recipient. To create a cohesive cosmology of meanings, to sell a coherent memorable story.
However, propaganda still denies the distance between the source and audience because it puts the ideology first.

We’ll dive in the psychological aspects in another part. For now, this should have created a strong foundation to understand influence, persuasion, and propaganda.

The exact identification of what can and should be considered propaganda is debatable and extremely hard to do. Some researchers have attempted to create a framework to discern manipulative persuasion from consensual informational and dialogical ones, but this is hardly applicable to the real world. And, as we said, propaganda could be considered education in the eye of some people.

When it comes to ethics, the subject of children indoctrination comes to mind. Children are the most vulnerable targets of propaganda as they haven’t built strong defenses against them. They are the least prepared for critical reasoning and contextual comprehension and thus are impressionable. Their mental world is in the process of being built and is easier to mold. There is no strong forces to oppose the propagandist goal.

On that note, it is subjective whether certain types of persuasions and propagandas are ethical or not. Propagandists are driven by ideologies, and consequently strongly believe that all means are necessary. As with anything related to ethics and moral, it differs based on personal judgement of the actions. For some people, it is ethically wrong to employ any subterfuge or tactics to reach a goal. For others, it is not about the tactics but whether the goal itself is right or wrong.
There’s a lot to be said on the black and white division of ethics, but we’ll leave it at that because the world is inherently nebulous.

This concludes our review of propaganda, influence, and persuasion. We’ve taken a look at the basic idea of communication, the packing of a message. We’ve then defined influence, persuasion, and propaganda, with propaganda being a type of persuasion that is one-to-many, manipulative, and driven by ideology. We’ve seen different types of propaganda and their definitions such as black, white, grey, self-, and others. Then we’ve seen how this package can be administered effectively through the person or channel carrying it and the actual shell of the message. This message will be successful if it doesn’t conflict with the receiver, if it is reinforced, uses an anchor, a hook, or strong imagery. The message will spread further and faster if it triggers strong emotional responses, especially something people feel they have to stand against. Finally, we’ve glanced at the issue of identification and the ethics concerning propaganda and the most vulnerable.

Table Of Content
  • Introduction
  • Part 1: What
    In this part we'll describe the important artifacts and places. Going over these essential, but basic, pieces is mandatory to understand how they come into play as tools.
  • Part 2: How
    In this part we'll go over how the previous elements are put into work by the different actors, who these actors are, what are their incentives, and the new dynamics.
    • New Economies: Information Economy, Attention Economy, Data Economy, Surveillance Economy
    • Internet, In Between The Corporate, Private, And Public Spheres
    • PSYOP, Narrative Warfare, And State Actors
  • Part 3: Why
    In this part we'll try to understand why we are prone to manipulation, why they work so effectively or not on us, and who is subject to them.
    • Cognitive biases
    • Criticism Bypass
    • Uncertainties and doubt
    • Anonymity
    • Information overload
    • Fake news
    • Personalization Of Information Targeting (Microtargeting)
  • Part 4: So What
    In this part we'll put forward the reasons why we should care about what is happening in the online sphere. Why it's important to pay attention to it and the effects it could have at the scale of societies, and individuals. This part will attempt to give the bigger picture of the situation.
    • Mass Control And Brainwashing?
    • Is Mass Hypnosis Possible
    • Information Paralysis And social Cooling (Nothing disappears)
    • Comparison with Dystopia
    • Real World Social Unrest
    • Uncertainty About Truth
    • Lack Of Trust (In Institutions And Others)
    • Detachment From Reality
    • Polarization
  • Part 5: Now What
    In this concluding part we'll go over the multiple solutions that have been proposed or tried to counter the negative aspects of the internet.
    • Free Market solutions
    • Education: Digital & Web Literacy
    • Legal Solutions
    • Digital Identity As Accountability
    • Technical Solutions
    • The "Back To Reality" Solutions


Attributions: René Descartes, Traité de l’homme

March 03, 2021

Robin Schroer (sulami)

The Shape of Tests March 03, 2021 12:00 AM

Many tests for an operation iterate over a mapping of different inputs to expected outcomes. By looking at the tests for a single operation as the same test with different inputs and output expectations, we can start to question how we should model those tests.

Tests as Matrices

By simply enumerating every possible combination of input values, we can construct a matrix with as many dimensions as inputs. We can then define the expected result for each set of inputs, and write a generalised test function:

∀ input∈{(a, b, …, n) | a∈A, b∈B, …, n∈N} f(input)

The number of possible test cases is thus:

|inputs| = |A × B × ⋯ × N|

As soon as our operation accepts an input that has more than a few possible values, that is any kind of number, string, or complex data structure, enumerating every possible input combination becomes impractical. Instead we can resort to groupings of values via properties.

This is a test matrix for division which uses properties instead of values, with the rows being dividends, and the columns divisors:

÷ Positive Zero Negative
Positive Positive undefined Negative
Zero Zero undefined Zero
Negative Negative undefined Positive

Matrices like this are necessarily exhaustive, and force us to think about the result for every possible combination of the input values we have included.

This is an implementation of the same property matrix in Clojure:

(ns division-matrix-test
  (:require [clojure.test :refer [deftest is testing]]
            [clojure.spec.alpha :as s]
            [clojure.spec.gen.alpha :as gen]))

(defn safe-divide
  "2-arity `/`, but returns `nil` on division by zero."
  [dividend divisor]
    (/ dividend divisor)
    (catch ArithmeticException _

(defmacro test-matrix
  "Generates tests for a two-dimensional test matrix."
  [test-fn matrix]
  (let [columns (rest (first matrix))
        rows (map first (rest matrix))
        combinations (for [[row idy] (map #(vector %1 %2) rows (range))
                           [col idx] (map #(vector %1 %2) columns (range))]
                       [row col (-> matrix
                                    (nth (inc idy))
                                    (nth (inc idx)))])]
    `(doseq [combination# [~@combinations]]
       (apply ~test-fn combination#))))

(deftest safe-division-test
  (let [gen-input
        (fn [kind]
          (case kind
            :pos (gen/generate (s/gen pos-int?))
            :neg (gen/generate (s/gen neg-int?))
            :zero 0))]


     (fn [x y result-pred]
       (let [dividend (gen-input x)
             divisor (gen-input y)]
         (is (result-pred (safe-divide dividend divisor))
             (format "Failed with: %s / %s" dividend divisor))))

     [[nil    :pos   :zero  :neg ]
      [:pos   pos?   nil?   neg? ]
      [:zero  zero?  nil?   zero?]
      [:neg   neg?   nil?   pos? ]])))

In this case we are testing a safe variant of the division function /, which returns nil if the divisor is zero. This simplifies the testing process, because we do not have to include any exception catching logic in our test function, or invent a notation to mean this set of inputs should result in a thrown exception.

It is worth noting that such a direct interpretation of a matrix is only possible in a language as malleable as Clojure. In other languages, we might have to resort to enumerating a set of (dividend divisor result) tuples, losing the guarantee of covering all possible combinations.

But even in Clojure, more than two dimensions in this matrix will quickly become unwieldy and hard to follow, and a tuple-based approach would scale better to larger numbers of input parameters.

Tests as Trees

Another way we could structure our tests is as a tree. A tree does not have to be exhaustive the same way a matrix has to be. We can omit certain combinations of inputs by pruning their branches. In this way we are implying that if a single input has a given value, it defines the result regardless of the other inputs’ values.

In the division example all branches with a divisor of zero could be collapsed into a single case, as the dividend does not matter in this case. This only works if the first level of branching describes the divisor, and the dividends are on the second level.

(ns division-tree-test
  (:require [clojure.test :refer [are deftest testing]]
            [clojure.spec.alpha :as s]
            [clojure.spec.gen.alpha :as gen]))

(deftest safe-division-test

  (testing "with a positive divisor"
    (let [divisor (gen/generate (s/gen pos-int?))]

      (testing "and a positive dividend"
        (let [dividend (gen/generate (s/gen pos-int?))]
          (is (pos? (safe-divide dividend divisor)))))

      (testing "and a zero dividend"
        (let [dividend 0]
          (is (zero? (safe-divide dividend divisor)))))

      (testing "and a negative dividend"
        (let [dividend (gen/generate (s/gen neg-int?))]
          (is (neg? (safe-divide dividend divisor)))))))

  (testing "with a divisor of zero"
    (let [dividend (gen/generate (s/gen int?))]
      (is (nil? (safe-divide dividend 0)))))

  (testing "with a negative divisor"
    (let [divisor (gen/generate (s/gen neg-int?))]

      (testing "and a positive dividend"
        (let [dividend (gen/generate (s/gen pos-int?))]
          (is (neg? (safe-divide dividend divisor)))))

      (testing "and a zero dividend"
        (let [dividend 0]
          (is (zero? (safe-divide dividend divisor)))))

      (testing "and a negative dividend"
        (let [dividend (gen/generate (s/gen neg-int?))]
          (is (pos? (safe-divide dividend divisor))))))))

This might look more verbose, but in exchange we get a unique label for every tree branch, which can improve readability. The nesting also naturally lends itself to lexical scoping, so we only have the values in scope which apply on a given branch.

A key advantage of the tree structure is flexibility. If one of the branches requires special code, we can confine it to that branch, avoiding complicating the remaining branches more than necessary.

Trees also scale better with larger numbers of inputs or options for inputs. A tree might grow overly wide or deep, but we can split it if that becomes a problem.

There is a downside to omitting branches though. If we change our safe-divide function to return different results depending on the dividend when the divisor is zero, our tests might still pass, depending on the specific inputs used, but we will lose test coverage for certain code paths. We have chosen to not test certain input combinations, and we need to be aware of this omission when we are changing the code under test.

Tests as Definitions

Considering the formula describing the generalised test function above, we could also consider translating this directly into code. This can work, but only if we can test results without re-implementing large parts of the code under test, otherwise we are overly coupling the tests to the code. In the division case, we can decide the sign of the result based on the signs of the inputs.

(ns division-spec-test
  (:require [clojure.test :refer [deftest is]]
            [clojure.spec.alpha :as s]
            [clojure.spec.test.alpha :as stest]))

(defn- check-safe-divide-result [{{:keys [dividend divisor]} :args
                                  ret :ret}]
    (zero? divisor) (nil? ret)

    (zero? dividend) (zero? ret)

    (or (and (pos? dividend) (pos? divisor))
        (and (neg? dividend) (neg? divisor)))
    (pos? ret)

    :else (neg? ret)))

(s/fdef safe-divide
  :args (s/cat :dividend number?
               :divisor number?)
  :ret (s/nilable number?)
  :fn check-safe-divide-result)

(deftest safe-divide-spec-test
  (let [check-result (stest/check `safe-divide)]
    (is (not check-result)
        (format "Failed with: %s"
                (-> check-result

This solution is specific to Clojure, though many other languages have property based testing tools that work similarly.

By adding a spec to our function, we can run a large number of different inputs against our function, and assert a property about the result based on the inputs. It will even shrink the inputs to find the simplest set of inputs to trigger a spec failure.

This means we do not have a programmer writing a matrix or a tree by hand anymore, which has some advantages. The main one being that a programmer might not consider all possible inputs.

Fail in safe-divide-spec-test
Failed with: {:args {:dividend ##NaN, :divisor 0}, :ret ##NaN}

Fail in safe-divide-spec-test
Failed with: {:args {:dividend 1, :divisor ##Inf}, :ret 0.0}

Fail in safe-divide-spec-test
Failed with: {:args {:dividend 6.812735744013041E-108, :divisor 2.7578261315509936E216}, :ret 0.0}


The optimal shape of a test depends mainly on the structure of the inputs to the operation we are testing, as well as its nature.

For pure functions which we expect to use widely and change rarely, property-based testing can be desirable to avoid unintended consequences. There is also a certain speed requirement for test shrinking to work effectively.

Operations with a small number of possible inputs can also be tested via test matrices, which have fewer limitations, but do not guarantee correctness, as only the programmer can assert the completeness of the matrix. They are easy to extend with additional values for parameters, but harder to extend with additional values. Their declarative nature can be useful for documentation purposes.

At the other end of the spectrum, tree-shaped tests are the most flexible, and scale best for larger operations with many inputs. If different branches require fundamentally different setup, test trees can isolate that complexity to where it is required. They also require the most care to keep tidy, and have a tendency to sprawl if unsupervised.

March 02, 2021

Gokberk Yaltirakli (gkbrk)

Emacs idle config March 02, 2021 09:00 PM

Due to some bad computer habits, I end up opening and closing editor instances a lot instead of starting my editor and using it for the duration of the project. In particularly bad cases, I end up closing my editor for every code change to run a compiler or check some files in the same terminal.

The proper solution to this problem is to get rid of bad habits and start using multiple windows like a modern human in 2021. But the easier solution is to make my editor startup time faster in order to mitigate some of the pain in using it.

This is not the fault of Emacs of course, as Emacs is expected to be running for at least the whole programming session. This doesn’t change the fact that I need to speed it up to make my life better though.


The first step to making a program faster is to run a profiler to see which parts are slowing it down. Since running emacs -Q (which makes Emacs start without loading the user config) is instant, this means my customization is causing Emacs to start up slower.

I used two methods for profiling. The first method is to run ProfileDotEmacs, which is a small Emacs Lisp script that prepares a report on how long each section of your .emacs file takes to execute. If all your configurations are done in your .emacs file, in a mostly top-level fashion, ProfileDotEmacs will probably be sufficient to know what is causing the slowdown.

The second method, which actually provided more useful information for me, is to wrap your config in profiler statements. This can be done like this.

(profiler-start 'cpu)

;; Your existing config


Now you will be able to run M-x profiler-report whenever you want in order to a nested profiler report that you can interactively drill down to.

Speeding everything up

The title is a little dishonest, as we won’t actually be improving the run-time of any code. What we will be doing is to take the blocking code that runs on startup, and make it run once the editor is idle for a few seconds. The most important thing is that the editor starts up quickly, displays the contents of the file instantly and allows me to move around. Less crucial settings and comfort plugins can be loaded once I stop pressing the keys to think for a few seconds.

The way to schedule a command to run when the editor is idle is to call run-with-idle-timer. You can choose how many idle seconds you want to wait before executing your callback and what arguments to pass to you callback. I ended up writing these two helper functions for loading a file after the editor becomes idle.

(defun leo/load-file (file)
  "Load an elisp file from .emacs.d"
  (load (concat "~/.emacs.d/" file)))

(defun leo/load-idle (sec file)
  "Load a file after N idle seconds"
  (run-with-idle-timer sec nil 'leo/load-file file))

Delaying everything by one second was enough for my needs, but you can choose a tiered approach where the essentials are loaded immediately and heavier, less-important plugins get loaded over time.

March 01, 2021

Gustaf Erikson (gerikson)

On the dates March 01, 2021 06:13 PM

When I started recording the dates when Sweden’s death toll from COVID-19 exceeded round thousands, I did not foresee the project continuing into the next year. But here we are.

I used to set the dates when I noticed Swedish media report them, but I’ve now gone to FHM’s stats page and got them from there.

This has led to some reshuffling - especially on Jan 6 2021 which now has its own tally.

This table has also been updated.

9,000 dead in Sweden March 01, 2021 03:15 PM

10,347 dead in Sweden March 01, 2021 03:14 PM

Also a coup or something in the US…

8,000 dead in Sweden March 01, 2021 03:13 PM

11,000 dead in Sweden March 01, 2021 03:12 PM

Simple chart showing the number of days between every 1 000 deaths, and the average deaths per day.

Update Monday, 2021-03-01: revised dates based on official stats from FHM.

DateDeaths DaysDeaths/day
2020-03-11 1 0 0.0
2020-04-09 1 000 29 34.4
2020-04-19 2 000 10 100.0
2020-05-02 3 000 13 76.9
2020-05-18 4 000 16 62.5
2020-06-11 5 000 24 41.7
2020-11-01 6 000 143 7.0
2020-11-27 7 000 26 38.5
2020-12-12 8 000 15 66.7
2020-12-23 9 000 11 90.9
2021-01-02 10 000 10 100.0
2021-01-14 11 000 12 83.3
2021-01-27 12 000 13 76.9

10,000 dead in Sweden March 01, 2021 03:10 PM

12,000 dead in Sweden March 01, 2021 12:28 PM

Mark J. Nelson (mjn)

Converting arbitrary base^exponent to scientific notation March 01, 2021 12:00 PM

I had the following question: Given a number in $base^{exponent}$ format, like $3^{500}$ or $7^{300}$, is there a way to convert it to scientific notation without explicitly constructing the (possibly very large) number as an intermediate step? We will assume here that both $base$ and $exponent$ are positive. For example, $7^{300}$ should be converted to $3.384\ldots \cdot 10^{253}$, also written 3.384...e+253.

This can be useful when looking at exponential growth with different bases. I happened to come across this particular question, right now, with game tree branching factors. If one game has an average branching factor of 3 and lasts 500 turns on average, is that more or fewer total states than one with an average branching factor of 7, which lasts 300 turns on average? We can eyeball the answer easily by converting to scientific notation, with its more familiar base 10: $3^{500} \approx 3.636 \cdot 10^{238}$, while $7^{300} \approx 3.384 \cdot 10^{253}$, so the latter is larger by 15 orders of magnitude.

Beyond games, exponential growth is recently in the news in epidemiology of course. Given an effective reproduction number $R$ and current cases $i$, there will be an expected $i \cdot R^t$ cases after $t$ additional time units pass. But how to easily compare different $R$ and $t$ combinations? Converting to scientific notation is one way.

I spent more time googling unsuccessfully for this answer than it ultimately took to just derive it from equations solvable using high school mathematics. Maybe I am bad at Google. But nonetheless, I thought I'd summarize the result.

There are two steps. First, change bases from $base$ to $10$. The resulting exponent is likely not to be an integer. Secondly, take this $10^x$ and convert it to scientific notation, meaning an expression of the form $m \cdot 10^e$, for some mantissa $m$, $0 m 10$ and integral exponent $e$.

Step 1. Changing bases

We want to turn $base^{exponent}$ into $10^x$. That means solving for $x$ in the equation $10^x = base^{exponent}$. For example, $10^x = 7^{300}$.

First, take the base-10 logarithm of both sides: $$\log_{10} 10^x = log_{10} base^{exponent}$$ Applying the definition of a logarithm as the inverse of exponentiation, this is equivalent to: $$x = \log_{10} base^{exponent}$$ This gives us a closed form for $x$, but still requires evaluating $base^{exponent}$, which is what we were hoping to avoid. But we can apply the logarithmic identity that $\log a^b = b \cdot \log a$, resulting in: $$x = exponent \cdot \log_{10} base$$

Now we have something that can be computed efficiently, at least in approximate terms. For example, we can convert $7^{300}$ to $10^x$ by computing $x = 300 \cdot \log_{10} 7$, yielding $10^{253.529\ldots}$.

This is almost scientific notation, except that scientific notation has integral exponents, not exponents like $253.529\ldots$.

Step 2. Round down the exponent and attach a mantissa

We now have a $10^x$ with probably non-integral $x$, such as the running example of $7^{300} \approx 10^{253.529}$.

The exponent for the purposes of scientific notation will simply be this $x$ rounded down to the nearest integer, denoted $\lfloor x \rfloor$. That will naturally result in a smaller number. So to approximate the original number, we need to multiply it by a mantissa $m$. This is the basic idea of scientific notation: a number between 0 and 10 times an integral power of 10. Put in the form of an equation, we want to solve for $m$ in: $$m \cdot 10^{\lfloor x \rfloor} = 10^x$$ This can be solved with high school mathematics actually a little more elementary than Step 1, though for some reason it took me longer to recall that fact: $$m = \frac{10^x}{10^{\lfloor x \rfloor}}$$ $$m = 10^{x - \lfloor x \rfloor}$$ Now we again have a closed-form equation that can be computed reasonably efficiently with floating-point arithmetic (not requiring a gigantic intermediate result).

To return again to our running example of $7^{300}$. In step 1 we determined this was equal to $10^{253.529\ldots}$. Now we turn it into scientific notation of the form $m \cdot 10^e$, where:

  • $e = \lfloor 253.529\ldots \rfloor = 253$
  • $m = 10^{253.529\ldots - e} = 10^{0.529\ldots} \approx 3.384$
Yielding $7^{300} \approx 3.384 \cdot 10^{253}$ or 3.384e253, without explicitly constructing the large intermediate number.

* * *

Translating the above to Python:
  import math

  base = 7
  exponent = 300

  raw_exponent = exponent * math.log10(base)

  sci_notation_exponent = math.floor(raw_exponent)
  sci_notation_mantissa = 10 ** (raw_exponent - sci_notation_exponent)

Gustaf Erikson (gerikson)

The Anarchy: The East India Company, Corporate Violence, and the Pillage of an Empire by William Dalrymple March 01, 2021 09:36 AM

A good history of the EIC. Dalrymple gives equal space to the “opposing” viewpoints, sidestepping the historiographical triumphalism of most English-language histories.

February March 01, 2021 09:01 AM


Last image(?) from this area of Stockholm from me, as we’ve moved our office.

Feb 2020 | Feb 2019 | Feb 2018 | Feb 2017 | Feb 2016 | Feb 2015 | Feb 2014 | Feb 2013 | Feb 2012 | Feb 2011

February 28, 2021

Derek Jones (derek-jones)

Fitting discontinuous data from disparate sources February 28, 2021 11:16 PM

Sorting and searching are probably the most widely performed operations in computing; they are extensively covered in volume 3 of The Art of Computer Programming. Algorithm performance is influence by the characteristics of the processor on which it runs, and the size of the processor cache(s) has a significant impact on performance.

A study by Khuong and Morin investigated the performance of various search algorithms on 46 different processors. Khuong The two authors kindly sent me a copy of the raw data; the study webpage includes lots of plots.

The performance comparison involved 46 processors (mostly Intel x86 compatible cpus, plus a few ARM cpus) times 3 array datatypes times 81 array sizes times 28 search algorithms. First a 32/64/128-bit array of unsigned integers containing N elements was initialized with known values. The benchmark iterated 2-million times around randomly selecting one of the known values, and then searching for it using the algorithm under test. The time taken to iterate 2-million times was recorded. This was repeated for the 81 values of N, up to 63,095,734, on each of the 46 processors.

The plot below shows the results of running each algorithm benchmarked (colored lines) on an Intel Atom D2700 @ 2.13GHz, for 32-bit array elements; the kink in the lines occur roughly at the point where the size of the array exceeds the cache size (all code+data):

Benchmark runtime at various array sizes, for each algorithm using a 32-bit datatype.

What is the most effective way of analyzing the measurements to produce consistent results?

One approach is to build two regression models, one for the measurements before the cache ‘kink’ and one for the measurements after this kink. By adding in a dummy variable at the kink-point, it is possible to merge these two models into one model. The problem with this approach is that the kink-point has to be chosen in advance. The plot shows that the performance kink occurs before the array size exceeds the cache size; other variables are using up some of the cache storage.

This approach requires fitting 46*3=138 models (I think the algorithm used can be integrated into the model).

If data from lots of processors is to be fitted, or the three datatypes handled, an automatic way of picking where the first regression model should end, and where the second regression model should start is needed.

Regression discontinuity design looks like it might be applicable; treating the point where the array size exceeds the cache size as the discontinuity. Traditionally discontinuity designs assume a sharp discontinuity, which is not the case for these benchmarks (R’s rdd package worked for one algorithm, one datatype running on one processor); the more recent continuity-based approach supports a transition interval before/after the discontinuity. The R package rdrobust supports a continued-based approach, but seems to expect the discontinuity to be a change of intercept, rather than a change of slope (or rather, I could not figure out how to get it to model a just change of slope; suggestions welcome).

Another approach is to use segmented regression, i.e., one of more distinct lines. The package segmented supports fitting this kind of model, and does estimate what they call the breakpoint (the user has to provide a first estimate).

I managed to fit a segmented model that included all the algorithms for 32-bit data, running on one processor (code+data). Looking at the fitted model I am not hopeful that adding data from more than one processor would produce something that contained useful information. I suspect that there are enough irregular behaviors in the benchmark runs to throw off fitting quality.

I’m always asking for more data, and now I have more data than I know how to analyze in a way that does not require me to build 100+ models :-(

Suggestions welcome.

February 27, 2021

Patrick Louis (venam)

Internet: Medium For Communication, Medium For Narrative Control — Introduction February 27, 2021 10:00 PM

Astronomicum Cæsareum,

  • Internet: Medium For Communication, Medium For Narrative Control
  • Introduction

To no one’s surprise, the internet has permeated all aspects of our lives. All other means of communication have dwindled in comparison, even though the technological behemoth is relatively young (around 50 years old as of 2021).
Worldwide, according to statistics from 2019, people spent an average of 2 and a half hours a day on social media. The top place goes to The Philippines with 3h53min per day.

This raises an iceberg of loaded questions.
At the top: How controlling is the internet today?
Or, asking in reverse: How does internet, as a new vector of communication, let different actors control us? How is the internet changing the way communication works and, indirectly, how we think?

These are broad questions and there are plenty of sub-questions underneath. Indeed, we keep hearing in the news about social media, extremism on the rise, and a salad of convoluted keywords thrown in articles trying to rationalize all this.
Is there really an information war?
What are the elements of it?
Who are the actors?
What’s the relation with social media?
Is it related to all the conspiracy theories we keep hearing about?
Is journalism dying?
What is the attention and data economy?
Are we all part of a giant hypnosis experiment?
More importantly, can we, and should we, do something about this?

Like many of you, I’ve asked myself these questions countless times, got buried in a mountain of headlines and news articles selling shock-value pseudo-deciphering. I temporarily felt clear-headed but quickly got back to a muddy state of comprehension.
Thus, I’ve set myself to consume all the literature I could find, peel it, parse it, organize it, categorize it, hone it, filter it, etc.. So that I could, at least partly, remove the haze surrounding the topic.

This series of articles is going to be my attempt at answering how the internet changes communication regarding narrative control. My own mini internet studies.

Here is the rough plan of our exploration. (might be subject to change as the series progresses)

Table Of Content
  • Introduction
  • Part 1: What
    In this part we'll describe the important artifacts and places. Going over these essential, but basic, pieces is mandatory to understand how they come into play as tools.
  • Part 2: How
    In this part we'll go over how the previous elements are put into work by the different actors, who these actors are, what are their incentives, and the new dynamics.
    • New Economies: Information Economy, Attention Economy, Data Economy, Surveillance Economy
    • Internet, In Between The Corporate, Private, And Public Spheres
    • PSYOP, Narrative Warfare, And State Actors
  • Part 3: Why
    In this part we'll try to understand why we are prone to manipulation, why they work so effectively or not on us, and who is subject to them.
    • Cognitive biases
    • Criticism Bypass
    • Uncertainties and doubt
    • Anonymity
    • Information overload
    • Fake news
    • Personalization Of Information Targeting (Microtargeting)
  • Part 4: So What
    In this part we'll put forward the reasons why we should care about what is happening in the online sphere. Why it's important to pay attention to it and the effects it could have at the scale of societies, and individuals. This part will attempt to give the bigger picture of the situation.
    • Mass Control And Brainwashing?
    • Is Mass Hypnosis Possible
    • Information Paralysis And social Cooling (Nothing disappears)
    • Comparison with Dystopia
    • Real World Social Unrest
    • Uncertainty About Truth
    • Lack Of Trust (In Institutions And Others)
    • Detachment From Reality
    • Polarization
  • Part 5: Now What
    In this concluding part we'll go over the multiple solutions that have been proposed or tried to counter the negative aspects of the internet.
    • Free Market solutions
    • Education: Digital & Web Literacy
    • Legal Solutions
    • Digital Identity As Accountability
    • Technical Solutions
    • The "Back To Reality" Solutions

Get ready because this is going to be a long ride!


  • Peter Apian, Astronomicum Caesareum (Ingoldstadt, 1540)

Gustaf Erikson (gerikson)

Advent of Code 2020 wrapup February 27, 2021 02:36 PM

This was a good year, in my opinion. It was a return to the 2015-2017 era, before the unfortunate experiment in collective puzzle-making in 2018, and the “armature” of the intcode puzzles in 2019. These were fun, and impressive, but if you missed out on doing them there was ~45% of the puzzles gone.

I managed to solve all puzzles before the end of the year, with a personal score of 48 out of 50, which is the best I’ve done since 2016.

AoC has grown hugely, even compared to last year. Here’s how many solved both parts of day 1 and day 25 (essentially solving every puzzle):

Year 2 stars day 1 2 stars day 25 Personal best ranking
for 2 stars
2015 50 358 3 521 422 (day 16)
2019 101 125 3 137 2 111 (day 23)
2020 148 468 11 285 5 040 (day 18)

Favorite puzzle was probably day 20, where we had to do some image recognition. I didn’t finish this on the day it was released but had some fun polishing it later.

February 26, 2021

Gokberk Yaltirakli (gkbrk)

Giving search engines a fair access to data February 26, 2021 09:00 PM

Search engines are difficult to create. They are even harder to improve to a point where you get good-enough results to keep regular users. This is why it’s so rare to see decent search engines that aren’t front-ends to the Bing or Google APIs.

This doesn’t mean there are none though. There are a small number of search engines with their own crawlers and search logic. More and more of them appear over time, but most of them cannot improve to the point of catching on. This is because of a common resource they lack: Data.

I am not talking about the slimy, personal kind of data that Google and friends like so much. What are people searching for right now? How many of those do I have good results for? How do people form their queries? Those are all difficult to answer and improve if people aren’t using you search engine. But no one will use your search engine unless you improve those. Great then, we are in a chicken-and-egg situation with no escape in sight.

The data problem

Before tackling the problem, let’s explore what the problem is in the first place. The first problem is the number of humans testing the result quality. In almost all cases, the creator(s) will be testing the results. Friends and family will try it a few times before going back to their default search engine. Social media and Hacker News will provide a swarm of clicks that only last for a few hours. This is not data, at least not enough data.

The second problem is a little trickier. Most people from our already small set of users will not provide data that is too valuable. Let’s break down our users into two segments, the creators and the people testing it out.

The creators are programmers who research very specific pieces of information all day. While this makes them very good at using search engines, it makes them very bad at testing the results. A programmer knows the exact query that will bring them results before typing it. This query is usually so good that even a bad algorithm will find the results they are looking for.

The people testing it out have a different problem. When put on the spot for testing a search engine, it is not easy to come up with queries for difficult questions. But those are the exact situations that need the help of a good search engine. You will only see these queries once people see you as reliable and pick you as their default engine.

The current situation

We can separate the current search ecosystem into three distinct groups.

Google gets almost all the search traffic. They have more than enough data, both personal and aggregated, to serve all their needs. Their monopoly on search, and their hostility for the open web makes this undesirable. A good solution will decrease the amount of data and they get, or give more data to their competitors.

DuckDuckGo and other API front-ends get a small chunk of search traffic. They are a good compromise between keeping the web open, and having a good search experience as a user. Most of these engines stay as API wrappers forever, so the data they get doesn’t improve them much.

Independent search engines have to make do with scraps. This makes it hard for them to become popular or earn money to support themselves.

How to improve the situation

In this post; I will propose different ways to improve this situation. Each have different trade-offs in user convenience and their usefulness to search engines. The best option would be to use and promote independant search engines. But for a lot of people, it is hard to commit to a sub-par experience even if it is the better long-term option. One can look at how people handle environmental issues to see a prime example of this effect.

Feeding data to search engines

With the first option, you keep using your favourite search engine.

An automated process will send a sample of the queries to different search engines. This way, the engines can get their hands on organic usage data before finding full-time users.

This approach makes automated requests without user interaction. Depending on their systems, this might mess up their data collection or make it difficult to serve real users. To be considerate to the service operators, we should make our requests with a User-Agent header that explains what is happening. This header will allow them to log our requests, handle them in a cheaper way, and to filter them out of the data for their real users.

Redirecting to different search engines

Another approach is to have each search go to a random search engine. Compared to the previous approach, this one is more beneficial to search engines and more incovenient for the user. The user won’t be able to reproduce searches as the same query will end up going to different search providers. Similarly, a smaller search engine might give unsatisfactory results to the user, forcing them to perform the same query multiple times.

This approach can be combined with the previous one as well. By putting a few “good” engines on the random redirect list and feeding data automatically to the rest of them, the downsides could be improved.

Embedding the results of multiple engines

There are already meta-search engines, like Searx, that satisfy some of these requirements. The problem with them though is, each data source they add clutters the main results and slows down search. I think if Searx adds the option of sending data to small search engines in the background without slowing down the main UI, it will be a really good solution to this.

One could use iframes to do this as well, but browsers not being “User Agents” any more, they allow the websites to control their embeddability.

Centralized vs. shared

Another trade-off to consider is where the automated query submission should happen. If you choose a centralized approach, you end up trusting a third-party with your search queries. If you instead choose to handle this yourself without a centralized third-party, you are now sending all your queries to all the other engines in an identifiable way.

There are a few ways to work around this. One of them is to have small public instances like the Fediverse. Everyone would pick who to trust with their queries, and even on small instances the queries would be mixed enough to protect identities. Another approach would be to keep the queries saved locally, and submit them using random proxies.


If there are solutions satifying this need in the future, I am planning to implement this. I just wanted to write this and put it on the internet in case other people are planning similar things. I already have the random search engine redirect working, but in my opinion the most important piece is the automatic data feeding.

The way I will most likely implement this is either a web endpoint that can be added to browsers as a search engine, which can be hosted locally or on a server, or a browser extension.

Gonçalo Valério (dethos)

Django Friday Tips: Subresource Integrity February 26, 2021 06:26 PM

As you might have guessed from the title, today’s tip is about how to add “Subresource integrity” (SRI) checks to your website’s static assets.

First lets see what SRI is. According to the Mozilla’s Developers Network:

Subresource Integrity (SRI) is a security feature that enables browsers to verify that resources they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched resource must match.

Source: MDN

So basically, if you don’t serve all your static assets and rely on any sort of external provider, you can force the browser to check that the delivered contents are exactly the ones you expect.

To trigger that behavior you just need to add the hash of the content to the integrity attribute of the <script> and/or <link> elements in question.

Something like this:

<script src="" integrity="sha256-KSlsysqp7TXtFo/FHjb1T9b425x3hrvzjMWaJyKbpcI=" crossorigin="anonymous"></script>

Using SRI in a Django project

This is all very nice but adding this info manually isn’t that fun or even practical, when your resources might change frequently or are built dynamically on each deployment.

To help with this task I recently found a little tool called django-sri that automates these steps for you (and is compatible with whitenoise if you happen to use it).

After the install, you just need to replace the {% static ... %} tags in your templates with the new one provided by this package ({% sri_static .. %}) and the integrity attribute will be automatically added.

February 22, 2021

Ponylang (SeanTAllen)

Last Week in Pony - February 22, 2021 February 22, 2021 03:11 PM

Ponycheck has become an official Ponylang project. @ergl has opened a new RFC related to FFI declarations. We also have notes from Sean T. Allen and Theo Butler on how to start contributing to Pony.

Marc Brooker (mjb)

Incident Response Isn't Enough February 22, 2021 12:00 AM

Incident Response Isn't Enough

Single points of failure become invisible.

Postmortems, COEs, incident reports. Whatever your organization calls them, when done right they are a popular and effective way of formalizing the process of digging into system failures, and driving change. The success of this approach has lead some to believe that postmortems are the best, or even only, way to improve the long-term availability of systems. Unfortunately, that isn't true. A good availability program requires deep insight into the design of the system.

To understand why, let's build a house, then a small community.

A house, with four things it needs to be a working home

Our house has four walls, a roof, and a few things it needs to be a habitable home. We've got a well for water, a field of corn for food, a wood pile for heat, and a septic tank. If any one of these things is not working, let's say that the house is unavailable. Our goal is to build many houses, and make sure they are unavailable for as little of the time as possible.

When we want to build a second house, we're faced with a choice. The simple approach is just to stamp out a second copy of the entire house, with it's own field, wood, well, and tank. That approach is great: the failure of the two houses is completely independent, and availability is very easy to reason about.

Two houses, with full redundancy

As we scale this approach up, however, we're met with the economic pressure to share components. This makes a lot of sense: wells are expensive to drill, and don't break down often, so sharing one between many houses could save the home owners a lot of money. Not only does sharing a well reduce construction costs, but thanks to the averaging effect of adding the demand of multiple houses together, reduces the peak-to-average ratio of water demand. That improves ongoing economics, too.

Five houses, sharing a well

In exchange for the improved economics, we've bought ourselves a potential problem. The failure of the well will cause all the houses in our community to become unavailable. The well has high blast radius. Mitigating that is well-trodden technical ground, but there's a second-order organizational and cultural effect worth paying attention to.

Every week, our community's maintenance folks get together and talk about problems that occurred during the week. Dead corn, full tanks, empty woodpiles, etc. They're great people with good intentions, so for each of these issues they carefully draw up plans to prevent recurrence of the issue, and invest the right amount in following up on those issues. They invest in the most urgent issues, and talk a lot about the most common issues. The community grows, and the number of issues grows. The system of reacting to them scales nicely.

Everything is great until the well breaks. The community is without water, and everybody is mad at the maintenance staff. They'd hardly done any maintenance on the well all year! It wasn't being improved! They spent all their attention elsewhere! Why?

The problem here is simple. With 100 houses in the community, there were 100 fields, 100 tanks, 100 piles, and one well. The well was only responsible for 1 in every 301 issues, just 0.33%. So, naturally, the frequency-based maintenance plan spent just 0.33% of the maintenance effort on it. Over time, with so little maintenance, it got a little creaky, but was still only a tiny part of the overall set of problems.

Plot showing how the percentage of action items related to the well drops with scale

This is one major problem with driving any availability program only from postmortems. It feels like a data-driven approach, but tends to be biased in exactly the ways we don't want a data-driven approach to be biased. As a start, the frequency measurement needs to be weighted based on impact. That doesn't solve the problem. The people making decisions are human, and humans are bad at making decisions. One way we're bad at decisions is called the Availability Heuristic: We tend to place more importance on things we can remember easily. Like those empty wood piles we talk about every week, and not the well issue from two years ago. Fixing this requires that an availability program takes risk into account, not only in how we measure, but also in how often we talk about issues.

It's very easy to forget about your single point of failure. After all, there's just one.

February 21, 2021

Derek Jones (derek-jones)

Research software code is likely to remain a tangled mess February 21, 2021 11:32 PM

Research software (i.e., software written to support research in engineering or the sciences) is usually a tangled mess of spaghetti code that only the author knows how to use. Very occasionally I encounter well organized research software that can be used without having an email conversation with the author (who has invariably spent years iterating through many versions).

Spaghetti code is not unique to academia, there is plenty to be found in industry.

Structural differences between academia and industry make it likely that research software will always be a tangled mess, only usable by the person who wrote it. These structural differences include:

  • writing software is a low status academic activity; it is a low status activity in some companies, but those involved don’t commonly have other higher status tasks available to work on. Why would a researcher want to invest in becoming proficient in a low status activity? Why would the principal investigator spend lots of their grant money hiring a proficient developer to work on a low status activity?

    I think the lack of status is rooted in researchers’ lack of appreciation of the effort and skill needed to become a proficient developer of software. Software differs from that other essential tool, mathematics, in that most researchers have spent many years studying mathematics and understand that effort/skill is needed to be able to use it.

    Academic performance is often measured using citations, and there is a growing move towards citing software,

  • many of those writing software know very little about how to do it, and don’t have daily contact with people who do. Recent graduates are the pool from which many new researchers are drawn. People in industry are intimately familiar with the software development skills of recent graduates, i.e., the majority are essentially beginners; most developers in industry were once recent graduates, and the stream of new employees reminds them of the skill level of such people. Academics see a constant stream of people new to software development, this group forms the norm they have to work within, and many don’t appreciate the skill gulf that exists between a recent graduate and an experienced software developer,
  • paid a lot less. The handful of very competent software developers I know working in engineering/scientific research are doing it for their love of the engineering/scientific field in which they are active. Take this love away, and they will find that not only does industry pay better, but it also provides lots of interesting projects for them to work on (academics often have the idea that all work in industry is dull).

    I have met people who have taken jobs writing research software to learn about software development, to make themselves more employable outside academia.

Does it matter that the source code of research software is a tangled mess?

The author of a published paper is supposed to provide enough information to enable their work to be reproduced. It is very unlikely that I would be able to reproduce the results in a chemistry or genetics paper, because I don’t know enough about the subject, i.e., I am not skilled in the art. Given a tangled mess of source code, I think I could reproduce the results in the associated paper (assuming the author was shipping the code associated with the paper; I have encountered cases where this was not true). If the code failed to build correctly, I could figure out (eventually) what needed to be fixed. I think people have an unrealistic expectation that research code should just build out of the box. It takes a lot of work by a skilled person to create to build portable software that just builds.

Is it really cost-effective to insist on even a medium-degree of buildability for research software?

I suspect that the lifetime of source code used in research is just as short and lonely as it is in other domains. One study of 214 packages associated with papers published between 2001-2015 found that 73% had not been updated since publication.

I would argue that a more useful investment would be in testing that the software behaves as expected. Many researchers I have spoken to have not appreciated the importance of testing. A common misconception is that because the mathematics is correct, the software must be correct (completely ignoring the possibility of silly coding mistakes, which everybody makes). Commercial software has the benefit of user feedback, for detecting some incorrect failures. Research software may only ever have one user.

Research software engineer is the fancy title now being applied to people who write the software used in research. Originally this struck me as an example of what companies do when they cannot pay people more, they give them a fancy title. Recently the Society of Research Software Engineering was setup. This society could certainly help with training, but I don’t see it making much difference with regard status and salary.

Carlos Fenollosa (carlesfe)

Whatever Clubhouse is, they are onto something February 21, 2021 11:35 AM

I've been following Clubhouse for a few weeks. As a podcaster, it piqued my interest. So it's like podcasts, but live?

The official slogan is drop-in audio chat. But that's not good. It only makes sense once you've used the app, and it doesn't describe the whole thing.

For me, the perfect definition is: it's Twitch for audio. But then, you need to know what Twitch is.

Yesterday I received an invitation and finally got to try it first hand. And I think that Clubhouse is onto something.

Radio vs Podcasts

Everybody knows radio. Even during this Internet revolution, it still has survived. Why? Because it's convenient. You tune in to some station and listen to music or people talking. It requires zero effort.

Radio has two problems: the fact that it's live, and the selection of topics.

Nowadays it's easy to download aired shows, so if you really like some program but you missed it when it was live, just go to their website and download the mp3 file.

However, the selection of topics still is an issue. Due to the fact that a station is a business, and that its model is airing ads, it requires volume. Therefore most radio stations produce mainstram content.

With the coming of the internet, a few nerds started using a new technology called Podcasts. You could record any audio content with a 1€ microphone and publish it on the internet.

Even though podcasts are naturally asynchronous, many shows air live too. Some listeners can listen to the stream, but most of them just download the audio file later.

Publicly searchable podcast directories aggregate both amateur and professional audios. Thanks to that, we have reached this point where anybody in the world has access to an ocean of audio content about any topic, either mainstream or niche.

Enter Clubhouse

What Twitch did to Youtube, Clubhouse has done to podcasts. For the sake of this explanation, let's ignore that podcasts are an open ecosystem and Youtube is proprietary.

Youtube is a video discovery platform. It has some tools to livestream, but it's not their main focus. Twitch has a much better product (and ToS) for livestreamers and their audience.

Want to watch somebody playing Minecraft? Open Twitch, search for Minecraft, and boom! hundreds of streams right there. Join one, chat with the community, and if you're lucky the streamer may shout out to you.

You can't do that with podcasts.

First of all, there can be some interactivity by combining an Icecast stream with an IRC channel, but it is not a good system.

Second, live podcasts are not aggregated anywhere. It is just impossible to search for "strategies to control your stress during covid-19" and find live shows.

So, if only as a directory of live audio content, Clubhouse has future.

But it is not only that. The product is very well thought and lets the audience participate, with audio.

A naive approach would have been to include a text chat on top of the audio stream. That would replicate the current solution on an integrated app. Okay, not bad.

However, the Clubhouse team spent some time thinking about the use case for audio streaming, which is not the same as for video streaming, nor public chat rooms.

Most of us listen to audio while we are doing other tasks and most of the times our hands are busy. This is why people jokingly call it the Airpods social network. You can participate while being away from a phone or computer.

In Clubhouse, you can tap a button to "raise your hand", and the moderators may "unmute" you. Then you can talk to the rest of the audience. Of course, not all show formats allow for that, but the option is there.

Being able to talk to your idols or even talk to the community of fans is very powerful. My first experience with Clubhouse was moving. I was listening to a concert and after the show all the listeners gathered up to talk about their experience and to have a chat with the band. Everybody agreed that with Clubhouse you can feel that there's people at the other end. Not only the speakers, but also the audience.

You don't get that with podcasts, even with live ones with a chat room.

A new category

Clubhouse has definitely invented a new category which combines the best of radio and the best of podcasts.

The product implements a selection of novel features which, when brought together, create an exciting and very addictive experience:

  • Directory of live audio streams ("rooms") about any imaginable topic
  • You can quickly drop in any room, listen for a few minutes, and jump to another one
  • The audience can participate via audio, which creates a great sense of community
  • Basic tools to follow people and interests, and get notified when they live stream
  • Of course, streamers may record the audio and publish it afterwards, so it's trivial to use Clubhouse in combination with the current podcasting ecosystem.

If you're in the podcasting community you should try to find an invitation. It is the real deal.

Tags: internet, podcasting

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Pepijn de Vos (pepijndevos)

Switching Continuously Variable Transmission February 21, 2021 12:00 AM

What if you took a boost converter and converted it to the rotational mechanical domain? Switching CVT!

At the University of Twente, they teach Bond Graphs, a modelling system for multi-domain systems that is just perfect for this job. Unlike domaing-specific systems or block diagrams, Bond Graphs model a system as abstract connections of power. Power is what you get when you multiply an effort with a flow. The two examples we’re interested is voltage × current and force × velocity, or to be exact, angular momentum × angular velocity.

Here is a schematic of a boost converter (source). It goes from a high voltage (effort, force) to a low voltage, but from a high current (flow, velocity) to a low current. It works by charging the inductor by shorting it to ground, and then discharging it via the diode into the capacitor.

boost converter

The classic example of model equivalence is that an electrical inductor-capacitor-resistor system behaves equivalent to a mechanical mass-spring-damper system. In the rotational domain, the equivalent of a switch is a clutch, and the equivalent of a diode is a ratchet. So we have all we need to convert the system! Step one is making the bond graph from the electrical system.

boost converter bond graph

Quick Bond Graph primer if you’re too lazy to read the Wikipedia page. Se is a source of effort. R, I, and C are generalized resistance, inertance, and compliance. mR is a modulated resistance I used for the switch/clutch. D is a diode/ratchet that I just made up. 0 junctions are sum of flows, equal effort. 1 junctions are sum of effort, equal flow. An ideal electrical net has equal voltage (effort), and a sum of currents, but a mechanical joint has an equal velocity (flow), but a sum of forces. With that in mind, we can convert the bond graph to the mechanical system.

mechanical boost converter

I’m not sure if those are even remotely sane mechanical symbols, so I added labels just in case. The motor spins up a flywheel, and then when the clutch engages it winds up the spring. Then when the clutch is disengaged, the ratched keeps the spring wound up, driving the output while the motor can once more spin up the flywheel.

It works exactly analogous to the boost converter, and also suffers from the same problems. Most ciritically, switching/clutching losses. I imagine applying PWM to your clutch will at best wear it down quickly, and maybe just make it go up in smoke. Like with a MOSFET, the transition period where there is a nonzero effort and flow on the clutch, there is power loss and heat.

Anyway, I decided to build it in LEGO to see if it’d work. I used a high-speed ungeared motor that can drive absolutely no load at all, and connected it with a 1:1 gear ratio to the wheels with only a flywheel, clutch, ratchet, and spring inbetween. This proves that there is actually power conversion going on!

lego mechanical boost converter

If you get rich making cars with this CVT system, please spare me a few coins. If you burn out your clutch… I told you so ;)

February 18, 2021

Pete Corey (petecorey)

Genuary 2021 February 18, 2021 12:00 AM

I didn’t participate in Genuary this year, but I was inspired by a few of the submissions I saw to hack together my own sketches. Here’s what I came up with.

I was originally inspired by this Reddit post on expanding circles, so I recreated it and added some extra layers of depth. My kingdom for a plotter and a mechanical pencil:

From there, I thickened the stroke width of each circle, and colored each pixel based on the number of circle intersections there (using 2D SDF to check for intersections, and the cubehelix algorithm for coloring). There’s some really cool kaleidoscope vibes in some of these variations:

The SDF technique caught my imagination, so I spent some more time playing with using SDF and cubehelix to render simple circles:

This post inspired me to play with turmites a bit. The &lbrace&lbrace&lbrace1, 2, 1}, &lbrace1, 8, 1}}, &lbrace&lbrace1, 2, 1}, &lbrace0, 2, 0}}} turmite is especially cool. Coloring it based on the number of visits to each cell, and removing the “state lines” shows some interesting structures:

While I didn’t officially participate, I had a lot of fun with Genuary this year.

February 17, 2021

Pages From The Fire (kghose)

Lenovo Flex 5 + Win 10: A lament February 17, 2021 10:48 AM

I normally work on a (16GB + SSD) Mac Book Pro. This machine costs around $1700 in the configuration I have it in. I do some writing on the side (as you may have guessed) and for a for a variety of reasons I decided to do my writing on a separate laptop. I use… Read More Lenovo Flex 5 + Win 10: A lament

February 15, 2021

Ponylang (SeanTAllen)

Last Week in Pony - February 14, 2021 February 15, 2021 02:47 AM

The supported version of FreeBSD is moving from 12.1 to 12.2. The Apple M1 support team has an initial report. The documentation site website,, is being shut down. The default branch renaming is underway. Interested in contributing to Corral or other Pony tools?

Andreas Zwinkau (qznc)

Software Architecture: How You Make Them Care February 15, 2021 12:00 AM

Software architects must code, talk business, and tell stories.

Read full article!

February 14, 2021

Derek Jones (derek-jones)

Performance impact of comments on tasks taking a few minutes February 14, 2021 11:23 PM

How cost-effective is an investment in commenting code?

Answering this question requires knowing the time needed to write the comment and the time they save for later readers of the code.

A recent study investigated the impact of comments in small programming tasks on developer performance, and Sebastian Nielebock, the first author, kindly sent me a copy of the data.

How might the performance impact of comments be measured?

The obvious answer is to ask subjects to solve a coding problem, with half the subjects working with code containing comments and the other half the same code without the comments. This study used three kinds of commenting: No comments, Implementation comments and Documentation comments; the source was in Java.

Were the comments in the experiment useful, in the sense of providing information that was likely to save readers some time? A preliminary run was used to check that the comments provided some benefit.

The experiment was designed to be short enough that most subjects could complete it in less than an hour (average time to complete all tasks was 31 minutes). My own experience with running experiments is that it is possible to get professional developers to donate an hour of their time.

What is a realistic and experimentally useful amount of work to ask developers to in an hour?

The authors asked subjects to complete 9-tasks; three each of applying the code (i.e., use the code’s API), fix a bug in the code, and extend the code. Would a longer version of one of each, rather than a shorter three of each been better? I think the only way to find out is to try it both ways (I hope the authors plan to do another version).

What were the results (code+data)?

Regular readers will know, from other posts discussing experiments, that the biggest factor is likely to be subject (professional developers+students) differences, and this is true here.

Based on a fitted regression model, Documentation comments slowed performance on a task by 30 seconds, compared to No comments and Implementation comments (which both had the same performance impact). Given that average task completion time was 205 seconds, this is a 15% slowdown for Documentation comments.

This study set out to measure the performance impact of comments on small programming tasks. The answer, at least for tasks designed to take a few minutes, is that No comments, or if comments are required, then write Implementation comments.

This experiment measured the performance impact of comments on developers who did not write the code containing them. These developers have to first read and understand the comments (which takes time). However, the evidence suggests that code is mostly modified by the developer who wrote it (just reading the code does not leave a record that can be analysed). In this case, the reading a comment (that the developer previously wrote) can trigger existing memories, i.e., it has a greater information content for the original author.

Will comments have a bigger impact when read by the person who wrote them (and the code), or on tasks taking more than a few minutes? I await the results of more experiments…

Update: I have updated the script based on feedback about the data from Sebastian Nielebock.

Noon van der Silk (silky)

2020 Books February 14, 2021 12:00 AM

Posted on February 14, 2021 by Noon van der Silk { display: flex; flex-direction: row; padding: 10px; } img { margin-right: 10px; width: 150px; } div.alt { background: #eaeaea; }

Continuing the tradition started in 2018, continued in 2019 over on the Braneshop blog, I was reminded I haven’t posted the books I read in 2020 yet.

So, here we are:

How Long ’til Black Future Month? by Jemisin, N.K.

As a big fan of N.K. Jemisin I just wanted to read more of what she’d written. This is good for that; it’s a really nice dip-in to the worlds she has created and leaves you wanting more!

Gravity’s Century: From Einstein’s Eclipse to Images of Black Holes by Cowen, Ron

Pretty good. This was a nice history of black holes and an introduction to some of the pressing issues presently.

The Cloudspotter’s Guide by Pretor-Pinney, Gavin

I picked this up after starting (but not finishing) The Wavewatcher’s Companion, which I found to be hilarious. This book was not as funny, but was still pretty good. Gavin has a very nice way of appreciating the world.

Altruism: The Power of Compassion to Change Yourself and the World by Ricard, Matthieu

I really quite enjoyed this book, and infact wrote a longer review over on the Between Books website.

Girl, Woman, Other by Evaristo, Bernardine

A nice collection of stories featuring different women and how their lives intersect. I think it’s quite nice to get so many stories from the everyday lives of women; especially for me, for whom it’s a bit unknown territory.

The White Album by Didion, Joan

The first Joan Didion book I’ve read. She’s very famous of course; and I don’t know. It was good to read; but was it objectionably good? Maybe. It’s at least nice to know a bit about her life and how she writes. I’ll probably read more.

Mullumbimby by Lucashenko, Melissa

I quite enjoyed this story. It’s also the first time I’ve read a modern fictional Australian indigenous-centered story. Will be on the look out for more books of this kind! Came up as part of the OC House bookclub, I think.

Down Girl: The Logic of Misogyny by Manne, Kate

This book is amazing. I found it to be written in a bit of an academic-y style; so it can be a bit dense, but it does an amazing job of framing what misogyny is and how it is present in all aspects of society.

Unfree Speech: The Threat to Global Democracy and Why We Must Act, Now by Wong, Joshua

This is a nice short history of free speech and how China is taking over Hong Kong. It was interesting to read about how they attempted to make some large changes to their democracy through grass-roots organisation. In some ways quite inspiring, but also a reality check that it isn’t easy to defeat power. Worth a read.

A Month in Siena by Matar, Hisham

This was just a nice simple book about someone getting amongst life in a new town. In many ways I think this kind of experience is how many people wish their holidays to go: meeting some nice stranger, integrating into the community, and just appreciating the joy of life.

The Weil Conjectures: On Math and the Pursuit of the Unknown by Olsson, Karen

I can’t remember much about this book at this point; I think I enjoyed it.

The Worst Journey in the World by Cherry-Garrard, Apsley

I read the next two books back-to-back. This one was strange. It’s written from the English perspective; it’s quite racist at times. I wouldn’t recommend reading it, unless you really want to compare perspectives on the journeys of Scott and Amundsen.

The Last Place on Earth: Scott and Amundsen’s Race to the South Pole by Huntford, Roland

This was a really interesting read. It was so fascinating to compare how Amundsen approached the journey compared to Scott. According to this book, Scott was woefully underprepared and arrogant, and Amundsen spent many many years training and learning the right skills from different indigenous groups in order to survive.

If you want to learn about the race to the south pole; this book is definitely better than the above.

Food Rules: An Eater’s Manual by Pollan, Michael

More of a blog post than a book; but still, nice to keep next to you somewhere when you want to remember if it’s okay to eat chocolate everyday (probably not) or have a wine with dinner (probably fine).

Men Explain Things to Me by Solnit, Rebecca

Hilarious. This is a collection of stories; some funnier than others, but overall great.

The Shadow of the Sun by Kapuściński, Ryszard

I enjoyed this as a nice way to get a bit of, albeit an outsiders, insight into how different people live in some of the poorer parts of Africa.

Capital and Ideology by Piketty, Thomas

I really enjoyed this book, and wrote about it more over on Between Books - Capital and Ideology

Capital in the Twenty-First Century by Piketty, Thomas

This one I also enjoyed; I don’t think you really need to read it first before Ideology, but I opted to do so. Reading this really inspired a love and interest of economics, and Piketty seems to do an amazing job of exploring these topics in an approachable way.

Affluence Without Abundance: The Disappearing World of the Bushmen by Suzman, James

This book wasn’t bad; it’s an exploration of life with a marginalised tribe referred to as the Bushmen. It doesn’t totally take the expected view that the simple life is better; but it does discuss how politics and the community encroaches on this tribe, and how they live life.

A Woman in the Polar Night by Ritter, Christiane

I think this is a classic book; it’s a little strange, but it does show one persons journey into the Arctic and how she learned to love it. It didn’t exactly inspire me to go and life there, but I do admire her approach!

Touching the Void: The True Story of One Man’s Miraculous Survival by Simpson, Joe

This is amazing. I read this assuming a particular fact about the story (I won’t spoil it by telling you), but it turned out that my mind was blown by what actually happened. I’ll probably read it again!

Into Thin Air: A Personal Account of the Mount Everest Disaster by Krakauer, Jon

I bought this I think after the one above; desperate for more books about climbing. This is also an amazing story; I think if anything it made me feel certain that I’ll never attempt to climb Everest.

The Case of the Honjin Murder by Yokomizo, Seishi

I just happened across this one in the bookshop and thought I’d give it a go. I can’t say it was the best; it was quite sexist. The story was probably pretty good, if you could ignore that; but I couldn’t.

Tall Man: The Death of Doomadgee by Hooper, Chloe

This was a good read but super aggravating. It’s really unsettling to learn how terrible some of the policing is; and the subsequent investigations that yielded no useful outcome. Eye-opening for me in terms of how bad racism is in Australia.

Mountains of the Mind: A History of a Fascination by Macfarlane, Robert

Pretty good. I went through a phase of reading about mountain climbing. This was a nice little overview of how people justify climbing. Good for an introduction to other things to read.

The City We Became (Great Cities, #1) by Jemisin, N.K.

For how much I loved her other books; I have to say I found this one a bit disappointing. I think by normal sci-fi standards, it’s certainly excellent, and I’ll definitely read the rest of the series; but if you’re looking for something as amazing as her other books, you might not find it here.

Annapurna: A Woman’s Place by Blum, Arlene

I really enjoyed this one. It was interesting to compare this to other climbing books written by men, which almost never feature much uncertainty or collaborative leadership.

Underland: A Deep Time Journey by Macfarlane, Robert

This is just super cool. We are taken on a journey through lots of different underground worlds. Certainly makes you want to do this kind of exploring.

The Eastern Curlew by Saddler, Harry

I read this after meeting the author at a party! I felt so cool; I’d never met an author in real life before. Inspired by that meeting, I picked it up. It’s really a nice story about following the migratory path of a bird, and thinking about how their ecosystem is being impacted. Would definitely recommend!

Her Body and Other Parties by Machado, Carmen Maria

Not bad. Probably not my favourite style of book, but if you like weird kind of magical fiction with a message, probably quite good.

My Sister, the Serial Killer by Braithwaite, Oyinkan

I quite enjoyed this one; it was a very quick read; so I’m looking forward to more by this author!

In Praise of Shadows by Tanizaki, Jun’ichirō

Pretty good. Very very short; but a thoughtful analysis of how light impacts space. I think it’s a classic of the architecture world.

Barbarian Days: A Surfing Life by Finnegan, William

I really really enjoyed this one. Very nicely written, you feel like you’re living the life alongside the author. It’s nice to read about someone who follows their passion so directly.

Why I’m No Longer Talking to White People About Race by Eddo-Lodge, Reni

I picked this up as soon as I arrived in the UK, to gain an understanding for how people here think about race issues. Pretty good reading.

Kon-Tiki by Heyerdahl, Thor

This was a funny one. A classic kind of adventure story, from real life, I have to say I enjoyed it.

Mad, Bad & Dangerous to Know: The Autobiography by Fiennes, Ranulph

I found this one also kind of funny. I suppose this guy is very famous in the UK, but I’d not heard of him really. It’s funny to read about how he thinks of endurance, and his claim that “anyone” can be like him; in terms of running 7 marathons at the age of 70 across 7 countries; or something along those lines. Didn’t exactly encourage me to do the same, but did give me some food for thought about willpower and energy.

See What You Made Me Do: Power, Control and Domestic Violence by Hill, Jess

This is a very hard book to read, emotionally. It contains some exceptionally difficult stories. Definitely recommended reading, but if you want a more academic treatment see the earlier book by Kate Manne.

Becoming Bodhisattvas by Chödrön, Pema

Easily the best book I read last year. Of course it’s a buddhist view on how to live life; but I found it very practical and thoughtful. I’ve read it again during some difficult times, and found it very uplifting.

February 12, 2021

Jan van den Berg (j11g)

Thoughts on Clubhouse February 12, 2021 09:44 AM

You can’t

  • Listen on demand
  • Restart/replay a conversation
  • Record a conversation or audio snippets
  • Trace back when a conversation started
  • See who is talking, you can only hear them
  • Send (text) messages to other Clubhouse users
  • Share pictures, videos, gifs or audio files
  • Use it on Android

You can

  • Listen to audio conversations as they happen
  • See avatars of who is in the conversation
  • Start a group audio conversation with people you know
  • See Clubhouse member profiles with who they follow and who follow them
  • Minimize the app and do other things while listening
  • Receive conversation notices, as they happen, based on people you follow*

What you get

  • Raw, unscripted watercooler conversations

So there are a lot of things Clubhouse doesn’t do and only a few things it does do. But the almost archaic possibilities of the Clubhouse app are a large part of the appeal. The constraints punctuate the experience.

Developers, developers, developers

Ephemeral group conversations are of course as old as humans. And we didn’t even have to wait for the internet and smartphones for this human need to be implemented in technology. Theoretically the Clubhouse experience was already possible — pre-internet — with the plain old telephony system and it is also basically what happens on the amateur radio network frequencies (this is still a thing).

Remember this bad boy?

Which is why it is remarkable that it took to 2020 for such an app to exist on a smartphone. Was the idea maybe too simple? No. Clubhouse may be a primal experience but it is also a logical iteration from text messages, Instagram Stories, Snapchat and TikTok. Clubhouse adds something new to this line of — increasingly real-time — social interactions, by taking away a lot of barriers. And by being the only one that is actually real-time.

The Clubhouse experience is the lowest bar to participation of any social app out there. You don’t have to leave the house, sit at a desk, straighten your hair, you don’t even have to be able to type. It is just you talking or listening to people.

And Clubhouse strips down the human need for sharing without showing your face (Zoom), or having to be overly creative (TikTok). Remember that Instagram and Snapchat filters are not only successful because they are fun, they also obfuscate what you don’t want to be seen. Clubhouse doesn’t have this problem.


This all boils down to the lowest denominator of participation of any social app out there and the result is a very real experience. So real that it hardly makes sense for people to get ‘verified’ (blue checkmarks) you know right away if the person talking is who they say they are. I was listening to a room with Steve Ballmer, and trust me, that was Steve Ballmer.

It’s the algorithm

So Clubhouse offers one of the oldest human experiences of just people talking. But here is the really clever part and why we did need the internet and smartphones.
*The Clubhouse algorithm sends a push notification when conversations you might be interested in are happening. This is probably also the only reason Clubhouse uses profiles and following/followers lists. Because your interests are, of course, based on people you follow. And this — social graph — is exactly what the internet and smartphones bring to the table that the telephone system and amateur radio can’t.

So now what?

A lot of Clubhouse conversations are about Clubhouse. Also a lot of people on Clubhouse are talking about social media and social influencing in general. It feels very meta. But I guess that is what typically happens when these things start.

Clubhouse is the hottest new app at the moment, either because of their clever iOS only, FOMO inducing invite only approach, or because of the tech VC entourage that pushes the interest for the app, or maybe because that the pandemic has emphasized the need for humans to connect. It’s probably a little bit of all of the above. But you also know because of the app’s succes either one of these two things will happen: 1. Facebook will buy Clubhouse or 2. Facebook will clone Clubhouse. We’ll see.

I see different paths forward for Clubhouse and I am curious to see how it will pan out. And the app right now is very bare, which is also the appeal. So it’ll be interesting to see how and whether they will pivot, maybe they will start adding features, maybe they will introduce recording conversations (podcasts)? And they of course will have to find ways to monetize. And they will have to do so all while the question looms: will it stay fun or is it just the newness that is appealing?

The post Thoughts on Clubhouse appeared first on Jan van den Berg.

Pete Corey (petecorey)

The Long and Short Case for Elixir February 12, 2021 12:00 AM

I recently landed a new contract with a client gracious enough to hear my thoughts about why they should make radical changes to their stack and choose Elixir to build their greenfield data pipeline.

As part of my pitch, I put together two documents:

The “long case” is a detailed, low level document that outlines why Elixir might be a better choice than Node.js for building the server in question. The “short case” is a distilled and higher level version of the longer document, intended for the project owner.

May these documents help you as much as they’ve helped me. If you have any suggestions for improving them, please let me know!

February 11, 2021

Carlos Fenollosa (carlesfe)

How I moved my setup from a Mac to a Linux laptop February 11, 2021 08:38 AM

This article is part of a series:

  1. Seven years later, I bought a new Macbook. For the first time, I don't love it
  2. This article
  3. (TBD) The experience of using a Linux desktop for six months
  4. (TBD) My review of the M1 Macbook Air

Returning the Macbook Pro

In the previous installment of this series I explained how I had been disenchanted with recent Macs.

The fact is that shortly after writing that review I returned the 2020 Macbook Pro.

I tried to love it, honest to God, but it was mediocre.

Let me share with you the list of issues I found while using it for two weeks:

  • USB2 keyboard and mouse randomly disconnecting. I tried 3 different USB/Thunderbolt adapters, including Apple's, to no avail.
  • Abnormal rattling sound when fans start spinning, as if they were not properly seated.
  • Not detecting the internal keyboard sometimes after resuming sleep.
  • Only 5 hours of battery life when streaming video with the screen off, connected to a TV.
  • Battery drained from 70% to 15% after sleeping overnight.
  • The touchbar. I spent time trying to make it useful with BetterTouchTool. Then I discovered that the keypresses don't register correctly most of the times, and I had to tap twice or three times, which defeats the purpose of having a dedicated shortcut.
  • Trackpad registers too many spurious taps and the cursor jumps around the screen while typing.
  • Airpods Pro have ~400ms of audio lag when watching videos. Solved with a reboot but it appears again after a few minutes.

The fact that it had issues with the internal keyboard and the USB subsystem made me think that the unit may have a faulty logic board. I discovered later, through a Reddit thread, that the USB2 issue was not specific to my unit, but it didn't matter much.

I was feeling really ripped off with a machine I had spent 2.000€ on and which, speed aside, was worse than my 2013 Air in every way I cared.

In the end, my gut told my brain, "Stop rationalizing it, this laptop makes you unhappy", and I came to terms with it.

Now what?

I had been dreading this moment since 2016, when I realized that Apple didn't care about my demographic anymore.

Migrating platforms is a big hassle, so after I made the decision to return the Macbook Pro, I thought carefully what the next steps would be.

In order to transition from the Mac to Linux I had to prepare a plan for for new hardware, new software, and a new cloud ecosystem.

At that point there were strong rumors about ARM Macs. I thought I'd use Linux for an indeterminate amount of time, until Apple hopefully released a good Mac again. Which may have been "never", though I was optimistic.

I have used Linux extensively in my life, since 1999, but mostly as a developer. Nowadays my requirements are more "mainstream" and I need apps like Microsoft Office and Adobe Reader to do my job. This is an important point to make. This computer is my main work tool, and it needs to accommodate my own needs as well as my job's. I spent some time making sure that I could run all the software I needed. As a last resort, there is always the option of using a VM.

As a final step, I had to move all the iCloud stuff out of there, because it is not interoperable with standard clients. I decided I would self-host it and see how difficult it is to leave the walled garden.

Therefore, I needed to fulfil the following requirements:

  1. Good laptop hardware with Linux support
  2. Ability to run work-related software
  3. Self-hosted (or almost) cloud services

1. Choosing a new laptop: The 2018 Dell XPS 13"

Before buying—and potentially returning—a new machine I drove to the office and grabbed two old laptops to see if they would fit: a Thinkpad 420 and a 2018 Dell XPS 13".

I decided to test drive the five of them: the 2020 MBP, my 2013 MBA, the Thinkpad, the Dell, and my tinkering laptop, a Thinkpad x230 with OpenBSD.

I then spent a couple days trying to make some sense of the situation. You can see them running a group video chat and some benchmarks.

Fortunately, a clear winner emerged: the 2018 Dell XPS with Ubuntu-Dell installed.

How good is the 2018 XPS? Excellent. 9.5/10 would recommend. I got in love with that machine. Very good hardware, with just a few minor issues.


  • Good screen quality
  • Small bezels. It makes a difference and I still miss them today.
  • Light, nice in my lap. The Macbook Pros have air vents that "cut" into your legs when you're wearing shorts.
  • All I/O worked fine. I used the official Dell Thunderbolt Dock.
  • Excellent keyboard. I liked the pgup/pgdn keys on the arrow cluster and welcomed back the function keys.
  • Good battery life (6h of streaming video) even though the laptop had been used daily for almost 3 years.


  • The speakers are of laughable quality. Not just bad, but why-would-Dell-do-that bad. Extremely quiet and terrible quality.
  • The webcam is on a bad location. Not really a big deal, but I'm glad they fixed it in recent revisions.
  • The trackpad is kinda like the 2013 Air's, but a bit worse.
  • Coil whine. I tried to be positive and used it as an "activity indicator" like in the old days of spinning hard drives, but I'd rather not have it.

That really is it. The Dell XPS is probably the best go-to PC laptop. Excellent set of compromises, good price, good support. If you want to use Linux on a laptop, you can't go wrong with the XPS.

2. Doing my job with Linux

I knew beforehand that hardware support was not going to be an issue. Linux drivers in general are pretty good nowadays, and that XPS specifically was designed to work well with Linux, that is why we bought it for the office.

On first boot everything works. Ubuntu is pretty good. Gnome 3 tries to be like a Mac, which I liked, and the basic software is fine for most of the people.

I then installed my work software. Most of it is either standard (email, calendar...) or multi-platform via Electron or webapps. For Windows-specific software I purchased a license of Crossover and also installed a Windows 10 VM on Virtualbox. It was not super convenient and sometimes things crashed with Crossover, but I could manage.

Overall, the desktop environment and base apps are not as polished as macOS, which I will discuss later, but it worked.

I am happy to realize that I can continue recommending Linux to "regular people" who want a computer that just works, doesn't get viruses, and is very low maintenance.

3. My self-hosted cloud setup

This is a topic that is on everybody's mind right now. We went from the original, decentralized internet, to a network centralized in a few vendors like Facebook, Google, Cloudflare and Amazon, and I think that is a bad idea.

The walled garden is comfortable, but what happens when you want to make the switch? How easy it really is to migrate your online infrastructure to another vendor?

Well, I was going to discover that soon. I like the iCloud ecosystem, and in general am fine with Apple's privacy policies, but I just couldn't continue using it. Apart from email, all other services (pictures, calendars, files, notes, etc.) cannot be used in Linux, and the browser client is extremely bad.

I am a geek, and have been a sysadmin since college, so I took it as a personal challenge to create my own personal cloud infrastructure.

First I tried Nextcloud. It mostly works and I recommend it in general, but the server components are too heavy and the file syncing is slow and unreliable.

I decided to self-host every individual piece of the puzzle:

  • My mail has been managed by postfix/dovecot for a few years now. I don't recommend anybody self-hosting email due to deliverability issues, but I'm that stubborn.
  • I set up radicale for contacts, calendars and tasks. It had issues connecting to some clients due to, I believe, SSL negotiation. If I had to set it up again I'd try another alternative.
  • All my files got synced over my laptops and the server thanks to Syncthing. I can't stress enough how great of a software is Syncthing. Really, if you're looking for an alternative to Dropbox, try it out. It will amaze you.
  • Syncthing does not expose files publicly, so I span up the Apache Webdav server to share files.
  • Joplin is a good alternative to take rich text notes and sync them over the internet. The clients are not very polished, but it works.
  • For passwords I've been using Lastpass for some time.
  • I kept using iCloud for pictures, because it's the best solution if you have an iPhone. It is fine because I don't need to work with pictures on my daily workflow.

It took some time of researching and deploying all the pieces, and I'm quite happy with the result. It feels really great to manage your online infrastructure, even though it requires technical knowledge and regular maintenance to keep everything up to date.

So how did it all work out?

Well, I have been repeating this term all over the article: it works. I could do my job, and it was a very gratifying learning experience. Overall, I do encourage geeks to spin up their own cloud infra and work with Linux or BSD boxes. I do have some self-hosted cloud services and I also keep a laptop with OpenBSD which I use regularly.

It is possible to get out of the walled garden. Of course, it's not within reach of the general public yet, even though Nextcloud is very close, and some third party vendors are starting to offer an integrated cloud experience outside the world of the Big Cloud.

But I'm writing this in the past tense because I went back to the Mac. Unfortunately, after six months of using this setup full-time I started noticing very rough edges, which I will explain on the next article.

Stay tuned!

Tags: apple, linux, hardware

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Maxwell Bernstein (tekknolagi)

Small objects and pointer tagging February 11, 2021 12:00 AM

Welcome back to the third post in what is quickly turning into a series on runtime optimization. The last two posts were about inline caching and quickening, two techniques for speeding up the interpreter loop.

In this post, we will instead look at a different part of the runtime: the object model.

The problem

Right now, we represent objects as tagged unions.

typedef enum {
} ObjectType;

typedef struct {
  ObjectType type;
  union {
    const char *str_value;
    word int_value;
} Object;

This C struct contains two components: a tag for the type, and space for either an integer or a C string pointer. In order to create a new Object, we allocate space for it on the C heap and pass a pointer around. For example, here is the current constructor for integers:

Object* new_int(word value) {
  Object* result = malloc(sizeof *result);
  CHECK(result != NULL && "could not allocate object");
  *result = (Object){.type = kInt, .int_value = value};
  return result;

I don’t know about you, but to me this seems a little wasteful. Allocating every single new integer object on the heap leaves a lot to be desired. malloc is slow and while memory is cheaper these days than it used to be, it still is not free. We’ll need to do something about this.

In addition, even if we somehow reduced allocation to the bare minimum, reading from and writing to memory is still slow. If we can avoid that, we could greatly speed up our runtime.

This post will focus on a particular strategy for optimizing operations on integer objects, but the ideas also apply broadly to other small objects. See Exploring further for more food for thought.

The solution space

There are a number of strategies to mitigate this gratuitous allocation, most commonly:

  1. Interning a small set of canonical integer objects. CPython does this for the integers between -5 and 256.
  2. Interning all integers by looking them up in a specialized hash table before allocating. This is also called hash consing.
  3. Have a statically typed low-level language where the compiler can know ahead of time what type every variable is and how much space it requires. C and Rust compilers, for example, can do this.

The first two approaches don’t reduce the memory traffic, but the third approach does. Our runtime has no such type guarantees and no compiler to speak of, so that’s a no-go, and I think we can do better than the first two strategies. We’ll just need to get a little clever and re-use some old knowledge from the 80s.

What’s in a pointer?

Before we get clever, we should take a step back and think about the Object pointers we pass around. The C standard guarantees that malloc will return an aligned pointer. On 32-bit systems, this means that the result will be 4-byte aligned, and on 64-bit systems, it will be 8-byte aligned. This post will only focus on 64-bit systems, so for our purposes all malloced objects will be 8-byte aligned.

Being 8-byte aligned means that all pointers are multiples of 8. This means that if you look at the pointer representation in binary, they look like:

High                                                           Low

See that? The three lowest bits are zero. Since we’re guaranteed the pointers will always be given to us with the three zero bits, we can use those bits to store some extra information. Lisp and Smalltalk runtimes have been doing this for at least 30 years.

On some hardware, there are also bits unused in the high part of the address. We will only use the lower part of the address, though, because the high bits are reserved for future use.

The scheme

To start, we will tag all pointers to heap-allocated objects with a lower bit of 11. This means that now all real heap pointers will end in 001 instead of 000. We will then assume that any pointer with a lowest bit of 0 is actually an integer. This leaves us 63 bits of integer space. This is one less bit than we had before, which we will talk about more in Exploring further.

We are doing this because the assumption behind this pointer tagging is that integer objects are both small and common. Adding and subtracting them should be very cheap. And it’s not so bad if all operations on pointers have to remove the low 1 bit, either. x86-64 addressing modes make it easy to fold that into normal struct member reads and writes2.

And guess what? The best part is, since we were smart and used helper functions to allocate, type check, read from, and write to the objects, we don’t even need to touch the interpreter core or library functions. We only need to touch the functions that work directly on objects. Let’s take a look.

New object representation

Okay, we should also look at at the struct definition. I think we should first make Object opaque. We don’t want anyone trying to dereference a tagged pointer!

struct Object;
typedef struct Object Object;

Now we’ll need to represent the rest of the heap-allocated objects in some other type. I think HeapObject is a reasonable and descriptive name. We can keep using the tagged union approach from earlier.

typedef struct {
  ObjectType type;
  union {
    const char* str_value;
} HeapObject;

Right now we only have strings but I imagine it would be useful to add some more types later on.

Now, it’s important to keep the invariant that whenever we have a HeapObject*, it is a valid pointer. This means that we should always untag before casting from Object* to HeapObject*. This will help both keep our interfaces clean and avoid bugs. You’ll see what I mean in a little bit.

Helper functions

Now that we have our object representation down, we can take a look at the helper functions. Let’s start with the easiest three, object_is_int, object_as_int, and new_int:

enum {
  kIntegerTag = 0x0,      // 0b0
  kIntegerTagMask = 0x1,  // 0b1
  kIntegerShift = 1,

bool object_is_int(Object* obj) {
  return ((uword)obj & kIntegerTagMask) == kIntegerTag;

word object_as_int(Object* obj) {
  return (word)obj >> kIntegerShift;

Object* new_int(word value) {
  CHECK(value < INTEGER_MAX && "too big");
  CHECK(value > INTEGER_MIN && "too small");
  return (Object*)((uword)value << kIntegerShift);

We decided that integer tags are one bit wide, zero, and the lowest bit. This function puts that in code. If you are unfamiliar with bit manipulation, check out the Wikipedia article on bit masks. The constants INTEGER_MAX and INTEGER_MIN refer to the maximum and minimum values that will fit in the 63 bits of space we have. Right now there are some CHECKs that will abort the program if the integer does not fit in 63 bits. Implementing a fallback to heap-allocated 64-bit integers or even big integers is a potential extension (see Exploring further).

The test for heap objects is similar to the test for ints. We use the same tag width (one bit) but we expect the bit to be 1, not 0.

enum {
  // ...
  kHeapObjectTag = 0x1,      // 0b01
  kHeapObjectTagMask = 0x1,  // 0b01

bool object_is_heap_object(Object* obj) {
  return ((uword)obj & kHeapObjectTagMask) == kHeapObjectTag;

Any pointer that passes object_is_heap_object should be dereferenceable once unmasked.

Speaking of unmasking, we also have a function to do that. And we also have a function that can cast the other way, too.

HeapObject* object_address(Object* obj) {
  return (HeapObject*)((uword)obj & ~kHeapObjectTagMask);

Object* object_from_address(HeapObject* obj) {
  return (Object*)((uword)obj | kHeapObjectTag);

The function object_address will be the only function that returns a HeapObject*. It checks that the object passed in is actually a heap object before casting and untagging. This should be safe enough.

Alright, so we can make integers and cast between Object* and HeapObject*. We still need to think about object_type and the string functions. Fortunately, they can mostly be implemented in terms of the functions we implemented above!

Let’s take a look at object_type. For non-heap objects, it has to do some special casing. Otherwise we can just pull out the type field from the HeapObject*.

ObjectType object_type(Object* obj) {
  if (object_is_int(obj)) {
    return kInt;
  return object_address(obj)->type;

And now for the string functions. These are similar enough to their previous implementations, with some small adaptations for the new object model.

bool object_is_str(Object* obj) { return object_type(obj) == kStr; }

const char* object_as_str(Object* obj) {
  return object_address(obj)->str_value;

Object* new_str(const char* value) {
  HeapObject* result = malloc(sizeof *result);
  CHECK(result != NULL && "could not allocate object");
  *result = (HeapObject){.type = kStr, .str_value = value};
  return object_from_address(result);

That’s that for the helper functions. It’s a good thing we implemented int_add, str_print, etc in terms of our helpers. None of those have to change a bit.

Performance analysis

I am not going to run a tiny snippet of bytecode in a loop and call it faster than the previous interpreter. See Performance analysis from the first post for an explanation.

I am, however, going to walk through some of the generated code for functions we care about.

We said that integer operations were important, so let’s take a look at what kind of code the compiler generates for int_add. For a refresher, let’s look at the definition for int_add (which, mind you, has not changed since the last post):

Object* int_add(Object* left, Object* right) {
  return new_int(object_as_int(left) + object_as_int(right));

Previously this would read from memory in object_as_int, call malloc in new_int, and then write to memory. That’s a whole lot of overhead and function calls. Even if malloc were free, memory traffic would still take quite a bit of time.

Now let’s take a look at the code generated by a C compiler. To get this code, I pasted interpreter.c into The Compiler Explorer. You could also run objdump -S ./interpreter or gdb -batch -ex "disassemble/rs int_add" ./interpreter. Or even run GDB and poke at the code manually. Anwyay, here’s the generated code:

int_add:                                # @int_add
        and     rdi, -2
        lea     rax, [rdi + rsi]
        and     rax, -2

How about that, eh? What was previously a monster of a function is now four whole instructions3 and no memory operations. Put that in your pipeline and smoke it.

This is the kind of benefit we can reap from having small objects inside pointers.

Thanks for reading! Make sure to check out the repo and poke at the code.

Exploring further

In this post, we made an executive decision to shrink the available integer range by one bit. We didn’t add a fallback to heap-allocated 64-bit numbers. This is an interesting extension to consider if you occasionally need some big numbers. Or maybe, if you need really big numbers, you could also add a fallback to heap allocated bignums! If you don’t care at all, it’s not unreasonable to decide to make your integer operations cut off at 63 bits.

This post spent a decent chunk of time fitting integers into pointers. I chose to write about integers because it was probably the simplest way to demonstrate pointer tagging and immediate objects. However, your application may very well not be integer heavy. It’s entirely possible that in a typical workload, the majority of objects floating around your runtime are small strings! Or maybe floats, or maybe something else entirely. The point is, you need to measure and figure out what you need before implementing something. Consider implementing small strings or immediate booleans as a fun exercise. You will have to think some more about your object tagging system!

Pointer tagging is not the only way to compress values into pointer-sized objects. For runtimes whose primary numeric type is a double, it may make sense to implement NaN boxing. This is what VMs like SpiderMonkey4 and LuaJIT do.

Remember my suggestion about the template interpreter from the quickening post? Well, that idea is even more attractive now. You, the runtime writer, get to write a lot less assembly. And your computer gets to run a lot less code.

This post puts two distinct concepts together: small objects and pointer tagging. Maybe you don’t really need small objects, though, and want to store other information in the tag bits of your pointer. What other kinds of information could you store in there that is relevant to your workload? Perhaps you can tag pointers to all prime integers. Or maybe you want to tag different heap-allocated objects with their type tags. Either way, the two techniques are independent and can be used separately.

  1. In my blog post about the Ghuloum compiler, I used the bit patterns from the original Ghuloum paper to tag integers, characters, and different types of heap-allocated objects. Feel free to skim that if you want a taste for different pointer tagging schemes. 

  2. (Adapted from my Reddit comment)

    Say you have a C struct:

    struct foo {
      int bar;

    and a heap-allocated instance of that struct:

    struct foo *instance = malloc(...);

    Reading an attribute of that struct in C looks like:


    and gets compiled down to something like the following pseudo-assembly (which assumes the instance pointer is stored in a register):

    mov rax, [instance+offsetof(foo, bar)]

    This is read as:

    1. take pointer from whatever register instance is in
    2. add the offset for the bar field to the pointer
    3. dereference that resulting pointer
    4. store that in rax

    And if you tag your pointers with, say, 0x1, you want to remove the 0x1. Your C code will look like:

    instance->bar & ~0x1;

    or maybe:

    instance->bar - 1;

    Which compiles down to:

    mov rax, [instance+offsetof(foo, bar)-1]

    and since both offsetof(foo, bar) and 1 are compile-time constants, that can get folded into the same mov instruction. 

  3. And guess what? This is just what the C compiler can generate from a C description of our object model. I have not figured out how to add the right compiler hints, but another correct implementation of int_add is just two instructions:

    int_add:                                # @int_add
        lea     rax, [rdi + rsi]

    I’m not sure what’s going on with the code generation that prevents this optimization, but we really should be able to add two integers without doing any fiddling with tags. In an assembly/template interpreter world, this kind of fine-grained control becomes much easier. 

  4. This is interesting because V8 and Dart, other JS(-like) VMs use pointer tagging. Seach “Smi” (or perhaps “Smi integer”) if you want to learn more. 

February 09, 2021

asrpo (asrp)

Adding a minimal foreign function interface to your language February 09, 2021 08:21 PM

A foreign function interface (FFI) is a way for one language to call functions from another language.

The most common FFIs are C FFIs where the target language is C.

I just pushed a new version of Flpc where I added a very limited FFI for calling Python. Highlights of some other changes are:

February 08, 2021

Gonçalo Valério (dethos)

Documentation done right February 08, 2021 11:03 PM

One critical piece of the software development process that often gets neglected by companies and also by many open-source projects is explaining how it works and how it can be used to solve the problem in question.

Documentation is often lacking and people have an hard time figuring out how they can use or contribute to a particular piece of software. I think most developers and users have faced this situation at least once.

Looking at it from the other side, it isn’t always easy to pick and share the right information so others can hit the ground running. The fact that not everybody is starting from the same point and have the same goal, makes the job a bit harder.

One approach to solve this problem that I like is Divio’s documentation system.

Divio's documentation system explained. Showing the 4 quadrants and their relations.The components of Divio’s documentation system.

It splits the problems in 4 areas targeting different stages and needs of the person reading the documentation. Django uses this system and is frequently praised for having great documentation.

From a user point of view it looks solid. You should take a look and apply it on your packages/projects, I surely will.

Ponylang (SeanTAllen)

Last Week in Pony - February 7, 2021 February 08, 2021 12:36 AM

@ergl, also know as Borja o’Cook on the Ponylang Zulip, has become a Pony committer. Also, we have started the process of renaming the default branches on our repos from master to main.

February 07, 2021

Carlos Fenollosa (carlesfe)

Linux vs GNU/Linux February 07, 2021 09:41 PM

Why I do call the system Linux

I personally use the term Linux for simplicity. I know that it's the kernel name, but that's how naming works, sometimes it's convenient to use a synecdoche.

Linux is useful to identify the platform as a whole and it is recognizable to the general public. No-one is using it as demeaning, and when technical people need to be more specific they can include any complements to that term.

Why I don't call it GNU/Linux anymore

Some time ago I was an advocate for the term GNU/Linux, following the rationale of the FSF.

I still do recognize the importance of the FSF and the GNU project into making Linux what it is today. However, I think that nowadays this controversy does more harm than good to the FSF, and I encourage people to be careful when discussing it.

The FSF arguments boil down to:

  1. Many years ago, a big percentage of the codebase of GNU/Linux systems was sourced from the GNU project.
  2. The GNU project is not just a series of tools, but rather an integrated system, which was missing only the kernel. It existed before Linux.

The thing is, with every passing year Linux systems have less and less GNU components. Some vendors are even preferring software with alternative licenses such as BSD or Apache.

Should we then, using the same arguments, advocate for a name which reflects the more recent critical components, as GNU/FreeDesktop/Apache/OSI/BSD/.../Linux?

I am not trying to ridicule the FSF argument. To the contrary, my point is that, while they have been a very important contributor to the project, those specific arguments carry less weight as the project progresses. Therefore, even if this discussion could have been productive in 2001, nowadays it is either moot, or worse, plays against the GNU project's interests.

I am sure that the FSF will continue calling the system GNU/Linux and I belive they are entitled to it. But I don't think anybody should continue to proselytize about this anymore. And I also don't think that calling the system Linux in 2021 is neither morally nor technically wrong.

Tags: linux

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Derek Jones (derek-jones)

Widely used programming languages: past, present, and future February 07, 2021 09:09 PM

Programming languages are like pop groups in that they have followers, fans and supporters; new ones are constantly being created and some eventually become widely popular, while those that were once popular slowly fade away or mutate into something else.

Creating a language is a relatively popular activity. Science fiction and fantasy authors have been doing it since before computers existed, e.g., the Elf language Quenya devised by Tolkien, and in the computer age Star Trek’s Klingon. Some very good how-to books have been written on the subject.

As soon as computers became available, people started inventing programming languages.

What have been the major factors influencing the growth to widespread use of a new programming languages (I’m ignoring languages that become widespread within application niches)?

Cobol and Fortran became widely used because there was widespread implementation support for them across computer manufacturers, and they did not have to compete with any existing widely used languages. Various niches had one or more languages that were widely used in that niche, e.g., Algol 60 in academia.

To become widely used during the mainframe/minicomputer age, a new language first had to be ported to the major computers of the day, whose products sometimes supported multiple, incompatible operating systems. No new languages became widely used, in the sense of across computer vendors. Some new languages were widely used by developers, because they were available on IBM computers; for several decades a large percentage of developers used IBM computers. Based on job adverts, RPG was widely used, but PL/1 not so. The use of RPG declined with the decline of IBM.

The introduction of microcomputers (originally 8-bit, then 16, then 32, and finally 64-bit) opened up an opportunity for new languages to become widely used in that niche (which would eventually grow to be the primary computing platform of its day). This opportunity occurred because compiler vendors for the major languages of the day did not want to cannibalize their existing market (i.e., selling compilers for a lot more than the price of a microcomputer) by selling a much lower priced product on microcomputers.

BASIC became available on practically all microcomputers, or rather some dialect of BASIC that was incompatible with all the other dialects. The availability of BASIC on a vendor’s computer promoted sales of the hardware, and it was not worthwhile for the major vendors to create a version of BASIC that reduced portability costs; the profit was in games.

The dominance of the Microsoft/Intel partnership removed the high cost of porting to lots of platforms (by driving them out of business), but created a major new obstacle to the wide adoption of new languages: Developer choice. There had always been lots of new languages floating around, but people only got to see the subset that were available on the particular hardware they targeted. Once the cpu/OS (essentially) became a monoculture most new languages had to compete for developer attention in one ecosystem.

Pascal was in widespread use for a few years on micros (in the form of Turbo Pascal) and university computers (the source of Wirth’s ETH compiler was freely available for porting), but eventually C won developer mindshare and became the most widely used language. In the early 1990s C++ compiler sales took off, but many developers were writing C with a few C++ constructs scattered about the code (e.g., use of new, rather than malloc/free).

Next, the Internet took off, and opened up an opportunity for new languages to become dominant. This opportunity occurred because Internet related software was being made freely available, and established compiler vendors were not interested in making their products freely available.

There were people willing to invest in creating a good-enough implementation of the language they had invented, and giving it away for free. Luck, plus being in the right place at the right time resulted in PHP and Javascript becoming widely used. Network effects prevent any other language becoming widely used. Compatible dialects of PHP and Javascript may migrate widespread usage to quite different languages over time, e.g., Facebook’s Hack.

Java rode to popularity on the coat-tails of the Internet, and when it looked like security issues would reduce it to niche status, it became the vendor supported language for one of the major smart-phone OSs.

Next, smart-phones took off, but the availability of Open Source compilers closed the opportunity window for new languages to become dominant through lack of interest from existing compiler vendors. Smart-phone vendors wanted to quickly attract developers, which meant throwing their weight behind a language that many developers were already familiar with; Apple went with Objective-C (which evolved to Swift), Google with Java (which evolved to Kotlin, because of the Oracle lawsuit).

Where does Python fit in this grand scheme? I don’t yet have an answer, or is my world-view wrong to treat Python usage as being as widespread as C/C++/Java?

New programming languages continue to be implemented; I don’t see this ever stopping. Most don’t attract more users than their implementer, but a few become fashionable amongst the young, who are always looking to attach themselves to something new and shiny.

Will a new programming language ever again become widely used?

Like human languages, programming languages experience strong networking effects. Widely used languages continue to be widely used because many companies depend on code written in it, and many developers who can use it can obtain jobs; what company wants to risk using a new language only to find they cannot hire staff who know it, and there are not many people willing to invest in becoming fluent in a language with no immediate job prospects.

Today’s widely used programmings languages succeeded in a niche that eventually grew larger than all the other computing ecosystems. The Internet and smart-phones are used by everybody on the planet, there are no bigger ecosystems to provide new languages with a possible route to widespread use. To be widely used a language first has to become fashionable, but from now on, new programming languages that don’t evolve from (i.e., be compatible with) current widely used languages are very unlikely to migrate from fashionable to widely used.

It has always been possible for a proficient developer to dedicate a year+ of effort to create a new language implementation. Adding the polish need to make it production ready used to take much longer, but these days tool chains such as LLVM supply a lot of the heavy lifting. The problem for almost all language creators/implementers is community building; they are terrible at dealing with other developers.

It’s no surprise that nearly all the new languages that become fashionable originate with language creators who work for a company that happens to feel a need for a new language. Examples include:

  • Go created by Google for internal use, and attracted an outside fan base. Company languages are not new, with IBM’s PL/1 being the poster child (or is there a more modern poster child). At the moment Go is a trendy language, and this feeds a supply of young developers willing to invest in learning it. Once the trendiness wears off, Google will start to have problems recruiting developers, the reason: Being labelled as a Go developer limits job prospects when few other companies use the language. Talk to a manager who has tried to recruit developers to work on applications written in Fortran, Pascal and other once-widely used languages (and even wannabe widely used languages, such as Ada),
  • Rust a vanity project from Mozilla, which they have now abandoned cast adrift. Did Rust become fashionable because it arrived at the right time to become the not-Google language? I await a PhD thesis on the topic of the rise and fall of Rust,
  • Microsoft’s C# ceased being trendy some years ago. These days I don’t have much contact with developers working in the Microsoft ecosystem, so I don’t know anything about the state of the C# job market.

Every now and again a language creator has the social skills needed to start an active community. Zig caught my attention when I read that its creator, Andrew Kelley, had quit his job to work full-time on Zig. Two and a-half years later Zig has its own track at FOSEM’21.

Will Zig become the next fashionable language, as Rust/Go popularity fades? I’m rooting for Zig because of its name, there are relatively few languages whose name starts with Z; the start of the alphabet is over-represented with language names. It would be foolish to root for a language because of a belief that it has magical properties (e.g., powerful, readable, maintainable), but the young are foolish.

Andreas Zwinkau (qznc)

3 Ideas How to Communicate Your Architecture February 07, 2021 12:00 AM

Write decision records, write a newsletter, and explain at concrete examples.

Read full article!

February 06, 2021

Patrick Louis (venam)

Making Sense of The Audio Stack On Unix February 06, 2021 10:00 PM

Come see my magical gramophone

Audio on Unix is a little zoo, there are so many acronyms for projects and APIs that it’s easy to get lost. Let’s tackle that issue! Most articles are confusing because they either use audio technical jargon, or because they barely scratch the surface and leave people clueless. A little knowledge can be dangerous.
In this article I’ll try to bridge the gap by not requiring any prerequisite knowledge while also giving a good overview of the whole Unix audio landscape. There’s going to be enough details to remove mysticism (Oh so pernicious in web bubbles) and see how the pieces fit.

By the end of this article you should understand the following terms:

  • ALSA
  • OSS
  • ESD
  • aRts
  • sndio
  • PulseAudio
  • PipeWire
  • GStreamer

We’ll try to make sense of their utility and their link. The article will focus a bit more on the Linux stack as it is has more components than others and is more advanced in that respect. We’ll also skip non-open source Unix-like systems.
As usual, if you want to go in depth there’s a list of references at the bottom.

Overall, we got:

  • Hardware layer: the physical devices, input and output
  • Kernel layer: interfacing with the different hardware and managing their specificities (ALSA, OSS)
  • Libraries: used by software to interface with the hardware directly, to manipulate audio/video, to interface with an intermediate layer for creating streams (GStreamer, Libcanberra, libpulse, libalsa, etc..), and to have a standard format (LADSPA).
  • Sound servers: used to make the user facing (user-level) interaction easier, more abstract, and high level. This often acts as glue, resolving the issue that different software speak different protocols. (PulseAudio, ESD, aRts, PipeWire, sndio)

Table of Content

Let me preface this by saying that I am not a developer in any of these tech, neither am I a sound engineer. I am simply regrouping my general understanding of the tech so that anyone can get an overview of what the pieces involved are, and maybe a bit more.

Hardware layer

It’s essential to have a look at the hardware at our disposal to understand the audio stack because anything above it will be its direct representation.
There are many types of audio interfaces, be it input or output, with different varieties of sound cards, internal organizations, and capabilities. Because of this diversity of chipsets, it’s simpler to group them into families when interacting with them.

Let’s list the most common logical components that these cards can have.

  • An interface to communicate with the card connected to the bus, be it interrupts, IO ports, DMA (direct memory access), etc..
  • Output devices (DAC: Digital to analog converter)
  • Input devices (ADC: Analog to digital converter)
  • An output amplifier, to raise the power of output devices
  • An input amplifier, same as above but for input devices (ex: microphones).
  • Controls mechanism to allow different settings
  • Hardware mixer, which controls each devices volume and routing, usually volume is measured in decibel.
  • A MIDI (Musical Instrument Digital Interface) device/controller, a standard unified protocol to control output devices (called synthesizers) — think of them like keyboards for sounds.
  • A sequencer, a builtin MIDI synthesizer (output of the above)
  • A timer used to clock audio
  • Any other special features such as a 3D spatializer

It is important to have a glance at these components because everything in the software layers attempts to make them easier to approach.

Analog to Digital & Digital to Analog (ADC & DAC)

A couple of concepts related to the interaction between the real and digital world are also needed to kick-start our journey.

In the real world, the analog world, sound is made up of waves, which are air pressures that can be arbitrarily large.

Speakers generating sound have a maximum volume/amplitude, usually represented by 0dB (decibels). Volume lower than the maximum is represented by negative decibels: -10dB, -20dB, etc.. And no sound is thus -∞ dB.
This might be surprising and actually not really true either. Decibel doesn’t mean much until it’s tied to a specific absolute reference point, it’s a relative scale. You pick a value for 0dB that makes sense for what you are trying to measure.

vu meter

The measurement above is the dBFS, the dB relative to digital full-scale, aka digital 0. There are other measurements such as dB SPL and dBV.
One thing to note about decibels is that they follow a strictly exponential law, which matches it to human perception. What sounds like a constantly increasing volume is indicated by a constantly rising dB meter, corresponding to an exponentially rising output power. This is why you can hear both vanishingly soft sounds and punishingly loud sounds. The step from the loudest you can hear up to destroying your ears or killing you is only a few more dB.

While decibels are about loudness, the tone is represented as sine waves of certain frequency, the speed. For example, the note A is a 440Hz sine wave.

Alright, we got the idea of decibel and tone but how do we get from waves to our computer or in reverse? This is what we call going from analog to digital or digital to analog.
To do this we have to convert waves into discrete points in time, taking samples per second — what we call sample rate. The higher the sample rate, the more accurate the representation of the analog sound (a lollipop graph). Each sample has a certain accuracy, how much information we store in it, the number of bits for each sample — what we call the bit rate (the higher the less noise). For example, CDs use 16 bits.
Which value you choose as your sample rate and bit rate will depend on a trade-off between quality and memory use.

NB: That’s why it makes no sense to convert from digital low sample rate to digital high sample rate, you’ll just be filling the void in the middle of the discrete points with the same data.

Additionally, you may need to represent how multiple channels play sound — multichannel. For example, mono, stereo, 3d, surround, etc..

It’s important to note that if we want to play sounds from multiple sources at the same time, they will need to agree on the sample rate, bit rate, and format representation, otherwise it’ll be impossible to mix them. That’s something essential on a desktop.

The last part in this equation is how to implement the mechanism to send audio to the sound card. That highly depends on what the card itself supports, but the usual simple mechanism is to fill buffers with streams of sound, then let the hardware read the samples, passing them to the DAC (digital to analog converter), to then reach the speaker, and vice versa. Once the hardware has read enough samples it’ll do an interrupt to notify the software side that it needs more samples. This cyclic mechanism goes on and on in a ring fashion.
If the buffer samples aren’t filled fast enough we call this an underrun or drop-out (aka xruns), which result in a glitch, basically audio stopping for a short period before the buffer is filled again.

The audio can be played at a certain sample rate, with a certain granularity, as we’ve said. So if we call buffer-size the number of samples that can be contained in a cyclic-buffer meant to be read by the hardware, fragment-size or period-size the number of samples after which an interrupt is generated, number-fragments the number of fragments that can fit in a hardware buffer (buffer-size/fragment-size), and sample-rate the number of samples per seconds.
Then the latency will be buffer-size/sample-rate, for example if we can fit 100 samples in a buffer and the samples are played once every 1ms then that’s a 100ms latency; we’ll have to wait 100ms before the hardware finishes processing the buffer.
From the software side, we’ll be getting an interrupt every period-size/sample-rate. Thus, if our buffer-size is 1024 samples, and fragment-size is 512, and our sample-rate is 44100 samples/second, then we get an interrupt and need to refill the buffer every 512/44100 = 11.6 ms, and our latency for this stage is up to 1024/44100 = 23 ms. Audio processing pipelines consist of a series of buffers like this, with samples read from one buffer, processed as needed, and written to the next.

Choosing the values of the buffer-size and period-size are hard questions. We need a buffer big enough to minimize underruns, but we also need a buffer small enough to have low latency. The fragments should be big enough to avoid frequent interrupts, but we also need them small enough so that we’re able to fill the buffer and avoid underruns.
What some software choose to do is to not follow the sound card interrupts but to rely on the operating system scheduler instead, to be able to rewrite the buffer at any time so that it stays responsive to user input (aka buffer rewinding). This in turn allows to make the buffer as big as possible. Though, timers often deviate, but that can be fixed with good real-time scheduling.
There are no optimal solutions to this problem, it will depend on requirements, and these values can often be configured.

So we’ve seen how sound is represented in the real world with strength and tone, to then be converted in the digital world via digital to analog DAC or analog to digital ADC converters. This is done by taking samples at a rate and of a certain accuracy called the bit rate. There can also be other information needed such as the channel, byte ordering, etc.. Sound that needs to be mixed needs to agree on these properties. Lastly, we’ve seen how software has to manage a buffer of samples so that sound plays continuously on the device, while also being responsive to users.


Audio related libraries seem to be an alphabet soup of keywords. Here are some examples: alsaplayer-esd, libesd-alsa, alsa-oss, alsaplayer-jack, gstreamer-alsa, gstreamer-esd, lib-alsa-oss, libpulse, libpulse-simple, libao, and so on.

For programs to be able to use audio hardware and the related functionalities, they rely on libraries offering specific APIs. Over time, some APIs get deprecated and new ones appear. Thus, that creates a situation where multiple software speak differently.
To solve this issue, many glue libraries have appeared to interoperate between them. This is especially true when it comes to sound servers such as aRts, eSD, PulseAudio, and backends. For example ALSA supports an OSS layer, that is the role of lib-alsa-oss.

Apart from libraries used to play or record sound and music, there are libraries that have specific usages.

GStreamer is a popular library for constructing chains of media-handling components. It is the equivalent of a shell pipeline for media. This library is used in multiple software in the GNOME desktop environment. For example, cheese (webcam) uses it to add video effects on the fly. Keep this in mind as the creator of GStreamer is now working on PipeWire and applying some of the mindset and functionalities there.

libcanberra is a library that implements a specs to play event sounds. Instead of having to play event sounds by loading and playing a sound file from disk every time, desktop components should instead use this library which abstract the lower level layer that will handle playing it on the appropriate backend. It’s important considering what we said about them changing over time.
The event sound files can usually be found in: /usr/share/sounds/freedesktop/stereo/, and you can test by calling on the command line:

canberra-gtk-play -i bell
canberra-gtk-play -i phone-incoming-call

There are also multiple libraries used to abstract the audio backend of any OS, so called cross-platform audio libraries. This includes libraries such as PortAudio (software using it), OpenAL that focuses on 3D audio, libSDL, and libao.

Lastly, there is LADSPA, the Linux Audio Developer’s Simple Plugin API, which is a library offering a standard way to write audio filter plugins to do signal processing effects. Many programs and libraries support the format including ardour, audacity, GStreamer, Snd, ALSA (with plugin), and PulseAudio (with plugin).

We’ve seen multiple usages for libraries, from their use as glue, to them helping in chaining audio, to desktop integration, to cross-platform interaction, and to allow a common format for audio filters.

Audio Driver

For us to be able to use the audio hardware components we mentioned, we need a way to communicate with them, what we call a driver. That job is done dynamically by the kernel which loads a module when it encounters a new device.

Every platform got its device management mechanism, be it devd on FreeBSD, systemd-udev, Gentoo’s eudev, Devuan’s vdev, mdev from BusyBox or Suckless, etc..
For example, on Linux you can take a look at the currently connected components and the driver handling them by executing lspci. Similarly, on FreeBSD this is done with the pciconf -l command.

To be handled properly, the hardware needs an appropriate driver associated with it.

On Linux, the ALSA kernel layer handles this automatically. The driver names start with the prefix snd_. Issuing lsmod should show a list of them. (supported cards)
In case the device doesn’t get associated with the right driver, you can always create specific rules in the device management (udev).

On FreeBSD, the process takes place within the kernel sound infrastructure and is controlled dynamically at runtime using sysctl kernel tunables. We’ll see how to tune drivers settings in another section, as this is how you interact with them on BSD. The process is similar on most BSDs.
If the driver doesn’t load automatically you can always manually activate the kernel module. For example, to load the Intel High Definition Audio bridge device driver on the fly:

$ kldload snd_hda

Or to keep them always loaded you can set it at boot time in /boot/loader.conf:


On BSDs and Linux the drivers, OSS-derived and ALSA in the kernel, then map the components within the file system, they are the reflection of the hardware we’ve seen before. Mostly input, output, controllers, mixers, clocks, midi, and more.

On FreeBSD the sound drivers may create the following device nodes:

  • /dev/dsp%d.p%d Playback channel.
  • /dev/dsp%d.r%d Record channel.
  • /dev/dsp%d.%d Digitized voice device.
  • /dev/dspW%d.%d Like /dev/dsp, but 16 bits per sample.
  • /dev/dsp%d.vp%d Virtual playback channel.
  • /dev/dsp%d.vr%d Virtual recording channel.
  • /dev/audio%d.%d Sparc-compatible audio device.
  • /dev/sndstat Current sound status, including all channels and drivers.

Example of status:

$ cat /dev/sndstat
FreeBSD Audio Driver (newpcm: 64bit 2009061500/amd64)
Installed devices:
pcm0: <NVIDIA (0x001c) (HDMI/DP 8ch)> (play)
pcm1: <NVIDIA (0x001c) (HDMI/DP 8ch)> (play)
pcm2: <Conexant CX20590 (Analog 2.0+HP/2.0)> (play/rec) default

On OpenBSD it’s similar, a SADA-like driver (Solaris Audio API), that has a different and much simpler mapping:

  • /dev/audioN Audio device related to the underlying device driver (for both playback and recording)
  • /dev/sound same as /dev/audioN, for recording and playback of sound samples (with cache for replaying samples)
  • /dev/mixer to manipulate volume, recording source, or other mixer functions
  • /dev/audioctlN Control device, accept same ioctl as /dev/sound
  • /dev/midiN Control device

On Linux, the ALSA kernel module also maps the components to operational interfaces under /dev/snd/. The files in the latter will generally be named aaaCxDy where aaa is the service name, x the card number, and y the device number. For example:

  • pcmC?D?p pcm playback devices
  • pcmC?D?c pcm capture devices
  • controlC? control devices (i.e. mixer, etc.) for manipulating the internal mixer and routing of the card
  • hwC?D? hwdep devices
  • midiC?D? rawmidi devices - for controlling the MIDI port of the card, if any
  • seq sequencer device - for controlling the built-in sound synthesizer of the card, if any
  • timer timer device - to be used in pair with the sequencer

The devices will mostly be mapped as either PCM devices, pulse-code modulation — the digital side of the equation, or as CTL devices, the controller and mixer, or as MIDI interface, etc..

The driver status and configuration interface is in the process information pseudo-filesystem under /proc/asound (instead of kernel tunable like on most BSDs).
The following long list should give you an idea of what’s available:

  • /proc/asound/
  • /proc/asound/cards (RO) the list of registered cards
  • /proc/asound/version (RO) the version and date the driver was built
  • /proc/asound/devices (RO) the list of registered ALSA devices (major=116)
  • /proc/asound/hwdep (RO) the list of hwdep (hardware dependent) controls
  • /proc/asound/meminfo (RO) memory usage information this proc file appears only when you build the alsa drivers with memory debug (or full) option so the file shows the currently allocated memories on kernel space.
  • /proc/asound/pcm (RO) the list of allocated pcm streams
  • /proc/asound/seq/ the directory containing info about sequencer
  • /proc/asound/dev/ the directory containing device files. device files are created dynamically; in the case without devfs, this directory is usually linked to /dev/snd/
  • /proc/asound/oss/ the directory containing info about oss emulation
  • /proc/asound/cards info about cards found in cardX sub dir
  • /proc/asound/cardX/ (X = 0-7) the card-specific directory with information specific to the driver used
    • id (RO) the id string of the card
    • pcm?p the directory of the given pcm playback stream
    • pcm?c the directory of the given pcm capture stream
    • pcm??/info (RO) the pcm stream general info (card, device, name, etc.)
    • pcm??/sub?/info (RO) the pcm substream general info (card, device, name, etc.)
    • pcm??/sub?/status (RO) the current of the given pcm substream (status, position, delay, tick time, etc.)
    • pcm??/sub?/prealloc (RW) the number of pre-allocated buffer size in kb. you can specify the buffer size by writing to this proc file

For instance we can issue:

$ cat /proc/asound/cards
 0 [HDMI           ]: HDA-Intel - HDA ATI HDMI
                      HDA ATI HDMI at 0xf0244000 irq 32
 1 [Generic        ]: HDA-Intel - HD-Audio Generic
                      HD-Audio Generic at 0xf0240000 irq 16
 2 [LX3000         ]: USB-Audio - Microsoft LifeChat LX-3000
                      C-Media Electronics Inc. Microsoft LifeChat LX-3000 at usb-0000:00:12.0-4, full

That gives us an idea of how different Unix-like OS dynamically load the driver for the device, and then maps it to the filesystem, often also giving an interface to get their status and configure them. Now let’s dive into other aspects of OSS and ALSA, more on the user-side of the equation.

We now have an overview of:

  • The basic logical components a card can have (input/output devices, mixers, control mechanisms, etc..)
  • How to go from analog to digital and vice-versa
  • Some libraries and why software use different ones
  • The mapping of hardware devices to the filesystem when discovered

Advanced Linux Sound Architecture (ALSA)

Now let’s dive into ALSA in particular and see what’s the deal with it.

If you want to get dizzy you can look at this spaghetti diagram. It does more to confuse you than to clarify anything, so it fails as far as meaning is concerned. Linux Audio Dizzy Diagram

ALSA, the Advanced Linux Sound Architecture is an interface provided by the Linux kernel to interact with sound devices.
We’ve seen so far that ALSA is a kernel module and is responsible for loading drivers for the appropriate hardware, and also maps things in the filesystem on /proc/asound and in /dev/snd. ALSA also has a library, a user-facing API for real and virtual devices, and configuration mechanisms that let you interact with the internal audio concepts. Historically, it was designed to replace OSS (Open Sound System) on Linux, which we’ll see in the next section. ALSA provides an OSS emulation if needed.

Some features that are often cited:

  • Up to 8 audio devices at the same time, modularized
  • MIDI functionality like Hardware-based MIDI synthesis.
  • Perform hardware mixing of multiple channels
  • Full-duplex operation.
  • Multiprocessor-friendly
  • thread-safe device drivers

Let’s see the following: How ALSA represents devices, what are PCM and CTL, plugins, configurations, and tools.

ALSA is good at doing automatic configuration of sound-card hardware. It does that by grouping different cards based on “chipset” and families, — similar cards will have similar interfaces. It also fills the gap by using plugins when it comes to the name of controls by deliberately keeping them similar. For example, the master volume is always called “Master Volume”, even when not physically there, the abstraction will exist as a software control plugin.
This grouping allows developers to interact with sound devices in a unified way which makes it much simpler to write applications.

We’ve previously seen how ALSA maps devices in entries in /dev/snd (pcm, control, midi, sequencer, timer) with their meta-information in /proc/asound. Moreover, ALSA split devices into a hierarchy.
ALSA has the concept of cards, devices, and subdevices.

A card is any audio hardware, be it a USB audio headset, an audio chip, or virtual sound card, etc.. Real hardware are backed by kernel drivers while virtual ones live in user-space. It has 3 identifiers:

  • A number, which is incremental after each new insertion (so could change after reboot)
  • An ID, which is a text identifier for a card. This is more unique and consistent
  • A name, another text identifier, but not a useful one.

Devices are subdivision of a card, for playback or capture. For example it could be “analog input + output”, “digital output”, etc.. It dictates the type of device that the card is, what it can do and is capable of processing. A sort of “profile” for the card. Same as with cards, devices have three identifiers: Number, ID, and Name.
Only one device is active at a time, because the device is the current “function” that the card takes.

Devices themselves have at least one subdevice. All subdevices share the same playback (output) or recording (input) stream. They are used to represent available slots for hardware mixing, joining audio in hardware. However, hardware mixing is rarely used so there is usually a single subdevice unless it is a surround sound system.

Overall, that gives us this type of notation.

card 2: LX3000 [Microsoft LifeChat LX-3000], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Alternatively, you can go directly to the /proc tree, which we’ve seen previously, and list cards with cat /proc/asound/cards.

 2 [LX3000         ]: USB-Audio - Microsoft LifeChat LX-3000
                      C-Media Electronics Inc. Microsoft
                      LifeChat LX-3000 at usb-0000:00:12.0-4,

A common notation for ALSA devices looks like hw:X,Y where X is the card id and Y the subdevice id. You do not need the device id because only one can be active at a time.

You can list playback devices by issuing aplay -l and recording devices with arecord -l (and MIDI devices with amidi -l).
Another useful script to dump info about the system is

All ALSA clients have to interface with objects in the ALSA world, the most important two are the PCM (pulse code modulation) and the CTL (control) which we’ve briefly mentioned before.
PCM objects are the representation of a sound stream format used for data flow. These PCMs can be chained and are typically attached to a hardware at one end, but can also be attached to other things such as the filesystem, a server, or even dropping audio completely. When PCMs are chained we call them slave PCM, a virtual PCM, which is an extra step of indirection. The PCM streams connected to the hardware need to follow its characteristics: that is they need to have the right sample rate, bit rate (sample width), sample encoding (endianess), and number of channels (mono, stereo, etc..). List them using aplay -L.
CTL objects are control objects telling ALSA how to process non-audio data. That includes things such as volume controls, toggle controls, multiple-choice selections, etc.. These controls can be put in one of 3 categories: playback control (for output device), capture control (for input device), and feature control (for special features).
There are other ALSA objects such as MIXER and UCM (Use Case Manager - for presets separation, like notifications, voice, media, etc..) but they are not important to get the concept across so we’ll skip them.

These objects can be defined, modified, and created in ALSA configurations and are often templatized and manipulated through plugins. A lot of them are automatically created by ALSA itself to create “profiles” for cards. Most of ALSA processing is delegated to plugins.
Clients will then read the configuration and most often use the default PCMs and CTL, if not selected explicitly by the user. Practically, that means the software will write or read audio from a stream (PCM) and control it (usually volume) through the CTL or MIXER interfaces.

For example:

  • aplay and other players use the PCM interface
  • alsactl uses the ctl (control) interface
  • amixer uses the mixer interface
  • amidi the rawmidi interface and so on.
  • alsaucm the ucm interface

As far as configuration location is concerned, clients will load alsa.conf from the ALSA’s data directory, so /usr/share/alsa/alsa.conf, and in turn this configuration will load system- and user-wide configurations in /etc/asound.conf and ~/.asoundrc or ~/.config/alsa/asoundrc respectively.
Change will take place as soon as clients re-read the configuration, normally when they are restarted.

ALSA configuration format is notoriously complex, almost Turing complete. It consists of a dictionary containing key/value pairs of names and object of a given type.
For example, the pcm key will contain the list of all PCM definitions, and the ctl key will contain the list of all CTL definitions.

The statements are of the form:


KEY1 being one of the object mentioned (pcm, ctl, and others).

The configuration format supports different value types, they could be either string, number, compound (using braces{}), or reference another value.

Additionally, the configuration is hyper flexible, allowing different ways to define the dictionary, which ALSA will later on resolve internally by adding them to its internal global dictionary that has all the keys it accumulated while reading the confs.
For instance, these are equivalent notations, from multiline definition, to using the = sign between params and values, comma or semicolon between consecutive param assignments. Like so:

pcm.a.b 4
pcm.a.c "hi"

is equivalent to

pcm.a {
    b 4
    c "hi"

is equivalent to

pcm.a = {
    b = 4;
    c = "hi";

The configuration format has special statements that begin with @ such as @func, @hooks, and @args which have different behavior. @func is used to call functions, @hooks to load files, and @args to define internal variables that can be used within a compound variable type.


{ @func getenv vars [ ENVVAR1 ENVVAR2 ... ] default VALUE }

Will turn into a string from the specified environment variable. Each environment variable is queried and the first to match is used, otherwise VALUE.

Additionally, you can control how ALSA will act when it finds conflicting entries in the dictionary, how it will merge them. This is done by prefixing the key with one of the following:

  • ! the exclamation mark will cause the previous definition to be overridden instead of adding new values, removing all of the param and sub-param. (!pcm would delete all that is under pcm)
  • ? the question mark will ignore the assignment if the param exists
  • + and - respect the type of any earlier assignment, + creates a new param when necessary, - causes error if param didn’t previously exist.

Alright, so now we know how to create a gigantic dictionary of ALSA sound-related objects, how do we actually make use of them?
What we do is create a name under one of these object and give it a type. This type is a plugin that will dictate what to do with this object. The plugins take different configurations depending on what they do, so you’ll have to consult the docs. That gives rise to something like this:

pcm.NAME {
    type TYPE

ctl.NAME {
    type TYPE

slave_pcm.NAME {
    pcm PCM

So ALSA consists mostly of plugins. You can find the external ones that were installed in: /usr/lib/alsa-lib, and the others are internal. For example, check the documentation for the internal pcm plugins.

For obvious reasons, the most important pcm plugin is the hw hardware one, which is used to access the hardware driver. This plugin takes as parameters things that we mentioned such as the card number, the device, subdevice, the format, rate, channels, etc..

Now things should start to make sense: We have clients reading ALSA configurations and manipulating objects having a common interface, which are handled in the backend by plugins, which often end up on the hardware.

Another important plugin is plug which performs channel duplication, sample value conversion, and resampling when necessary. That is needed if a stream has the wrong sample rate for a hardware. You can use aplay -v to make sure the resampling actually happens.

Yet another one is the dmix pcm plugin, the direct mix plugin, which will will merge multiple pcm streams into an output pcm. This is software mixing, which is much more prominent these days compared to hardware mixing.

There really is a long list of fun plugins that can be used in different scenarios, so take a look.


pcm.plugger {
	type plug
	slave {
		pcm "hw:0,0"

This creates a device called plugger that respect the object interface of pcm. Whatever is written or read from this plugger will be handled by the plug plugin, which in turn will use a slave PCM device hw:0,0.
Notice how we used the word “device”, that is because any pcm connected to a hardware corresponds to an ALSA device. It should start making sense now. These pcm, for instance, are the ones that get listed when you issue aplay -L or arecord -L and these are the objects that clients will interact with — they don’t know if these are even connected to a card or not.

The special name default is used to specify the default object interface. So to set, and override, the default playback device you can do:

pcm.!default "hw:CARD"

ALSA provides a lot of these objects preconfigured, as generic device templates. However, sometimes it requires a bit of fiddling to get right and this isn’t obvious to everyone considering the complexity of ALSA configs.
Sometimes it’s also hard to find a plugin for your case. For example, projects like alsaequal creates an audio equalizer, and project alsa_rnnoise creates a pcm device that will remove noise.
These difficulties are part of the reasons why we use sound servers, which we’ll see in their own sections.

So, we’ve seen ALSA’s representation of components, the common objects such as PCM and CTL, how to create them in the flexible configuration format, how the configuration is mostly plugins, how these plugins will use the sound components, and how clients use the PCM without caring what’s happening under the hood.

Open Sound System (OSS) and SADA

OSS, the Open Sound System, is the default Unix interface for audio on POSIX-compatible systems. Unfortunately, like such standards, it isn’t compatible everywhere. It can be perplexing to understand because different systems have branched out of it.

Untill OSS version 3 Linux was using OSS. The company developing it, 4Front Technology, chose in 2002 to make OSSv4 a proprietary software, then in 2007 they re-released it under GPL.
For that reason OSSv4 isn’t used as the default driver of any major OS these days, but can still be installed manually. However, not many applications have support for it and it might requires a translation layer.

In the meantime, Linux had switched to ALSA because of the license issues and the shortcomings of OSS, namely it couldn’t play multiple sounds simultaneously, allocating the sound device to one application at a time, and wasn’t very flexible.
Similarly, in the BSD world some chose to continue to extend OSS and some only got inspired by it to do something else. FreeBSD continued with its fork of OSS by reimplementing the API and drivers and improving it along the way. They added in-kernel resampling, mixing, an equalizer, surround sound, bit-perfect mode (no resampling or mixing), and independent volume control per application.
On the other side, NetBSD and OpenBSD chose to go with their own audio API that is Sun-like (SADA, Solaris Audio API aka devaudio), with an OSS compatibility mode to keep backward compatibility.
Solaris an OpenSolaris use a fork of OSSv4 called Boomer that combines OSS together with Sun’s earlier SADA API, similar to what OpenBSD does.

Due to this arduous history, it is not guaranteed that any of these OS will use a compatible OSS version or OSS layer.

Like ALSA, OSS audio subsystem provides playback, recording, controller, MIDI, and others. We’ve seen that these are mapped to special files by the driver, all starting with /dev/dsp*. /dev/sndstat can be used to list which driver controls which device.
Unlike ALSA where clients interact with PCM and CTL objects, in OSS there is a common API that is used to interact with the special files that were mapped on the filesystem. That means that developers will rely on a more Unix/POSIX like model using the common system calls like open, close, read, write, ioctl, select, poll, mmap, instead of custom library functions.

What these functions do depends on the OS and OSS version. For example, ioctl lets you interact with the generic device type features, as can be seen here. That gives rise to much simpler programs, check this example

One thing the FreeBSD audio frameworks support is the use of mmap to allow applications to directly map the audio buffer, and use ioctl to deal with head/tail synchronization. ALSA on Linux does the same.

Similarly to OSS, on OpenBSD and NetBSD, the 3 exposed devices /dev/audioN, /dev/audioctlN, and /dev/mixerN can be manipulated with read, write, and mostly ioctl. You can take a look in the man 4 audio to get an idea.

When it comes to configuring specific OS related functionalities such as how to mix sound, selecting the sample rates and bit rates, choosing the default output and input hardware, etc.. That depends entirely on the implementation of the particular operating system.

On FreeBSD, audio configurations are available by configuring kernel tunables via sysctl or set statically at boot. For example, dev.pcm.0 is the first instance of the pcm driver and hw.usb.uaudio is the usb audio hardware settings. You’ll known which one is which by consulting /dev/sndstat.
Setting the default sound device on FreeBSD:

sysctl hw.snd.default_unit=n

Where n is the device number.

You can also set the default value for the mixer:

sysctl hint.pcm.0.vol="50"

As you can notice the mixer is part of the pcm driver. This driver supports Volume Per Channel (VPC), that means you can control the volume of each application independently.

As for OSSv4, it offers configuration files for each driver, along with its own set of tools like ossinfo, ossmix, vmixctl, etc..
The configurations can be found under /usr/lib/oss/conf/, the $OSSLIBDIR. It contains audio config files for different drivers with their tunables. It can help set basic settings such as: virtual mixers, quality of sound, sample rate, etc.. You can consult the man page of each driver to check their settings man 7 oss_usb.
Note that OSSv4 isn’t as flexible and requires turning the sound on and off for the settings to take effect.

On OpenBSD and NetBSD, similarly to FreeBSD, settings are done through kernel tunable. The SADA-like system also has multiple tools such as mixerctl and audioctl to make it easier to interact with the audio driver variables.
For example, you can change the sample rate on the fly with:

audioctl -w play.sample_rate=11025

OpenBSD stays true to its security and privacy aspect by disabling recording by default, which can be re-enabled with:

$ sysctl
$ echo >> /etc/sysctl.conf

As you can see, multiple systems using different APIs and configurations isn’t practical and very limiting. That is why OpenBSD created a layer on top called sndio, a sound server that we’ll discover in a bit.

Overall, we’ve seen the idea of the OSS-like systems. They expose devices in the file system and let you interact with them through the usual Unix system calls. To configure these devices you have to use whatever way the OS gives you, mostly kernel tunables.
While kernel tunables don’t offer the crazy configurability that ALSA does, these audio stacks can still give more or less the same basic features by having them in the kernel. Like on FreeBSD, where the mixing of streams and volume per application control happens out of sight.
As with anything, the lower in the stack it is, the less flexible it is, but the more stable, efficient, and abstract it is.

complexity of system

Sound Servers

Sound servers are neither kernel modules nor drivers, but are daemons that run in user-space to provide additional functionalities. They are there to both raise the flexibility and to have a more abstract and common layer.

The operations a sound server allow range from transferring audio between machines, to resampling, changing the channel count, mixing sounds, caching, adding audio effects, etc.. Having these operations done in a modular way in a sound server is more advantageous than having them in the kernel. Moreover, having a sound server could mean having the same API for all audio regardless of the underlying kernel module or API used. So you won’t have to worry about the interoperability of running on OSS or ALSA or anything else.

There are many sound servers available, some are deprecated like aRts and ESD, others are in use such as sndio, JACK, and PulseAudio, and new ones are coming out and being pushed into distributions like PipeWire. Every one of these has different features, supports for all kinds of protocols, and run on multiple operating system flavors.


sndio is the default sound server on BSDs today. It is a small, compact, audio and MIDI framework and user-space server developed by OpenBSD. The server is so simple it doesn’t even need a configuration.

sndio’s main roles are to abstract the underlying audio driver and to be a single point of access instead of requiring each application to get raw access to the hardware.
Having a sound server solves the issue of the fracture between all OSS implementations. It creates a new standardized layer.

Let’s mention some of sndio features:

  • Change the sound encoding to overcome incompatibilities between software and hardware. (can change sample rate and bit rate)
  • Conversions, resampling, mixing, channel mapping.
  • Route the sound from one channel to another, join stereo or split mono.
  • Control the per-application playback volume as well as the master volume.
  • Monitor the sound being played, allowing one program to record what other programs play.
  • Use of ticking mechanism for synchronization (maintained after underruns, when buffer isn’t filled fast enough) and latency control
  • Support network connections

sndiod, the server, operates as follows: it creates a sub-device that audio programs connect to as if it was one of the device created by the driver (a real hardware). Thus, during playback or recording, sndiod will handle the audio streams and commands for all programs and take care of the result on the fly.
In sum, sndiod acts as a proxy while giving a similar interface as the kernel API on BSD (read, write, ioctl).

All programs connected to the same sub-device will be part of the same group, which gives a way to process and configure sound according to which sub-device is used.
These sub-devices are defined in a similar string format:


Which, as you can see, allows to connect to a remote host.

Here are some examples:

    Audio device of type snd referred by the first -f option passed
    to sndiod(8) server
    Sub-device of type snd registered with "-s rear" option
    Default audio or MIDI device.

The server sndiod doesn’t have any configuration files, so everything is passed on the command line as an argument. Here are a couple examples of how to do that.

Start the server with a 48kHz sample rate, 240 frame block size (fragment-size), and 2-block buffers (240*2) (buffer-size) (See previous Analog to Digital & Digital to Analog (ADC & DAC) for more info on what these mean), this creates a 10ms latency.

$ sndiod -r 48000 -b 480 -z 240

Start sndiod by creating the default sub-device with low volume (65) and an additional sub-device called max with high volume (127). These will map to snd/0 and snd/0.max respectively.

$ sndiod -v 65 -s default -v 127 -s max

This example create the default sub-device plus another sub-device that outputs to channels 2:3 only (the output speaker will depend on the card). These will map to snd/0 and snd/0.rear respectively.

$ sndiod -s default -c 2:3 -s rear

The sndioctl utility is a helper tool for audio device status and control through sndio. For example, you can change the output level and mute certain sub-devices.

$ sndioctl output.level=+0.1
$ sndioctl output.mute=1
$ sndioctl -f snd/0 output[0].level+=0.1

The commands passed on the right depend on the actual audio device.

sndio is not only available for BSD, it also has a backend for ALSA, so it can run on top of it.
It is generally well supported by major softwares like media players and web-browsers. However, if a program cannot interface with sndio there are ALSA plugins that provice a PCM that can connect to a sndiod server.

In this sections we’ve seen sndio, a very simple sound server that creates sub-devices on the filesystem for any type of audio device, output, input, midi, control, etc.. By arbitraging resources, it is able to calibrate sound streams to fit the hardware sampling and bit rate support. We’ve also seen how to start the server that has no config, and how to use sndioctl to interact with it.

aRts (analog Real time synthesizer) and ESD or ESounD (Enlightened Sound Daemon)

aRts and ESD or ESounD are two deprecated sound servers (audio daemons). Like all sound servers, they accept audio from applications and feed it to the hardware, while manipulating the stream format so that it fits it (resampling and others, you know the drill).

aRts was part of the KDE project and its main big cool feature was that it had a simulation of analog synthesizer.

EsounD was the sound server for the Enlightenment desktop and GNOME. It had similar functionality as any sound server but additionally it had two special cool features: desktop events sound handling, and a mechanism to pipeline audio and videos.

The different teams partnered to synchronize on their projects and have a single sound server. This split EsounD and aRts into pieces: the desktop events sound handling is now libcanberra (see the Libraries section), the pipeline of audio and video is now GStreamer (see the Libraries section), and the sound server was extracted unto PulseAudio (see next section on PulseAudio).



PulseAudio tends to trigger online flame wars, which are non-constructive.
Let’s see what PulseAudio is, what features it has, some examples of it, its design, the definition of its internal objects, the sinks and sources, how it represents things we’ve seen such as cards/devices/profiles, how it is configured, how the server and clients start, the modular mechanism, what some modules/plugins do, the compatibility with different protocols, the desktop integration, some of the common frontends, how supported it is, and how to temporarily suspend it.
There’s a long road ahead!

PulseAudio — What Is It?

PulseAudio is a sound server for POSIX OSes, like sndio and others, so its job is similar: abstracting the interaction with the lower layers, whichever backend that is, and offering flexibility in audio manipulation — a proxy for sound applications.
Additionally, PulseAudio’s main focus is toward desktop users, as it was primarily created to overcome the limitations of EsounD and aRts. It is heavily influence by Apple’s CoreAudio design.

So far, PulseAudio is mainly targeted at the Linux desktop but there exists ports to other operating systems such as FreeBSD, Solaris, Android, NetBSD, MacOS X, Windows 2000, and Windows XP. However, some of the features require integration with the system and so are platform-specific, particularly: timer-scheduler, hot-plug features, and bluetooth interaction. The features in PulseAudio come as “external components” or also called “plugins”, so these functionalities aren’t inherent to the core server.

Let’s have a look at a list of features that PulseAudio provides.

  • Extensible plugin architecture (a micro-kernel arch with dynamically loadable modules via dlopen)
  • A toolset to be able to manipulate the sound server on the fly
  • Support for multiple input and output streams
  • Flexible, implicit sample type conversion and resampling
  • Ability to fully synchronize multiple playback streams
  • Support interacting with audio streams of various protocols and backends, be them local or on a network
  • Per application independent volume control
  • Automatic management and setup of audio device, hotplug, via policies and restoration mechanism (mostly ALSA backend only)
  • Sound processing ability and creation of audio pipeline chains: custom modules, mixing, sample rate conversion, echo cancellation, etc..
  • Sample cache: in-memory storage for short sounds, useful for desktop events
  • Low and accurate latency behaviour: uses features such as clocking and rewinding to keep the buffer responsive and avoid glitches. This is done via a timer-based scheduler per-device. (See previous Analog to Digital & Digital to Analog (ADC & DAC) for more info on what these mean) (ALSA backend only)
  • Power saving: due to the use of default latency and timer-based scheduler per-device, there is no need to have a high number of interrupts. (ALSA backend only)
  • Other desktop integration: X11 bells, D-Bus integration, Media role, hardware control, GConf, etc..

In practice that allows to do things like:

  • Automatically setting up a USB headset when it’s connected, remembering the configuration it was in the last time it was used.
  • A GUI for controlling the sound of specific applications and deciding wether to move the audio stream from one device to another on the fly.
  • Dynamically adding sound processing and filters to a currently running application, such as noise cancellation.

Pulseaudio — Overall Design

PulseAudio Engine Layer

The PulseAudio server consists of 3 logical components:

  • A daemon: the piece that configures the core, loads the modules, and starts the main loop
  • A core: based on libpulsecore this is a building block and shared environment for modules.
  • Modules: dynamically loaded libraries to extend the functionality of the server, relying on the libpulsecore library.

Inside the PulseAudio server lives different types of objects that we can manipulate, understanding these objects means understanding how PulseAudio works:

  • format info
  • source
  • source output
  • sink
  • sink input
  • card
  • device port
  • module
  • client
  • sample cache entry

Pulseaudio — Sink, Sink Input, Source, and Source Input

The source output and sinks inputs are the most important concepts in PulseAudio. They are the representation of audio streams. A source device generates a stream that is read/receive to a source output, like a process generating sound or a capture device, and a sink device is written/sent to via a sink input, like a sound card, a server, or a process.
In sum, sinks are output devices and sources are inputs devices: a source will be read unto a “source output” stream and the “sink input” stream will write to the sink device.
There can be virtual devices and virtual streams, and the sink input and source output can be moved from one device to another on the fly. This is possible because of the rewinding feature, each stream having its own timer-scheduler.

Additionally, sink monitors are always associated with a sink and get written to when the sink device reads from its sink inputs.

sink and source

PulseAudio manages these stream in the ALSA backend using a timer-based scheduler to make them efficient for desktop. (See previous Analog to Digital & Digital to Analog (ADC & DAC) for more info on what these mean)
Every source, sink, source output, and sink input can have their own audio parameters, be it sample format, sample rate, channel map, etc.. The resampling is done on the fly and PulseAudio allows to select between different resamplers in its configuration and modules (speex, ffmpeg, src, sox, trivial,copy, peaks, etc..).
It’s good to note that when multiple sink inputs are connected to the same sink then they automatically get mixed.
Each of these components can have their own volume. A configuration called “flat volume” can be set so that the same volume will be used for all sink inputs connected to the same sink.

This flexible concept of streams and devices can be used effectively with a couple of modules that allow juggling with them. For example, the module-loopback forwards audio from a source to a sink, it’s a pair of source output and sink input with a queue in between. If you load it and the source is your microphone you’ll then be able to hear your voice as echo.

pactl load-module module-loopback
pactl unload-module module-loopback # to unload it

Another example is the module-null-source and module-null-sink which will drop data. As with other sinks it will have a monitor associated with it, thus you can convert a sink-input to source-output, basically turning the audio that was supposed to be written to a device back as a readable stream.
We’ll see more examples in the module section, but this is enough to whet your appetite.

You can use the tool pacmd to check each of these and the properties associated with the object:

pacmd list-sources
pacmd list-source-output
pacmd list-sinks
pacmd list-sink-inputs

For now, in the info shown from the above commands, you should at least understand a couple of the information shown such as the latency, the volume, the sample format, the number of channels, the resampling method, and others.

NB: the default source and sink are often abbreviated as @DEFAULT_SOURCE@ and @DEFAULT_SINK in PulseAudio configs and commands.

Another example, changing the volume via pactl of a sink by its index:

pactl set-sink-volume 0 +5%
pactl set-sink-mute 0 toggle

Pulseaudio — Internal Concepts: Cards, Card Profile, Device Port, Device

We got the idea of streams, but we still need to understand the mapping of actual devices into PulseAudio. This is represented by the concept of cards which could be any sound card or bluetooth device. A card has a card pofiles, device ports, and devices.

A card correspond to a mapping related to the driver in use. When using ALSA, this is the same card as an ALSA card.
For example:

aplay -l
card 2: LX3000 [Microsoft LifeChat LX-3000], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

pacmd list-sinks | grep card
  driver: <module-alsa-card.c>
  card: 5 <alsa_card.usb-C-Media_Electronics_Inc._Microsoft_LifeChat_LX-3000-00>
    alsa.card = "2"
    alsa.card_name = "Microsoft LifeChat LX-3000"
    alsa.long_card_name = "C-Media Electronics Inc. Microsoft LifeChat LX-3000 at usb-0000:00:12.0-4, full"

The card has profiles which are, when using ALSA, equivalent to the list of pcm objects of ALSA that are attached to the hardware, which you can list using aplay -L.
You can see the list of profiles discovered by PulseAudio by doing pacmd list-cards and checking the profiles and active profile sections. Note that only one profile can be active for a card at a time.

In practice, a profile is an opaque configuration set of a card, defining the backend-specific configuration of the card, the list of currently available device ports, devices, configurations, and audio parameters. These are all templatized by ALSA as we’ve seen before.
ALSA manages this when it’s the drivers, however PulseAudio doesn’t take the profiles as they are, it sometimes uses a mapping to bind them into a new format that will result in a sink or source being created for the card. This is done via a configuration found in /usr/share/alsa-card-profile and /usr/share/pulseaudio/alsa-mixer/. You can have the mapping of different ALSA objects to PulseAudio ones in there.

For example, I have the following for the MIXER interface, which tells PulseAudio I can use mute, and capture:


After finding the right profile we can now know what are the device ports. They correspond to the single input or output associated with the card, like a microphone or speaker. Multiple device ports may belong to a card.
Then device, be it a source or sink, is the representation of the currently active producer or consumer, that is a card+a device port. For example, playing audio as digital stereo output on a USB headset.

Another mapping is possible using ALSA UCM (Use Case Manager) to group cards, but we’ll skip over it. Use case management is used to abstract some of the object configuration like the MIXER (higher level management of CTL) so that you can play the same type of sounds together: notifications, media, video, VOIP, etc..

In summary, that gives the following relation between ALSA backend and PulseAudio objects.

  • The PulseAudio ALSA backend will automatically create the cards, card profiles, device ports, sources, and sinks.
  • A PulseAudio card is an ALSA card
  • A PulseAudio card profile is an ALSA configuration for a certain card, this will dictate the list of available device ports, sources, and sinks for PulseAudio. These can be mapped using configs in a dir.
  • A PulseAudio device port defines the active inputs and outputs for a card and other options, it’s the selection of one profile function.
  • Finally, the source and sink are associated with an ALSA device, a single pcm attached to the hardware. A source or sink get connected to a device port and that defines its parameters (sample rate, channels, etc..)

In general, whatever the backend, be it ALSA, OSS, or Bluetooth, PulseAudio’s goal is to find out what inputs and outputs are available and map them to device ports.

Pulseaudio — Everything Is A Module Thinking

As far as the PulseAudio server is concerned, it only manipulates its internal objects, provides an API, and doesn’t do anything else than host modules. Even the backends like ALSA are implemented as modules.
That gives rise to a sort of micro-kernel architecture where most of the functionalities in the server are implemented in modules, and there’s a module for everything. Most of what we’ve mentioned already is done via a module. Let’s still show a list of some of them.

  • Device drivers
  • Protocols
  • Audio routing
  • Saving information
  • Trivia like x11 bell
  • Volume control
  • Bluetooth
  • Filters and Processing

Some modules are autoloaded in the server like the module-native-protocol-unix, which is PulseAudio’s native protocol, and others are loaded on the fly.
Even the protocol to load modules and interact with the server via the command-line interface is a module in itself: module-cli-protocol-unix/tcp.
If you are interested in knowing about the ALSA backend integration, it is done by the module-alsa-card.

There’s really a lot of modules, the list of the ones that come with the PulseAudio server default installation can be found here and here. There are also many user contributed modules, which you can place on disk in your library path /usr/lib/pulse-<version>/modules/.

To list the currently loaded modules use the following:

pacmd list-modules

NB: It can be quite eery to see that PulseAudio has its own module mechanism while we’ve seen earlier that ALSA does a similar thing through its configuration. However, keep in mind that PulseAudio is relatively easier to use, can work on top of different backends, not only ALSA, and has a different concept when it comes to audio streams (source-output and sink-input).

Now let’s see how to load new modules and configure them.

Pulseaudio — Startup Process And Configuration

Before we see how to load modules into the server, we first need to check how to run the server.
The PulseAudio server can either run in system-wide mode or per-user basis. The latter is preferred as it is better for desktop integration because some modules use the graphical desktop. It is usually started during the setup of the user session, which is taken care of by the desktop environment autostart mechanism. These days, with the advent of the systemd framework project, PulseAudio is often launched as a user service.

$ systemctl --user status pulseaudio
● pulseaudio.service - Sound Service
     Loaded: loaded (/usr/lib/systemd/user/pulseaudio.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2021-02-06 14:36:22 EET; 19h ago
TriggeredBy: ● pulseaudio.socket
   Main PID: 2159374 (pulseaudio)
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/pulseaudio.service
             ├─2159374 /usr/bin/pulseaudio --daemonize=no --log-target=journal
             └─2159380 /usr/lib/pulse/gsettings-helper

Another way to start the PulseAudio server is to not start it all. That’s surprising, but with the default configuration clients will autospawn the server if they see that it’s not running.
The reverse is also true, there is a configuration to autoexit when no clients have used the server for a certain period.

This is how clients start:

  • Initialization: finding the server address from somewhere (environment variable, X11 root window, per-user and system-wide client conf files)
  • connect: depending on the protocol used (native, tcp localhost, or remote)
  • autospawn: if enable spawn a server automatically
  • authenticate: using cookies found somewhere (environment variable, X11 root window, explicit, per user or system wide conf, per-use home dir)

The server starts by reading the server configurations, and then loading the modules found in the configuration associated with its running stance (system mode or per-user).
The server configurations are found by first looking in the home directory ~/.config/pulse, and if not found then by looking in the system-wide config in /etc/pulse. The directory will contain the following configuration files: daemon.conf and client.conf.

daemon.conf: contains the settings related to the server itself, things like the base sample rate to be used by modules that will automatically do resampling, the realtime scheduling options, the cpu limitation, if flat-volume will be used or not, the fragment size, latency, etc.. These cannot be changed at runtime.
You can consult pulse-daemon.conf(5) manpage for more info.

client.conf, this is the file that will be read by clients, which we mentioned above. It contains runtime options for individual clients.
See pulse-client.conf(5) manpage for more info on this one. and are the per-user and system-wide startup scripts to load and configure modules. Once the server has finished initializing, it will read and load the modules from this file.
You can also load and manipulate the modules using tools such as pactl and pacmd, see pulse-cli-syntax(5) manpage for more info.
The .conf files are simple key-value formatted files while the .pa are real command scripts following the CLI protocol format of PulseAudio.


load-sample-lazy x11-bell /usr/share/sounds/freedesktop/stereo/bell.oga
load-module module-x11-bell sample=x11-bell

When it comes to realtime scheduling, you can either integrate PulseAudio by giving it priority at the OS level, or you can rely on its integration with RealtimeKit (rtkit), which is a D-Bus service that changes the scheduling policy on the fly.
Realtime policy will be applied to all sink and source threads so that timer-based scheduling have lower latency. This is important if you want to play audio in bit-perfect mode, that is about not applying any resampling or mixing to the audio but playing it directly as is.

Pulseaudio — Interesting Modules And Features

Let’s now have a look at a couple features and modules. We can’t list them all as there are so many but let’s try to do a roundup of the most interesting ones.

PulseAudio can talk over many protocols by using different plugins, that includes:

  • Native protocols over different transport (fd, unix, tcp)
  • mDNS (Zeroconf)
  • RAOP
  • HTTP
  • DLNA and Chromecast (Digital Living Network Alliance)
  • ESound

It also offers control protocols to manage the server itself and audio streams:

  • D-Bus API
  • CLI protocol

There are many modules for post-processing and effects because it’s easy to create a chain of sound. Though only two types of connections are allowed: source output are connected to source, and sink input to sink. That means you’ll sometimes need to create indirect adapters to have the scenario you want.
If you need more advanced chains you are probably better off going to another sound server that specializes in these like JACK.

PulseEffects is a popular software to add audio effects to stream, but note that it is getting deprecated in favor of PipeWire.
The LADSPA plugin called module-ladspa-sink allows to load the audio processing effects in the common format we’ve seen, and apply them to a sink.
There are a couple different equalizers such as the integrated one and others like prettyeq. An equalizer works by becoming the default output/sink.
There are also noise cancellation filters such as the builtin one module-echo-cancel and NoiseTorch

Some cool desktop integration features:

  • sample cache, basically loading a small sound sample in the server and cache it (pacmd list-samples)
  • multimedia buttons
  • publish address on X11 root window properties
  • x11 bell integration, using XKB bell events and played from sample cache
  • Use of GNOME registry to load modules instead of .pa configuration files.
  • Hotplug based on udev/jackbus/coreaudio/bluetooth/bluez5/bluez4 so that cards are automatically detected.

Let’s go over two less known but cool features of PulseAudio: The restoration DB and the routing process.

PulseAudio keeps track and restores the parameters used for cards, devices, and streams. When a new object appears the server tries to restore the previous configuration and might move the streams to another device based on what it has seen before.
This neat automatic setup is done via an embedded db, which the user can choose the format. This is done via the following plugins: module-device-restore, module-stream-restore, and module-card-restore. You’ll find the files ending in *.tdb in your ~/.config/pulse/ if you are using the default configuration. (You can use tdbtool to inspect them if you’re interested)

The decision regarding this automatic setup is influence by the media role and other properties associated with the stream and device such as application id and name.
This information is set programmatically on the stream when communicating with the server, for example a stream can be set as video, music, game, event, phone, animation, etc.. (Sort of like the use case scenario)
So based on this information in the restore db, the routing will select which source or sink is best for a new stream.

The actual algorithm, that takes care of this isn’t obvious, I advise looking at these two flow charts for more details.

initial route new device route

This can be confusing if you are trying to set a default device because the default device is only used as fallback when the restore db is in place.

Pulseaudio — Tools

There are many tools that can be used to interface with PulseAudio, some are full front-ends and some are more specific.

We’ve seen pacmd and pactl which both are used to reconfigure the server at runtime.
paplay, parec, pacat, pamon, and others are mini-tools used to test features of PulseAudio.
There are GUIs like pamixer, paprefs (useful to setup simultaneous output), pavucontrol.
There are TUI like pulsemixer.

Longer lists can be found here for GUI and here for CLI.

Pulseaudio — Suspending

Sometimes it is useful to temporarily suspend PulseAudio. The utility pasuspender has this purpose. It is especially useful when running JACK in parallel with PulseAudio.

Another way is to use the D-Bus reservation API to allocate a card to a certain application. This can be done more easily when you include the module for JACK within PulseAudio.



JACK is a sound server, just like sndio, ESD, PulseAudio, and others, but is designed for professional audio hardware and software. JACK can run on top of many drivers, including on top and along PulseAudio. While PulseAudio is designed for consumer audio for desktop and mobile, JACK is made for music production.

JACK recursive acronym stands for JACK Audio Connection Kit, and as the name implies it specializes in connecting multiple hardware and virtual streams together. As you remember, this was not so simple in PulseAudio. It allows to setup real-time, low-latency connections between streams, and ease the configuration regarding parameters like buffer size, sample rate, fragment size, and others.

The biggest strength of JACK is its integration with professional tooling and its emphasis on MIDI and professional hardware. In a professional environment you often have to deal with multiple devices such as mixers, turntables, microphones, speakers, synthesizer, etc.. The graphical interfaces that come around the JACK server allow to handle this easily but you have to be knowledgeable in the audio world to understand the complex configuration.

JACK also has support for specific professional hardware drivers, like a FireWire driver (IEEE 1394) that PulseAudio doesn’t have.


JACK frontends and software using it are really where it shines, there are so many interfaces and GUIs for professionals. The most widely used software to configure it being qjackctl, a connection manager making links between streams and devices. That’s because JACK separates the concerns: one part is about managing the connections, in a graph-like fashion, and the other part is only concerned with passing the audio stream around. This is especially important when there’s a lot of equipment and should be easily doable via a GUI.
Let’s mention some professional audio engineer software:

  • mpk - Virtual MIDI Piano Keyboard
  • Cadence - A studio with multiple sub-tools like Cadence and Claudia
  • Patchage - visual connection manager
  • Catia - anoter visual connection manager
  • ardour
  • Qtractor
  • Carla
  • QASMixer
  • bitwig studio
  • drumstick
  • QSynth
  • Helm
  • Calf Studio Gear
  • LMMS

Here is a longer list.

There are also a bunch of specialized Linux distributions:

The audio domain, for sound engineers and musicians, is gigantic and it’s not really my place to talk about it, so I’ll keep it at that.


PipeWire is a relatively new sound-server but is also much more. It not only handles audio streams but video streams too, it is meant as a generic multimedia processing graph server.
On the audio side, it will try to fill the need of both desktop users, like PulseAudio, and professional audio engineers, like JACK.

Initially the project was supposed to be named PulseVideo and do a similar job to PulseAudio but for Video. The rational to handle all multimedia streams together is that it makes no sense to handle video streams without the audio counter-part to sync them together.

The project was started by GStreamer’s creator. The library, as you may remember, already handles audio and video streams.
So in sum, that creates an equation that includes GStreamer+PulseAudio+PulseVideo+JACK-like-graph. The integration with GStreamer means that applications that already use it will automatically be able to interoperate with PipeWire.

The project is still in early development, not so stable, and the server side currently only supports video and has integration layers with PulseAudio and JACK. Otherwise, it can interfaces with ALSA clients directly, through a new ALSA pcm device that redirects to PipeWire (Like PulseAudio does).

Some of the new goals of PipeWire is that it will give access to media to sandboxed Flatpack applications and allow Wayland compositors to access media streams securely via a mechanism like PolKit for granular access control. This is called a policy/session manager.

PipeWire takes ideas from multiple places. In itself it only cares about creating a graph of nodes that will process and move audio streams around via IPC. Meanwhile, another process, like with JACK, will take care of managing the connections, device discovery, and policies.
So you get a division of roles, a processing media graph and a policy/session manager.

The innovation lies in this graph model taken from JACK, combined with integration of policy management and desktop usage. Each node in the graphs has its own buffer mechanism and dynamic latency handling. That leads to a lower CPU usage because of the timer-based scheduling model that wakes up nodes only when they need to and can dynamically adapt buffers depending on the latency needed.

What’s all this talk about nodes and graphs about, what does this actually mean?

pipewire and WirePlumber pipewire

To understand this we have to get 2 concepts: the PipeWire media graphs and the session management graphs.

PipeWire is a media stream exchange framework, it embodies this concept through the media graph that is composed of nodes that have ports which are connected through directed links. The media streams flows from node to nodes passing by their ports via their links to reach the port of another node.
A node is anything that can process media, that either consumes it or produces it. Each node has its own buffer of data and personal preferences and properties such as media class, sample rate, bit rate, format, and latency. These nodes can be anywhere, not limited to inside the PipeWire daemon, but can also be external inside clients. That means the processing of streams can be delegated to other software and passed around from node to node without PipeWire touching the stream. Nodes can be applications, real devices, virtual ones, filters, recording application, echo-cancellation, anything that dabbles with media streams.
To interact with the rest of the system, a node can have ports, which are interfaces for input (sink) or output (source).
A link is a connection between 2 ports, one as the source the other as the sink.

So far that’s a very generic concept, nodes that handle media and have ports that can generate media stream or consume it.

Whenever some media needs to be handled by a node, they are woken up by a timer-scheduler mechanism. The connected nodes form a graph, and one of these nodes usually “drives” the graph by starting the processing for all other nodes joined in it. This timer dynamically manages the desired latency for each node to negotiate the most appropriate one within a range.
When two nodes in a graph need to communicate with one another, they have to negotiate a common preferred format and minimum latency based on their buffer size. That’s why the nodes are normally wrapped in an adapter that will automatically do the conversion (sample rate, sample format, channel conversion, mixing, volume control). This is all done dynamically which is good for desktop usage, but not so much for pro-audio.
More info about the buffer model here

In practice, that gives a graph that looks like the following, dumped using pw-dot(1) utility:


So far so good, we have nodes that exchange streams, a graph processing media, but how do we get these nodes in there in the first place and how do we choose which one links to which one? We need a way to attach these nodes, decide what is connected to what. Like when a client attaches and asks to play an audio stream, how is that handled?
That’s where the second piece of the PipeWire equation comes in: the session connector and policy controller along with its session management graph.

This piece of software is external to PipeWire, it can have multiple implementations. The default one that comes with the installation is called pipewire-media-session, another one that is still a work-in-progress is called wireplumber. There are talks about including it in desktop environment session managers such as gnome-session-daemon.

The role of this software is to keep track of the devices available, their priorities, keeping track of which application uses which device, ensuring policy control and security, keep a device and stream restoration database (not implemented), share global properties of the system, find the default endpoints used by clients when asked to play sound, etc..

When clients connect to PipeWire they announce their session information, what they’d like to do (playback or capture), the type of media they want to handle (video, playback, VOIP, etc..), their preferred latency and sample rate, etc..
Based on this information, the session manager can shortlist which device the client needs.

As soon as the client connects, if its session information is new (PID,GID,UID), it will first be frozen until its permissions are acknowledged. PipeWire default session manager pipewire-media-session comes with a series of modules that take care of this called module-portal and module-access. The portal module will, through desktop integration (via D-Bus, like PolKit), open a pop-up to ask for user confirmation (read,write,execute). After that, the session manager will configure the client permissions in its client object.
So clients are not able to list other nodes or connect to them until the session manager approves.

The session manager can then choose to restore connections based on previous usage of this stream — decide how to connect it, make sure it’s linked to the appropriate device and follows the peering rules for its use case (depending on media type). Then this new node can get connected and configured in the media graph.

That is as far as clients are concerned, however, PipeWire doesn’t open any device by default either, and it is also the role of the session manager to load devices, configure them, and map them on the media graph.

To achieve this flexibly, some session managers can use what is called a session management graph. In practice, this is the equivalent to how PulseAudio manages devices through the concept of cards and profiles that can create sink and source nodes but with the extra addition of routing based on use case. Internally, the session manager actually reuses the related PulseAudio code and config for device management.

The session management graph is a representation of this, the high-level media flow from the point of view of the session manager. As far as I can see, these graphs are hosted within PipeWire along other objects but they have different types.

+---------------------+                                +----------------------+
|                     |                                |                      |
|            +----------------+  Endpoint Link  +----------------+            |
|  Endpoint  |Endpoint Stream |-----------------|Endpoint Stream |  Endpoint  |
|            +----------------+                 +----------------+            |
|                     |                                |                      |
+---------------------+                                +----------------------+

Endpoints are where media can be routed to or from (laptop speaker, USB webcam, Bluetooth headset mic, amplifier, radio, etc..). They then get mapped to nodes in the media graph, but not always.
They can be mutually exclusive, this is the equivalent of device ports, which as you remember correspond to a single input or output associated with the card. So the Endpoint is a card+a device port in theory.

The Endpoint Stream are the logical path, the routing associated with a use case (Music, Voice, Emergency, Media, etc..). They are equivalent to PulseAudio sink/source on the device side, and sink-input/source-output on the client side.
These can be used to change the routing in the media graph.

The Endpoint Link is what connects and creates the media flow, it can only exist if there are actual links in the media graph or if the link exists physically (real hardware connection).

The session manager is then responsible of knowing which devices are present, what they support, what kind of linking information is there, and if streams need compatibility between them, and share that information if needed.

Additionally, internally the session manager can put on the graph objects of type Device which map to the ALSA cards, JACK clients, or others. Like cards in PulseAudio.

Now let’s see how to configure PipeWire and its session/policy manager.

When the PipeWire daemon starts it reads the config file located at $PIPEWIRE_CONFIG_FILE, normally /etc/pipewire/pipewire.conf. It contains sections, some to set server values, some to load plugins and modules, some to create objects, and some to automatically launch programs.
The goal of this configuration file is to make it easy to configure how the processing happens in the media graph.

The execution section is normally used to automatically launch the session manager.

There are configurations related to how the graph will be scheduled such as the global sample rate used by the processing pipeline, which all signal will be converted to: default.clock.rate. The resampling quality can be configured server side too (even though the node are wrapped in an adapter that does that) in case it needs to be done. This resampler is a custom highly optimized one. Moreover, you can control the buffer size minimum and maximum value through a min-quantum, max-quantum, and default quantum, which are going to be used to dynamically change the latency.

default.clock.quantum =		1024
default.clock.min-quantum =	32
default.clock.max-quantum =	8192

NB: PipeWire relies on plugins that follow the SPA, Simple Plugin API, based on GStreamer plugin mechanism but lighter.
Most of them can be found in /usr/lib/spa-<version>.

Now as far as the session manager is concerned, it highly depends on the implementation. It is about modules for policy and matching rules to associate them with specific actions to do in the media graph or the session management graph. Monitor subsystems watch when a new device or stream appear (new node), or when the system creates a new object, and decides what to do with it based on the endpoint configuration.

For example, I have the following rule for WirePlumber in 00-default-output-audio.endpoint-link:

media_class = "Stream/Output/Audio"

media_class = "Audio/Sink"

Which will attach a new endpoint of class “Stream/Output/Audio” to the default endpoint with class “Audio/Sink”.

However, this all depends on the session manager implementation.

At this point it’s easy to picture that this system would be fantastic to create filters and effects streams, however currently this is still very hard to do. So far, the only way to achieve this is with the help of PulseAudio tools such as pactl.

You can create sinks and sources with specific media classes so that they map within PipeWire.
For example:

pactl load-module module-null-sink object.linger=1 media.class=Audio/Sink sink_name=my-sink channel_map=surround-51

pw-cli can also be used instead:

pw-cli create-node adapter { media.class=Audio/Duplex object.linger=1 audio.position=FL,FR }

It remains that PipeWire is missing the interface toolset to easily interact with it. There aren’t any good sound configuration tool that permits to inspect and manipulate it so far. Moreover, the ones that will have to do this will need to be able to portray the internal connection mechanism, similar to JACK’s many connection managers.

I quote:

There is currently no native graphical tool to inspect the PipeWire graph but we recommend to use one of the excellent JACK tools, such as Carla, catia, qjackctl, … You will not be able to see all features like the video ports but it is a good start.

PipeWire comes with a set of mini debug tools similar to what PulseAudio provides, they start with the pw-* prefix:

  • pw-cli - The PipeWire Command Line Interface
  • pw-dump - Dump objects inside PipeWire
  • pw-dot - The PipeWire dot graph dump in graphviz format
  • pw-mon - The PipeWire monitor
  • pw-cat - Play an Record media with PipeWire
  • pw-play - Like pw-cat but for play only
  • pw-metadata - The PipeWire metadata
  • pw-profiler - The PipeWire profiler
  • pw-top - Acts like top but for devices nodes inside PipeWire

The most useful tools are pw-cli, pw-dump and pw-dot.


pw-cli info 0

Here’s an extract from pw-dump showing an Endpoint of class “Audio/Source”, a microphone on the boring headset you’ve encountered in this article.

    "id": 53,
    "type": "PipeWire:Interface:Endpoint",
    "version": 0,
    "permissions": [ "r", "w", "x", "m" ],
    "props": {
      "": "alsa_card.usb-C-Media_Electronics_Inc._Microsoft_LifeChat_LX-3000-00.capture.0.0",
      "media.class": "Audio/Source",
      "": 75,
      "": 39,
      "": 25

Overall, PipeWire is an interesting sound server, combining a media processing graphs framework along with an external policy/session/connection manager that controls it. The timer and dynamic latency mechanism should have a significant effect on CPU usage.

Unfortunately, after testing it you can clearly see that it is still in its early stage but that it integrates well on the audio part through the backward compatibility with PulseAudio.
Additionally, it remains to be seen if the tooling around it will adapt properly to the graph thinking. Will they build around the concept or dismiss it entirely considering most desktop tools today aren’t used to long sequence of media processing, and neither are users.
Finally, on the session/connection manager side we need more innovation. What is currently available seems to be lacking. I couldn’t find much documentation about the restoration DB mechanism, hot-plug, desktop integration, caching of sample sounds for events, and others.

It’s The Same On All OS - Conclusion

Anybody who claims one system offers better audio “quality” is just plain wrong and base their assumption on something non-scientifically proven.

All the low-level stacks are relatively the same speed when running in bit-perfect mode. The big differences between all that we’ve seen relates to the driver support, the ease of use, the desktop integration, and the buffer/latency management.
Some systems are targeted at end users and others at audio engineers.

According to measurements from A Look at Linux Audio (ALSA, PulseAudio) for instance, ALSA performs very well on Linux and keeps up with a Windows machine that is much more performant. Tests with PulseAudio are similar but use 6% more CPU processing.

Whether in the past with Mac OS X, or Windows, and now Linux, there is no evidence that operating systems make any difference to sound quality if you’re playing “bit perfect” to the hardware directly (ie. ALSA to DAC with no software conversion similar to Windows ASIO, Kernel Streaming or WASAPI)

The discussion then rotates around low-latency using real-time scheduling, better IO, using better sampling size, etc.. (See Analog to Digital & Digital to Analog (ADC & DAC) for more info on what these mean).

The audio stack is fragmented on all operating systems because the problem is a large one. For example, on Windows the audio APIs being ASIO, DirectSound and WASAPI.
Perhaps MacOs has the cleanest audio stack, CoreAudio, but nobody can clearly say if they can’t look at the code. PulseAudio was inspired by it.
The stack of commercial operating systems are not actually better or simpler.

Meanwhile, the BSD stack is definitely the simplest, even though there are discrepancies between the lowest layers and lack of driver support, sndio makes it a breeze.

Linux is the platform of choice for audio and acoustic research and was chosen by the CCRMA (Center for Computer Research in Music and Acoustics).

Let’s conclude, we’ve seen basic concepts about audio such as the typical hardware component, how audio is transferred and converted from the real world to the digital world through digital-to-analog-converters. We’ve seen the issue about buffering and fragment size. Then we’ve taken a look at different libraries that can act as translation layers, as processing helper, or as standard format to write filters. After that we went through the drivers: ALSA and OSS, the crazy configuration format and plugins of ALSA and the internal concepts it has to map devices. On the OSS and SADA side, we’ve seen the historical fracture and how things could be done mainly hidden away inside the kernel via tunable to not freak out the users. Finally, we’ve attacked sound servers, from sndio, to the deprecated aRts and ESD, to PulseAudio, to JACK, and lastly PipeWire. Each of them has its specialty, sndio being extremely simple, PulseAudio being great for the desktop integration use case, JACK catering to the audio engineers having too much equipment to connect together and having superb integration with professional tools, and PipeWire that is getting inspired by JACK’s graphs but wants to take it a step further by including video streams, integrating with the desktop and making things more snappy with wake-up node driving the processing graph.


That’s it, I hope you learned a thing or two in this post. Let me know what you think!










It’s all the same:

Gokberk Yaltirakli (gkbrk)

Drawing on the spectrum February 06, 2021 09:00 PM

Software Defined Radio software usually comes with a waterfall view (spectrogram) that lets the user quickly inspect the spectrum. The spectrogram plots the amplitude of frequencies over time. This means, by carefully outputting a signal consisting of multiple frequencies, we can draw shapes and pictures on the spectrogram.

A common NFM walkie-talkie is too limited to do this, but a Software Defined Radio that can transmit arbitrary I/Q samples will do the job perfectly. Fortunately I have a hackrf-one at hand, so I gave this a try.

In order to transmit from the HackRF, I will be using the hackrf_transfer command. This means all I’ll need to do in my modulator is to output I/Q samples to stdout. Let’s make a quick helper method to do this.

Writing samples

Traditionally, DSP samples are kept between -1 and 1, so we will be using this format internally. In order to give them to hackrf_transfer, we need to encode them as a signed 8-bit integer. The format accepted by the program is alternating 8-bit signed I and Q samples.

import struct, os

dsp = os.fdopen(1, 'wb')

def write(i, q):
    i = int(i * 127)
    q = int(q * 127)
    data = struct.pack('bb', i, q)


Let’s also define some constants; such as the output sample rate, the maximum frequency deviation, and how long it should take to transmit the image. The frequency deviation determines how wide our signal will be on the spectrum and the transmission time will determine the height. You should play around with these values until you can get a clear image.

RATE = 2_000_000 # 4M sample rate
TRANSMIT_TIME = 2 # 2 Seconds
FREQ_DEV = 15_000 # 15 KHz

Loading the image

With the configuration out of the way, we are now ready to produce the samples. The first thing we need to do is to read an image file. To do this, I will be using the Pillow library. Let’s get the image file path from the command line arguments, load the image and convert it to a black and white bitmap.

from PIL import Image
import sys

im =[1])
im.convert('1') # 1 means a 1-bit image

Outputting the image

We need to output the image bottom-to-top because the spectrogram will put the signals received earlier at the bottom, as it scrolls like a waterfall.

t = 0

for y in range(im.height)[::-1]:
    target = t + TRANSMIT_TIME / im.height
    while t < target:
        # Output line...

Every line, we pick a target time. We will be outputting samples for the current line until we reach target. Each line gets TRANSMIT TIMEIMAGE HEIGHT \frac{\text{TRANSMIT TIME}}{\text{IMAGE HEIGHT}} IMAGE HEIGHTTRANSMIT TIME seconds.

First of all, let’s cache the pixels of the current line since Python is not very fast.

line = [im.getpixel((x, y)) for x in range(im.width)]

When we are outputting the line, we’ll pretend that each pixel of the image is a frequency in out output. So for an image with the width of 300 and frequency deviation of 5000 Hz; x=0x = 0x=0 is offset by 0 Hz, x=150x = 150x=150 is offset by 2500 Hz and x=299x = 299x=299 is offset by 5000 Hz.

Using the mapping we described above, let’s accumulate I and Q values for all the pixels.

i = 0
q = 0

for x, pix in enumerate(line):
    if not pix:
    offs = x / im.width
    offs *= FREQ_DEV
    i += math.cos(2 * math.pi * offs * t) * 0.01
    q += math.sin(2 * math.pi * offs * t) * 0.01

write(i, q)
t += 1.0 / RATE

We can represent a wave of a particular frequency in time using the well-known formula 2π⋅freq⋅time2\pi \cdot \text{freq} \cdot \text{time}2πfreqtime. Since I is the cosine of the value and Q is the sine, our final values become sin⁡(2π⋅f⋅t)\sin \left( 2\pi \cdot f \cdot t \right)sin(2πft) and cos⁡(2π⋅f⋅t)\cos \left( 2\pi \cdot f \cdot t \right)cos(2πft).

We don’t output anything for lines where the pixel value is 0. We multiply the signals we add to I and Q (i.e. dampen them) by 0.1 in order to prevent the signal from excessive clipping. This approach actually has some downsides, as the signal might still clip for certain images, but for a short demo where we can pick the images and change the dampening factors it won’t be a problem.

Now let’s combine the code snippets so far and try to render a signal. I recommend not transmitting this in real-time as Python is slow, and using PyPy as Python is slow.

$ pypy3 ./ btc.png > btc.raw
... Wait a lot
$ hackrf_transfer -f 433000000 -t btc.raw -s 4000000 -a 1


Here’s a video of what our signal looks like on gqrx.

Your device does not support video playback.


Here’s the full code, if you want to try this on your own.

#!/usr/bin/env python3
import struct
import os
from PIL import Image
import sys
import math

dsp = os.fdopen(1, "wb")

def write(i, q):
    i = int(i * 127)
    q = int(q * 127)
    data = struct.pack("bb", i, q)

RATE = 4_000_000  # 4M sample rate
TRANSMIT_TIME = 2  # 2 Seconds
FREQ_DEV = 15_000  # 15 KHz

im =[1])
im.convert("1")  # 1 means 1-bit image

t = 0

for y in range(im.height)[::-1]:
    target = t + TRANSMIT_TIME / im.height

    line = [im.getpixel((x, y)) for x in range(im.width)]
    while t < target:
        i = 0
        q = 0

        for x, pix in enumerate(line):
            if not pix:
            offs = x / im.width
            offs *= FREQ_DEV
            i += math.cos(2 * math.pi * offs * t) * 0.01
            q += math.sin(2 * math.pi * offs * t) * 0.01
        write(i, q)
        t += 1.0 / RATE

February 03, 2021

Maxwell Bernstein (tekknolagi)

Inline caching: quickening February 03, 2021 12:00 AM

In my last post I discussed inline caching as a technique for runtime optimization. I ended the post with some extensions to the basic technique, like quickening. If you have not read the previous post, I recommend it. This post will make many references to it.

Quickening involves bytecode rewriting — self modifying code — to remove some branches and indirection in the common path. Stefan Brunthaler writes about it in his papers Efficient Interpretation using Quickening and Inline Caching Meets Quickening.

The problem

Let’s take a look at a fragment of the caching interpreter from the last post so we can talk about the problem more concretely. You can also get the sources from the repo and open interpreter.c in your preferred editor.

void add_update_cache(Frame* frame, Object* left, Object* right) {
  Method method = lookup_method(object_type(left), kAdd);
  cache_at_put(frame, object_type(left), method);
  Object* result = (*method)(left, right);
  push(frame, result);

void eval_code_cached(Frame* frame) {
  // ...
  while (true) {
    // ...
    switch (op) {
      // ...
      case ADD: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        CachedValue cached = cache_at(frame);
        Method method = cached.value;
        if (method == NULL || cached.key != object_type(left)) {
          add_update_cache(frame, left, right);
        Object* result = (*method)(left, right);
        push(frame, result);
      // ...
    frame->pc += kBytecodeSize;

As I also mentioned last post, the ADD opcode handler has three cases to handle:

  1. Cache is empty
  2. Cache has the wrong key
  3. Cache has the right key

Since Deutsch & Schiffman found that types don’t vary that much, the third case is the fast path case. This means that we should do as little as possible in that case. And right now, we’re doing too much work.

Why should we have to check if the cache slot is empty if in the fast path it shouldn’t be? And why should we then have to make an indirect call? On some CPUs, indirect calls are much slower than direct calls. And this assumes the compiler generates a call instruction — it’s very possible that a compiler would decide to inline the direct call.

Quickening is a technique that reduces the number of checks by explitly marking state transitions in the bytecode.

Removing the empty check

In order to remove one of the checks — the method == NULL check — we can add a new opcode, ADD_CACHED. The ADD_CACHED opcode can skip the check because our interpreter will maintain the following invariant:

Invariant: The opcode ADD_CACHED will appear in the bytecode stream if and only if there is an entry in the cache at that opcode.

After ADD adds something to the cache, it can rewrite itself to ADD_CACHED. This way, the next time around, we have satisfied the invariant.


Let’s see how that looks:

void eval_code_quickening(Frame* frame) {
  // ...
  while (true) {
    // ...
    switch (op) {
      // ...
      case ADD: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        add_update_cache(frame, left, right);
        code->bytecode[frame->pc] = ADD_CACHED;
      case ADD_CACHED: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        CachedValue cached = cache_at(frame);
        if (cached.key != object_type(left)) {
          add_update_cache(frame, left, right);
        Method method = cached.value;
        Object* result = (*method)(left, right);
        push(frame, result);
    // ...
    frame->pc += kBytecodeSize;

Not too different. We’ve shuffled the code around a little bit but overall it looks fairly similar. We still get to share some code in add_update_cache, so there isn’t too much duplication, either.

Now that we’ve moved the empty check, it’s time to remove the indirect call.

Removing the indirect call

Let’s assume for a minute that you, the writer of a language runtime, know that most of the time, when people write a + b, the operation refers to integer addition.

Not many other primitive types implement addition. Frequently floating point numbers use the same operator (though languages like OCaml do not). Maybe strings. And maybe your language allows for overloading the plus operator. But most people don’t do that. They add numbers.

In that case, you want to remove as much of the overhead as possible for adding two numbers. So let’s introduce a new opcode, ADD_INT that is specialized for integer addition.

In an ideal world, we would just be able to pop two objects, add them, and move on. But in our current reality, we still have to deal with the possibility of programmers passing in a non-integer every once in a while.

So first, we check if the types match. If they don’t, we populate the cache and transition to ADD_CACHED. I’ll get to why we do that in a moment.

And if we did actually get an int, great, we call this new function do_add_int.

void do_add_int(Frame* frame, Object* left, Object* right) {
  Object* result = int_add(left, right);
  push(frame, result);

void eval_code_quickening(Frame* frame) {
  // ...
  while (true) {
    // ...
    switch (op) {
      // ...
      case ADD_INT: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        if (object_type(left) != kInt) {
          add_update_cache(frame, left, right);
          code->bytecode[frame->pc] = ADD_CACHED;
        do_add_int(frame, left, right);
    // ...
    frame->pc += kBytecodeSize;

This is a nice opcode handler for ADD_INT, but right now it’s orphaned. Some opcode has to take the leap and rewrite itself to ADD_INT, otherwise it’ll never get run.

I suggest we make ADD do the transition. This keeps ADD_CACHED fast for other types. If ADD observes that the left hand side of the operation is an integer, it’ll call do_add_int and rewrite itself.


Let’s see how that looks in code.

void eval_code_quickening(Frame* frame) {
  // ...
  while (true) {
    // ...
    switch (op) {
      // ...
      case ADD: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        if (object_type(left) == kInt) {
          do_add_int(frame, left, right);
          code->bytecode[frame->pc] = ADD_INT;
        add_update_cache(frame, left, right);
        code->bytecode[frame->pc] = ADD_CACHED;
    // ...
    frame->pc += kBytecodeSize;

Back to “why transition from ADD_INT to ADD_CACHED”. Two thoughts:

  1. We could transition back to ADD. In that case, this code would perform poorly in an environment where the programmer passes multiple different types at this opcode. There would be a lot of bytecode rewriting overhead going on as it goes back and forth between ADD and ADD_INT.

  2. We could also assume it’s a hiccup and not rewrite. This would perform poorly if the first time the argument is an integer, but something else every subsequent operation. There would be a lot of lookup_method calls.

A great extension here would be to add a polymorphic cache. Those are designed to efficiently handle a small (less than five, normally) amount of repeated types at a given point.

Why is this faster?

Even if we leave the interpreter in this state, a small C bytecode interpreter, we save a couple of instructions and some call overhead in the fast path of integer addition. This is a decent win for math-heavy applications.

In the best case, though, we save a great deal of instructions. It’s entirely possible that the compiler will optimize the entire body of ADD_INT to something like:

pop rax
pop rcx
cmp rax, $IntTag
jne slow_path
add rcx
push rcx
jmp next_opcode
; ...

It won’t look exactly like that, due to our object representation and because our push/pop functions do not operate on the C call stack, but it will be a little closer than before. But what if we could fix these issues and trim down the code even further?

Then we might have something like the Dart intermediate implementation of addition for small integers on x86-64. The following C++ code emits assembly for a specialized small integer handler:

void CheckedSmiOpInstr::EmitNativeCode(FlowGraphCompiler* compiler) {
  // ...
  Register left = locs()->in(0).reg();
  Register right = locs()->in(1).reg();
  // Check both left and right are small integers
  __ movq(TMP, left);
  __ orq(TMP, right);
  __ testq(TMP, compiler::Immediate(kSmiTagMask));
  __ j(NOT_ZERO, slow_path->entry_label());
  Register result = locs()->out(0).reg();
  __ movq(result, left);
  __ addq(result, right);
  // ...

This example is a little bit different since it is using an optimizing compiler and assumes the input and output are both in registers, but still expresses the main ideas.

See also the JVM template interpreter implementation for binary operations on small integers:

void TemplateTable::iop2(Operation op) {
  // ...
  __ pop_i(rdx);
  __ addl (rax, rdx);
  // ...

which pops the top of the stack and adds rax to it. I think this is because the JVM caches the top of the stack in the register rax at all times, but I have not been able to confirm this. It would explain why it adds rax and why there is no push, though.

Exploring further

There are a number of improvements that could be made to this very simple demo. Bytecode rewriting can unlock a lot of performance gains with additional work. I will list some of them below:

  • Make a template interpreter like in the JVM. This will allow your specialized opcodes (like ADD_INT) directly make use of the call stack.
  • Make a template JIT. This is the “next level up” from a template interpreter. Instead of jumping between opcode handlers in assembly, paste the assembly implementations of the opcodes one after another in memory. This will remove a lot of the interpretive dispatch overhead in the bytecode loop.
  • Special case small integers in your object representation. Why allocate a whole object if you can fit a great deal of integers in a tagged pointer? This will simplify some of your math and type checking. I wrote a follow-up post about this!

Maybe I will even write about them in the future.

February 01, 2021

Gonçalo Valério (dethos)

10 years February 01, 2021 07:18 PM

The first post I published on this blog is now 10 years old. This wasn’t my first website or even the first blog, but it’s the one that stuck for the longest time.

The initial goal was to have a place to share anything I might find interesting on the Web, a place that would allow me to publish my opinions on all kinds of issues (if I felt like it) and to be able to publish information about my projects. I think you still can deduce that from the tag line, that remained unchanged ever since.

From the start, being able to host my own content was one of the priorities, in order to be able to control its distribution and ensuring that it is universally accessible to anyone without any locks on how and by whom it should be consumed.

The reasoning behind this decision was related to a trend that started a couple of years earlier, the departure from the open web and the big migration to the walled gardens.

Many people thought it was an inoffensive move, something that would improve the user experience and make the life easier for everyone. But as anything in life, with time we started to see the costs.

Today the world is different, using closed platforms that barely interact with each other is the rule and the downsides became evident: Users started to be spied for profit, platforms decide what speech is acceptable, manipulation is more present than ever, big monopolies are now gate keepers to many markets, etc. Summing up, the information and power is concentrated in fewer hands.

Last week this event set the topic for the post. A “simple chat app”, that uses an open protocol to interact with different servers, was excluded/blocked from the market unilaterally without any chance to defend itself. A more extensive discussion can be found here.

The message I wanted to leave in this commemorative post, is that we need to give another shot to decentralized and interoperable software, use open protocols and technologies to put creators and users back in control.

If there is anything that I would like to keep for the next 10 years, is the capability to reach, interact and collaborate with the world without having a huge corporation acting as middleman dictating its rules.

I will continue to put an effort in making sure open standards are used on this website (such RSS, Webmention, etc) and that I’m reachable using decentralized protocols and tools (such as email, Matrix or the “Fediverse“). It think this is the minimum a person could ask for the next decade.

Gustaf Erikson (gerikson)

January February 01, 2021 05:18 PM

Ponylang (SeanTAllen)

Last Week in Pony - January 31, 2021 February 01, 2021 12:03 AM

Version 0.38.3 of ponyc and 0.4.1 of corral have been released!

January 31, 2021

Derek Jones (derek-jones)

Growth in number of packages for widely used languages January 31, 2021 10:55 PM

These days a language’s ecosystem of add-ons, such as packages, is often more important than the features provided by the language (which usually only vary in their syntactic sugar, and built-in support for some subset of commonly occurring features).

Use of a particular language grows and shrinks, sometimes over very many decades. Estimating the number of users of a language is difficult, but a possible proxy is ecosystem activity in the form of package growth/decline. However, it will take many several decades for the data needed to test how effective this proxy might be.

Where are we today?

The Module Counts website is the home for a project that counts the number of libraries/packages/modules contained in 26 language specific repositories. Daily data, in some cases going back to 2010, is available as a csv :-) The following are the most interesting items I discovered during a fishing expedition.

The csv file contains totals, and some values are missing (which means specifying an ‘ignore missing values’ argument to some functions). Some repos have been experiencing large average daily growth (e.g., 65 for PyPI, and 112 for Maven Central-Java), while others are more subdued (e.g., 0.7 for PERL and 3.9 for R’s CRAN). Apart from a few days, the daily change is positive.

Is the difference in the order of magnitude growth due to number of active users, number of packages that currently exist, a wide/narrow application domain (Python is wide, while R’s is narrow), the ease of getting a package accepted, or something else?

The plots below show how PyPI has been experiencing exponential growth of a kind (the regression model fitted to the daily total has the form e^{10^{-3}days-6.5*10^{-8}days^2}, where days is the number of days since 2010-01-01; the red line is the daily diff of this equation), while Ruby has been experiencing a linear decline since late 2014 (all code+data):

Daily change in the number of packages in PyPI and Rubygems.

Will the five-year decline in new submissions to Rubygems continue, and does this point to an eventual demise of Ruby (a few decades from now)? Rubygems has years to go before it reaches PERL’s low growth rate (I think PERL is in terminal decline).

Are there any short term patterns, say at the weekly level? Autocorrelation is a technique for estimating the extent to which today’s value is affected by values from the immediate past (usually one or two measurement periods back, i.e., yesterday or the day before that). The two plots below show the autocorrelation for daily changes, with lag in days:

Autocorrelation of daily changes in PyPI and Maven-Java package counts.

The recurring 7-day ‘peaks’ show the impact of weekends (I assume). Is the larger ”weekend-effect’ for Java, compared to PyPI, due to Java usage including a greater percentage of commercial developers (who tend not to work at the weekend)?

I did not manage to find any seasonal effect, e.g., more submissions during the winter than the summer. But I only checked a few of the languages, and only for a single peak (see code for details).

Another way of tracking package evolution is version numbering. For instance, how often do version numbers change, and which component, e.g., major/minor. There have been a couple of studies looking at particular repos over a few years, but nobody is yet recording broad coverage daily, over the long term 😉

January 29, 2021

Frederic Cambus (fcambus)

NetBSD on the EdgeRouter Lite January 29, 2021 07:20 PM

NetBSD-current now has pre-built octeon bootable images (which will appear in NetBSD 10.0) for the evbmips port, so I decided to finally give it a try. I've been happily running OpenBSD/octeon on my EdgeRouter Lite for a few years now, and have previously published some notes including more detail about the CPU.

Contrary to the OpenBSD/octeon port which is very stable and runs SMP kernels, things are a little less polished on the NetBSD side for this platform. The system runs an uniprocessor kernel and there are still some stability issues.

EdgeRouter Lite

Here is the U-Boot configuration to boot the image:

Octeon ubnt_e100# set bootcmd 'fatload usb 0 $loadaddr netbsd;bootoctlinux $loadaddr coremask=0x3 root=wedge:octeon-root'
Octeon ubnt_e100# saveenv
Saving Environment to Flash...
Un-Protected 1 sectors
Erasing Flash...
. done
Erased 1 sectors
Writing to Flash... 4....3....2....1....done
Protected 1 sectors
Octeon ubnt_e100#

On first boot, the system automatically expands the filesystem:

Resizing / (NAME=octeon-root)
/dev/rdk1: grow cg |*************************************                 |  69%

Here is the login session, for posterity:

Thu Jan 28 23:40:37 UTC 2021

NetBSD/evbmips (octeon) (constty)


Here is the output of running file on executables:

ELF 32-bit MSB pie executable, MIPS, N32 MIPS-III version 1 (SYSV), dynamically
linked, interpreter /libexec/ld.elf_so, for NetBSD 9.99.79, not stripped

For the record, OpenSSL speed benchmark results are available here.

System message buffer (dmesg output):

[     1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[     1.000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[     1.000000]     2018, 2019, 2020, 2021 The NetBSD Foundation, Inc.  All rights reserved.
[     1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[     1.000000]     The Regents of the University of California.  All rights reserved.

[     1.000000] NetBSD 9.99.79 (OCTEON) #0: Thu Jan 28 18:52:43 UTC 2021
[     1.000000]
[     1.000000] Cavium Octeon CN5020-500
[     1.000000] total memory = 512 MB
[     1.000000] avail memory = 496 MB
[     1.000000] timecounter: Timecounters tick every 10.000 msec
[     1.000000] mainbus0 (root)
[     1.000000] cpunode0 at mainbus0: 2 cores, crypto+kasumi, 64bit-mul, unaligned-access ok
[     1.000000] cpu0 at cpunode0 core 0: 500.00MHz
[     1.000000] cpu0: Cavium CN5020-500 (0xd0601) Rev. 1 with software emulated floating point
[     1.000000] cpu0: 64 TLB entries, 512TB (49-bit) VAs, 512TB (49-bit) PAs, 256MB max page size
[     1.000000] cpu0: 32KB/128B 4-way set-associative L1 instruction cache
[     1.000000] cpu0: 16KB/128B 64-way set-associative write-through coherent L1 data cache
[     1.000000] cpu0: 128KB/128B 8-way set-associative write-back L2 unified cache
[     1.000000] cpu1 at cpunode0 core 1: disabled (uniprocessor kernel)
[     1.000000] wdog0 at cpunode0: default period is 4 seconds
[     1.000000] iobus0 at mainbus0
[     1.000000] iobus0: initializing POW
[     1.000000] iobus0: initializing FPA
[     1.000000] com0 at iobus0 address 0x0001180000000800: ns16650, no ERS, 16-byte FIFO
[     1.000000] com0: console
[     1.000000] com at iobus0 address 0x0001180000000c00 not configured
[     1.000000] octrnm0 at iobus0 address 0x0001180040000000
[     1.000000] entropy: ready
[     1.000000] octtwsi at iobus0 address 0x0001180000001000 not configured
[     1.000000] octmpi at iobus0 address 0x0001070000001000 not configured
[     1.000000] octsmi0 at iobus0 address 0x0001180000001800
[     1.000000] octpip0 at iobus0 address 0x00011800a0000000
[     1.000000] octgmx0 at octpip0
[     1.000000] cnmac0 at octgmx0: address=0x1180008000000: RGMII
[     1.000000] cnmac0: Ethernet address 44:d9:e7:9e:f5:9e
[     1.000000] atphy0 at cnmac0 phy 7: Atheros AR8035 10/100/1000 PHY, rev. 2
[     1.000000] atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseSX-FDX, 1000baseT-FDX, auto
[     1.000000] cnmac1 at octgmx0: address=0x1180008000000: RGMII
[     1.000000] cnmac1: Ethernet address 44:d9:e7:9e:f5:9f
[     1.000000] atphy1 at cnmac1 phy 6: Atheros AR8035 10/100/1000 PHY, rev. 2
[     1.000000] atphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseSX-FDX, 1000baseT-FDX, auto
[     1.000000] cnmac2 at octgmx0: address=0x1180008000000: RGMII
[     1.000000] cnmac2: Ethernet address 44:d9:e7:9e:f5:a0
[     1.000000] atphy2 at cnmac2 phy 5: Atheros AR8035 10/100/1000 PHY, rev. 2
[     1.000000] atphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseSX-FDX, 1000baseT-FDX, auto
[     1.000000] dwctwo0 at iobus0 address 0x0001180068000000
[     1.000000] dwctwo0: Core Release: 2.65a (snpsid=4f54265a)
[     1.000000] usb0 at dwctwo0: USB revision 2.0
[     1.000000] bootbus0 at mainbus0
[     1.000000] timecounter: Timecounter "mips3_cp0_counter" frequency 500000000 Hz quality 100
[     1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[     1.059978] uhub0 at usb0: NetBSD (0x0000) DWC2 root hub (0x0000), class 9/0, rev 2.00/1.00, addr 1
[     1.059978] uhub0: 1 port with 1 removable, self powered
[     1.069975] aes: BearSSL aes_ct
[     1.069975] aes_ccm: self-test passed
[     1.069975] chacha: Portable C ChaCha
[     1.079979] blake2s: self-test passed
[     3.609971] umass0 at uhub0 port 1 configuration 1 interface 0
[     3.620226] umass0: vendor 13fe (0x13fe) USB DISK 2.0 (0x4200), rev 2.00/1.00, addr 2
[     3.620226] umass0: using SCSI over Bulk-Only
[     3.620226] scsibus0 at umass0: 2 targets, 1 lun per target
[     3.632383] uhub0: autoconfiguration error: illegal enable change, port 1
[     3.639974] sd0 at scsibus0 target 0 lun 0: <, USB DISK 2.0, PMAP> disk removable
[     3.639974] sd0: 3824 MB, 959 cyl, 255 head, 32 sec, 512 bytes/sect x 7831552 sectors
[     3.659974] sd0: GPT GUID: 6e7b1b6a-2e9f-4915-946a-567dad0caaa4
[     3.669969] dk0 at sd0: "octeon-boot", 163840 blocks at 32768, type: ntfs
[     3.669969] dk1 at sd0: "octeon-root", 7626752 blocks at 196608, type: ffs
[     3.683879] WARNING: 1 error while detecting hardware; check system log.
[     3.691430] boot device: sd0
[     3.691430] root on dk1
[     3.709975] root file system type: ffs
[     3.719976] kern.module.path=/stand/evbmips/9.99.79/modules
[     3.719976] WARNING: no TOD clock present
[     3.729990] WARNING: using filesystem time

Kevin Burke (kb)

How to Force a Recall Election for the SF School Board January 29, 2021 06:46 PM

It is difficult for children, especially young children, to learn over Zoom. It is more difficult to teach critical skills like learning to read and write over Zoom. As Heather Knight and others have noted, keeping children isolated has severe impacts on their mental health.

It is the Mayor's top priority to get children back in schools. It is inexcusable that the San Francisco School District Board has not gotten children back in schools. The children who are most harmed by this failure are low income children in the Southeastern neighborhoods. Every single candidate for School Board last fall highlighted the importance of equity but has been unwilling to do what it takes to reduce inequality on this issue, the biggest one they will face while elected - get children back in schools so they can learn to read.

Even if you are not a parent, you should be interested in this issue. Who is going to want to move to San Francisco if the school system is not interested in educating students? If I had school age kids I would be looking for anything to get my kids back in school and doing things that are normal for kids to do. Keeping kids at home is not good for the community.

There are seven members on the SFUSD School Board. Four Board members were just elected in November and cannot be recalled until May 3, 2021. The other three can be recalled immediately. To force a recall election, you need to collect the signatures of 10% of the city's voters, around 51,000 votes.

Every recall election has signatures thrown out because they are signed by people who aren't residents or signatures or addresses don't match. So you probably need to collect twice as many signatures - around 95,000 signatures. There are about 515,000 voters in San Francisco, and around 100,000 parents of students in the SFUSD school system.

You can either get volunteers to collect signatures door to door or pay signature collectors about $8 to $15 per signature, or a mix of both. So you are looking at around $76,000 to $1,400,000 for a fully professional recall campaign. I can get you in touch with people who can collect signatures.

That said, you may not need to collect all the signatures. The goal is to put political pressure on the Board. The threat of a recall election was enough to get Gavin Newsom to change his behavior, and it may be enough here as well.

Tim Kellogg (kellogh)

Cold Paths January 29, 2021 12:00 AM

Faced with yet another crisis caused by a bug hidden in a cold path, I found myself Googling for a quick link to Slack out to the engineering team about cold paths. Unfortunately, I can’t find a focused write-up; and so here I am writing this.

A cold path is a path through the code or situation that rarely happens. By contrast, hot paths happen frequently. You don’t find bugs in hot paths. By nature, bugs are found in places that you didn’t think to look. Bugs are always in cold paths — every bug is found in a path colder than all the paths you tested.

Here are some real world “cold paths” with big consequences:

Rare events are hard to predict. That’s just the nature of them. As engineers, I belive it’s our responsibility to do our best to try harder and get better at planning for these rare bugs. Is that it? Try harder?

Better: Don’t have cold paths

Smaller programs

I watched one of Gil Tene’s many amazing talks on Azul’s C4 garbage collector (not this talk, but similar) where he claimed that normally it takes 10 years to harden a garbage collector. Azul didn’t have 10 years to produce a viable business, so they avoided almost all cold paths in the collector and they were able to harden it in 4 years (I never tried verifying this claim).

For a garbage collector, this means things like offering fewer options, or having a simpler model to avoid cold paths around promoting objects between generations. For your app it will mean something different.

You can test less to achieve high quality by reducing the size of your application. Less edge cases is equivalent to less testing surface area, which implies less testing work and fewer missed test cases. There’s something to be said for avoiding config options and making solutions less generic.

Avoid fallbacks

While I worked at AWS I had this beaten into my skull, but thankfully they’ve published guidence an excellent piece titled “avoiding fallback in distributed systems”. The hope is that, when system 1 fails you would like to automatically fallback to system 2.

For example, let’s say we have a process that sends logs to another service. For the hot path, we send logs directly via an HTTP request. But if the log service fails (e.g. overloaded, maintenence, etc.) we fallback by writing to a file and have a secondary process send those logs to the service when it comes back.

  • System 1: directly send logs to server
  • System 2: send asynchronously via file append

If system 2 is more reliable than system 1, then why don’t we always choose system 2? Always write to the file and ship logs asynchronously rather than send directly to the server. This is surprisingly strong logic that isn’t considered often enough. More often, by asking the question you end up finding a way to make system 1 more robust.

In cases where fallback can’t be avoided they suggest always exercising the fallback. For example, on every request, randomly decide to use either system 1 or system 2, thereby ensuring that the cold path isn’t cold because both are exercised on the hot path, at least sometimes.

Know your capacity for testing

In “files are fraught with problems”, Dan Luu demonstrates that it’s unexpectedly difficult to write a file to disk correctly. Juggling issues like handling random power loss or strange ext4 behavior becomes a full-time job. It’s a lot to keep in your head, just to write a file.

Is it better to:

  1. Ignore the cold paths and hope for the best
  2. Correctly implement & test each file write event and ship late
  3. Use a system that does it correctly for you, like MySQL or SQLite

Choice #3 delegates the testing of all those pesky cold paths to a 3rd party. Therefore, #3 is always the best choice, unless your company is in the file writing business (e.g. you’re AWS and working on DynamoDB or S3).

Alternnate take on the same idea: Choose boring technology


The practice of avoiding cold paths is often presented as “simple code”. Unfortunately, “simple” has such wildly varying meanings that it’s often antagonistic to use it outside a mathematical setting. I’ve found that centering conversations around “avoiding cold paths” gives more clarity on how to proceed.

In system design, the conversation about what is “simple” is even tougher due to the amorphous nature of it. The principle of “avoiding cold paths” can be extended to mean, “delegating cold paths” to a trusted third party, like an open source project or a cloud provider. An earnest discussion about your capacity for testing might be appropriate. It lets you disengage from “building cool stuff” and instead view it as “testing burden I’d rather not have”.

January 25, 2021

Patrick Louis (venam)

A Peek Into The Future Of Distros January 25, 2021 10:00 PM

The world description from an hermetic point of view was always convoluted and full of retrospective to enforce meaning

The year of the Linux desktop is coming, and it looks like a piñata of micro-services. Let’s break it and see the candies inside.

  • systemd core — the building block framework abstracting basic functionalities
  • A universal package manager with a single reposotiry where anyone can push
  • flatpak/appimage/snap — containarized applications
  • polkit, desktop portal, and apparmor — granular security
  • systemd-homed — movable home with an immutable base system
  • wayland compositors — the graphical environment
  • pipewire — media pipeline

If you’re bored you can leave now, that’s it. Otherwise, let me get more ideas out, but nothing surprising to anyone that has been following RedHat, the, and systemd.

The key innovation of the picture I lay above is that every layer in the system is now isolated and communicate with one another through d-bus services. D-bus is fantastic in its own right and good at abstracting functionalities.

All of these pieces need one another. Wayland compositors need pipewire to be able to access media hardware, and pipewire needs a polkit-like mechanism (D-Bus services such as policy managers and desktop portals) to select who can do that. Similarly, containarized applications access the rest of the system through systemd services, polkit, portals, and pipewire.

What this leads to is a base system that is stable, a solid framework to build on. The home is transportable inside a systemd-homed, along with the containarized apps, proprietary or not.

The deep secret insider info I got tells me that this is a giant scheme to get us all in a vendor lock-in. I can lie, this is the internet.
Or better (or worse?), this is a new Android ecosystem.

Who’s going to nag when this is all done in the open; when the open source movers are the implementers of the de-facto implementations; when they keep piling the standards until nobody can follow them.

I believe the direction the Linux desktop is going towards will bring more incentives for investments. Distributions uniqueness won’t matter anymore as they’ll be centralized around a “store” where devs will push directly. (see previous post about distros roles)

Leave a comment if you weren’t aware of this vision of how distros will look in the future.

January 24, 2021

Derek Jones (derek-jones)

Payback time-frame for research in software engineering January 24, 2021 09:24 PM

What are the major questions in software engineering that researchers should be trying to answer?

A high level question whose answer is likely to involve life, the universe, and everything is: What is the most cost-effective way to build software systems?

Viewing software engineering research as an attempt to find the answer to a big question mirrors physicists quest for a Grand Unified Theory of how the Universe works.

Physicists have the luxury of studying the Universe at their own convenience, the Universe does not need their input to do a better job.

Software engineering is not like physics. Once a software system has been built, the resources have been invested, and there is no reason to recreate it using a more cost-effective approach (the zero cost of software duplication means that manufacturing cost is the cost of the first version).

Designing and researching new ways of building software systems may be great fun, but the time and money needed to run the realistic experiments needed to evaluate their effectiveness is such that they are unlikely to be run. Searching for more cost-effective software development techniques by paying to run the realistic experiments needed to evaluate them, and waiting for the results to become available, is going to be expensive and time-consuming. A theory is proposed, experiments are run, results are analysed; rinse and repeat until a good-enough cost-effective technique is found. One iteration will take many years, and this iterative process is likely to take many decades.

Very many software systems are being built and maintained, and each of these is an experiment. Data from these ‘experiments’ provides a cost-effective approach to improving existing software engineering practices by studying the existing practices to figure out how they work (or don’t work).

Given the volume of ongoing software development, most of the payback from any research investment is likely to occur in the near future, not decades from now; the evidence shows that source code has a short and lonely existence. Investing for a payback that might occur 30-years from now makes no sense; researchers I talk to often use this time-frame when I ask them about the benefits of their research, i.e., just before they are about to retire. Investing in software engineering research only makes economic sense when it is focused on questions that are expected to start providing payback in, say, 3-5 years.

Who is going to base their research on existing industry practices?

Researching existing practices often involves dealing with people issues, and many researchers in computing departments are not that interested in the people side of software engineering, or rather they are more interested in the computer side.

Algorithm oriented is how I would describe researchers who claim to be studying software engineering. I am frequently told about the potential for huge benefits from the discovery of more efficient algorithms. For many applications, algorithms are now commodities, i.e., they are good enough. Those with a career commitment to studying algorithms have a blinkered view of the likely benefits of their work (most of those I have seen are doing studying incremental improvements, and are very unlikely to make a major break through).

The number of researchers studying what professional developers do, with an aim to improving it, is very small (I am excluding the growing number of fake researchers doing surveys). While I hope there will be a significant growth in numbers, I’m not holding my breadth (at least in the short term; as for the long term, Planck’s experience with quantum mechanics was: “Science advances one funeral at a time”).

Ponylang (SeanTAllen)

Last Week in Pony - January 24, 2021 January 24, 2021 04:18 PM

The Pony community is preparing to support Apple Silicon. Version 0.3.0 of ponylang/flycheck-pony has been released.

January 22, 2021

Jan van den Berg (j11g)

Merge two images in Windows from right-click context menu January 22, 2021 11:16 AM

  1. Download and install ImageMagick.
  2. Go to Windows Explorer and type sendto in the address bar. This will open the following path:


    The files here will be available as actions from the Windows “Send to” right-click context menu.
  3. Create a new (text) file in this directory and add the following line:

    magick.exe %1 %2 -resize "x%%[fx:max(u.h,v.h)]" +append -set filename: "COMBINED-%%k" "%%[filename:].jpg"

    This is essentially what will be executed when you right click two images anywhere and select the action.
    %1 and %2 are the two files
    – the resize parameters makes sure the two images line up correctly, from here
    +append means the images will be merged side by side — horizontally — as opposed to vertically
    filename uses %k to pass some random value to the generated new filename. Otherwise it would overwrite already existing files with the same name. By generating something unique this doesn’t happen. The word COMBINED is chosen by me, you can change this to whatever you like.

    This line has extra % this is necessary when you run this script as a batch script. If you want to run it from the commandline by hand, you need to remove the double %% and replace them for single %.
  4. Name this file Merge two images side by side.bat or any name you like as long it ends with .bat, so Windows knows to execute it.
  5. Done!

And the result looks like this. Click image or here.

The post Merge two images in Windows from right-click context menu appeared first on Jan van den Berg.

Marc Brooker (mjb)

The Fundamental Mechanism of Scaling January 22, 2021 12:00 AM

The Fundamental Mechanism of Scaling

It's not Paxos, unfortunately.

A common misconception among people picking up distributed systems is that replication and consensus protocols—Paxos, Raft, and friends—are the tools used to build the largest and most scalable systems. It's obviously true that these protocols are important building blocks. They're used to build systems that offer more availability, better durability, and stronger integrity than a single machine. At the most basic level, though, they don't make systems scale.

Instead, the fundamental approach used to scale distributed systems is avoiding co-ordination. Finding ways to make progress on work that doesn't require messages to pass between machines, between clusters of machines, between datacenters and so on. The fundamental tool of cloud scaling is coordination avoidance.

A Spectrum of Systems

With this in mind, we can build a kind of spectrum of the amount of coordination required in different system designs:

Coordinated These are the kind that use paxos, raft, chain replication or some other protocol to make a group of nodes work closely together. The amount of work done by the system generally scales with the offered work (W) and the number of nodes (N), something like O(N * W) (or, potentially, worse under some kinds of failures).

Data-dependent Coordination These systems break their workload up into uncoordinated pieces (like shards), but offer ways to coordinate across shards where needed. Probably the most common type of system in this category is sharded databases, which break data up into independent pieces, but then use some kind of coordination protocol (such as two-phase commit) to offer cross-shard transactions or queries. Work done can vary between O(W) and O(N * W) depending on access patterns, customer behavior and so on.

Leveraged Coordination These systems take a coordinated system and build a layer on top of it that can do many requests per unit of coordination. Generally, coordination is only needed to handle failures, scale up, redistribute data, or perform other similar management tasks. In the happy case, work done in these kinds of systems is O(W). In the bad case, where something about the work or environment forces coordination, they can change to O(N * W) (see Some risks of coordinating only sometimes for more). Despite this risk, this is a rightfully popular pattern for building scalable systems.

Uncoordinated These are the kinds of systems where work items can be handled independently, without any need for coordination. You might think of them as embarrassingly parallel, sharded, partitioned, geo-partitioned, or one of many other ways of breaking up work. Uncoordinated systems scale the best. Work is always O(W).

This is only one cut through a complex space, and some systems don't quite fit1. I think it's still useful, though, because by building a hierarchy of coordination we can think clearly about the places in our systems that scale the best and worst. The closer a system is to the uncoordinated end the better it will scale, in general.

Other useful tools

There are many other ways to approach this question of when coordination is necessary, and how that influences scale.

The CAP theorem2, along with a rich tradition of other impossibility results3, places limits on the kinds of things systems can do (and, most importantly, the kinds of things they can offer to their clients) without needing coordination. If you want to get into the details there, the breakdown in Figure 2 of Highly Available Transactions: Virtues and Limitations is pretty clear. I like it because it shows us both what is possible, and what isn't.

The CALM theorem4 is very useful, because it provides a clear logical framework for whether particular programs can be run without coordination, and something of a path for constructing programs that are coordination free. If you're going to read just one distributed systems paper this year, you could do a lot worse than Keeping CALM.

Harvest and Yield is another way to approach the problem, by thinking about when systems can return partial results4. This is obviously a subtle topic, because the real question is when your clients and customers can accept partial results, and how confused they will be when they get them. At the extreme end, you start expecting clients to write code that can handle any subset of the full result set. Sometimes that's OK, sometimes it sends them down the same rabbit hole that CALM takes you down. Probably the hardest part for me is that partial-result systems are hard to test and operate, because there's a kind of mode switch between partial and complete results and modes make life difficult. There's also the minor issue that there are 2N subsets of results, and testing them all is often infeasible. In other words, this is a useful too, but it's probably best not to expose your clients to the full madness it leads to.

Finally, we can think about the work that each node needs to do. In a coordinated system, there is generally one or more nodes that do O(W) work. In an uncoordinated system, the ideal node does O(W/N) work, which turns into O(1) work because N is proportional to W.


  1. Like systems that coordinate heavily on writes by mostly avoid coordination on reads. CRAQ is one such system, and a paper that helped me fall in love with distributed systems. So clever, and so simple once you understand it.
  2. Best described by Brewer and Lynch.
  3. See, for example, Nancy Lynch's 1989 paper A Hundred Impossibility Proofs for Distributed Computing. If there were a hundred of these in 1989, you can imagine how many there are now, 32 years later. Wow, 1989 was 32 years ago. Huh.
  4. I wrote a post about it back in 2014.

January 21, 2021

Sevan Janiyan (sevan)

Forcing an Xcode command line tools reinstall in order to update January 21, 2021 04:38 PM

Lost some time debugging a build issue which I was unable to reproduce. Turns out I was on an older version of clang despite both of us running the same version of macOS Catalina. Though you install the command line tools using xcode-select --install, there’s no way to force a reinstall with the tool as …

January 20, 2021

Gustaf Erikson (gerikson)

Our long international nightmare is over (for now) January 20, 2021 07:34 PM

For four years, I have managed to avoid to hear all but a few sentences spoken by the late unlamented 45th president of the United States. I don’t know if I’d managed four more years.

Trump is of course just the symptom, not the disease, and he’s just the tip of an iceberg of stewing resentment and irrationality. I’m not convinced today’s and tomorrow’s leaders have what it takes to melt it. Perhaps it will all be boiled away like life itself when the planet heats up.

January 19, 2021

Kevin Burke (kb)

Decline the 15 Minute Post-Vaccination Waiting Period January 19, 2021 05:36 PM

In very rare cases, the Pfizer and Moderna vaccines will cause the person being vaccinated to have an allergic reaction. When I say very rare, I mean it; the chances are a few in a million, or about the same of picking a specific resident of Newark, New Jersey at random out of the phone book.

Because of this chance, the pharmacy/hospital/vaccination site will ask you to wait around for 15 minutes after getting the shot so they can check whether you have an allergic reaction. Most places (scroll through the list on Vaccinate CA) administering the shot are doing so indoors in windowless rooms. People - right now, seniors and others with high exposure to COVID - are being asked to wait for 15 minutes in crowded waiting rooms.

However - waiting in a cramped room indoors is exactly how COVID spreads! Sure, most people are probably wearing masks. But the new B.117 variant is more dangerous and, anecdotally, can get past a cloth mask much more easily. Right after getting the vaccine, but before it has kicked in, we are asking people to huddle in basically the ideal location for transmitting COVID, all to avoid a miniscule risk of an allergic reaction. Not only is this extremely dangerous but it's a huge waste of vaccine - if we know you are going to get COVID today, we shouldn't give you protection against COVID starting a week from now.

The risk of not spotting someone who has an allergic reaction must be weighed against the risk of transmitting COVID. Right now about 3% of the US population is infected with COVID. So about 1 in every 30 people in a vaccination waiting room (likely higher due to selection effects) will be infected with COVID and transmit it to others. About 1-2% of people who get COVID will die from it and more will be hospitalized. Contrast this with about a 1 in ~180,000 risk of an allergic reaction. It's just not comparable.

If you are offered the vaccine, and the waiting area is indoors, I would urge you to decline to wait. Explain that waiting indoors with other unvaccinated1 people is not safe, and then wait outside. You can call someone on the phone for 15 minutes who can monitor you for side effects. Or, walk back in after 15 minutes, tell the pharmacist you are OK, and then leave.

You are not breaking the law by doing this and you are aware of the risks. The more people that choose to do this, the more pressure we can put on vaccinators and public health agencies to end this dangerous practice, and offer waiting areas that are well spaced outdoors.

So many people are dying every day and a vaccine is so close now that small changes will have a huge impact on the total number of people hospitalized and dead. Please share this post with someone you know who is getting vaccinated soon.

Thanks to Michael Story for bringing this issue up.

1. The vaccine takes a week to kick in so for all intents and purposes you are still unvaccinated minutes after you have received the shot.

Bogdan Popa (bogdan)

Running Racket CS on iOS January 19, 2021 08:25 AM

A couple of weeks ago, I started working on getting Racket CS to compile and run on iOS and, with a lot of guidance from Matthew Flatt, I managed to get it working (with some caveats). Those changes have now been merged, so I figured I’d write another one of these guides while the information is still fresh in my head.

January 18, 2021

Robin Schroer (sulami)

Traps to Avoid When Reviewing Code Changes January 18, 2021 12:00 AM

Reviewing code changes is an underappreciated art. It is part of most software engineers’ daily routine, but as an industry we do little towards developing it as a skill, even though it contributes directly to the quality of the software we produce.

The LGTM Trap

Characterised by the eponymous review comment, this trap can have different root causes, all of them resulting in the rubber-stamping of a bad pull request.

First of all, conducting proper code reviews is difficult and mentally exhausting. It is important to take breaks when conducting long or consecutive reviews. Resist the temptation to just “get something in” because it has been open for a while, or because someone else is blocked. Avoid including anything that you already know will need fixing up later, this road leads to broken windows. This also means your comment-change-test cycle should be as fast as possible to encourage fixing even the smallest issues before merging.

If a critical issue makes it past your review, you should investigate how it got missed, and what you can do to catch similar issues in the future. This way you can build your own checklist to use during reviews. You can also ask someone you trust to conduct a supplementary review, or even try pairing with them on some reviews.

The Human Linter Trap

Engineering time is expensive, and the focus required for good reviews is hard to maintain, so minimising the time required for a review is key. This is why we should avoid automatable tasks in code reviews. Prime examples include linting and enforcing a style guide. A pre-commit hook or a CI job can do either of these much more efficiently than a human reviewer ever could.

Beyond the efficiency gains, this also avoids the clutter resulting from many small comments pointing out typos, bad indentation, or non-idiomatic code, and lets both the reviewer and the author focus on more important issues.

The Implementation Trap

Tests can not only be useful to ensure correctness of our code, they can also help us during review. Tests exercise the interface of our code, ideally without giving us too much of an idea of the implementation. As a general rule, you want to be reviewing changes from the outside in.

This forces you to actually understand the test code, assert that the checks performed actually match the contract you expect the code to abide by, and catch potential holes in test coverage. It also allows you to judge the interface provided, as awkward tests often hint at sub-optimally factored code.

The Iceberg Trap

78 ths of an iceberg are famously below the water line and functionally invisible. Similarly some of the most important parts to pay attention to during reviews are not visible in the diff. This can range from introducing some avoidable duplication because the author was not aware of existing code with the same functionality, all the way to production outages because a remote piece of code made an assumption that does not hold anymore.

It can be helpful to checkout the change locally and look at it in the context of the entire code base instead of in isolation. Asking others familiar with the code base or related ones to have a cursory look can also uncover a wide range of problems quickly.

The Rube Goldberg Trap

Just because you can, it does not mean you should. And sometimes there is a better solution than the one proposed.

To review a change, it is important to agree on the problem to solve. Ask the author to supply a problem statement if it is not presented in the change. Only once you understand the problem statement you can evaluate the quality of a solution. The solution could be over- or under-engineered, implemented in the wrong place, or you could even disagree with the problem statement altogether.

January 17, 2021

Derek Jones (derek-jones)

Software effort estimation is mostly fake research January 17, 2021 08:53 PM

Effort estimation is an important component of any project, software or otherwise. While effort estimation is something that everybody in industry is involved with on a regular basis, it is a niche topic in software engineering research. The problem is researcher attitude (e.g., they are unwilling to venture into the wilds of industry), which has stopped them acquiring the estimation data needed to build realistic models. A few intrepid people have risked an assault on their ego and talked to people in industry, the outcome has been, until very recently, a small collection of tiny estimation datasets.

In a research context the term effort estimation is actually a hang over from the 1970s; effort correction more accurately describes the behavior of most models since the 1990s. In the 1970s models took various quantities (e.g., estimated lines of code) and calculated an effort estimate. Later models have included an estimate as input to the model, producing a corrected estimate as output. For the sake of appearances I will use existing terminology.

Which effort estimation datasets do researchers tend to use?

A 2012 review of datasets used for effort estimation using machine learning between 1991-2010, found that the top three were: Desharnias with 24 papers (29%), COCOMO with 19 papers (23%) and ISBSG with 17. A 2019 review of datasets used for effort estimation using machine learning between 1991 and 2017, found the top three to be NASA with 17 papers (23%), the COCOMO data and ISBSG were joint second with 16 papers (21%), and Desharnais was third with 14. The 2012 review included more sources in its search than the 2019 review, and subjectively your author has noticed a greater use of the NASA dataset over the last five years or so.

How large are these datasets that have attracted so many research papers?

The NASA dataset contains 93 rows (that is not a typo, there is no power-of-ten missing), COCOMO 63 rows, Desharnais 81 rows, and ISBSG is licensed by the International Software Benchmarking Standards Group (academics can apply for a limited time use for research purposes, i.e., not pay the $3,000 annual subscription). The China dataset contains 499 rows, and is sometimes used (there is no mention of a supercomputer being required for this amount of data ;-).

Why are researchers involved in software effort estimation feeding tiny datasets from the 1980s-1990s into machine learning algorithms?

Grant money. Research projects are more likely to be funded if they use a trendy technique, and for the last decade machine learning has been the trendiest technique in software engineering research. What data is available to learn from? Those estimation datasets that were flogged to death in the 1990s using non-machine learning techniques, e.g., regression.

Use of machine learning also has the advantage of not needing to know anything about the details of estimating software effort. Everything can be reduced to a discussion of the machine learning algorithms, with performance judged by a chosen error metric. Nobody actually looks at the predicted estimates to discover that the models are essentially producing the same answer, e.g., one learner predicts 43 months, 2 weeks, 4 days, 6 hours, 47 minutes and 11 seconds, while a ‘better’ fitting one predicts 43 months, 2 weeks, 2 days, 6 hours, 27 minutes and 51 seconds.

How many ways are there to do machine learning on datasets containing less than 100 rows?

A paper from 2012 evaluated the possibilities using 9-learners times 10 data-prerocessing options (e.g., log transform or discretization) times 7-error estimation metrics giving 630 possible final models; they picked the top 10 performers.

This 2012 study has not stopped researchers continuing to twiddle away on the option’s nobs available to them; anything to keep the paper mills running.

To quote the authors of one review paper: “Unfortunately, we found that very few papers (including most of our own) paid any attention at all to properties of the data set.”

Agile techniques are widely used these days, and datasets from the 1990s are not applicable. What datasets do researchers use to build Agile effort estimation models?

A 2020 review of Agile development effort estimation found 73 papers. The most popular data set, containing 21 rows, was used by nine papers. Three papers used simulated data! At least some authors were going out and finding data, even if it contains fewer rows than the NASA dataset.

As researchers in business schools have shown, large datasets can be obtained from industry; ISBSG actively solicits data from industry and now has data on 9,500+ projects (as far as I can tell a small amount for each project, but that is still a lot of projects).

Are there any estimates on Github? Some Open source projects use JIRA, which includes support for making estimates. Some story point estimates can be found on Github, but the actuals are missing.

A handful of researchers have obtained and released estimation datasets containing thousands of rows, e.g., the SiP dataset contains 10,100 rows and the CESAW dataset contains over 40,000 rows. These datasets are generally ignored, perhaps because when presented with lots of real data researchers have no idea what to do with it.

Caius Durling (caius)

Cleaning BODUM Bistro Coffee Grinder January 17, 2021 11:41 AM

I’ve owned a BODUM Bistro Coffee Grinder for a number of years, and aside from occasionally running rice through to clean the grinding surfaces haven’t had any issues with it. Recently bought some new beans which are much oilier than ones I usually get, and after running most of them through ended up with the grinder failing to work.

The failure was the mechanism sounding like it was okay for roughly a second, then the motor straining under load before what sounded like plastic gears jumping teeth. At this point I turned it off. Running it with an empty hopper worked fine, adding anything (either beans or ground coffee) to the hopper caused the load issue. On the third attempt it also had stopped self-feeding from the hopper, and trying to gently push beans/grounds through caused the stoppage above.

David Hagman has previously torn down his grinder and posted a video on YouTube showing the internals. I couldn’t see any obvious part of the internals that would be related to the failure I was experiencing, so I decided to start with cleaning it out and then continue with a strip down if that didn’t reveal anything.

To start with the hopper came off, then the top half of the burr grinder lifts out vertically leaving a worm screw standing proud. I first started gently tapping the grinder unit upside down to free any stuck coffee, then escalated to a small bottle brush. There was still enough grounds stuck around the base of the screw mechanism I couldn’t reach with a narrow brush, so I switched to a bamboo barbequeue stick to loosen grounds and then tip them out.

After clearing out most of the base of the worm gear, whilst I had the unit upside down tapping out the loosened grounds I looked up the chute the grounds fall down into the jar normally to find it was blocked solid with ground coffee. Some gentle rodding with the skewer to break it up and eventually I could see from the grinder mechanism through to the end of the chute.

Once that was clear and everything removable had been thoroughly cleaned and dried, I reassembled and ran rice through it a few times starting with a really coarse grind and fed the result back through the hopper, getting finer each grind. Grinder now works flawlessly, and I guess lesson learned about checking the chute to make sure it’s clear more frequently.

January 15, 2021

Gonçalo Valério (dethos)

Django Friday Tips: Permissions in the Admin January 15, 2021 05:55 PM

In this year’s first issue of my irregular Django quick tips series, lets look at the builtin tools available for managing access control.

The framework offers a comprehensive authentication and authorization system that is able to handle the common requirements of most websites without even needing any external library.

Most of the time, simple websites only make use of the “authentication” features, such as registration, login and logout. On more complex systems only authenticating the users is not enough, since different users or even groups of users will have access to distinct sets of features and data records.

This is when the “authorization” / access control features become handy. As you will see they are very simple to use as soon as you understand the implementation and concepts behind them. Today I’m gonna focus on how to use these permissions on the Admin, perhaps in a future post I can address the usage of permissions on other situations. In any case Django has excellent documentation, so a quick visit to this page will tell you what you need to know.

Under the hood

Simplified Entity-Relationship diagram of Django's authentication and authorization features.ER diagram of Django’s “auth” package

The above picture is a quick illustration of how this feature is laid out in the database. So a User can belong to multiple groups and have multiple permissions, each Group can also have multiple permissions. So a user has a given permission if it is directly associated with him or or if it is associated with a group the user belongs to.

When a new model is added 4 permissions are created for that particular model, later if we need more we can manually add them. Those permissions are <app>.add_<model>, <app>.view_<model>, <app>.update_<model> and <app>.delete_<model>.

For demonstration purposes I will start with these to show how the admin behaves and then show how to implement an action that’s only executed if the user has the right permission.

The scenario

Lets image we have a “store” with some items being sold and it also has a blog to announce new products and promotions. Here’s what the admin looks like for the “superuser”:

Admin view, with all models being displayed.The admin showing all the available models

We have several models for the described functionality and on the right you can see that I added a test user. At the start, this test user is just marked as regular “staff” (is_staff=True), without any permissions. For him the admin looks like this:

A view of Django admin without any model listed.No permissions

After logging in, he can’t do anything. The store manager needs the test user to be able to view and edit articles on their blog. Since we expect in the future that multiple users will be able to do this, instead of assigning these permissions directly, lets create a group called “editors” and assign those permissions to that group.

Only two permissions for this group of users

Afterwards we also add the test user to that group (in the user details page). Then when he checks the admin he can see and edit the articles as desired, but not add or delete them.

Screenshot of the Django admin, from the perspective of a user with only No “Add” button there

The actions

Down the line, the test user starts doing other kinds of tasks, one of them being “reviewing the orders and then, if everything is correct, mark them as ready for shipment”. In this case, we don’t want him to be able to edit the order details or change anything else, so the existing “update” permissions cannot be used.

What we need now is to create a custom admin action and a new permission that would let specific users (or groups) execute that action. Lets start with the later:

class Order(models.Model):
    class Meta:
        permissions = [("set_order_ready", "Can mark the order as ready for shipment")]

What we are doing above, is telling Django there is one more permission that should be created for this model, a permission that we will use ourselves.

Once this is done (you need to run migrate), we can now create the action and ensure we check that the user executing it has the newly created permission:

class OrderAdmin(admin.ModelAdmin):
    actions = ["mark_as_ready"]

    def mark_as_ready(self, request, queryset):
        if request.user.has_perm("shop.set_order_ready"):
                request, "Selected orders marked as ready", messages.SUCCESS
                request, "You are not allowed to execute this action", messages.ERROR

    mark_as_ready.short_description = "Mark selected orders as ready"

As you can see, we first check the user as the right permission, using has_perm and the newly defined permission name before proceeding with the changes.

And boom .. now we have this new feature that only lets certain users mark the orders as ready for shipment. If we try to execute this action with the test user (that does not have yet the required permission):

No permission assigned, no action for you sir

Finally we just add the permission to the user and it’s done. For today this is it, I hope you find it useful.

eta (eta)

Setting the tone in a group is very important January 15, 2021 12:00 AM

Various people have various different opinions on how they should present themselves. Personally, I’d consider myself quite a sensitive person; I’d like to think I try quite hard to take the feelings and circumstances of other people into account when talking to them, although doing so is by no means easy or automatic (and I often fail at this, sometimes quite badly). Partially, this is because I often have a lot of feelings and circumstances going in my life myself which I’d like other people to attempt to take into account – in fact, especially nowadays, I’d argue that the overwhelming majority of people have some topics or areas that make them uncomfortable, or that it’d be possible to upset them with a correctly targeted remark.

It’s very hard to judge what might upset or offend someone, simply because there can be a lot of stuff going on behind the scenes that people just don’t tell others about (it’s almost as if having an entire country where not talking about stuff is the norm could lead to significant problems!). That said, you can at least attempt to be reactive – it’s sometimes possible to detect that something you’ve said or done wasn’t received well, and try to do less of that thing in future.

There are, of course, cultures and groups where this sort of thing is very much not the norm. Some people – and groups of people – attempt to act in a sort of “macho” / “tough guy” sort of way, where one is supposed to pretend that one doesn’t really have feelings, or that one is immune to whatever life might throw in one’s way. This, of course, is obviously false – everyone has feelings! – but it suits them to conduct themselves in this manner, because, at the end of the day, talking about one’s feelings can be very hard.

Maybe you weren’t brought up in an environment where people do that; maybe you never saw your parents, or your school friends, cry, or be angry, or show feelings toward things, because they thought they had to be ‘strong’). Maybe you had people express too much of their feelings in your past and really got put off by it (as in, maybe you were distressed by other people having issues and now have decided that ‘burdening’ others with your feelings isn’t a good idea). Maybe you don’t have any friends you really trust enough to be able to confide in, perhaps because they’re all tough guys, or perhaps because you have issues trusting people for some other reason – especially in our modern society, that reason can often be loneliness, an epidemic that people don’t actually realise the severity of.

But anyway, you don’t want to talk about your feelings. I get that, because I don’t really want to either1. Being in that position, while it’s not really a good thing for you long-term, isn’t really wrong; you aren’t hurting anyone except yourself there (although if you read the previous paragraph and found it hit a bit too close to home, you should probably watch this video).

It’s not uncommon for large-ish (i.e. more than 3 or 4) groups of people to have leaders, whether explicitly or implicitly allocated. Usually there are one or two people who do a lot of the talking – who appear to set the tone and the rules of engagement for the rest of the group. This doesn’t always have to happen, of course, but what I’m saying is it probably happens more than you realise. (It can also become painfully obvious when the leaders step out to go and do something else for a bit and you end up with a bunch of people who don’t really know what they should be doing with themselves, which is always a fun scenario!)

As you probably gathered from the title, what I really want to emphasise is the whole “setting the tone” aspect of being a leader. This turns out to be important in a bunch of ways that aren’t immediately obvious. I’d hypothesize that a large part of the issues people who aren’t cishet white males face in STEM fields, and especially programming / IT, are down to this factor; the groups people tend to hang out in are implicitly using a bunch of norms that probably aren’t very inclusive (think people making slightly inappropriate cheeky comments about women they fancy, but also more subtle mannerisms and ways of communicating that tend to only be shared by people from a certain background that make it harder for people not from that background to communicate). A lot of ink has been spilled about this2 in terms of how companies should try to avoid it when creating teams at work, because it’s a real problem.

The macho people from earlier can face real issues with this sort of thing. Often, their tough-guy behaviour implicitly sets the tone for the groups of people they find themselves in (which usually end up being filled with people who don’t really care, or are also trying to be macho). To avoid opening themselves up to the potential of insecurity, they often tend to do this more forcefully – terms like “pussy”, “wimp”, et al. are often employed for this purpose, wherein such people attempt to claim that people who do have feelings, are afraid of things, etc. are somehow ‘weaker’ than them.

The astute reader will think that I’m writing this because someone did that to me and I’m angry about it, and this blog post is my way of rationalizing their behaviour and asserting that I’m actually a better person than them. And they’d be partially right!3

It’s more nuanced than that, however. The thing that’s actually really sad about these sorts of situations is that the people responsible for creating the harmful no-feelings-allowed environment are often the people most in need of a way to express their feelings (as implied earlier). And what they’ve managed to do by creating such an environment is ensure they most likely won’t be able to do that thing with those people – if they try, it could get awkward (since the others aren’t really happy having a more ‘deep’ conversation, and that’s why they’re in the group), or they might find themselves met with a surprising lack of sympathy (because others actually did have problems and got humiliated for them).

I don’t even think you can blame these people, either. They’ve just found themselves in a situation that most likely isn’t even their fault, and they don’t really know what they should do to cope with it. If anything, it’s probably society that teaches them to behave in this way – and that’s just a sad, sad situation that’s not exactly easy to fix4.

  1. No, it suits me to write pseudo-intellectual blog posts that nobody reads that vaguely hint at a whole bunch of screwed up stuff going on. 

  2. Like a lot of the claims on this blog, this one is unsubstantiated. I think it’s true though… 

  3. I mean I really don’t post often, so someone has to have annoyed me for things to get this bad. 

  4. That said, I’ve seen advertising campaigns that try! I think some biscuit company teamed up with some mental health charity to promote the idea of having a cuppa and a chat about your problems with your friends, which is absolutely a good thing (even if Big Biscuit ends up profiting from it) 

January 14, 2021

Gustaf Erikson (gerikson)

719: Number splitting January 14, 2021 06:13 PM

I’m a bit fired up after Advent of Code, so thought I’d start by picking off some “easy” recent PE problems.

My first brute-force attempt found a solution after 4.5 hours, but access to the forum showed much smarter solutions so that’s what I’ve added to the repo.

Maxwell Bernstein (tekknolagi)

Inline caching January 14, 2021 12:00 AM

Inline caching is a popular technique for runtime optimization. It was first introduced in 1984 in Deutsch & Schiffman’s paper Efficient implementation of the smalltalk-80 system [PDF] but has had a long-lasting legacy in today’s dynamic language implementations. Runtimes like the Hotspot JVM, V8, and SpiderMonkey use it to improve the performance of code written for those virtual machines.

In this blog post, I will attempt to distill the essence of inline caching using a small and relatively useless bytecode interpreter built solely for this blog post. The caching strategy in this demo is a technique similar to the ideas from Inline Caching meets Quickening [PDF] in that it caches function pointers instead of making use of a JIT compiler.

In order to make the most of this post, I recommend having some background on building bytecode virtual machines. It is by no means necessary, but will make some of the new stuff easier to absorb.


In many compiled programming languages like C and C++, types and attribute locations are known at compile time. This makes code like the following fast:

#include "foo.h"

Foo do_add(Foo left, Foo right) {
  return left.add(right);

The compiler knows precisely what type left and right are (it’s Foo) and also where the method add is in the executable. If the implementation is in the header file, it may even be inlined and do_add may be optimized to a single instruction. Check out the assembly from objdump:

0000000000401160 <_Z6do_add3FooS_>:
  401160:	48 83 ec 18          	sub    $0x18,%rsp
  401164:	89 7c 24 0c          	mov    %edi,0xc(%rsp)
  401168:	48 8d 7c 24 0c       	lea    0xc(%rsp),%rdi
  40116d:	e8 0e 00 00 00       	callq  401180 <_ZN3Foo3addES_>
  401172:	48 83 c4 18          	add    $0x18,%rsp
  401176:	c3                   	retq

All it does is save the parameters to the stack, call Foo::add, and then restore the stack.

In more dynamic programming languages, it is often impossible to determine at runtime startup what type any given variable binding has. We’ll use Python as an example to illustrate how dynamism makes this tricky, but this constraint is broadly applicable to Ruby, JavaScript, etc.

Consider the following Python snippet:

def do_add(left, right):
    return left.add(right)

Due to Python’s various dynamic features, the compiler cannot in general know what type left is and therefore what code to run when reading left.add. This program will be compiled down to a couple Python bytecode instructions that do a very generic LOAD_METHOD/CALL_METHOD operation:

>>> import dis
>>> dis.dis("""
... def do_add(left, right):
...     return left.add(right)
... """)
Disassembly of <code object do_add at 0x7f0b40cf49d0, file "<dis>", line 2>:
  3           0 LOAD_FAST                0 (left)
              2 LOAD_METHOD              0 (add)
              4 LOAD_FAST                1 (right)
              6 CALL_METHOD              1
              8 RETURN_VALUE


This LOAD_METHOD Python bytecode instruction is unlike the x86 mov instruction in that LOAD_METHOD is not given an offset into left, but instead is given the name "add". It has to go and figure out how to read add from left’s type — which could change from call to call.

In fact, even if the parameters were typed (which is a new feature in Python 3), the same code would be generated. Writing left: Foo means that left is a Foo or a subclass.

This is not a simple process like “call the function at the given address specified by the type”. The runtime has to find out what kind of object add is. Maybe it’s just a function, or maybe it’s a property, or maybe it’s some custom descriptor protocol thing. There’s no way to just turn this into a call instruction!

… or is there?

Runtime type information

Though dynamic runtimes do not know ahead of time what types variables have at any given opcode, they do eventually find out when the code is run. The first time someone calls do_add, LOAD_METHOD will go and look up the type of left. It will use it to look up the attribute add and then throw the type information away. But the second time someone calls do_add, the same thing will happen. Why don’t runtimes store this information about the type and the method and save the lookup work?

The thinking is “well, left could be any type of object — best not make any assumptions about it.” While this is technically true, Deutsch & Schiffman find that “at a given point in code, the receiver is often the same class as the receiver at the same point when the code was last executed”.

Note: By receiver, they mean the thing from which the attribute is being loaded. This is some Object-Oriented Programming terminology.

This is huge. This means that, even in this sea of dynamic behavior, humans actually are not all that creative and tend to write functions that see only a handful of types at a given location.

The Smalltalk-80 paper describes a runtime that takes advantage of this by adding “inline caches” to functions. These inline caches keep track of variable types seen at each point in the code, so that the runtime can make optimization decisions with that information.

Let’s take a look at how this could work in practice.

A small example

I put together a small stack machine with only a few operations. There are very minimal features to avoid distracting from the main focus: inline caching. Extending this example would be an excellent exercise.

Objects and types

The design of this runtime involves two types of objects (ints and strs). Objects are implemented as a tagged union, but for the purposes of this blog post the representation does not matter very much.

typedef enum {
} ObjectType;

typedef struct {
  ObjectType type;
  union {
    const char *str_value;
    int int_value;
} Object;

These types have methods on them, such as add and print. Method names are represented with an enum (Symbol) though strings would work just as well.

typedef enum {

} Symbol;

The representation of type information isn’t super important. Just know that there is a function called lookup_method and that it is very slow. Eventually we’ll want to cache its result.

Method lookup_method(ObjectType type, Symbol name);

Let’s see how we use lookup_method in the interpreter.


This interpreter provides no way to look up (LOAD_METHOD) or call (CALL_METHOD) the methods directly. For the purposes of this demo, the only way to call these methods is through purpose-built opcodes. For example, the opcode ADD takes two arguments. It looks up kAdd on the left hand side and calls it. PRINT is similar.

There are only two other opcodes, ARG and HALT.

typedef enum {
  // Load a value from the arguments array at index `arg'.
  // Add stack[-2] + stack[-1].
  // Pop the top of the stack and print it.
  // Halt the machine.
} Opcode;

Bytecode is represented by a series of opcode/argument pairs, each taking up one byte. Only ARG needs an argument; the other instructions ignore theirs.

Let’s look at a sample program.

byte bytecode[] = {/*0:*/ ARG,   0,
                   /*2:*/ ARG,   1,
                   /*4:*/ ADD,   0,
                   /*6:*/ PRINT, 0,
                   /*8:*/ HALT,  0};

This program takes its two arguments, adds them together, prints the result, and then halts the interpreter.

You may wonder, “how is it that there is an instruction for loading arguments but no call instruction?” Well, the interpreter does not support calls. There is only a top-level function, eval_code_uncached. It takes an object, evaluates its bytecode with the given arguments, and returns. Extending the interpreter to support function calls would be another good exercise.

The interpreter implementation is a fairly straightforward switch statement. Notice that it takes a representation of Frame, which holds all the state, like the pc and the stack. It contains a function-like thing (Code) and an array of arguments. nargs is only used for bounds checking.

I am omitting some of its helper functions (init_frame, push, pop, peek) for brevity’s sake, but they do nothing tricky. Feel free to look in the repo for their definitions.

typedef unsigned char byte;

#define STACK_SIZE 100

typedef struct {
  Object* stack_array[STACK_SIZE];
  Object** stack;
  Code* code;
  word pc;
  Object** args;
  word nargs;
} Frame;

void eval_code_uncached(Frame* frame) {
  Code* code = frame->code;
  while (true) {
    Opcode op = code->bytecode[frame->pc];
    byte arg = code->bytecode[frame->pc + 1];
    switch (op) {
      case ARG:
        CHECK(arg < frame->nargs && "out of bounds arg");
        push(frame, frame->args[arg]);
      case ADD: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        Method method = lookup_method(object_type(left), kAdd);
        Object* result = (*method)(left, right);
        push(frame, result);
      case PRINT: {
        Object* obj = pop(frame);
        Method method = lookup_method(object_type(obj), kPrint);
      case HALT:
        fprintf(stderr, "unknown opcode %d\n", op);
    frame->pc += kBytecodeSize;

Both ADD and PRINT make use of lookup_method to find out what function pointer corresponds to the given (type, symbol) pair. Both opcodes throw away the result. How sad. Let’s figure out how to save some of that data.

Inline caching strategy

Since the Smalltalk-80 paper tells us that the receiver type is unlikely to change from call to call at a given point in the bytecode, let’s cache one method address per opcode. As with any cache, we’ll have to store both a key (the object type) and a value (the method address).

There are several states that the cache could be in when entering an opcode:

  1. If it is empty, look up the method and store it in the cache using the current type as a cache key. Use the cached value.
  2. If it has an entry and the entry is for the current type, use the cached value.
  3. Last, if it has an entry and the entry is for a different type, flush the cache. Repeat the same steps as in the empty case.

This is a simple monomorphic (one element) implementation that should give us most of the performance. A good exercise would be to extend this cache system to be polymorphic (multiple elements) if the interpreter sees many types. For that you will want to check out Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches by Hölzle, Chambers, and Ungar.

For the purposes of this inline caching demo, we will focus on caching lookups in ADD. This is a fairly arbitrary choice in our simple runtime, since the caching implementation will not differ between opcodes.

Note: The types in this demo code are immutable. Some programming languages (Python, Ruby, etc) allow for types to be changed after they are created. This requires careful cache invalidation. We will not address cache invalidation in this post.

Inline caching implementation

Let’s store the caches on the Code struct. If we have one element per opcode, that looks like:

typedef struct {
  // Array of `num_opcodes' (op, arg) pairs (total size `num_opcodes' * 2).
  byte* bytecode;
  word num_opcodes;
  // Array of `num_opcodes' elements.
  CachedValue* caches;
} Code;

where CachedValue is a key/value pair of object type and method address:

typedef struct {
  ObjectType key;
  Method value;
} CachedValue;

We have some helpers, cache_at and cache_at_put, to manipulate the caches.

CachedValue cache_at(Frame* frame) {
  return frame->code->caches[frame->pc / kBytecodeSize];

void cache_at_put(Frame* frame, ObjectType key, Method value) {
  frame->code->caches[frame->pc / kBytecodeSize] =
      (CachedValue){.key = key, .value = value};

These functions are fairly straightforward given our assumption of a cache present for every opcode.

Let’s see what changed in the ADD opcode.

void add_update_cache(Frame* frame, Object* left, Object* right) {
  Method method = lookup_method(object_type(left), kAdd);
  cache_at_put(frame, object_type(left), method);
  Object* result = (*method)(left, right);
  push(frame, result);

void eval_code_cached(Frame* frame) {
  // ...
  while (true) {
    // ...
    switch (op) {
      // ...
      case ADD: {
        Object* right = pop(frame);
        Object* left = pop(frame);
        CachedValue cached = cache_at(frame);
        Method method = cached.value;
        if (method == NULL || cached.key != object_type(left)) {
          add_update_cache(frame, left, right);
        Object* result = (*method)(left, right);
        push(frame, result);
      // ...
    frame->pc += kBytecodeSize;

Now instead of always calling lookup_method, we do two quick checks first. If we have a cached value and it matches, we use that instead. So not much changed, really, except for the reading and writing to code->caches.

If we don’t have a cached value, we call lookup_method and store the result in the cache. Then we call it. I pulled this slow-path code into the function add_update_cache.

Run a test program and see results

Let’s put it all together for some satisfying results. We can use the sample program from earlier that adds its two arguments.

We’ll call it four times. The first time we will call it with integer arguments, and it will cache the integer method. The second time, it will use the cached integer method. The third time, we will call it with string arguments and it will cache the string method. The fourth time, it will use the cached string method.

int main() {
  byte bytecode[] = {/*0:*/ ARG,   0,
                     /*2:*/ ARG,   1,
                     /*4:*/ ADD,   0,
                     /*6:*/ PRINT, 0,
                     /*8:*/ HALT,  0};
  Object* int_args[] = {
  void (*eval)(Frame*) = eval_code_cached;
  Frame frame;
  Code code = new_code(bytecode, sizeof bytecode / kBytecodeSize);
  init_frame(&frame, &code, int_args, ARRAYSIZE(int_args));
  init_frame(&frame, &code, int_args, ARRAYSIZE(int_args));
  Object* str_args[] = {
      new_str("hello "),
  init_frame(&frame, &code, str_args, ARRAYSIZE(str_args));
  init_frame(&frame, &code, str_args, ARRAYSIZE(str_args));

And if we run that, we see:

laurel% ./a.out
int: 15
int: 15
str: "hello world"
str: "hello world"

Which superficially seems like it’s working, at least. 5 + 10 == 15 and "hello " + "world" == "hello world" after all.

To get an insight into the behavior of the caching system, I added some logging statements. This will help convince us that the cache code does the right thing.

laurel% ./a.out
updating cache at 4
int: 15
using cached value at 4
int: 15
updating cache at 4
str: "hello world"
using cached value at 4
str: "hello world"

Hey-ho, looks like it worked.

Performance analysis

Most posts like this end with some kind of performance analysis of the two strategies proposed. Perhaps the author will do some kind of rigorous statistical analysis of the performance of the code before & after. Perhaps the author will run the sample code in a loop 1,000 times and use time to measure. Most authors do something.

Reader, I will not be doing a performance analysis in this post. This tiny interpreter has little to no resemblance to real-world runtimes and this tiny program has little to no resemblance to real-world workloads. I will, however, give you some food for thought:

If you were building a runtime, how would you know inline caching would help you? Profile your code. Use the tools available to you, like Callgrind and Perf. If you see your runtime’s equivalent of lookup_method show up in the profiles, consider that you may want a caching strategy. lookup_method may not show up in all of your profiles. Some benchmarks will have very different workloads than other benchmarks.

How would you measure the impact of inline caching, once added? Profile your code. Did the percent CPU time of lookup_method decrease? What about overall runtime? It’s entirely possible that your benchmark slowed down. This could be an indicator of polymorphic call sites — which would lead to a lot of overhead from cache eviction. In that case, you may want to add a polymorphic inline cache. printf-style logging can help a lot in understanding the characteristics of your benchmarks.

No matter how you measure impact, it is not enough to run the ADD handler in a tight loop and call it a benchmark. The real-life workload for your runtime will undoubtedly look very different. What percent of opcodes executed is ADD? Ten percent? Five percent? You will be limited by Amdahl’s Law. That will be your upper bound on performance improvements.

What other performance considerations might you have? Consider your memory constraints. Perhaps you are on a system where memory is a concern. Adding inline caches will require additional memory. This might cause swapping, if its enabled.

So without benchmarks, how do you know this is even faster? Don’t take my word for it. Benchmark your runtime. Take a look at the Smalltalk-80 and Brunthaler papers linked. Take a look at the JVM, V8, MRI, Dart, and other runtimes. They all seem to have found inline caching helpful.


Inline caches can be a good way to speed up your bytecode interpreter. I hope this post helped you understand inline caches. Please write me if it did not.

Exploring further

There are a number of improvements that could be made to this very simple demo. I will list some of them below:

  • Rewrite generic opcodes like ADD to type-specialized opcodes like ADD_INT. These opcodes will still have to check the types passed in, but can make use of a direct call instruction or even inline the specialized implementation. I wrote a follow-up post about it!
  • Not all opcodes require caches, yet we allocate for them anyway. How can we eliminate these wasted cache slots?
  • Actually store caches inline in the bytecode, instead of off in a side table.
  • Instead of storing cached function pointers, build up a sort of “linked list” of assembly stubs that handle different type cases. This is also mentioned in the Deutsch & Schiffman paper. See for example this excellent post by Matthew Gaudet that explains it with pseudocode.
  • Figure out what happens if the runtime allows for mutable types. How will you invalidate all the relevant caches?

Maybe I will even write about them in the future.

January 11, 2021

Jan van den Berg (j11g)

Freedom of speech is not freedom of reach January 11, 2021 05:37 PM

The man with access to the nuclear launch codes has been deemed unfit for Twitter. And the country that doesn’t believe universal healthcare is a human right, all of a sudden believes access to Twitter should be an inalienable right. Interesting times!

This week more Americans died from Covid than on 9/11, the Iraq war and the Afghanistan war combined, and that fact isn’t even in the top 10 news stories right now.

The news is dominated by the insurrection and the presidents’ direct incitement of it. And the subsequent (social) media bans that followed. And most notably his account suspension on Twitter.

Most news seems to be focused on the Twitter ban — his preferred outlet — and this has made a lot of people angry, specifically the ones being so called silenced. Which is strange: because I don’t know why GOP politicians are upset about the president losing his Twitter account. They’ve never seen any of his tweets anyway–at least, that’s what they told reporters every time they were asked, right before they ran away.

But it’s not just Twitter. Also Facebook, Google, Apple and Amazon are banning the president, and other apps are pulled and online stores are closed etc. This has people arguing that their first amendment right is violated. In doing so they fail to understand that first amendement is not just to protect the president from his people, but to protect the people from the president.

The relation the USA has to freedom of speech is uniquely American and the complicated nuances and general differences between tolerance and freedom of speech, are paradoxical. But nonetheless: freedom of speech does not entail freedom of reach.

Just like the second amendment was written with slow loading muskets in mind — one round every few minutes — it is now abused to argue the right to automatic weapons that fire hundreds of rounds per minute. And the same is true for the first amendment which was developed to — maybe — reach 40-50 people by standing on a scaffold in a park, is now being abused to argue the right to broadcast opinions to 80 million people. These are clearly different things.


Over the last few days I have heard several arguments against or for banning. Let’s look at some of them.

  • They waited until the democrats had the majority, weak!

Well, it seems the platforms waited until the election was officially called and the electoral college had spoken. Imagine what would have happened if they blocked the president before it? That would have been a much more impactful decision. And then you could have really argued they influenced elections. Now the platforms at least think this argument can never be thrown at them. I take it a lot of lawyers have looked at the timing of this decision.

There have been plenty of reasons already to block the president, citing Terms of Service violations. But if you don’t do it out of the gate (i.e. four years ago) it becomes increasingly more difficult to pick a good time. So we now had to watch and escalate this whole thing steadily for four years.

  • Twitter silenced the president!

Well, it is the president himself who chose to make Twitter his default media outlet. The person with access to every news channel and newspaper in the world chose Twitter, the outrage amplifier, as his biggest news outlet and contact with the people.

Sure Twitter has silenced him, but he still has plenty of other ways to reach people. This proves however that Twitter is not a right, it’s a privilege and it has rules.

This being said, other well known dictators still do have a Twitter account. The difference might be direct incitation?

Still, you can say plenty about the power Twitter yields and the inherit risks involved. Same goes for Facebook et al. of course. They do have great power (too much), and therefore great responsibility. And I do believe regulation should be in place, but that’s another topic.

  • Twitter is a private company, they can do whatever they want!

Well, this is true (the section 230 discussion aside). And this is also how free and open markets should work. He is still entitled to his opinion and spreading this wherever he wants (see above). So we’re not watching “censorship” we’re watching an open source, free market approach to safety measures on the web.

I’ll say this though, Twitter is the de facto pulse of society, whereas Facebook is the personal newspaper and I am willing to state when something is de facto that it has inherent responsibilities following from that. But clearly there are lines and they have chosen to draw the line. As is their right.

That doesn’t mean you can’t feel conflicted about the whole situation. Which I do.

Who dis?

This all being said there is just an incredible amount of complaining about cancel culture, from people that actually tried to cancel the election and the democracy.

The good news is that a test of a secure democracy isn’t whether mobs storm the seat of government. The test of a secure democracy is whether democratic processes survive and continue *in spite of* mobs storming the seat of government. And democracy is proving itself secure in the USA.

In the end what happened was no surprise, at least if you had eyes and ears. And this is not about who the president is, we know who he is, this about who America is. If you want to know who he is, there is an hour long tape of someone who is out of options and plain and simple wants to cheat.

And we can all see and hear with our own eyes what happened. And no, it is not a media narrative.

Now what?

Part of the damage is done. These companies missed the chance to change course years ago. There is no separating Facebook, Twitter, and YouTube from what happened on Wednesday.

The insurrection took longer than necessary, and it sure took time before law enforcement showed up. But the president waited so long, because he was waiting for it to work. And he deliberately took his time. This was the Trump coup. And he got exposed.

The insurrection was a blatant grab to seize power, but it was also to bully and frighten people and to literally terrorize people.

Accountability here is important. Because every time the president hasn’t been held accountable, he’s gotten worse. Every time.

Some people ask: Why would you impeach and convict a president who has only a few days left in office? The answer: Precedent. It must be made clear that no president, now or in the future, can lead an insurrection against the U.S. government.

And there are other reasons of course, the president would loose a lot of benefits.

But even with the president out of the equation, 147 Republicans voted to overturn election results. The USA is in deep trouble. And most Capitol stormers themselves seem deeply troubled. And white. It’s frightening.

One of the indigestible facts of the USA is that most of its terrorism and nearly all its mass shootings are committed by mostly conservative-leaning white men…earnestly committed to their white supremacist-misogynist identity politics.

So there is a lot of work to be done, before we can discuss healing.

To end on a positive note, fortunately stories like these also have heros.

(This post is constructed by assembling tweets from my timeline into a more or less coherent story. I’ve hardly typed any words. For readability all blue links are referenced tweets. Twitter is great.)

The post Freedom of speech is not freedom of reach appeared first on Jan van den Berg.

Nikola Plejić (nikola)

2020: A Year In Review January 11, 2021 03:30 PM

I liked going back to my previous "year in review" post, so let's keep it up. I particularly liked how funny it looks in retrospect. Having good sense of humor is important.

So, 2020... Whew. Oh boy.

The Ugly

COVID-19 happened very early on. Luckily, it did not seriously affect anyone close to me, but it did change everyone's approach to living.

In March, Zagreb got hit by a fairly strong M=5.3 earthquake. The city and its surroundings are still recovering: the renovation project is painfully slow, and the leadership was dreadful. Me and my partner were dislocated for a few months, but have since returned to our rental.

In December, Petrinja got hit by a devastating M=6.4 earthquake which basically destroyed several smaller cities, a lot of villages, and caused widespread damage in central Croatia. The aftershocks are still strong, and if Zagreb is any indicator, the recovery will be ongoing for years to come.

It's been a traumatic year all-around, and I'm sure we'll all be dealing with it for quite a while.


In 2018, I have started pursuing a BSc in Physics at The Open University. This year, I finally graduated — with first class honors. It was probably the pinnacle of the year, and one of my prouder achievements.

I was very happy with the distance learning approach. While I don't see face-to-face teaching disappearing from the face of the Earth any time soon, I believe there's much to explore and experiment within that approach to teaching.

I will be applying to Georgia Tech's Online MS in Computer Science in the Fall 2021 semester. I seem to be on a roll, and it'd be a shame to stop now.


COVID-19 meant I was mostly roaming around Croatia: Vis, Prvić, Fužine, Ozalj, Plitvice. I very much did not mind that.


This year, I've read a disproportional amount of fiction written in my mother tongue. I very much did not mind that, either. A couple, regardless of genre, that I found particularly impactful:

  • The Judgment of Richard Richter and W by Igor Štiks
  • Selected Poems by Constantinos P. Cavafy
  • Huzur by Martina Mlinarević Sopta
  • Meho by Amin Kaplan
  • Permanent Record by Edward Snowden
  • Patterns of Software by Richard P. Gabriel
  • Homo Faber by Max Frisch
  • Blue Horses by Mary Oliver


I was slightly less mindful of music this year than usual. However, here's a few albums I particularly enjoyed, in alphabetical order with Bandcamp or other indie-ish purchase links where available:



I bought a new camera (a Canon EOS M6 Mark II) and took a fair amount of photos. I don't do much conventionally creative work so I'm enjoying playing around with the camera and post-processing — with various degrees of success. It's also a great motivator to go outside and move.

I really liked this one:


For the first time ever, I made a photobook of photos taken during a trip I went to with my partner in 2018! It turned out really well, and I think I'll be doing more of these in the future.

A picture of a photobook.


Last year, one of my plans was to learn some electronics and embedded programming, and I've managed to at least get that going by buying a Lily58L keyboard kit and assembling the thing! I love it, and it's the keyboard I now use daily. Here's me trying really hard not to burn myself:

A picture of me soldering.

Thanks to Lovro for both teaching me how to solder & find my way around the project, as well as for taking this picture! It was an incredibly fun feat.


In the recent election, Možemo!, the green/left political party featuring yours truly as a member, ended up with four seats in the Croatian Parliament. As limited as it is, this was a remarkable result, and a refreshing change to the parliamentary structure.

Plans for 2021

This time I'll try to keep it simple. Throughout 2021, I'd like to:

  • get vaccinated against COVID-19;
  • get into Georgia Tech's MS in CS program;
  • read, listen to music;
  • take some pictures, preferably while travelling;
  • survive, preferably without any serious long-term consequences.

January 10, 2021

Derek Jones (derek-jones)

My new kitchen clock January 10, 2021 08:22 PM

After several decades of keeping up with the time, since November my kitchen clock has only been showing the correct time every 12-hours. Before I got to buy a new one, I was asked what I wanted to Christmas, and there was money to spend :-)

Guess what Santa left for me:

Hermle Ravensburg clock.

The Hermle Ravensburg is a mechanical clock, driven by the pull of gravity on a cylindrical 1kg of Iron (I assume).

Setup requires installing the energy source (i.e., hang the cylinder on one end of a chain), attach clock to a wall where there is enough distance for the cylinder to slowly ‘fall’, set the time, add energy (i.e., pull the chain so the cylinder is at maximum height), and set the pendulum swinging.

The chain is long enough for eight days of running. However, for the clock to be visible from outside my kitchen I had to place it over a shelf, and running time is limited to 2.5 days before energy has to be added.

The swinging pendulum provides the reference beat for the running of the clock. The cycle time of a pendulum swing is proportional to the square root of the distance of the center of mass from the pivot point. There is an adjustment ring for fine-tuning the swing time (just visible below the circular gold disc of the pendulum).

I used my knowledge of physics to wind the center of mass closer to the pivot to reduce the swing time slightly, overlooking the fact that the thread on the adjustment ring moved a smaller bar running through its center (which moved in the opposite direction when I screwed the ring towards the pivot). Physics+mechanical knowledge got it right on the next iteration.

I have had the clock running 1-second per hour too slow, and 1-second per hour too fast. Current thinking is that the pendulum is being slowed slightly when the cylinder passes on its slow fall (by increased air resistance). Yes dear reader, I have not been resetting the initial conditions before making a calibration run 😐

What else remains to learn, before summer heat has to be adjusted for?

While the clock face and hands may be great for attracting buyers, it has serious usability issues when it comes to telling the time. It is difficult to tell the time without paying more attention than normal; without being a few feet of the clock it is not possible to tell the time by just glancing at it. The see though nature of the face, the black-on-black of the end of the hour/minute hands, and the extension of the minute hand in the opposite direction all combine to really confuse the viewer.

A wire cutter solved the minute hand extension issue, and yellow fluorescent paint solved the black-on-black issue. Ravensburg clock with improved user interface, framed by faded paint of its predecessor below:

Ravensburg clock with improved user interface.

There is a discrete ting at the end of every hour. This could be slightly louder, and I plan to add some weight to the bell hammer. Had the bell been attached slightly off center, fine volume adjustment would have been possible.

January 07, 2021

Derek Jones (derek-jones)

Likelihood of a fault experience when using the Horizon IT system January 07, 2021 03:34 PM

It looks like the UK Post Office’s Horizon IT system is going to have a significant impact on the prosecution of cases that revolve around the reliability of software systems, at least in the UK. I have discussed the evidence illustrating the fallacy of the belief that “most computer error is either immediately detectable or results from error in the data entered into the machine.” This post discusses what can be learned about the reliability of a program after a fault experience has occurred, or alleged to have occurred in the Horizon legal proceedings.

Sub-postmasters used the Horizon IT system to handle their accounts with the Post Office. In some cases money that sub-postmasters claimed to have transferred did not appear in the Post Office account. The sub-postmasters claimed this was caused by incorrect behavior of the Horizon system, the Post Office claimed it was due to false accounting and prosecuted or fired people and sometimes sued for the ‘missing’ money (which could be in the tens of thousands of pounds); some sub-postmasters received jail time. In 2019 a class action brought by 550 sub-postmasters was settled by the Post Office, and the presiding judge has passed a file to the Director of Public Prosecutions; the Post Office may be charged with instituting and pursuing malicious prosecutions. The courts are working their way through reviewing the cases of the sub-postmasters charged.

How did the Post Office lawyers calculate the likelihood that the missing money was the result of a ‘software bug’?

Horizon trial transcript, day 1, Mr De Garr Robinson acting for the Post Office: “Over the period 2000 to 2018 the Post Office has had on average 13,650 branches. That means that over that period it has had more than 3 million sets of monthly branch accounts. It is nearly 3.1 million but let’s call it 3 million and let’s ignore the fact for the first few years branch accounts were weekly. That doesn’t matter for the purposes of this analysis. Against that background let’s take a substantial bug like the Suspense Account bug which affected 16 branches and had a mean financial impact per branch of £1,000. The chances of that bug affecting any branch is tiny. It is 16 in 3 million, or 1 in 190,000-odd.”

That 3.1 million comes from the calculation: 19-year period times 12 months per year times 13,650 branches.

If we are told that 16 events occurred, and that there are 13,650 branches and 3.1 million transactions, then the likelihood of a particular transaction being involved in one of these events is 1 in 194,512.5. If all branches have the same number of transactions, the likelihood of a particular branch being involved in one of these 16 events is 1 in 853 (13650/16 -> 853); the branch likelihood will be proportional to the number of transactions it performs (ignoring correlation between transactions).

This analysis does not tell us anything about the likelihood that 16 events will occur, and it does not tell us anything about whether these events are the result of a coding mistake or fraud.

We don’t know how many of the known 16 events are due to mistakes in the code and how many are due to fraud. Let’s ask the question: What is the likelihood of one fault experience occurring in a software system that processes a total of 3.1 million transactions (the number of branches is not really relevant)?

The reply to this question is that it is not possible to calculate an answer, because all the required information is not specified.

A software system is likely to contain some number of coding mistakes, and given the appropriate input any of these mistakes may produce a fault experience. The information needed to calculate the likelihood of one fault experience occurring is:

  • the number of coding mistakes present in the software system,
  • for each coding mistake, the probability that an input drawn from the distribution of input values produced by users of the software will produce a fault experience.

Outside of research projects, I don’t know of any anyone who has obtained the information needed to perform this calculation.

The Technical Appendix to Judgment (No.6) “Horizon Issues” states that there were 112 potential occurrences of the Dalmellington issue (paragraph 169), but does not list the number of transactions processed between these ‘issues’ (which would enable a likelihood to be estimated for that one coding mistake).

The analysis of the Post Office expert, Dr Worden, is incorrect in a complicated way (paragraphs 631 through 635). To ‘prove’ that the missing money was very unlikely to be the result of a ‘software bug’, Dr Worden makes a calculation that he claims is the likelihood of a particular branch experiencing a ‘bug’ (he makes the mistake of using the number of known events, not the number of unknown possible events). He overlooks the fact that while the likelihood of a particular branch experiencing an event may be small, the likelihood of any one of the branches experiencing an event is 13,630 times higher. Dr Worden’s creates complication by calculating the number of ‘bugs’ that would have to exist for there to be a 1 in 10 chance of a particular branch experiencing an event (his answer is 50,000), and then points out that 50,000 is such a large number it could not be true.

As an analogy, let’s consider the UK National Lottery, where the chance of winning the Thunderball jackpot is roughly 1 in 8-million per ticket purchased. Let’s say that I bought a ticket and won this week’s jackpot. Using Dr Worden’s argument, the lottery could claim that my chance of winning was so low (1 in 8-million) that I must have created a counterfeit ticket; they could even say that because I did not buy 0.8 million tickets, I did not have a reasonable chance of winning, i.e., a 1 in 10 chance. My chance of winning from one ticket is the same as everybody else who buys one ticket, i.e., 1 in 8-million. If millions of tickets are bought, it is very likely that one of them will win each week. If only, say, 13,650 tickets are bought each week, the likelihood of anybody winning in any week is very low, but eventually somebody will win (perhaps after many years).

The difference between the likelihood of winning the Thunderball jackpot and the likelihood of a Horizon fault experience is that we have enough information to calculate one, but not the other.

The analysis by the defence team produced different numbers, i.e., did not conclude that there was not enough information to perform the calculation.

Is there any way that the information needed to calculate the likelihood of a fault experience occurring?

In theory fuzz testing could be used. In practice this is probably completely impractical. Horizon is a data driven system, and so a copy of the database would need to be used, along with a copy of all the Horizon software. Where is the computer needed to run this software+database? Yes, use of the Post Office computer system would be needed, along with all the necessary passwords.

Perhaps if we wait long enough, a judge will require that one party make all the software+database+computer+passwords available to the other party.

José Padilla (jpadilla)

Figuring out the home network January 07, 2021 05:26 AM

After moving in to our new home 🏠, one of the first things I did was sign up for gigabit internet from @GoNetspeed.

The ONT is in a corner down in our basement and my office is in the second floor at the opposite side.

Follow me as I figure shit out…

Obviously I’m not going to work right next to the ONT all the time, so I at least need a wireless router…

@gcollazo sold me on @Ubiquiti UniFi _things_. We all need these sorta-over-the-top systems to manage our home network after all.

After watching a few videos and reading a few blogs, I bought a Dream Machine.

Basically a cute all-in-one access point, 4-port switch, and Wi-Fi router, with a shit ton of features and settings I know little about with a nice UI/UX.

Surprisingly, first floor is kinda set. Signal on the second floor sucked. I bought a AP BeaconHD W-Fi MeshPoint.

Basically a cute night light with a seamless integration with the Dream Machine that extends Wi-Fi coverage.

Speed tests in my office gave me a nice 300Mbps. Not enough! Where’s the other 700Mbps!?

I knew I needed to get some wiring done.

This house is already wired for phone ☎ and cable 📺 in every room. All of that ends up back on the basement, right next to the ONT.

I tried following a few of those cables, yanked on some, and noticed they were stapled in a couple of places.

I called it quits at that point and thought of hiring an expert to do what I wanted.

Fast forward a month and decided to revisit this during the holiday break 🎄.

I had gone up to the attic once or twice before. Its nice and “finished” only on top of my office. I thought about venturing to the other 3/4 of the space, but was kinda scared I’d just fall through the ceiling and make a horrible mistake 😂

I read up and watched some videos that gave me an idea of how to actually walk up there, so I did! I walked the whole thing a couple of times and got comfortable up there, you know, not falling through and all that

I started tracing some of the cables across and took a look at the corner where I imagined some of the cables from the basement came up. Jackpot!

Found some conduits that I hoped went into the basement(they did). One had two coaxial cables and one phone cable.

I found some roll of cable up there that was disconnected, so I tied a metal door stopper I had lying around, put a crap ton of tape and threw it down the empty conduit. Down all the stairs for the nth time.

There it was! It reached the basement!

This gave me the extra confidence that I could maybe do this with the minimum number of tools, holes, and mistakes. Given that I don’t even own a drill or anything like that. I have my trusty Ikea toolset, one other screwdriver, and a utility knife.

Went to a home improvement store, got 1000ft of CAT6 cable, because I have no real idea how much I’d need.

Also got a few other things I had seen I would need to actually terminate and test the cables.

I thought I’d give a try creating a 3ft cable first. After like an hour of trying to align cables and getting them on the connector, I noticed I had pass-thru connectors but the wrong crimping tool. Apparently you can still just use a utility knife(I have that!) to cut em.

First-timer luck, tested the cable and it passed!

Ok so managed to get one pull line through from the attic to the basement, why stop at one I thought. Lets do more, future me will thank me…

Mistake número 1: this just ended up being tangled a pain in the ass when pulling the actual CAT6 cable up.

Mistake número 2: got multiple pull lines, but no way to identify them. How the hell am I supposed to know which I need to pull up!?

Had to get @anapadilla_ to help me with this mess.

After quite a struggle a pulled the damn thing up to the attic! 🎉

Got it across the attic and up to the conduit thingy that went down into my office.

Mistake número 3: attached the pull string to the cable with some crappy knot and a bunch of tape. Together with the all the strings, and too much pulling, they came off.

Watched some more videos about useful knots for pulling cable, reattached to another pull string and bam💥

After another hour I got the cable terminated into a keystone jack and replaced the wall plate.

Another hour to terminate the other end in the basement. Moment of truth, got the tester hooked up….

Passed! ✅

I bought two UniFi 6 Lite APs, a UniFi Switch PoE 16, a panel, and rack.

Which I’ll install, eventually, right after I buy a drill I guess.

At some point, I’ll also try running an ethernet cable down to a few other key places where there’s a coaxial cable already.

Oh and maybe get into the UniFi Protect stuff for some cameras and doorbell.

Originally tweeted by José Padilla (@jpadilla_) on January 6, 2021.

Andreas Zwinkau (qznc)

Influence (review) January 07, 2021 12:00 AM

Humans rely on mental shortcuts and that is exploited.

Read full article!

January 06, 2021

Marc Brooker (mjb)

Quorum Availability January 06, 2021 12:00 AM

Quorum Availability

It's counterintuitive, but is it right?

In our paper Millions of Tiny Databases, we say this about the availability of quorum systems of various sizes:

As illustrated in Figure 4, smaller cells offer lower availability in the face of small numbers of uncorrelated node failures, but better availability when the proportion of node failure exceeds 50%. While such high failure rates are rare, they do happen in practice, and a key design concern for Physalia.

And this is what Figure 4 looks like:

The context here is that a cell is a Paxos cluster, and the system needs a majority quorum for the cluster to be able to process requests1. A cluster of one box needs one box available, five boxes need three available and so on. The surprising thing here is the claim that having smaller clusters is actually better if the probability of any given machine failing is very high. The paper doesn't explain it well, and I've gotten a few questions about it. This post attempts to do better.

Let's start by thinking about what happens for a cluster of one machine (n=1), in a datacenter of N machines (for very large N). We then fail each machine independently with probability p. What is the probability that our one machine failed? That's trivial: it's p. Now, let's take all N machines and put them into a cluster of n=N. What's the probability that a majority of the cluster is available? For large N, it's 1 for p < 0.5, and 0 for p > 0.5. If less than half the machines fail, less than half have failed. If more than half the machines fail, more than half have failed. Ok?

Notice how a cluster size of 1 is worse than N up until p = 0.5 then better after. Peleg and Wool say:

... for 0 < p < ½ the most available NDC2 is shown to be the "democracy" (namely, the minimal majority system), while the "monarchy" (singleton system) is least available. Due to symmetry, the picture reverses for ½ < p < 1.

Here, the minimal majority system is the one I'd call a majority quorum, and is used by Physalia (and, indeed, most Paxos implementations). The monarchy is where you have one leader node.

What about real practical cluster sizes like n=3, 5, and 7? There are three ways we can do this math. In The Availability of Quorum Systems, Peleg and Wool derive closed-form solutions to this problem3. Our second approach is to observe that the failures of the nodes are Bernoulli trials with probability p, and therefore we can read the answer to "what is the probability that 0 or 1 of 3 fail for probability p" from the distribution function of the binomial distribution. Finally, we can be lazy and do it with Monte Carlo. That's normally my favorite method, because it's easier to include correlation and various "what if?" questions as we go.

Whichever way you calculate it, what do you expect it to look like? For small n you may expect it to be closer in shape to n=1, and for large n you may expect it to approach the shape of n=N. If that's what you expect, you'd be right.

I'll admit that I find this result deeply deeply counter-intuitive. I think it's right, because I've approached it multiple ways, but it still kind of bends my mind a little. That may just be me. I've discussed it with friends and colleagues over the years, and they seem to think it matches their intuition. It's counter-intuitive to me because it suggests that smaller n (smaller clusters, or smaller cells in Physalia's parlance) is better for high p! If you think a lot of your boxes are going to fail, you may get better availability (not durability, though) from smaller clusters.


Correlation to the rescue!

It's not often that my statistical intuition is saved by introducing correlation, but in this case it helps. I'd argue that, in practice, that you only lose machines in an uncorrelated Bernoulli trial way for small p. Above a certain p, it's likely that the failures have some shared cause (power, network, clumsy people, etc) and so the failures are likely to be correlated in some way. In which case, we're back into the game we're playing with Physalia of avoiding those correlated failures by optimizing placement.

In many other kinds of systems, like ones you deploy across multiple datacenters (we'd call that regional in AWS, deployed across multiple availability zones), you end up treating the datacenters as units of failure. In that case, for 3 datacenters you'd pick something like n=9 because you can keep quorum after the failure of an entire datacenter (3 machines) and any one other machine. As soon as there's correlation, the math above is mostly useless and the correlation's cause is all that really matters.

Availability also isn't the only thing to think about with cluster size for quorum systems. Durability, latency, cost, operations, and contention on leader election also come into play. Those are topics for another post (or section 2.2 of Millions of Tiny Databases).


JP Longmore sent me this intuitive explanation, which makes a lot of sense:

Probability of achieving a quorum will increase when removing 2 nodes from a cluster, each with failure rate p>.5, since on average you're removing 2 bad nodes instead of 2 good nodes. Other cases with 1 good node & 1 bad node don't change the outcome (quorum/not). Repeat reasoning till N=1 or all remaining nodes have p<=.5 (if failure rate isn’t uniform).


  1. Physalia uses a very naive Paxos implementation, intentionally optimized for testability and simplicity. The quorum intersection requirements of Paxos (or Paxos-like protocols) are more subtle than this, and work like Heidi Howard et al's Flexible Paxos has been pushing the envelope here recently. Flexible Paxos: Quorum intersection revisited is a good place to start.
  2. Here, an NDC is a non-dominated coterie, and a coterie is a set of groups of nodes (like {{a, b}, {b, c}, {a, c}}). See Definition 2.2 in How to Assign Votes in a Distributed System for the technical definition of domination. What's important, though, is that for each dominated coterie there's a non-dominated coterie that provides the same mutual exclusion properties, but superior availability under partitions. The details are not particularly important here, but are very interesting if you want to do tricky things with quorum intersection.
  3. Along with a whole lot of other interesting facts about quorums, majority quorums and other things. It's a very interesting paper. Another good read in this space is Garcia-Molina and Barbara's How to Assign Votes in a Distributed System, which both does a better job than Peleg and Wool of defining the terms it uses, but also explores the general idea of assigning votes to machines, rather than simply forming quorums of machines. As you read it, it's worth remembering that it predates Paxos, and many of the terms might not mean what you expect.

Andreas Zwinkau (qznc)

Made to Stick (review) January 06, 2021 12:00 AM

Sticky ideas are simple, unexpected, concrete, credentialed, emotional, and a story.

Read full article!

January 05, 2021

Andreas Zwinkau (qznc)

The Three Owners of an Interface January 05, 2021 12:00 AM

An interface is owned above, below, or at the same layer.

Read full article!

January 04, 2021

Andreas Zwinkau (qznc)

The Culture Code (review) January 04, 2021 12:00 AM

Grow an effective team through building safety, sharing vulnerability, and establishing purpose.

Read full article!

January 03, 2021

Derek Jones (derek-jones)

What impact might my evidence-based book have in 2021? January 03, 2021 10:36 PM

What impact might the release of my evidence-based software engineering book have on software engineering in 2021?

Lots of people have seen the book. The release triggered a quarter of a million downloads, or rather it getting linked to on Twitter and Hacker News resulted in this quantity of downloads. Looking at the some of the comments on Hacker News, I suspect that many ‘readers’ did not progress much further than looking at the cover. Some have scanned through it expecting to find answers to a question that interests them, but all they found was disconnected results from a scattering of studies, i.e., the current state of the field.

The evidence that source code has a short and lonely existence is a gift to those seeking to save time/money by employing a quick and dirty approach to software development. Yes, there are some applications where a quick and dirty iterative approach is not a good idea (iterative as in, if we make enough money there will be a version 2), the software controlling aircraft landing wheels being an obvious example (if the wheels don’t deploy, telling the pilot to fly to another airport to see if they work there is not really an option).

There will be a few researchers who pick up an idea from something in the book, and run with it; I have had a couple of emails along this line, mostly from just starting out PhD students. It would be naive to think that lots of researchers will make any significant changes to their existing views on software engineering. Planck was correct to say that science advances one funeral at a time.

I’m hoping that the book will produce a significant improvement in the primitive statistical techniques currently used by many software researchers. At the moment some form of Wilcoxon test, invented in 1945, is the level of statistical sophistication wielded in most software engineering papers (that do any data analysis).

Software engineering research has the feeling of being a disjoint collection of results, and I’m hoping that a few people will be interested in starting to join the dots, i.e., making connections between findings from different studies. There are likely to be a limited number of major dot joinings, and so only a few dedicated people are needed to make it happen. Why hasn’t this happened yet? I think that many academics in computing departments are lifestyle researchers, moving from one project to the next, enjoying the lifestyle, with little interest in any research results once the grant money runs out (apart from trying to get others to cite it). Why do I think this? I have emailed many researchers information about the patterns I have found in the data they sent me, and a common response is almost completely disinterest (some were interested) in any connections to other work.

What impact do you think ‘all’ the evidence presented will have?

Andreas Zwinkau (qznc)

The Heart of Change (review) January 03, 2021 12:00 AM

Change initiatives succeed in stages building on each other.

Read full article!

January 02, 2021

Bogdan Popa (bogdan) January 02, 2021 12:00 AM

I was watching Systems with JT the other day and he demoed a hobby operating system called skiftOS. During the demo he ran one of the built-in apps called “neko” which looks like a clone of an old Windows “pet” program I remember from my childhood, also called “neko” (or “neko32”). It’s a really simple program: when you start it up, a cute little kitten shows up on your screen and starts running around, trying to catch your mouse cursor.

Andreas Zwinkau (qznc)

Turn the Ship Around! (review) January 02, 2021 12:00 AM

The best book about empowerment. Turns a submarine from worst to best.

Read full article!

December 31, 2020

Gustaf Erikson (gerikson)

Advent of Code 2020 December 31, 2020 12:56 PM

Completed all puzzles on 2020-12-30

Project website: Advent of Code 2020.

Previous years: 2015, 2016, 2017, 2018. 2019.

I use Perl for all the solutions.

Most assume the input data is in a file called input.txt in the same directory as the file.

Ratings (new for 2020)

I’m giving each puzzle a subjective rating between 1 and 5. This is based on difficulty, “fiddliness” and how happy I am with my own solution.

A note on scoring

I score my problems to mark where I’ve finished a solution myself or given up and looked for hints. A score of 2 means I solved both the daily problems myself, a score of 1 means I looked up a hint for one of the problems, and a zero score means I didn’t solve any of the problems myself.

My goals for this year (in descending order of priority):

  • get 40 stars or more (75%)
  • solve all problems up until day 15 without any external input
  • solve all problems within 24 hours of release

Final score: 48/50

Link to Github repo.

Day 1 - Day 2 - Day 3 - Day 4 - Day 5 - Day 6 - Day 7 - Day 8 - Day 9 - Day 10 - Day 11 - Day 12 - Day 13 - Day 14 - Day 15 - Day 16 - Day 17 - Day 18 - Day 19 - Day 20 - Day 21 - Day 22 - Day 23 - Day 24 - Day 25

Day 1 - Report Repair

Day 1 - complete solution

Not much to say about this. I used a hash to keep track of the “rest” of the values when comparing.

Apparently this (or at least part 2) is the 3SUM problem which is considered “hard”. I accidentally implemented the standard solution in part 1 so props for that I guess.

I still believe firing up the Perl interpreter and loading the actual file takes longer than just solving part 2 with two nested loops.

Beginning programmer example: loops and hashes/dicts/maps.

Puzzle rating: 3/5

Score: 2

Day 2 - Password Philosophy

Day 2 - complete solution

Despite actually being awake and caffienated when the puzzle dropped, I still managed to mess this up. Extra annoying when it’s basically tailor-made for Perl.

Here’s a partial list

  • messed up the initial regex
  • assigned $min and $max to the same value
  • messed up the comparison range in part 1
  • off-by-one error in the indexing in part 2
  • in part 2, tried to use the sum() function from List::Utils but forgot to use parentheses around the array

Beginning programmer example: parsing input, exclusive OR.

Puzzle rating: 3/5

Score: 2

Day 3 - Toboggan Trajectory

Day 3 - complete solution Day 3 - alternative solution

Veterans of previous AoC’s will get pathfinding flashbacks from this problem’s description, but it turns out it’s a bit simpler - as can be expected for day 3.

I decided before coding to not store the map explicitely as individual coordinates, instead just storing the rows as a text string and unpacking via split when needed.

Another decision was to work with the test input first to confirm my algorithm. That way it would be easier to, for example, print out the rows in case I needed to visually debug.

Beginning programmer example: dealing with infinite repeats using mod.

Puzzle rating: 4/5

Score: 2

Day 4 - Passport Processing

Day 4 - complete solution

This is a “M4P” problem - Made for Perl.

Nothing really to say about this. Set the $/ variable to an empty string to import the records as paragraphs.

I used a dispatch table to avoid a giant if/then/else statement.

Beginning programmer example: regexps! (and handling mult-line input).

Puzzle rating: 3/5

Score: 2

Day 5 - Binary Boarding

Day 5 - complete solution

I have a massive binary blind spot, so I just knew there was going to be a CS-appropriate simple solution to this. But I just followed the instructions and got the right answer in the end anyway.

Beginning programmer example: binary.

Puzzle rating: 3/5

Score: 2

Day 6 - Custom Customs

Day 6 - complete solution

Another easy problem, and during the weekend too! <anxiety intensifies>

My first stab at part 1 contained one specific data structure, that I had to tear out for part 2. After submitting the solution I realized the first solution could work for both.

Puzzle rating: 4/5

Score: 2

Day 7 - Handy Haversacks

Day 7 - complete solution

As was expected, difficulty has ramped up a bit.

Can’t really explain what I did here … the main idea was using BFS to scan the “table”, but part 2 was basically me fiddling around with terms until the test cases passed.

Puzzle rating: 4/5

Score: 2

Day 8 - Handheld Halting

Day 8 - complete solution

This year’s register rodeo, but with a fun twist.

Part 2 was solved by brute forcing every solution, it still took only ~0.3s to find the answer.

Puzzle rating: 4/5

Score: 2

Day 9 - Encoding Error

Day 9 - complete solution

An ok puzzle. It pays to read what’s sought carefully…

Puzzle rating: 4/5

Score: 2

Day 10 - Adapter Array

Day 10 - complete solution

This was a tough one! I was too impatient to really start to optimize after making a solution that solved the two example files, so I “cheated” and looked for inspiration. Full credit in source.

As a bonus I learned about postfix dereferencing in Perl.

Puzzle rating: 4/5

Score: 1.

Day 11 - Seating System

Day 11 - complete solution Day 11 - part 1 Day 11 - part 2

Ugh, got burned by Perl’s negative indices on arrays which messed up part 2. I rewrote it using hashrefs instead.

Puzzle rating: 2/5, mostly because we’ve seen this sort of puzzle before and I don’t enjoy them that much.

Score: 2

Day 12 - Rain Risk

Day 12 - complete solution Day 12 - part 1 Day 12 - part 2

A not too taxing problem.

I don’t like having to re-write the guts of part 1 to solve part 2, however.

Update: after some perusal of the subreddit I realized it was easy enough to run both solutions in one pass, so I rewrote part 2 to handle that.

Puzzle rating: 3/5.

Score: 2

Day 13 - Shuttle Search

Day 13 - complete solution

A tough Sunday problem.

Everyone on the internet figured out that this was an implementation of the Chinese Remainder Theorem, but I followed the logic of a commenter on Reddit (credit in source) and I’m quite proud of the solution.

Puzzle rating: 4/5

Score: 2

Day 14 - Docking Data

Day 14 - complete solution Day 14 - part 1 Day 14 - part 2

A bit of a fiddly problem, which I solved by only dealing with strings and arrays. Bit manipulation is for CS weenies.

I got good help from today’s solutions megathread in the AoC subreddit.

Puzzle rating: 3/5

Score: 2

Day 15 - Rambunctious Recitation

Day 15 - part 1 Day 15 - part 2

This looked complicated at first glance but wasn’t hard to implement.

My part 1 code solves part 2, given a powerful enough computer (I had to use my Raspberry Pi 4). However it takes very long on my standard VPS, so I re-implemented a solution from /u/musifter on Reddit. Credit in source.

Puzzle rating: 3/5

Score: 2

Day 16 - Ticket Translation

Day 16 - complete solution

I realized this was basically the same as 2018D16, but I had a hard time wrapping my head around how to lay out the gathering of required info. A bit of a slog.

Puzzle rating: 3/5

Score: 2

Day 17 - Conway Cubes

Day 17 - part 1 Day 17 - part 2

What if Conway’s game of life - but in more dimensions?!

Not too hard, but not too entertaining either.

Puzzle rating: 3/5

Score: 2

Day 18 - Operator Order

Day 18 - complete solution

A CS staple. So I didn’t feel bad for googling “shunting-yard algorithm” and cribbing a solution from Same for the RPN evaluation algorithm, but I found a much more straightforward implementation on Perlmonks. Credits in source.

I wonder how many CS grads nowadays have even seen a shunting-yard. The nerds in the MIT model railroad society had, of course, and Djikstra too.

Puzzle rating: 3/5

Score: 2

Day 19 - Monster Messages

Day 19 - part 1

For part 1, I tried using Parse::RecDescent and managed to get a kludgy solution, but without really knowing what I was doing.

Skipped part 2 for this one.

Puzzle rating: 3/5

Score: 1

Day 20 - Jurassic Jigsaw

Day 20 - complete solution

This year’s most involved problem. In the end it’s not that difficult, but there are a lot of moving parts.

I’m happy that the code I wrote first (to manipulate grids for part 1) was useful for part 2 too.

Puzzle rating: 4/5

Score: 2

Day 21 - Allergen Assessment

Day 21 - complete solution

Not my favorite puzzle this year. I had the right idea on how to go about it but had to look around for practical solutions.

Puzzle rating: 2/5, mostly because I’m cranky today.

Score: 2

Day 22 - Crab Combat

Day 22 - complete solution

I took a “free card” for part 2 today. Credit in source.

Puzzle rating: 3/5

Score: 1

Day 23 - Crab Cups

Day 23 - part 1 Day 23 - part 2

Part one saw me getting very fiddly with splice, an effort that was not appropriate for part 2…

Score: 2

Day 24 - Lobby Layout

Day 24 - complete solution

A fun little problem. I reused my hex grid logic from 2017D11 and the technique from this year’s day 17 to find the solution.

Puzzle rating: 4/5

Score: 2

Day 25 - Combo Breaker

Day 25 - complete solution

Nice and simple Christmas day puzzle.

I love the denoument of our hero’s journey. Luckily for him, I don’t have all the stars, so I guess I’ll have to stay in the resort until I can finish all the puzzles!

Puzzle rating: 3/5

Score: 2

December 30, 2020

Jan van den Berg (j11g)

Podcast: Donald Knuth Lectures on Things a Computer Scientist Rarely Talks About December 30, 2020 09:45 AM

I recently read ‘Things a Computer Scientist Rarely Talks About’ by Donald Knuth from 2001. Recommended reading if you like reading about how a world-renowned computer scientist wrote a book about how he wrote a book that deals with another book! Sounds recursive 😏

That last book is of course the bible and the book Knuth wrote about it is ‘3:16 Bible Texts Illuminated ‘ — published in 1991. And in 1999 Knuth gave a series of lectures at MIT ‘on the general subject of relations between faith and science’. In these lectures he explains how he went about writing this book and the thought process involved. So the lectures make for an enjoyable deep dive on creating such a book and how Knuth’s brain works, paired with discussions and insights on religion and things like eternity and finiteness.

And it is this series of lectures that are bundled together in ‘Things a Computer Scientist Rarely Talks About’ — almost verbatim. But, the lectures have also always been available as audio files (sadly no visuals) on Knuth’s homepage. And I listened to those a few years back, and as I read this book I was reminded that I had created a RSS feed for these files, effectively creating a Knuth podcast!

This is a later picture of Donald Knuth and not from the 1999 lectures. I added the text, of course in the only possible font.
(I have no copyright on this picture and couldn’t find out who did actually. Feel free to drop me a line if I can accredit you, or if you want it changed.)

I mostly created the file for myself to have the convenience of listening to the lectures in a podcast player. But I have also dropped the link to the XML file here and there over the years, and I noticed 607 unique IP addresses hit this link this month alone! There are only six lectures and one panel discussion and never any new content, so I am not sure what these numbers mean, if they mean anything at all.

But I also remembered I had never blogged about this, until now. So without further ado here is the link:

You can add this to your favorite podcast player. I have added the feed to Overcast myself so it looks like this which is nice.

Having the audiofiles available in a podcast player enables you to track progress, speed up/down parts and have an enhanced audio experience.

I do remember writing an email (no hyphen) to Knuth’s office and I received a nice reply that they thought it was ‘a great idea’, and they were actually also thinking of starting their own podcast ‘based on these materials’. However I haven’t found any link to this yet, so for now it is just this.

If you are more into video, here is a great conversation Donald Knuth had with Lex Fridman last year. Published exactly a year ago to this day. The video is not embeddable but you can click the image to go there. Recommended.

0.jpg (480×360)

The post Podcast: Donald Knuth Lectures on Things a Computer Scientist Rarely Talks About appeared first on Jan van den Berg.

December 29, 2020

Dan Luu (dl)

Against essential and accidental complexity December 29, 2020 12:00 AM

In the classic 1986 essay, No Silver Bullet, Fred Brooks argued that there is, in some sense, not that much that can be done to improve programmer productivity. His line of reasoning is that programming tasks contain a core of essential/conceptual1 complexity that's fundamentally not amenable to attack by any potential advances in technology (such as languages or tooling). He then uses an Ahmdahl's law argument, saying that because 1/X of complexity is essential, it's impossible to ever get more than a factor of X improvement via technological improvements.

Towards the end of the essay, Brooks claims that at least 1/2 (most) of complexity in programming is essential, bounding the potential improvement remaining for all technological programming innovations combined to, at most, a factor of 22:

All of the technological attacks on the accidents of the software process are fundamentally limited by the productivity equation:

Time of task = Sum over i { Frequency_i Time_i }

If, as I believe, the conceptual components of the task are now taking most of the time, then no amount of activity on the task components that are merely the expression of the concepts can give large productivity gains.

Let's see how this essential complexity claim holds for a couple of things I did recently at work:

  • scp from a bunch of hosts to read and download logs, and then parse the logs to understand the scope of a problem
  • Query two years of metrics data from every instance of every piece of software my employer has, for some classes of software and then generate a variety of plots that let me understand some questions I have about what our software is doing and how it's using computer resources


If we break this task down, we have

  • scp logs from a few hundred thousand machines to a local box
    • used a Python script for this to get parallelism with more robust error handling than you'd get out of pssh/parallel-scp
    • ~1 minute to write the script
  • do other work while logs download
  • parse downloaded logs (a few TB)
    • used a Rust script for this, a few minutes to write (used Rust instead of Python for performance reasons here — just opening the logs and scanning each line with idiomatic Python was already slower than I'd want if I didn't want to farm the task out to multiple machines)

In 1986, perhaps I would have used telnet or ftp instead of scp. Modern scripting languages didn't exist yet (perl was created in 1987 and perl5, the first version that some argue is modern, was released in 1994), so writing code that would do this with parallelism and "good enough" error handling would have taken more than an order of magnitude more time than it takes today. In fact, I think just getting semi-decent error handling while managing a connection pool could have easily taken an order of magnitude longer than this entire task took me (not including time spent downloading logs in the background).

Next up would be parsing the logs. It's not fair to compare an absolute number like "1 TB", so let's just call this "enough that we care about performance" (we'll talk about scale in more detail in the metrics example). Today, we have our choice of high-performance languages where it's easy to write, fast, safe code and harness the power of libraries (e.g., a regexp library3) that make it easy to write a quick and dirty script to parse and classify logs, farming out the work to all of the cores on my computer (I think Zig would've also made this easy, but I used Rust because my team has a critical mass of Rust programmers).

In 1986, there would have been no comparable language, but more importantly, I wouldn't have been able to trivially find, download, and compile the appropriate libraries and would've had to write all of the parsing code by hand, turning a task that took a few minutes into a task that I'd be lucky to get done in an hour. Also, if I didn't know how to use the library or that I could use a library, I could easily find out how I should solve the problem on StackOverflow, which would massively reduce accidental complexity. Needless to say, there was no real equivalent to Googling for StackOverflow solutions in 1986.

Moreover, even today, this task, a pretty standard programmer devops/SRE task, after at least an order of magnitude speedup over the analogous task in 1986, is still nearly entirely accidental complexity.

If the data were exported into our metrics stack or if our centralized logging worked a bit differently, the entire task would be trivial. And if neither of those were true, but the log format were more uniform, I wouldn't have had to write any code after getting the logs; rg or ag would have been sufficient. If I look for how much time I spent on the essential conceptual core of the task, it's so small that it's hard to estimate.

Query metrics

We really only need one counter-example, but I think it's illustrative to look at a more complex task to see how Brooks' argument scales for a more involved task. If you'd like to skip this lengthy example, click here to skip to the next section.

We can view my metrics querying task as being made up of the following sub-tasks:

  • Write a set of Presto SQL queries that effectively scan on the order of 100 TB of data each, from a data set that would be on the order of 100 PB of data if I didn't maintain tables that only contain a subset of data that's relevant
    • Maybe 30 seconds to write the first query and a few minutes for queries to finish, using on the order of 1 CPU-year of CPU time
  • Write some ggplot code to plot the various properties that I'm curious about
    • Not sure how long this took; less time than the queries took to complete, so this didn't add to the total time of this task

The first of these tasks is so many orders of magnitude quicker to accomplish today that I'm not even able to hazard a guess to as to how much quicker it is today within one or two orders of magnitude, but let's break down the first task into component parts to get some idea about the ways in which the task has gotten easier.

It's not fair to port absolute numbers like 100 PB into 1986, but just the idea of having a pipeline that collects and persists comprehensive data analogous to the data I was looking at for a consumer software company (various data on the resource usage and efficiency of our software) would have been considered absurd in 1986. Here we see one fatal flaw in the concept of accidental essential complexity providing an upper bound on productivity improvements: tasks with too much accidental complexity wouldn't have even been considered possible. The limit on how much accidental complexity Brooks sees is really a limit of his imagination, not something fundamental.

Brooks explicitly dismisses increased computational power as something that will not improve productivity ("Well, how many MIPS can one use fruitfully?", more on this later), but both storage and CPU power (not to mention network speed and RAM) were sources of accidental complexity so large that they bounded the space of problems Brooks was able to conceive of.

In this example, let's say that we somehow had enough storage to keep the data we want to query in 1986. The next part would be to marshall on the order of 1 CPU-year worth of resources and have the query complete in minutes. As with the storage problem, this would have also been absurd in 19864, so we've run into a second piece of non-essential complexity so large that it would stop a person from 1986 from thinking of this problem at all.

Next up would be writing the query. If I were writing for the Cray-2 and wanted to be productive, I probably would have written the queries in Cray's dialect of Fortran 77. Could I do that in less than 300 seconds per query? Not a chance; I couldn't even come close with Scala/Scalding and I think it would be a near thing even with Python/PySpark. This is the aspect where I think we see the smallest gain and we're still well above one order of magnitude here.

After we have the data processed, we have to generate the plots. Even with today's technology, I think not using ggplot would cost me at least 2x in terms of productivity. I've tried every major plotting library that's supposedly equivalent (in any language) and every library I've tried either has multiple show-stopping bugs rendering plots that I consider to be basic in ggplot or is so low-level that I lose more than 2x productivity by being forced to do stuff manually that would be trivial in ggplot. In 2020, the existence of a single library already saves me 2x on this one step. If we go back to 1986, before the concept of the grammar of graphics and any reasonable implementation, there's no way that I wouldn't lose at least two orders of magnitude of time on plotting even assuming some magical workstation hardware that was capable of doing the plotting operations I do in a reasonable amount of time (my machine is painfully slow at rendering the plots; a Cray-2 would not be able to do the rendering in anything resembling a reasonable timeframe).

The number of orders of magnitude of accidental complexity reduction for this problem from 1986 to today is so large I can't even estimate it and yet this problem still contains such a large fraction of accidental complexity that it's once again difficult to even guess at what fraction of complexity is essential. To write it all down all of the accidental complexity I can think of would require at least 20k words, but just to provide a bit of the flavor of the complexity, let me write down a few things.

  • SQL; this is one of those things that's superficially simple but actually extremely complex
    • Also, Presto SQL
  • Arbitrary Presto limits, some of which are from Presto and some of which are from the specific ways we operate Presto and the version we're using
    • There's an internal Presto data structure assert fail that gets triggered when I use both numeric_histogram and cross join unnest in a particular way. Because it's a waste of time to write the bug-exposing query, wait for it to fail, and then re-write it, I have a mental heuristic I use to guess, for any query that uses both constructs, whether or not I'll hit the bug and I apply it to avoid having to write two queries. If the heuristic applies, I'll instead write a more verbose query that's slower to execute instead of the more straightforward query
    • We partition data by date, but Presto throws this away when I join tables, resulting in very large and therefore expensive joins when I join data across a long period of time even though, in principle, this could be a series of cheap joins; if the join is large enough to cause my query to blow up, I'll write what's essentially a little query compiler to execute day-by-day queries and then post-process the data as necessary instead of writing the naive query
      • There are a bunch of cases where some kind of optimization in the query will make the query feasible without having to break the query across days (e.g., if I want to join host-level metrics data with the table that contains what cluster a host is in, that's a very slow join across years of data, but I also know what kinds of hosts are in which clusters, which, in some cases, lets me filter hosts out of the host-level metrics data that's in there, like core count and total memory, which can make the larger input to this join small enough that the query can succeed without manually partitioning the query)
    • We have a Presto cluster that's "fast" but has "low" memory limits a cluster that's "slow" but has "high" memory limits, so I mentally estimate how much per-node memory a query will need so that I can schedule it to the right cluster
    • etc.
  • When, for performance reasons, I should compute the CDF or histogram in Presto vs. leaving it to the end for ggplot to compute
  • How much I need to downsample the data, if at all, for ggplot to be able to handle it, and how that may impact analyses
  • Arbitrary ggplot stuff
    • roughly how many points I need to put in a scatterplot before I should stop using size = [number] and should switch to single-pixel plotting because plotting points as circles is too slow
    • what the minimum allowable opacity for points is
    • If I exceed the maximum density where you can see a gradient in a scatterplot due to this limit, how large I need to make the image to reduce the density appropriately (when I would do this instead of using a heatmap deserves its own post)
    • etc.
  • All of the above is about tools that I use to write and examine queries, but there's also the mental model of all of the data issues that must be taken into account when writing the query in order to generate a valid result, which includes things like clock skew, Linux accounting bugs, issues with our metrics pipeline, issues with data due to problems in the underlying data sources, etc.
  • etc.

For each of Presto and ggplot I implicitly hold over a hundred things in my head to be able to get my queries and plots to work and I choose to use these because these are the lowest overhead tools that I know of that are available to me. If someone asked me to name the percentage of complexity I had to deal with that was essential, I'd say that it was so low that there's no way to even estimate it. For some queries, it's arguably zero — my work was necessary only because of some arbitrary quirk and there would be no work to do without the quirk. But even in cases where some kind of query seems necessary, I think it's unbelievable that essential complexity could have been more than 1% of the complexity I had to deal with.

Revisiting Brooks on computer performance, even though I deal with complexity due to the limitations of hardware performance in 2020 and would love to have faster computers today, Brooks wrote off faster hardware as pretty much not improving developer productivity in 1986:

What gains are to be expected for the software art from the certain and rapid increase in the power and memory capacity of the individual workstation? Well, how many MIPS can one use fruitfully? The composition and editing of programs and documents is fully supported by today’s speeds. Compiling could stand a boost, but a factor of 10 in machine speed would surely . . .

But this is wrong on at least two levels. First, if I had access to faster computers, a huge amount of my accidental complexity would go away (if computers were powerful enough, I wouldn't need complex tools like Presto; I could just run a query on my local computer). We have much faster computers now, but it's still true that having faster computers would make many involved engineering tasks trivial. As James Hague notes, in the mid-80s, writing a spellchecker was a serious engineering problem due to performance constraints.

Second, (just for example) ggplot only exists because computers are so fast. A common complaint from people who work on performance is that tool X has somewhere between two and ten orders of magnitude of inefficiency when you look at the fundamental operations it does vs. the speed of hardware today5. But what fraction of programmers can realize even one half of the potential performance of a modern multi-socket machine? I would guess fewer than one in a thousand and I would say certainly fewer than one in a hundred. And performance knowledge isn't independent of other knowledge — controlling for age and experience, it's negatively correlated with knowledge of non-"systems" domains since time spent learning about the esoteric accidental complexity necessary to realize half of the potential of a computer is time spent not learning about "directly" applicable domain knowledge. When we look software that requires a significant amount of domain knowledge (e.g., ggplot) or that'slarge enough that it requires a large team to implement (e.g., IntelliJ6), the vast majority of it wouldn't exist if machines were orders of magnitude slower and writing usable software required wringing most of the performance out of the machine. Luckily for us, hardware has gotten much faster, allowing the vast majority of developers to ignore performance-related accidental complexity and instead focus on all of the other accidental complexity necessary to be productive today.

Faster computers both reduce the amount of accidental complexity tool users run into as well as the amount of accidental complexity that tool creators need to deal with, allowing more productive tools to come into existence.


To summarize, Brooks states a bound on how much programmer productivity can improve. But, in practice, to state this bound correctly, one would have to be able to conceive of problems that no one would reasonably attempt to solve due to the amount of friction involved in solving the problem with current technologies.

Without being able to predict the future, this is impossible to estimate. If we knew the future, it might turn out that there's some practical limit on how much computational power or storage programmers can productively use, bounding the resources available to a programmer, but getting a bound on the amount of accidental complexity would still require one to correctly reason about how programmers are going to be able to use zillions times more resources than are available today, which is so difficult we might as well call it impossible.

Moreover, for each class of tool that could exist, one would have to effectively anticipate all possible innovations. Brooks' strategy for this was to look at existing categories of tools and state, for each, that they would be ineffective or that they were effective but played out. This was wrong not only because it underestimated gains from classes of tools that didn't exist yet, weren't yet effective, or he wasn't familiar with (e.g., he writes off formal methods, but it doesn't even occur to him to mention fuzzers, static analysis tools that don't fully formally verify code, tools like valgrind, etc.) but also because Brooks thought that every class of tool where there was major improvement was played out and it turns out that none of them were (e.g., programming languages, which Brooks wrote just before the rise of "scripting languages" as well as just before GC langauges took over the vast majority of programming).

In some sense, this isn't too different from when we looked at Unix and found the Unix mavens saying that we should write software like they did in the 70s and that the languages they invented are as safe as any language can be. Long before computers were invented, elders have been telling the next generation that they've done everything that there is to be done and that the next generation won't be able to achieve more. Even without knowing any specifics about programming, we can look at how well these kinds of arguments have held up historically and have decent confidence that the elders are not, in fact, correct this time.

Looking at the specifics with the benefit of hindsight, we can see that Brooks' 1986 claim that we've basically captured all the productivity gains high-level languages can provide isn't too different from an assembly language programmer saying the same thing in 1955, thinking that assembly is as good as any language can be7 and that his claims about other categories are similar. The main thing these claims demonstrate are a lack of imagination. When Brooks referred to conceptual complexity, he was referring to complexity of using the conceptual building blocks that Brooks was familiar with in 1986 (on problems that Brooks would've thought of as programming problems). There's no reason anyone should think that Brooks' 1986 conception of programming is fundamental any more than they should think that how an assembly programmer from 1955 thought was fundamental. People often make fun of the apocryphal "640k should be enough for anybody" quote, but Brooks saying that, across all categories of potential productivity improvement, we've done most of what's possible to do, is analogous and not apocryphal!

We've seen that, if we look at the future, the fraction of complexity that might be accidental is effectively unbounded. One might argue that, if we look at the present, these terms wouldn't be meaningless. But, while this will vary by domain, I've personally never worked on a non-trivial problem that isn't completely dominated by accidental complexity, making the concept of essential complexity meaningless on any problem I've worked on that's worth discussing.

Thanks to Peter Bhat Harkins, Ben Kuhn, Yuri Vishnevsky, Chris Granger, Wesley Aptekar-Cassels, Lifan Zeng, Scott Wolchok, Martin Horenovsky, @realcmb, Kevin Burke, Aaron Brown, and Saul Pwanson for comments/corrections/discussion.

  1. The accidents I discuss in the next section. First let us consider the essence

    The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions. This essence is abstract, in that the conceptual construct is the same under many different representations. It is nonetheless highly precise and richly detailed.

    I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared to the conceptual errors in most systems.

  2. Curiously, he also claims, in the same essay, that no individual improvement can yield a 10x improvement within one decade. While this technically doesn't contradict his Ahmdal's law argument plus the claim that "most" (i.e., at least half) of complexity is essential/conceptual, it's unclear why he would include this claim as well.

    When Brooks revisited his essay in 1995 in No Silver Bullet Refired, he claimed that he was correct by using the weakest form of the three claims he made in 1986, that within one decade, no single improvement would result in an order of magnitude improvement. However, he did then re-state the strongest form of the claim he made in 1986 and made it again in 1995, saying that this time, no set of technological improvements could improve productivity more than 2x, for real:

    It is my opinion, and that is all, that the accidental or representational part of the work is now down to about half or less of the total. Since this fraction is a question of fact, its value could in principle be settled by measurement. Failing that, my estimate of it can be corrected by better informed and more current estimates. Significantly, no one who has written publicly or privately has asserted that the accidental part is as large as 9/10.

    By the way, I find it interesting that he says that no one disputed this 9/10ths figure. Per the body of this post, I would put it at far above 9/10th for my day-to-day work and, if I were to try to solve the same problems in 1986, the fraction would have been so high that people wouldn't have even conceived of the problem. As a side effect of having worked in hardware for a decade, I've also done work that's not too different from what some people faced in 1986 (microcode, assembly & C written for DOS) and I would put that work as easily above 9/10th as well.

    Another part of his follow-up that I find interesting is that he quotes Harel's "Biting the Silver Bullet" from 1992, which, among other things, argues that that decade deadline for an order of magnitude improvement is arbitrary. Brooks' response to this is

    There are other reasons for the decade limit: the claims made for candidate bullets all have had a certain immediacy about them . . . We will surely make substantial progress over the next 40 years; an order of magnitude over 40 years is hardly magical.

    But by Brooks' own words when he revisits the argument in 1995, if 9/10th of complexity is essential, it would be impossible to get more than an order of magnitude improvement from reducing it, with no caveat on the timespan:

    "NSB" argues, indisputably, that if the accidental part of the work is less than 9/10 of the total, shrinking it to zero (which would take magic) will not give an order of magnitude productivity improvement.

    Both his original essay and the 1995 follow-up are charismatically written and contain a sort of local logic, where each piece of the essay sounds somewhat reasonable if you don't think about it too hard and you forget everything else the essay says. As with the original, a pedant could argue that this is technically not incoherent — after all, Brooks could be saying:

    • at most 9/10th of complexity is accidental (if we ignore the later 1/2 claim, which is the kind of suspension of memory/disbelief one must do to read the essay)
    • it would not be surprising for us to eliminate 100% of accidental complexity after 40 years

    While this is technically consistent (again, if we ignore the part that's inconsistent) and is a set of claims one could make, this would imply that 40 years from 1986, i.e., in 2026, it wouldn't be implausible for there to be literally zero room for any sort of productivity improvement from tooling, languages, or any other potential source of improvement. But this is absurd. If we look at other sections of Brooks' essay and combine their reasoning, we see other inconsistencies and absurdities.

  3. Another issue that we see here is Brooks' insistence on bright-line distinctions between categories. Essential vs. accidental complexity. "Types" of solutions, such as languages vs. "build vs. buy", etc.

    Brooks admits that "build vs. buy" is one avenue of attack on essential complexity. Perhaps he would agree that buying a regexp package would reduce the essential complexity since that would allow me to avoid keeping all of the concepts associated with writing a parser in my head for simple tasks. But what if, instead of buying regexes, I used a language where they're bundled into the standard library or is otherwise distributed with the language? Or what if, instead of having to write my own concurrency primitives, those are bundled into the language? Or for that matter, what about an entire HTTP server? There is no bright-line distinction between what's in a library one can "buy" (for free in many cases nowadays) and one that's bundled into the language, so there cannot be a bright-line distinction between what gains a language provides and what gains can be "bought". But if there's no bright-line distinction here, then it's not possible to say that one of these can reduce essential complexity and the other can't and maintain a bright-line distinction between essential and accidental complexity (in a response to Brooks, Harel argued against there being a clear distinction in a response, and Brooks' response was to say that there there is, in fact, a bright-line distinction, although he provided no new argument).

    Brooks' repeated insistence on these false distinctions means that the reasoning in the essay isn't composable. As we've already seen in another footnote, if you take reasoning from one part of the essay and apply it alongside reasoning from another part of the essay, it's easy to create absurd outcomes and sometimes outright contradictions.

    I suspect this is one reason discussions about essential vs. accidental complexity are so muddled. It's not just that Brooks is being vague and handwave-y, he's actually not self-consistent, so there isn't and cannot be a coherent takeaway. Michael Feathers has noted that people are generally not able to correct identify essential complexity; as he says, One person’s essential complexity is another person’s accidental complexity.. This is exactly what we should expect from the essay, since people who have different parts of it in mind will end up with incompatible views.

    This is also a problem when critisizing Brooks. Inevitably, someone will say that what Brooks really meant was something completely different. And that will be true. But Brooks will have meant something completely different while also having meant the things he said that I mention. In defense of the view I'm presenting in the body of the text here, it's a coherent view that one could have had in 1986. Many of Brooks' statements don't make sense even when considered as standalone statements, let alone when cross-referenced with the rest of his essay. For example, the statement that no single development will result in an order of magnitude improvement in the next decade. This statement is meaningless as Brooks does not define and no one can definitively say what a "single improvement" is. And, as mentioned above, Brooks' essay reads quite oddly and basically does not make sense if that's what he's trying to claim. Another issue with most other readings of Brooks is that those are positions that are also meaningless even if Brooks had done the work to make them well defined. Why does it matter if one single improvement or two result in an order of magnitude improvement. If it's two improvements, we'll use them both.

  4. Let's arbitrarily use a Motorola 68k processor with an FP co-processor that could do 200 kFLOPS as a reference for how much power we might have in a consumer CPU (FLOPS is a bad metric for multiple reasons, but this is just to get an idea of what it would take to get 1 CPU-year of computational resources, and Brooks himself uses MIPS as a term as if it's meaningful). By comparison, the Cray-2 could achieve 1.9 GFLOPS, or roughly 10000x the performance (I think actually less if we were to do a comparable comparison instead of using non-comparable GFLOPS numbers, but let's be generous here). There are 525600 / 5 = 105120 five minute periods in a year, so to get 1 CPU year's worth of computation in five minutes we'd need 105120 / 10000 = 10 Cray-2s per query, not including the overhead of aggregating results across Cray-2s.

    It's unreasonable to think that a consumer software company in 1986 would have enough Cray-2s lying around to allow for any random programmer to quickly run CPU years worth of queries whenever they wanted to do some data analysis. One sources claims that 27 Cray-2s were ever made over the production lifetime of the machine (1985 to 1990). Even if my employer owned all of them and they were all created by 1986, that still wouldn't be sufficient to allow the kind of ad hoc querying capacity that I have access to in 2020.

    Today, someone at a startup can even make an analogous argument when comparing to a decade ago. You used to have to operate a cluster that would be prohibitively annoying for a startup to operate unless the startup is very specialized, but you can now just use Snowflake and basically get Presto but only pay for the computational power you use (plus a healthy markup) instead of paying to own a cluster and for all of the employees necessary to make sure the cluster is operable.

  5. I actually run into one of these every time I publish a new post. I write my posts in Google docs and then copy them into emacs running inside tmux running inside Alacritty. My posts are small enough to fit inside L2 cache, so I could have 64B/3.5 cycle write bandwidth. And yet, the copy+paste operation can take ~1 minute and is so slow I can watch the text get pasted in. Since my chip is working super hard to make sure the copy+paste happens, it's running at its full non-turbo frequency of 4.2Ghz, giving it 76.8GB/s of write bandwidth. For a 40kB post, 1 minute = 666B/s. 76.8G/666 =~ 8 orders of magnitude left on the table. [return]
  6. In this specific case, I'm sure somebody will argue that Visual Studio was quite nice in 2000 and ran on much slower computers (and the debugger was arguably better than it is in the current version). But there was no comparable tool on Linux, nor was there anything comparable to today's options in the VSCode-like space of easy-to-learn programming editor that provides programming-specific facilities (as opposed to being a souped up version of notepad) without being a full-fledged IDE. [return]
  7. And by the way, this didn't only happen in 1955. I've worked with people who, this century, told me that assembly is basically as productive as any high level language. This probably sounds ridiculous to almost every reader of this blog, but if you talk to people who spend all day writing microcode or assembly, you'll occasionally meet somebody who believes this.

    Thinking that the tools you personally use are as good as it gets is an easy trap to fall into.


December 27, 2020

Derek Jones (derek-jones)

Source code discovery, skipping over the legal complications December 27, 2020 10:27 PM

The 2020 US elections introduced the issue of source code discovery, in legal cases, to a wider audience. People wanted to (and still do) check that the software used to register and count votes works as intended, but the companies who wrote the software wouldn’t make it available and the courts did not compel them to do so.

I was surprised to see that there is even a section on “Transfer of or access to source code” in the EU-UK trade and cooperation agreement, agreed on Christmas Eve.

I have many years of experience in discovering problems in the source code of programs I did not write. This experience derives from my time as a compiler implementer (e.g., a big customer is being held up by a serious issue in their application, and the compiler is being blamed), and as a static analysis tool vendor (e.g., managers want to know about what serious mistakes may exist in the code of their products). In all cases those involved wanted me there, I could talk to some of those involved in developing the code, and there were known problems with the code. In court cases, the defence does not want the prosecution looking at the code, and I assume that all conversations with the people who wrote the code goes via the lawyers. I have intentionally stayed away from this kind of work, so my practical experience of working on legal discovery is zero.

The most common reason companies give for not wanting to make their source code available is that it contains trade-secrets (they can hardly say that it’s because they don’t want any mistakes in the code to be discovered).

What kind of trade-secrets might source code contain? Most code is very dull, and for some programs the only trade-secret is that if you put in the implementation effort, the obvious way of doing things works, i.e., the secret sauce promoted by the marketing department is all smoke and mirrors (I have had senior management, who have probably never seen the code, tell me about the wondrous properties of their code, which I had seen and knew that nothing special was present).

Comments may detail embarrassing facts, aka trade-secrets. Sometimes the code interfaces to a proprietary interface format that the company wants to keep secret, or uses some formula that required a lot of R&D (management gets very upset when told that ‘secret’ formula can be reverse engineered from the executable code).

Why does a legal team want access to source code?

If the purpose is to check specific functionality, then reading the source code is probably the fastest technique. For instance, checking whether a particular set of input values can cause a specific behavior to occur, or tracing through the logic to understand the circumstances under which a particular behavior occurs, or in software patent litigation checking what algorithms or formula are being used (this is where trade-secret claims appear to be valid).

If the purpose is a fishing expedition looking for possible incorrect behaviors, having the source code is probably not that useful. The quantity of source contained in modern applications can be huge, e.g., tens to hundreds of thousands of lines.

In ancient times (i.e., the 1970s and 1980s) programs were short (because most computers had tiny amounts of memory, compared to post-2000), and it was practical to read the source to understand a program. Customer demand for more features, and the fact that greater storage capacity removed the need to spend time reducing code size, means that source code ballooned. The following plot shows the lines of code contained in the collected algorithms of the Transactions on Mathematical Software, the red line is a fitted regression model of the form: LOC approx e^{0.0003Day}(code+data):

Lines of code contained in the collected algorithms of the Transactions on Mathematical Software, over time.

How, by reading the source code, does anybody find mistakes in a 10+ thousand line program? If the program only occasionally misbehaves, finding a coding mistake by reading the source is likely to be very very time-consuming, i.e, months. Work it out yourself: 10K lines of code is around 200 pages. How long would it take you to remember all the details and their interdependencies of a detailed 200-page technical discussion well enough to spot an inconsistency likely to cause a fault experience? And, yes, the source may very well be provided as a printout, or as a pdf on a protected memory stick.

From my limited reading of accounts of software discovery, the time available to study the code may be just days or maybe a week or two.

Reading large quantities of code, to discover possible coding mistakes, are an inefficient use of human time resources. Some form of analysis tool might help. Static analysis tools are one option; these cost money and might not be available for the language or dialect in which the source is written (there are some good tools for C because it has been around so long and is widely used).

Character assassination, or guilt by innuendo is another approach; the code just cannot be trusted to behave in a reasonable manner (this approach is regularly used in the software business). Software metrics are deployed to give the impression that it is likely that mistakes exist, without specifying specific mistakes in the code, e.g., this metric is much higher than is considered reasonable. Where did these reasonable values come from? Someone, somewhere said something, the Moon aligned with Mars and these values became accepted ‘wisdom’ (no, reality is not allowed to intrude; the case is made by arguing from authority). McCabe’s complexity metric is a favorite, and I have written how use of this metric is essentially accounting fraud (I have had emails from several people who are very unhappy about me saying this). Halstead’s metrics are another favorite, and at least Halstead and others at the time did some empirical analysis (the results showed how ineffective the metrics were; the metrics don’t calculate the quantities claimed).

The software development process used to create software is another popular means of character assassination. People seem to take comfort in the idea that software was created using a defined process, and use of ad-hoc methods provides an easy target for ridicule. Some processes work because they include lots of testing, and doing lots of testing will of course improve reliability. I have seen development groups use a process and fail to produce reliable software, and I have seen ad-hoc methods produce reliable software.

From what I can tell, some expert witnesses are chosen for their ability to project an air of authority and having impressive sounding credentials, not for their hands-on ability to dissect code. In other words, just the kind of person needed for a legal strategy based on character assassination, or guilt by innuendo.

What is the most cost-effective way of finding reliability problems in software built from 10k+ lines of code? My money is on fuzz testing, a term that should send shivers down the spine of a defense team. Source code is not required, and the output is a list of real fault experiences. There are a few catches: 1) the software probably to be run in the cloud (perhaps the only cost/time effective way of running the many thousands of tests), and the defense is going to object over licensing issues (they don’t want the code fuzzed), 2) having lots of test harnesses interacting with a central database is likely to be problematic, 3) support for emulating embedded cpus, even commonly used ones like the Z80, is currently poor (this is a rapidly evolving area, so check current status).

Fuzzing can also be used to estimate the numbers of so-far undetected coding mistakes.

Ponylang (SeanTAllen)

Last Week in Pony - December 27, 2020 December 27, 2020 09:54 PM

Pony 0.38.2 has been released! Audio from the December 15 sync call is available.

Jeff Carpenter (jeffcarp)

2020 Year in Review December 27, 2020 12:00 AM

2020. What can I say? It was a year I don’t think any of us need to live again. Here’s a brief overview of my year. Life In February on Valentine’s day, Elva and I adopted Noona(!), an 8 year old Siberian Husky. She’s the chillest, sweetest dog. Due to lockdown starting 2 weeks later, I’m sure she thinks that spending all day at the house giving her attention is normal.

December 26, 2020

Pete Corey (petecorey)

MongoDB Lookup Aggregation December 26, 2020 12:00 AM

Recently, I received an email from a reader asking for tips on writing a MongoDB aggregation that aggregated the layers of a tree, stored in separate collections, into a single document:

Hi Pete,

I had a question related to your article on MongoDB object array lookup aggregations.

I’m working on something similar, but with a small difference. Imagine I have three collections that represent the different layers of a tree. A is the root. B are the children of A, and C are the children of B. Each child holds the ID of its parent in a parentId field.

The end goal is to write an aggregation that fleshes out every layer of the tree:

  B: [
      C: [

How should I approach this? Thanks.

Hello friend,

I feel your pain. Writing MongoDB aggregation feels like an under-documented dark art. In newer versions of Mongo you can write sub-pipelines under lookups. I think this will get you where you want to go:

    $lookup: {
      from: 'b',
      let: { "id": '$_id' },
      as: 'b',
      pipeline: [
        { $match: { $expr: { $eq: ['$$id', '$parentId'] } } },
          $lookup: {
            from: 'c',
            let: { "id": '$_id' },
            as: 'c',
            pipeline: [
              { $match: { $expr: { $eq: ['$$id', '$parentId'] } } },

You can keep adding sub-piplines until you get as deep as you need.

I hope that helps.


December 23, 2020

Oliver Charles (ocharles)

Monad Transformers and Effects with Backpack December 23, 2020 12:00 AM

A good few years ago Edward Yang gifted us an implementation of Backpack - a way for us to essentially abstract modules over other modules, allowing us to write code independently of implementation. A big benefit of doing this is that it opens up new avenues for program optimization. When we provide concrete instantiations of signatures, GHC compiles it as if that were the original code we wrote, and we can benefit from a lot of specialization. So aside from organizational concerns, Backpack gives us the ability to write some really fast code. This benefit isn’t just theoretical - Edward Kmett gave us unpacked-containers, removing a level of indirection from all keys, and Oleg Grenrus showed as how we can use Backpack to “unroll” fixed sized vectors. In this post, I want to show how we can use Backpack to give us the performance benefits of explicit transformers, but without having library code commit to any specific stack. In short, we get the ability to have multiple interpretations of our program, but without paying the performance cost of abstraction.

The Problem

Before we start looking at any code, let’s look at some requirements, and understand the problems that come with some potential solutions. The main requirement is that we are able to write code that requires some effects (in essence, writing our code to an effect interface), and then run this code with different interpretations. For example, in production I might want to run as fast as possible, in local development I might want further diagnostics, and in testing I might want a pure or in memory solution. This change in representation shouldn’t require me to change the underlying library code.

Seasoned Haskellers might be familiar with the use of effect systems to solve these kinds of problems. Perhaps the most familiar is the mtl approach - perhaps unfortunately named as the technique itself doesn’t have much to do with the library. In the mtl approach, we write our interfaces as type classes abstracting over some Monad m, and then provide instances of these type classes - either by stacking transformers (“plucking constraints”, in the words of Matt Parson), or by a “mega monad” that implements many of these instances at once (e.g., like Tweag’s capability) approach.

Despite a few annoyances (e.g., the “n+k” problem, the lack of implementations being first-class, and a few other things), this approach can work well. It also has the potential to generate a great code, but in practice it’s rarely possible to achieve maximal performance. In her excellent talk “Effects for Less”, Alexis King hits the nail on the head - despite being able to provide good code for the implementations of particular parts of an effect, the majority of effectful code is really just threading around inside the Monad constraint. When we’re being polymorphic over any Monad m, GHC is at a loss to do any further optimization - and how could it? We know nothing more than “there will be some >>= function when you get here, promise!” Let’s look at this in a bit more detail.

Say we have the following:

foo :: Monad m => m Int
foo = go 0 1_000_000_000
    go acc 0 = return acc
    go acc i = return acc >> go (acc + 1) (i - 1)

This is obviously “I needed an example for my blog” levels of contrived, but at least small. How does it execute? What are the runtime consequences of this code? To answer, we’ll go all the way down to the STG level with -ddump-stg:

$wfoo =
    \r [ww_s2FA ww1_s2FB]
        let {
          Rec {
          $sgo_s2FC =
              \r [sc_s2FD sc1_s2FE]
                  case eqInteger# sc_s2FD lvl1_r2Fp of {
                    __DEFAULT ->
                        let {
                          sat_s2FK =
                              \u []
                                  case +# [sc1_s2FE 1#] of sat_s2FJ {
                                    __DEFAULT ->
                                        case minusInteger sc_s2FD lvl_r2Fo of sat_s2FI {
                                          __DEFAULT -> $sgo_s2FC sat_s2FI sat_s2FJ;
                                  }; } in
                        let {
                          sat_s2FH =
                              \u []
                                  let { sat_s2FG = CCCS I#! [sc1_s2FE]; } in  ww1_s2FB sat_s2FG;
                        } in  ww_s2FA sat_s2FH sat_s2FK;
                    1# ->
                        let { sat_s2FL = CCCS I#! [sc1_s2FE]; } in  ww1_s2FB sat_s2FL;
          end Rec }
        } in  $sgo_s2FC lvl2_r2Fq 0#;

foo =
    \r [w_s2FM]
        case w_s2FM of {
          C:Monad _ _ ww3_s2FQ ww4_s2FR -> $wfoo ww3_s2FQ ww4_s2FR;

In STG, whenever we have a let we have to do a heap allocation - and this code has quite a few! Of particular interest is the what’s going on inside the actual loop $sgo_s2FC. This loop first compares i to see if it’s 0. In the case that’s it’s not, we allocate two objects and call ww_s2Fa. If you squint, you’ll notice that ww_s2FA is the first argument to $wfoo, and it ultimately comes from unpacking a C:Monad dictionary. I’ll save you the labor of working out what this is - ww_s2Fa is the >>. We can see that every iteration of our loop incurs two allocations for each argument to >>. A heap allocation doesn’t come for free - not only do we have to do the allocation, the entry into the heap incurs a pointer indirection (as heap objects have an info table that points to their entry), and also by merely being on the heap we increase our GC time as we have a bigger heap to traverse. While my STG knowledge isn’t great, my understanding of this code is that every time we want to call >>, we need to supply it with its arguments. This means we have to allocate two closures for this function call - which is basically whenever we pressed “return” on our keyboard when we wrote the code. This seems crazy - can you imagine if you were told in C that merely using ; would cost time and memory?

If we compile this code in a separate module, mark it as {-# NOINLINE #-}, and then call it from main - how’s the performance? Let’s check!

module Main (main) where

import Foo

main :: IO ()
main = print =<< foo
$ ./Main +RTS -s
 176,000,051,368 bytes allocated in the heap
       8,159,080 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          33,416 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     169836 colls,     0 par    0.358s   0.338s     0.0000s    0.0001s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   54.589s  ( 54.627s elapsed)
  GC      time    0.358s  (  0.338s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   54.947s  ( 54.965s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    3,224,078,302 bytes per MUT second

  Productivity  99.3% of total user, 99.4% of total elapsed

OUCH. My i7 laptop took almost a minute to iterate a loop 1 billion times.

A little disclaimer: I’m intentionally painting a severe picture here - in practice this cost is irrelevant to all but the most performance sensitive programs. Also, notice where the let bindings are in the STG above - they are nested within the loop. This means that we’re essentially allocating “as we go” - these allocations are incredibly cheap, and the growth to GC is equal trivial, resulting in more like constant GC pressure, rather than impending doom. For code that is likely to do any IO, this cost is likely negligible compared to the rest of the work. Nonetheless, it is there, and when it’s there, it’s nice to know if there are alternatives.

So, is the TL;DR that Haskell is completely incapable of writing effectful code? No, of course not. There is another way to compile this program, but we need a bit more information. If we happen to know what m is and we have access to the Monad dictionary for m, then we might be able to inline >>=. When we do this, GHC can be a lot smarter. The end result is code that now doesn’t allocate for every single >>=, and instead just gets on with doing work. One trivial way to witness this is to define everything in a single module (Alexis rightly points out this is a trap for benchmarking that many fall into, but for our uses it’s the behavior we actually want).

This time, let’s write everything in one module:

module Main ( main ) where

And the STG:

lvl_r4AM = CCS_DONT_CARE S#! [0#];

lvl1_r4AN = CCS_DONT_CARE S#! [1#];

Rec {
main_$sgo =
    \r [void_0E sc1_s4AY sc2_s4AZ]
        case eqInteger# sc1_s4AY lvl_r4AM of {
          __DEFAULT ->
              case +# [sc2_s4AZ 1#] of sat_s4B2 {
                __DEFAULT ->
                    case minusInteger sc1_s4AY lvl1_r4AN of sat_s4B1 {
                      __DEFAULT -> main_$sgo void# sat_s4B1 sat_s4B2;
          1# -> let { sat_s4B3 = CCCS I#! [sc2_s4AZ]; } in  Unit# [sat_s4B3];
end Rec }

main2 = CCS_DONT_CARE S#! [1000000000#];

main1 =
    \r [void_0E]
        case main_$sgo void# main2 0# of {
          Unit# ipv1_s4B7 ->
              let { sat_s4B8 = \s [] $fShowInt_$cshow ipv1_s4B7;
              } in  hPutStr' stdout sat_s4B8 True void#;

main = \r [void_0E] main1 void#;

main3 = \r [void_0E] runMainIO1 main1 void#;

main = \r [void_0E] main3 void#;

The same program compiled down to much tighter loop that is almost entirely free of allocations. In fact, the only allocation that happens is when the loop terminates, and it’s just boxing the unboxed integer that’s been accumulating in the loop.

As we might hope, the performance of this is much better:

$ ./Main +RTS -s
  16,000,051,312 bytes allocated in the heap
         128,976 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          33,416 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15258 colls,     0 par    0.031s   0.029s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    9.402s  (  9.405s elapsed)
  GC      time    0.031s  (  0.029s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    9.434s  (  9.434s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,701,712,595 bytes per MUT second

  Productivity  99.7% of total user, 99.7% of total elapsed

Our time in the garbage collector dropped by a factor of 10, from 0.3s to 0.03. Our total allocation dropped from 176GB (yes, you read that right) to 16GB (I’m still not entirely sure what this means, maybe someone can enlighten me). Most importantly our total runtime dropped from 54s to just under 10s. All this from just knowing what m is at compile time.

So GHC is capable of producing excellent code for monads - what are the circumstances under which this happens? We need, at least:

  1. The source code of the thing we’re compiling must be available. This means it’s either defined in the same module, or is available with an INLINABLE pragma (or GHC has chosen to add this itself).

  2. The definitions of >>= and friends must also be available in the same way.

These constraints start to feel a lot like needing whole program compilation, and in practice are unreasonable constraints to reach. To understand why, consider that most real world programs have a small Main module that opens some connections or opens some file handles, and then calls some library code defined in another module. If this code in the other module was already compiled, it will (probably) have been compiled as a function that takes a Monad dictionary, and just calls the >>= function repeatedly in the same manner as our original STG code. To get the allocation-free version, this library code needs to be available to the Main module itself - as that’s the module that choosing what type to instantiate ‘m’ with - which means the library code has to have marked that code as being inlinable. While we could add INLINE everywhere, this leads to an explosion in the amount of code produced, and can sky rocket compilation times.

Alexis’ eff library works around this by not being polymorphic in m. Instead, it chooses a concrete monad with all sorts of fancy continuation features. Likewise, if we commit to a particular monad (a transformer stack, or maybe using RIO), we again avoid this cost. Essentially, if the monad is known a priori at time of module compilation, GHC can go to town. However, the latter also commits to semantics - by choosing a transformer stack, we’re choosing a semantics for our monadic effects.

With the scene set, I now want to present you with another approach to solving this problem using Backpack.

A Backpack Primer

Vanilla GHC has a very simple module system - modules are essentially a method for name-spacing and separate compilation, they don’t do much more. The Backpack project extends this module system with a new concept - signatures. A signature is like the “type” of a module - a signature might mention the presence of some types, functions and type class instances, but it says nothing about what the definitions of these entities are. We’re going to (ab)use this system to build up transformer stacks at configuration time, and allow our library to be abstracted over different monads. By instantiating our library code with different monads, we get different interpretations of the same program.

I won’t sugar coat - what follows is going to pretty miserable. Extremely fun, but miserable to write in practice. I’ll let you decide if you want to inflict this misery on your coworkers in practice - I’m just here to show you it can be done!

A Signature for Monads

The first thing we’ll need is a signature for data types that are monads. This is essentially the “hole” we’ll rely on with our library code - it will give us the ability to say “there exists a monad”, without committing to any particular choice.

In our Cabal file, we have:

library monad-sig
  hs-source-dirs:   src-monad-sig
  signatures:       Control.Monad.Signature
  default-language: Haskell2010
  build-depends:    base

The important line here is signatures: Control.Monad.Signature which shows that this library is incomplete and exports a signature. The definition of Control/Monad/Signature.hsig is:

signature Control.Monad.Signature where

data M a
instance Functor M
instance Applicative M
instance Monad M

This simply states that any module with this signature has some type M with instances of Functor, Applicative and Monad.

Next, we’ll put that signature to use in our library code.

Libary Code

For our library code, we’ll start with a new library in our Cabal file:

library business-logic
  hs-source-dirs:   lib
  signatures:       BusinessLogic.Monad
  exposed-modules:  BusinessLogic
    , base
    , fused-effects
    , monad-sig

  default-language: Haskell2010
    monad-sig requires (Control.Monad.Signature as BusinessLogic.Monad)

Our business-logic library itself exports a signature, which is really just a re-export of the Control.Monad.Signature, but we rename it something more meaningful. It’s this module that will provide the monad that has all of the effects we need. Along with this signature, we also export the BusinessLogic module:

{-# language FlexibleContexts #-}
module BusinessLogic where

import BusinessLogic.Monad ( M )
import Control.Algebra ( Has )
import Control.Effect.Empty ( Empty, guard )

businessCode :: Has Empty sig M => Bool -> M Int
businessCode b = do
  guard b
  return 42

In this module I’m using fused-effects as a framework to say which effects my monad should have (though this is not particularly important, I just like it!). Usually Has would be applied to a type variable m, but here we’re applying it to the type M. This type comes from BusinessLogic.Monad, which is a signature (you can confirm this by checking against the Cabal file). Other than that, this is all pretty standard!

Backpack-ing Monad Transformers

Now we get into the really fun stuff - providing implementations of effects. I mentioned earlier that one possible way to do this is with a stack of monad transformers. Generally speaking, one would write a single newtype T m a for each effect type class, and have that transformer dispatch any effects in that class, and to lift any effects from other classes - deferring their implementation to m.

We’re going to take the same approach here, but we’ll absorb the idea of a transformer directly into the module itself. Let’s look at an implementation of the Empty effect. The Empty effect gives us a special empty :: m a function, which serves the purpose of stopping execution immediately. As a monad transformer, one implementation is MaybeT:

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }

But we can also write this using Backpack. First, our Cabal library:

library fused-effects-empty-maybe
  hs-source-dirs:   src-fused-effects-backpack
  default-language: Haskell2010
    , base
    , fused-effects
    , monad-sig

  exposed-modules: Control.Carrier.Backpack.Empty.Maybe
    monad-sig requires (Control.Monad.Signature as Control.Carrier.Backpack.Empty.Maybe.Base)

Our library exports the module Control.Carrier.Backpack.Empty.Maybe, but also has a hole - the type of base monad this transformer stacks on top of. As a monad transformer, this would be the m parameter, but when we use Backpack, we move that out into a separate module.

The implementation of Control.Carrier.Backpack.Empty.Maybe is short, and almost identical to the body of Control.Monad.Trans.Maybe - we just change any occurrences of m to instead refer to M from our .Base module:

{-# language BlockArguments, FlexibleContexts, FlexibleInstances, LambdaCase,
      MultiParamTypeClasses, TypeOperators, UndecidableInstances #-}

module Control.Carrier.Backpack.Empty.Maybe where

import Control.Algebra
import Control.Effect.Empty
import qualified Control.Carrier.Backpack.Empty.Maybe.Base as Base

type M = EmptyT

-- We could also write: newtype EmptyT a = EmptyT { runEmpty :: MaybeT Base.M a }
newtype EmptyT a = EmptyT { runEmpty :: Base.M (Maybe a) }

instance Functor EmptyT where
  fmap f (EmptyT m) = EmptyT $ fmap (fmap f) m

instance Applicative EmptyT where
  pure = EmptyT . pure . Just
  EmptyT f <*> EmptyT x = EmptyT do
    f >>= \case
      Nothing -> return Nothing
      Just f' -> x >>= \case
        Nothing -> return Nothing
        Just x' -> return (Just (f' x'))

instance Monad EmptyT where
  return = pure
  EmptyT x >>= f = EmptyT do
    x >>= \case
      Just x' -> runEmpty (f x')
      Nothing -> return Nothing

Finally, we make sure that Empty can handle the Empty effect:

instance Algebra sig Base.M => Algebra (Empty :+: sig) EmptyT where
  alg handle sig context = case sig of
    L Empty -> EmptyT $ return Nothing
    R other -> EmptyT $ thread (maybe (pure Nothing) runEmpty ~<~ handle) other (Just context)

Base Monads

Now that we have a way to run the Empty effect, we need a base case to our transformer stack. As our transformer is now built out of modules that conform to the Control.Monad.Signature signature, we need some modules for each monad that we could use as a base. For this POC, I’ve just added the IO monad:

library fused-effects-lift-io
  hs-source-dirs:   src-fused-effects-backpack
  default-language: Haskell2010
  build-depends:    base
  exposed-modules:  Control.Carrier.Backpack.Lift.IO
module Control.Carrier.Backpack.Lift.IO where
type M = IO

That’s it!

Putting It All Together

Finally we can put all of this together into an actual executable. We’ll take our library code, instantiate the monad to be a combination of EmptyT and IO, and write a little main function that unwraps this all into an IO type. First, here’s the Main module:

module Main where

import BusinessLogic
import qualified BusinessLogic.Monad

main :: IO ()
main = print =<< BusinessLogic.Monad.runEmptyT (businessCode True)

The BusinessLogic module we’ve seen before, but previously BusinessLogic.Monad was a signature (remember, we renamed Control.Monad.Signature to BusinessLogic.Monad). In executables, you can’t have signatures - executables can’t be depended on, so it doesn’t make sense for them to have holes, they must be complete. The magic happens in our Cabal file:

executable test
  main-is:          Main.hs
  hs-source-dirs:   exe
    , base
    , business-logic
    , fused-effects-empty-maybe
    , fused-effects-lift-io
    , transformers

  default-language: Haskell2010
    fused-effects-empty-maybe (Control.Carrier.Backpack.Empty.Maybe as BusinessLogic.Monad) requires (Control.Carrier.Backpack.Empty.Maybe.Base as BusinessLogic.Monad.Base),
    fused-effects-lift-io (Control.Carrier.Backpack.Lift.IO as BusinessLogic.Monad.Base)

Wow, that’s a mouthful! The work is really happening in mixins. Let’s take this step by step:

  1. First, we can see that we need to mixin the fused-effects-empty-maybe library. The first (X as Y) section specifies a list of modules from fused-effects-empty-maybe and renames them for the test executable that’s currently being compiled. Here, we’re renaming Control.Carrier.Backpack.Empty.Maybe as BusinessLogic.Monad. By doing this, we satisfy the hole in the business-logic library, which was otherwise incomplete.

  2. But fused-effects-empty-maybe itself has a hole - the base monad for the transformer. The requires part lets us rename this hole, but we’ll still need to plug it. For now, we rename Control.Carrier.Backpack.Empty.Maybe.Base).

  3. Next, we mixin the fused-effects-lift-io library, and rename Control.Carrier.Backpack.Lift.IO to be BusinessLogic.Monad.Base. We’ve now satisfied the hole for fused-effects-empty-maybe, and our executable has no more holes and can be compiled.

We’re Done!

That’s “all” there is to it. We can finally run our program:

$ cabal run
Just 42

If you compare against businessCode you’ll see that we got passed the guard and returned 42. Because we instantiated BusinessLogic.Monad with a MaybeT-like transformer, this 42 got wrapped up in Just.

Is This Fast?

The best check here is to just look at the underlying code itself. If we add

{-# options -ddump-simpl -ddump-stg -dsuppress-all #-}

to BusinessLogic and recompile, we’ll see the final code output to STDERR. The core is:

  = \ @ sig_a2cM _ b_a13P eta_B1 ->
      case b_a13P of {
        False -> (# eta_B1, Nothing #);
        True -> (# eta_B1, lvl1_r2NP #)

and the STG:

businessCode1 =
    \r [$d(%,%)_s2PE b_s2PF eta_s2PG]
        case b_s2PF of {
          False -> (#,#) [eta_s2PG Nothing];
          True -> (#,#) [eta_s2PG lvl1_r2NP];



In this post, I’ve hopefully shown how we can use Backpack to write effectful code without paying the cost of abstraction. What I didn’t answer is the question of whether or you not you should. There’s a lot more to effectful code than I’ve presented, and it’s unclear to me whether this approach can scale to the needs. For example, if we needed something like mmorph’s MFunctor, what do we do? Are we stuck? I don’t know! Beyond these technical challenges, it’s clear that Backpack here is also not remotely ergonomic, as is. We’ve had to write five components just to get this done, and I pray for any one who comes to read this code and has to orientate themselves.

Nonetheless, I think this an interesting point of the effect design space that hasn’t been explored, and maybe I’ve motivated some people to do some further exploration.

The code for this blog post can be found at

Happy holidays, all!

December 21, 2020

Sevan Janiyan (sevan)

LFS, round #4 December 21, 2020 09:49 PM

Haven’t made any progress for a couple of weeks but things came together and instrumenting libc works as expected. One example demonstrated in section 12.2.2, chapter 12 of the BPF Performance book is attempting to instrument bash compiled without frame pointers where you only see a call to the read function. Compiling with -fno-omit-frame-pointer produces …

December 20, 2020

Derek Jones (derek-jones)

Many coding mistakes are not immediately detectable December 20, 2020 10:04 PM

Earlier this week I was reading a paper discussing one aspect of the legal fallout around the UK Post-Office’s Horizon IT system, and was surprised to read the view that the Law Commission’s Evidence in Criminal Proceedings Hearsay and Related Topics were citing on the subject of computer evidence (page 204): “most computer error is either immediately detectable or results from error in the data entered into the machine”.

What? Do I need to waste any time explaining why this is nonsense? It’s obvious nonsense to anybody working in software development, but this view is being expressed in law related documents; what do lawyers know about software development.

Sometimes fallacies become accepted as fact, and a lot of effort is required to expunge them from cultural folklore. Regular readers of this blog will have seen some of my posts on long-standing fallacies in software engineering. It’s worth collecting together some primary evidence that most software mistakes are not immediately detectable.

A paper by Professor Tapper of Oxford University is cited as the source (yes, Oxford, home of mathematical orgasms in software engineering). Tapper’s job title is Reader in Law, and on page 248 he does say: “This seems quite extraordinarily lax, given that most computer error is either immediately detectable or results from error in the data entered into the machine.” So this is not a case of his words being misinterpreted or taken out of context.

Detecting many computer errors is resource intensive, both in elapsed time, manpower and compute time. The following general summary provides some of the evidence for this assertion.

Two events need to occur for a fault experience to occur when running software:

  • a mistake has been made when writing the source code. Mistakes include: a misunderstanding of what the behavior should be, using an algorithm that does not have the desired behavior, or a typo,
  • the program processes input values that interact with a coding mistake in a way that produces a fault experience.

That people can make different mistakes is general knowledge. It is my experience that people underestimate the variability of the range of values that are presented as inputs to a program.

A study by Nagel and Skrivan shows how variability of input values results in fault being experienced at different time, and that different people make different coding mistakes. The study had three experienced developers independently implement the same specification. Each of these three implementations was then tested, multiple times. The iteration sequence was: 1) run program until fault experienced, 2) fix fault, 3) if less than five faults experienced, goto step (1). This process was repeated 50 times, always starting with the original (uncorrected) implementation; the replications varied this, along with the number of inputs used.

How many input values needed to be processed, on average, before a particular fault is experienced? The plot below (code+data) shows the numbers of inputs processed, by one of the implementations, before individual faults were experienced, over 50 runs (sorted by number of inputs needed before the fault was experienced):

Number of inputs processed before particular fault experienced

The plot illustrates that some coding mistakes are more likely to produce a fault experience than others (because they are more likely to interact with input values in a way that generates a fault experience), and it also shows how the number of inputs values processed before a particular fault is experienced varies between coding mistakes.

Real-world evidence of the impact of user input on reported faults is provided by the Ultimate Debian Database, which provides information on the number of reported faults and the number of installs for 14,565 packages. The plot below shows how the number of reported faults increases with the number of times a package has been installed; one interpretation is that with more installs there is a wider variety of input values (increasing the likelihood of a fault experience), another is that with more installs there is a larger pool of people available to report a fault experience. Green line is a fitted power law, faultsReported=1.3*installs^{0.3}, blue line is a fitted loess model.

Number of inputs processed before particular fault experienced

The source containing a mistake may be executed without a fault being experienced; reasons for this include:

  • the input values don’t result in the incorrect code behaving differently from the correct code. For instance, given the made-up incorrect code if (x 8) (i.e., 8 was typed rather than 7), the comparison only produces behavior that differs from the correct code when x has the value 7,
  • the input values result in the incorrect code behaving differently than the correct code, but the subsequent path through the code produces the intended external behavior.

Some of the studies that have investigated the program behavior after a mistake has deliberately been introduced include:

  • checking the later behavior of a program after modifying the value of a variable in various parts of the source; the results found that some parts of a program were more susceptible to behavioral modification (i.e., runtime behavior changed) than others (i.e., runtime behavior not change),
  • checking whether a program compiles and if its runtime behavior is unchanged after random changes to its source code (in this study, short programs written in 10 different languages were used),
  • 80% of radiation induced bit-flips have been found to have no externally detectable effect on program behavior.

What are the economic costs and benefits of finding and fixing coding mistakes before shipping vs. waiting to fix just those faults reported by customers?

Checking that a software system exhibits the intended behavior takes time and money, and the organization involved may not be receiving any benefit from its investment until the system starts being used.

In some applications the cost of a fault experience is very high (e.g., lowering the landing gear on a commercial aircraft), and it is cost-effective to make a large investment in gaining a high degree of confidence that the software behaves as expected.

In a changing commercial world software systems can become out of date, or superseded by new products. Given the lifetime of a typical system, it is often cost-effective to ship a system expected to contain many coding mistakes, provided the mistakes are unlikely to be executed by typical customer input in a way that produces a fault experience.

Beta testing provides selected customers with an early version of a new release. The benefit to the software vendor is targeted information about remaining coding mistakes that need to be fixed to reduce customer fault experiences, and the benefit to the customer is checking compatibility of their existing work practices with the new release (also, some people enjoy being able to brag about being a beta tester).

  • One study found that source containing a coding mistake was less likely to be changed due to fixing the mistake than changed for other reasons (that had the effect of causing the mistake to disappear),
  • Software systems don't live forever; systems are replaced or cease being used. The plot below shows the lifetime of 202 Google applications (half-life 2.9 years) and 95 Japanese mainframe applications from the 1990s (half-life 5 years; code+data).

    Number of software systems having a given lifetime, in days

Not only are most coding mistakes not immediately detectable, there may be sound economic reasons for not investing in detecting many of them.

December 19, 2020

Jeff Carpenter (jeffcarp)

How to clone a Google Source Repository in Working Copy iOS December 19, 2020 09:00 AM

I recently went through this process and couldn’t find a guide (though I swear one existed at some point in the past). Here’s how to clone a git repository from Google Source Repositories in the Working Copy iOS or iPadOS app: Navigate to Pick out the repo you want to clone and open the clone dialog (it looks like a “+” icon) Go to the “Manually generated credentials” tab Click on “Generate and store your Git credentials” Go through the authentication flow You should be prompted to copy a shell command onto your clipboard Go to the Working Copy iOS app Go to Settings (the gear icon) > Authentication Cookies Tap the “+” icon and import from clipboard You should now be able to clone the repository using the https://source.

December 16, 2020

Pierre Chapuis (catwell)

Personal news, 2020 edition December 16, 2020 01:50 PM

I haven't posted anything here for 6 months so I thought it would be a good idea to post a personal news update before the end of 2020.

Shutting down Chilli

I haven't posted about it here yet, but my co-founder Julien and me decided to shut down Chilli at the end of last year, in agreement with eFounders. Basically, some of our initial hypotheses about the state of the market we were in (french SMBs) were wrong and the business couldn't be profitable enough.

In retrospect, it was an interesting experience. I really appreciated being part of eFounders, I do recommend it for people who want to start a B2B SaaS company.

However, I wanted to get out of my comfort bubble too much with this one, by tackling a problem in a market I didn't know. Because of that, I had to rely on others regarding core business choices, and while that was fine with me initially it ended up being frustrating in the end when things didn't work well and I couldn't help much. So if I start another company someday, it will be one where I know the problem domain better.

Joining Inch

After we shut down Chilli, I decided to join a small startup editing a SaaS CRM and ticketing solution for property managers called Inch, only available in the French market for now.

I didn't pick Inch randomly, I have known one of the founders for years since he was a Moodstocks user! Fun fact: at the time, I had met him and he was looking for a technical co-founder for his project. I told him he should learn to code instead and gave him Ruby sample code... and this is why I am back to Rails today. Karma? :)

Anyway, I had been following Inch with interest since its creation, because it is the company I like: solving a real need in a market where tools were either terrible or non-existent. Now they have a unique place in the market and some interesting technical challenges to solve, so I decided to join and help.

Starting a family

And here comes the last piece of news: despite all the madness that happened this year, the biggest change for me is that I am now a father! My son was born last week, and I took a holiday until the end of the year to spend time with him and his mother. I don't intend to post too much personal news here, but this one deserved it. =)

Awn Umar (awn)

rosen: censorship-resistant proxy tunnel December 16, 2020 12:00 AM


Many governments and other well-resourced actors around the world implement some form of censorship to assert control over the flow of information on the Internet, either because it is deemed “sensitive” or because it is inconvenient for those with power.

Suppose there is some adversary, Eve, that wants to prevent users from accessing some content. There are many ways of implementing such censorship but they broadly fall into one of two categories: endpoint-based or flow-fingerprinting attacks.

A user attempting to access censored material through an adversarial Internet service provider.

A user attempting to access censored material through an adversarial Internet service provider.

Eve could maintain a list of banned services and refuse to serve any network request she receives if it is on this list. Here Eve is deciding based on the destination of the request: this is endpoint-based censorship. In response a user, Alice, could use a VPN or TOR to disguise the destination so that from Eve’s perspective, the destination will appear to be the VPN server or the TOR entry node instead of the censored service.

This is a working solution in many places, but Eve is not beaten. In response she can add the IP addresses of known VPN providers as well as public TOR nodes to her blocklist, reasoning that only a user who wants to bypass her blocking efforts would use these services. Alice could then setup her own VPN server or access the TOR network through a non-public TOR bridge that is not blocked.

Eve could actively probe the servers that Alice connects to in order to find out if they are TOR entry nodes, for example, but apart from this she has stretched endpoint-based censorship to its limits. An alternative is to censor a connection based on characteristics of the network flow instead of its destination: this is flow-fingerprinting. This is usually accomplished using some kind of deep packet inspection engine that can detect and group traffic into protocols and applications. With this capability Eve can block any traffic that is detected as a proxy regardless of whether any particular server is known.

An adversary using a deep packet inspection engine to decide whether to censor traffic.

An adversary using a deep packet inspection engine to decide whether to censor traffic.

To bypass this technique, Alice must disguise the fingerprint of her traffic so that a DPI engine does not classify it as a blocked protocol or application. There are a few approaches to this:

  1. Randomisation. The goal here is to make the traffic look indistinguishable from randomness, or put another way, to make it look like “nothing”. This would successfully hide which category traffic belongs to, but a lack of a fingerprint is a fingerprint itself and that’s a vulnerability.

    Examples of randomising obfuscators include Obfsproxy and ScrambleSuit.

  2. Mimicry. Instead of making traffic look like random noise, mimicry-based obfuscation makes packets look like they belong to a specific protocol or application that is assumed to be unblocked. For example, StegoTorus and SkypeMorph produce traffic that looks like HTTP and Skype, respectively, but they are prohibitively slow.

    Another option is LibFTE which is roughly a cryptographic cipher that produces ciphertext conforming to a given regular expression. DPI engines also commonly use regular expressions so with LibFTE it is possible to precisely force misclassification of a protocol.

    Mimicry only tries to make packet payloads look like some cover protocol and so the syntax and semantics of the overall network flow can deviate substantially from the protocol specification or any known implementation. This makes mimicry-based obfuscators easily detectable and results in the approach being fundamentally flawed.

  3. Tunnelling. A tunnelling obfuscator encapsulates traffic within some cover protocol using an actual implementation of the cover protocol instead of simply trying to mimic the way its packets look. An example is meek which uses HTTPS to communicate between the client and the server and domain fronting to hide the true destination of the traffic, but since domain fronting relied on an undocumented feature of major CDNs, it no longer works.

    Tunnelling obfuscators have to be careful to look like a specific and commonly used implementation of a cover protocol since a custom implementation may be distinguishable. China and Iran managed to distinguish TOR TLS and non-TOR TLS first-hop connections even though TOR used a real implementation of TLS to tunnel traffic.

An important metric to consider is the false positive detection rate associated with each method. This is the proportion of traffic that a DPI engine falsely detects as coming from an obfuscation tool. A high false-positive rate results in lots of innocent traffic being blocked which will cause frustration for ordinary users. Therefore the goal of an obfuscator should be to look as much like innocent traffic as possible to maximise the “collateral damage” of any attempted censorship. Overall, it seems like tunnelling is the best approach.

This brings us to Rosen, a modular, tunnelling proxy that I have developed as part of my ongoing masters thesis. It currently only implements HTTPS as a cover protocol, but this has been tested against nDPI and a commercial DPI engine developed by Palo Alto Networks, both of which detected TOR traffic encapsulated by Rosen as ordinary HTTPS. The goals of Rosen are:

  1. Unobservability. It should be difficult to distinguish obfuscated traffic from innocent background traffic using the same protocol.
  2. Endpoint-fingerprinting resistance. It should be difficult to use active probing to ascertain that a given server is actually a proxy server. This is accomplished by responding as a proxy if and only if a valid key is provided and falling back to some default behaviour otherwise. For example, the HTTPS implementation serves some static content in this case.
  3. Modularity. It should be relatively easy to add support for another cover protocol or configure the behaviour of an existing protocol to adapt to changing adversarial conditions. This is facilitated by a modular architecture.
  4. Compatibility. It should be possible to route most application traffic through the proxy. This is why a SOCKS interface was chosen, but TUN support is also a goal.
  5. Usability. It should be easy to use.
High-level overview of Rosen's architecture.

High-level overview of Rosen's architecture.

HTTPS was chosen as the first cover protocol to be implemented as it provides confidentiality, authenticity, and integrity; and it is ubiquitous on the Internet making it infeasible for an adversary to block. The implementation is provided by the Go standard library and most configuration options are set to their defaults so that it blends in with other applications. There is a option to disable TLS 1.3 as it could be blocked by some nation-state firewalls I was informed that censors are blocking ESNI specifically. The server will automatically provision a TLS certificate from LetsEncrypt and the client pins LetsEncrypt’s root by default.

It’s difficult to know how effective this truly is without further battle-testing by security researchers and users, but we can theorise to some extent.

  1. Endpoint-based censorship. Users are able to setup Rosen on their own servers behind their own domains so there is no generic firewall rule that can block all of them. An adversary could instead try to actively probe a Rosen server in order to detect it.

    One option is to provide a key and detect a timing difference as the server checks it. The delta I measured between providing a 32 byte key and not providing a key is 29ns (on an AMD Ryzen 3700X). Since network requests have a latency in the milliseconds, I assume this attack is practically infeasible.

    A simpler attack is to look at the static files that the HTTPS server responds with. If the user does not replace the default files with their own, an easy distinguishing attack is possible. This could be easier to avoid with a different protocol. For example, if an incorrect SSH password is provided to an SSH server, it simply refuses the connection and there are no other obvious side-effects for an adversary to analyse.

  2. Flow-fingerprinting. The cover protocol uses the standard library implementation of HTTPS which should be widely used by many different applications in various contexts. Default cipher suites are chosen and other aspects of the implementation are deliberately very typical.

    However, this does not cover the behaviour of Rosen clients. For example, HTTP requests to an ordinary website are usually a lot smaller than responses. Also, an adversary could compare the traffic between Alice and a Rosen HTTPS server with the static content available on that server to ascertain if something else is going on.

    To handle these attacks, the protocol could use some kind of random padding, limit the size and frequency of round trips, or replace the static decoy handler with a custom one that has different traffic characteristics.

    Timing patterns are particularly of importance. Currently the client waits a random interval between 0 and 100ms before polling the server for data. This choice was made to minimise latency but it is not typical of an ordinary website. Analysing timing patterns is what allowed researchers to detect meek, for example. There’s no evidence that this attack is employed by real-world censors, but a configuration flag that implements a tradeoff between performance and behaving “more typically” will be added in the future.

If you have the capability to test out Rosen, especially if you are behind a firewall that implements censorship, I would greatly appreciate you telling me about about your experiences at my Email address (available on GitHub and on this website’s homepage). If you want to contribute, you can open an issue or pull request on the project’s GitHub page.

December 14, 2020

Caius Durling (caius)

Disable Google Autoupdater on macOS December 14, 2020 04:14 PM

From reading Chrome is Bad, it seems in some situations the updater (also known as keystone) can chew up CPU cycles. Whilst I’m not 100% convinced keystone continuously chews CPU, its launchctl configuration suggests it runs at least once an hour. Given I don’t use Chrome as my main browser, this is undesirable behaviour for me.

With that in mind, I’ve decided to disable the background services rather than delete Chrome entirely. (I need it occasionally.) Stopping/unloading the services and fettling the config files to do nothing achieves this aim (and stops Chrome re-enabling them next launch), whilst leaving Chrome fully functional when needed.

  1. Unload the currently loaded services

    launchctl unload -w ~/Library/LaunchAgents/
    launchctl unload -w ~/Library/LaunchAgents/
  2. Empty the config files, so if launchd ever tries to launch them they’ll just error out

    echo > ~/Library/LaunchAgents/
    echo > ~/Library/LaunchAgents/
  3. Change ownership and permissions of these files so only root can write to the files

    chmod 644 ~/Library/LaunchAgents/
    chmod 644 ~/Library/LaunchAgents/
    sudo chown root ~/Library/LaunchAgents/
    sudo chown root ~/Library/LaunchAgents/

Now when I want to update Chrome once in a blue moon when I need it, I can navigate to chrome://settings/help to update (or from the UI, Chrome -> About Chrome.)

December 13, 2020

Derek Jones (derek-jones)

Survival rate of WG21 meeting attendance December 13, 2020 10:55 PM

WG21, the C++ Standards committee, has a very active membership, with lots of people attending the regular meetings; there are three or four meetings a year, with an average meeting attendance of 67 (between 2004 and 2016).

The minutes of WG21 meetings list those who attend, and a while ago I downloaded these for meetings between 2004 and 2016. Last night I scraped the data and cleaned it up (or at least the attendee names).

WG21 had its first meeting in 1992, and continues to have meetings (eleven physical meetings at the time or writing). This means the data is both left and right censored; known as interval censored. Some people will have attended many meetings before the scraped data starts, and some people listed in the data may not have attended another meeting since.

What can we say about the survival rate of a person being a WG21 attendee in the future, e.g., what is the probability they will attend another meeting?

Most regular attendees are likely to miss a meeting every now and again (six people attended all 30 meetings in the dataset, with 22 attending more than 25), and I assumed that anybody who attended a meeting after 1 January 2015 was still attending. Various techniques are available to estimate the likelihood that known attendees were attending meetings prior to those in the dataset (I’m going with what ever R’s survival package does). The default behavior of R’s Surv function is to handle right censoring, the common case. Extra arguments are needed to handle interval censored data, and I think I got these right (I had to cast a logical argument to numeric for some reason; see code+data).

The survival curves in days since 1 Jan 2004, and meetings based on the first meeting in 2004, with 95% confidence bounds, look like this:

Meeting survival curve of WG21 attendees.

I was expecting a sharper initial reduction, and perhaps wider confidence bounds. Of the 374 people listed as attending a meeting, 177 (47%) only appear once and 36 (10%) appear twice; there is a long tail, with 1.6% appearing at every meeting. But what do I know, my experience of interval censored data is rather limited.

The half-life of attendance is 9 to 10 years, suspiciously close to the interval of the data. Perhaps a reader will scrape the minutes from earlier meetings :-)

Within the time interval of the data, new revisions of the C++ standard occurred in 20072011 and 2014; there had also been a new release in 2003, and one was being worked on for 2017. I know some people stop attending meetings after a major milestone, such as a new standard being published. A fancier analysis would investigate the impact of standards being published on meeting attendance.

People also change jobs. Do WG21 attendees change jobs to ones that also require/allow them to attend WG21 meetings? The attendee’s company is often listed in the minutes (and is in the data). Something for intrepid readers to investigate.

Ponylang (SeanTAllen)

Last Week in Pony - December 13, 2020 December 13, 2020 09:07 PM

Version 0.2.2 of ponylang/http_server has been released.

Gonçalo Valério (dethos)

Mirroring GitHub Repositories December 13, 2020 03:58 PM

Git by itself is a distributed version control system (a very popular one), but over the years organizations started to rely on some internet services to manage their repositories and those services eventually become the central/single source of truth for their code.

The most well known service out there is GitHub (now owned by Microsoft), which nowadays is synonymous of git for a huge amount of people. Many other services exist, such as Gitlab and BitBucked, but GitHub gained a notoriety above all others, specially for hosting small (and some large) open source projects.

These centralized services provide many more features that help managing, testing and deploying software. Functionality not directly related to the main purpose of git.

Relying on these central services is very useful but as everything in life, it is a trade-off. Many large open source organizations don’t rely on these companies (such as KDE, Gnome, Debian, etc), because the risks involved are not worth the convenience of letting these platforms host their code and other data.

Over time we have been witnessing some of these risks, such as your project (and all the related data) being taken down without you having any chance to defend yourself (Example 1 and Example 2). Very similar to what some content creators have been experiencing with Youtube (I really like this one).

When this happens, your or your organizations don’t lose the code itself since you almost certainly have copies on your own devices (thanks to git), but you lose everything else, issues, projects, automated actions, documentation and essentially the known used by URL of your project.

Since Github is just too convenient to collaborate with other people, we can’t just leave. In this post I explain an easy alternative to minimize the risks described above, that I implemented myself after reading many guides and tools made by others that also tried to address this problem before.

The main idea is to automatically mirror everything in a machine that I own and make it publicly available side by side with the GitHub URLs, the work will still be done in Github but can be easily switched over if something happens.

The software

To achieve the desired outcome I’ve researched a few tools and the one that seemed to fit all my requirements (work with git and be lightweight) was “Gitea“. Next I will describe the steps I took.

The Setup

This part was very simple, I just followed the instructions present on the documentation for a docker based install. Something like this:

version: "3"

    external: false

    image: gitea/gitea:latest
    container_name: gitea
      - USER_UID=1000
      - USER_GID=1000
    restart: always
      - gitea
      - ./gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
      - "3000:3000"
      - "222:22"

If you are doing the same, don’t copy the snippet above. Take look here for updated instructions.

Since my website is not supposed to have much concurrent activity, using an SQLite database is more than enough. So after launching the container, I chose this database type and made sure I disabled the all the functionality I won’t need.

Part of Gitea's configuration page. Server and Third-Party Service Settings.Part of the Gitea’s configuration page

After this step, you should be logged in as an admin. The next step is to create a new migration on the top right menu. We just need to choose the “Github” option and continue. You should see the below screen:

Screenshot of the page that lets the users create a new migration/mirror in Gitea.Creating a new Github migration/mirror in Gitea.

If you choose This repository will be a mirror option, Gitea will keep your repository and wiki in sync with the original, but unfortunately it will not do the same for issues, labels, milestones and releases. So if you need that information, the best approach is to uncheck this field and do a normal migration. To keep that information updated you will have to repeat this process periodically.

Once migrated, do the same for your other repositories.


Having an alternative with a backup of the general Github data ended up being quite easy to set up. However the mirror feature would be much more valuable if it included the other items available on the standard migration.

During my research for solutions, I found Fossil, which looks very interesting and something that I would like to explore in the future, but at the moment all repositories are based on Git and for practical reasons that won’t change for the time being.

With this change, my public repositories can be found in:

December 07, 2020

Ponylang (SeanTAllen)

Last Week in Pony - December 6, 2020 December 07, 2020 03:57 AM

The audio recording of the December 1, 2020 Pony development sync call is available.

December 06, 2020

Derek Jones (derek-jones)

Christmas books for 2020 December 06, 2020 10:54 PM

A very late post on the interesting books I read this year (only one of which was actually published in 2020). As always the list is short because I did not read many books and/or there is lots of nonsense out there, but this year I have the new excuses of not being able to spend much time on trains and having my own book to finally complete.

I have already reviewed The Weirdest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous, and it is the must-read of 2020 (after my book, of course :-).

The True Believer by Eric Hoffer. This small, short book provides lots of interesting insights into the motivational factors involved in joining/following/leaving mass movements. Possible connections to software engineering might appear somewhat tenuous, but bits and pieces keep bouncing around my head. There are clearer connections to movements going mainstream this year.

The following two books came from asking what-if questions about the future of software engineering. The books I read suggesting utopian futures did not ring true.

“Money and Motivation: Analysis of Incentives in Industry” by William Whyte provides lots of first-hand experience of worker motivation on the shop floor, along with worker response to management incentives (from the pre-automation 1940s and 1950s). Developer productivity is a common theme in discussions I have around evidence-based software engineering, and this book illustrates the tangled mess that occurs when management and worker aims are not aligned. It is easy to imagine the factory-floor events described playing out in web design companies, with some web-page metric used by management as a proxy for developer productivity.

Labor and Monopoly Capital: The Degradation of Work in the Twentieth Century by Harry Braverman, to quote from Wikipedia, is an “… examination the nature of ‘skill’ and the finding that there was a decline in the use of skilled labor as a result of managerial strategies of workplace control.” It may also have discussed management assault of blue-collar labor under capitalism, but I skipped the obviously political stuff. Management do want to deskill software development, if only because it makes it easier to find staff, with the added benefit that the larger pool of less skilled staff increases management control, e.g., low skilled developers knowing they can be easily replaced.