Planet Crustaceans

This is a Planet instance for community feeds. To add/update an entry or otherwise improve things, fork this repo.

July 13, 2020

Pete Corey (petecorey)

Suggesting Chord Names with Glorious Voice Leader July 13, 2020 12:00 AM

Glorious Voice Leader, my chord-obsessed side project, now has the ability to turn a collection of notes played on the guitar fretboard into a list of possible chord names. Deciding on a specific chord name is still a very human, very context dependent task, but we can let the computer do a lot of the heavy lifting for us.

I’ve included a simplified version of this chord namer to the left. Feel free to click on the frets to enter any guitar chord you’d like the name of. Glorious Voice Leader will crunch the numbers and come up with a list of possible names that exactly describes the chord you’ve entered, sorted alphabetically.

In the full-fledged Glorious Voice Leader application, this functionality is accessible by simply clicking on the fretboard without first selecting the name of the chord you want. This felt like an intuitive design decision. You might know the shape of a specific chord you want to play in a progression, but you’re not sure of its name.

Enter it into the fretboard and Glorious Voice Leader will give you a corresponding list of names. When you click on one of those names, it’ll automatically suggest alternative voicings that voice lead smoothly from the previous chord.

The actual code behind this feature is dead simple. We simply filter over our set of all possible chord roots and qualities, and compare the set of notes in each resulting chord with the set of notes entered by the user:

let possibleNames = _.chain(qualities)
  .flatMap(quality =>, root => {
      return {
  .filter(({ root, quality }) => {
    if (_.isEmpty(chord.notes)) {
      return false;
    let chordNotes = _.chain(chord.notes)
      .map(([string, fret]) => (tuning[string] + fret) % 12)
    let qualityNotes = _.chain(quality.quality)
      .map(note => (roots[root] + note) % 12)
    return _.isEqual(chordNotes, qualityNotes);
  .map(({ root, quality }) => {
    return `${root}${}`;

From there we simply present the list of possible chord names to the user in some meaningful or actionable way.

For future work, it would be nice to sort the list of name suggestions in order of the lowest notes they entered on the fretboard. For example, if they entered the notes C, E, G, and B in ascending order, we should sort the Cmaj7 suggestion before the Am9 no 1 suggestion. As with all of the items on my future work list, there are many subtitles and nuances here that would have to be addressed before it becomes a reality.

I hope you find this helpful. If you find Glorious Voice Leader interesting or useful in any way, please let me know!

canvas { width: 100%; height: 100%; } #root { float: left; height: 40rem; margin: 0 0 0 2rem; }

July 12, 2020

Derek Jones (derek-jones)

No replies to 135 research data requests: paper titles+author emails July 12, 2020 09:05 PM

I regularly email researchers referring to a paper of theirs I have read, and asking for a copy of the data to use as an example in my evidence-based software engineering book; of course their work is cited as the source.

Around a third of emails don’t receive any reply (a small number ask why they should spend time sorting out the data for me, and I wrote a post to answer this question). If there is no reply after roughly 6-months, I follow up with a reminder, saying that I am still interested in their data (maybe 15% respond). If the data looks really interesting, I might email again after 6-12 months (I have outstanding requests going back to 2013).

I put some effort into checking that a current email address is being used. Sometimes the work was done by somebody who has moved into industry, and if I cannot find what looks like a current address I might email their supervisor.

I have had replies to later email, apologizing, saying that the first email was caught by their spam filter (the number of links in the email template was reduced to make it look less like spam). Sometimes the original email never percolated to the top of their todo list.

There are around 135 unreplied email requests (the data was automatically extracted from my email archive and is not perfect); the list of papers is below (the title is sometimes truncated because of the extraction process).

Given that I have collected around 620 software engineering datasets (there are several ways of counting a dataset), another 135 would make a noticeable difference. I suspect that much of the data is now lost, but even 10 more datasets would be nice to have.

After the following list of titles is a list of the 254 author last known email addresses. If you know any of these people, please ask them to get in touch.

If you are an author of one of these papers: ideally send me the data, otherwise email to tell me the status of the data (I’m summarising responses, so others can get some idea of what to expect).

50 CVEs in 50 Days: Fuzzing Adobe Reader
A Change-Aware Per-File Analysis to Compile Configurable Systems
A Design Structure Matrix Approach for Measuring Co-Change-Modularity
A Foundation for the Accurate Prediction of the Soft Error
A Large Scale Evaluation of Automated Unit Test Generation Using
A large-scale study of the time required to compromise
A Large-Scale Study On Repetitiveness, Containment, and
Analysing Humanly Generated Random Number Sequences: A Pattern-Based
Analysis of Software Aging in a Web Server
Analyzing and predicting effort associated with finding & fixing
Analyzing CAD competence with univariate and multivariate
Analyzing Differences in Risk Perceptions between Developers
Analyzing the Decision Criteria of Software Developers Based on
An analysis of the effect of environmental and systems complexity on
An Empirical Analysis of Software-as-a-Service Development
An Empirical Comparison of Forgetting Models
An empirical study of the textual similarity between
An error model for pointing based on Fitts' law
An Evolutionary Study of Linux Memory Management for Fun and Profit
An examination of some software development effort and
An Experimental Survey of Energy Management Across the Stack
Anomaly Trends for Missions to Mars: Mars Global Surveyor
A Quantitative Evaluation of the RAPL Power Control System
Are Information Security Professionals Expected Value Maximisers?:
A replicated and refined empirical study of the use of friends in
A Study of Repetitiveness of Code Changes in Software Evolution
A Study on the Interactive Effects among Software Project Duration, Risk
Bias in Proportion Judgments: The Cyclical Power Model
Capitalization of software development costs
Configuration-aware regression testing: an empirical study of sampling
Cost-Benefit Analysis of Technical Software Documentation
Decomposing the problem-size effect: A comparison of response
Determinants of vendor profitability in two contractual regimes:
Diagnosing organizational risks in software projects:
Early estimation of users’ perception of Software Quality
Empirical Analysis of Factors Affecting Confirmation
Estimating Agile Software Project Effort: An Empirical Study
Estimating computer depreciation using online auction data
Estimation fulfillment in software development projects
Ethical considerations in internet code reuse: A
Evaluating. Heuristics for Planning Effective and
Evaluating Pair Programming with Respect to System Complexity and
Evidence-Based Decision Making in Lean Software Project Management
Explaining Multisourcing Decisions in Application Outsourcing
Exploring defect correlations in a major. Fortran numerical library
Extended Comprehensive Study of Association Measures for
Eye gaze reveals a fast, parallel extraction of the syntax of
Factorial design analysis applied to the performance of
Frequent Value Locality and Its Applications
Historical and Impact Analysis of API Breaking Changes:
How do i know whether to trust a research result?
How do OSS projects change in number and size?
How much is “about” ? Fuzzy interpretation of approximate
Humans have evolved specialized skills of
Identifying and Classifying Ambiguity for Regulatory Requirements
Identifying Technical Competences of IT Professionals. The Case of
Impact of Programming and Application-Specific Knowledge
Individual-Level Loss Aversion in Riskless and Risky Choices
Industry Shakeouts and Technological Change
Inherent Diversity in Replicated Architectures
Initial Coin Offerings and Agile Practices
Interpreting Gradable Adjectives in Context: Domain
Is Branch Coverage a Good Measure of Testing Effectiveness?
JavaScript Developer Survey Results
Knowledge Acquisition Activity in Software Development
Language matters
Learning from Evolution History to Predict Future Requirement Changes
Learning from Experience in Software Development:
Learning from Prior Experience: An Empirical Study of
Links Between the Personalities, Views and Attitudes of Software Engineers
Making root cause analysis feasible for large code bases:
Making-Sense of the Impact and Importance of Outliers in Project
Management Aspects of Software Clone Detection and Analysis
Managing knowledge sharing in distributed innovation from the
Many-Core Compiler Fuzzing
Measuring Agility
Mining for Computing Jobs
Mining the Archive of Formal Proofs.
Modeling Readability to Improve Unit Tests
Modeling the Occurrence of Defects and Change
Modelling and Evaluating Software Project Risks with Quantitative
Moore’s Law and the Semiconductor Industry: A Vintage Model
More Testers – The Effect of Crowd Size and Time Restriction in
Motivations for self-assembling into project teams
Networks, social influence and the choice among competing innovations:
Nonliteral understanding of number words
Nonstationarity and the measurement of psychophysical response in
Occupations in Information Technology
On information systems project abandonment
On the Positive Effect of Reactive Programming on Software
On Vendor Preferences for Contract Types in Offshore Software Projects:
Peer Review on Open Source Software Projects:
Parameter-based refactoring and the relationship with fan-in/fan-out
Participation in Open Knowledge Communities and Job-Hopping:
Pipeline management for the acquisition of industrial projects
Predicting the Reliability of Mass-Market Software in the Marketplace
Prototyping A Process Monitoring Experiment
Quality vs risk: An investigation of their relationship in
Quantitative empirical trends in technical performance
Reported project management effort, project size, and contract type.
Reproducible Research in the Mathematical Sciences
Semantic Versioning versus Breaking Changes
Software Aging Analysis of the Linux Operating System
Software reliability as a function of user execution patterns
Software Start-up failure An exploratory study on the
Spatial estimation: a non-Bayesian alternative
System Life Expectancy and the Maintenance Effort: Exploring
Testing as an Investment
The enigma of evaluation: benefits, costs and risks of IT in
The impact of size and volatility on IT project performance
The Influence of Size and Coverage on Test Suite
The Marginal Value of Increased Testing: An Empirical Analysis
The nature of the times to flight software failure during space missions
Theoretical and Practical Aspects of Programming Contest Ratings
The Performance of the N-Fold Requirement Inspection Method
The Reaction of Open-Source Projects to New Language Features:
The Role of Contracts on Quality and Returns to Quality in Offshore
The Stagnating Job Market for Young Scientists
Time Pressure — A Controlled Experiment of Test-case Development and
Turnover of Information Technology Professionals:
Unconventional applications of compiler analysis
Unifying DVFS and offlining in mobile multicores
Use of Structural Equation Modeling to Empirically Study the Turnover
Use Two-Level Rejuvenation to Combat Software Aging and
Using Function Points in Agile Projects
Using Learning Curves to Mine Student Models
Virtual Integration for Improved System Design
Which reduces IT turnover intention the most: Workplace characteristics
Why Did Your Project Fail?
Within-Die Variation-Aware Dynamic-Voltage-Frequency

Author emails (automatically extracted and manually checked to remove people who have replied on other issues; I hope I have caught them all).

Ponylang (SeanTAllen)

Last Week in Pony - July 12, 2020 July 12, 2020 03:05 PM

Sync audo for July 7 is available. RFC PR #175 is ready for vote on the next sync meeting.

Gustaf Erikson (gerikson)

[SvSe]Söndagsvägen - berättelsen om ett mord av Peter Englund July 12, 2020 10:22 AM

Englund reflekterar Sveriges 60-tal via spegeln av ett sedan länge bortglömd mord. Genom att ta upp företeelser i tiden visas ett land i förändring, framförallt hur “det moderna projektet” börjar krackelera.

Andreas Zwinkau (qznc)

Crossing the Chasm July 12, 2020 12:00 AM

The book describes the dangerous transition from early adopters to an early majority market

Read full article!

July 11, 2020

Andrew Montalenti (amontalenti)

Learning about babashka (bb), a minimalist Clojure for building CLI tools July 11, 2020 06:25 PM

A few years back, I wrote Clojonic: Pythonic Clojure, which compares Clojure to Python, and concluded:

My exploration of Clojure so far has made me realize that the languages share surprisingly more in common than I originally thought as an outside observer. Indeed, I think Clojure may be the most “Pythonic” language running on the JVM today (short of Jython, of course).

That said, as that article discussed, Clojure is a very different language than Python. As Rich Hickey, the creator of Clojure, put it in his “A History of Clojure”:

Most developers come to Clojure from Java, JavaScript, Python, Ruby and other OO languages. [… T]he most significant […] problem  [in adopting Clojure] is learning functional programming. Clojure is not multiparadigm, it is FP or nothing. None of the imperative techniques they are used to are available. That said, the language is small and the data structure set evident. Clojure has a reputation for being opinionated, opinionated languages being those that somewhat force a particular development style or strategy, which I will graciously accept as meaning the idioms are clear, and somewhat inescapable.

There is one area in which Clojure and Python seem to have a gulf between them, for a seemingly minor (but, in practice, major) technical reason. Clojure, being a JVM language, inherits the JVM’s slow startup time, especially for short-lived scripts, as is common for UNIX CLI tools and scripts.

As a result, though Clojure is a relatively popular general purpose programming language — and, indeed, one of the most popular dynamic functional programming languages in existence — it is still notably unpopular for writing quick scripts and commonly-used CLI tools. But, in theory, this needn’t be the case!

If you’re a regular UNIX user, you probably have come across hundreds of scripts with a “shebang”, e.g. something like #!/usr/bin/env python3 at the top of Python 3 scripts or #!/bin/bash for bash scripts. But I bet you have rarely, perhaps never, come across something like #!/usr/bin/env java or #!/usr/bin/env clojure. It’s not that either of these is impossible or unworkable. No, they are simply unergonomic. Thus, they aren’t preferred.

The lack of ergonomics stems from a number of reasons inherent to the JVM, notably slow startup time and complex system-level classpath/dependency management.

Given Clojure’s concision, readability, and dynamism, it might be a nice language for scripting and CLI tools, if we could only get around that slow startup time problem. Could we somehow leverage the Clojure standard library and a subset of the Java standard library as a “batteries included” default environment, and have it all compiled into a fast-launching native binary?

Well, it turns out, someone else had this idea, and went ahead and implemented it. Enter babashka.


To quote the README:

Babashka is implemented using the Small Clojure Interpreter. This means that a snippet or script is not compiled to JVM bytecode, but executed form by form by a runtime which implements a sufficiently large subset of Clojure. Babashka is compiled to a native binary using GraalVM. It comes with a selection of built-in namespaces and functions from Clojure and other useful libraries. The data types (numbers, strings, persistent collections) are the same. Multi-threading is supported (pmapfuture). Babashka includes a pre-selected set of Java classes; you cannot add Java classes at runtime.

Wow! That’s a pretty neat trick. If you install babashka — which is available as a native binary for Windows, macOS, and Linux — you’ll be able to run bb to try it out. For example:

$ bb
Babashka v0.1.3 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.

user=> (+ 2 2)
user=> (println (range 5))
(0 1 2 3 4)
user=> :repl/quit

And, the fast startup time is legit. For example, here’s a simple “Hello, world!” in Clojure stored in hello.clj:

(println "Hello, world!")

Now compare:

$ multitime -n 10 -s 1 clojure hello.clj
        Mean        Std.Dev.    Min         Median      Max
user    1.753       0.090       1.613       1.740       1.954       
$ multitime -n 10 -s 1 bb hello.clj
        Mean        Std.Dev.    Min         Median      Max
user    0.004       0.005       0.000       0.004       0.012       

That’s a pretty big difference on my modern machine! That’s a median startup time of 1.7 seconds using the JVM version, and a median startup time of 0.004 seconds — that is, four one-thousandths of a second, or 4 milliseconds — using bb, the Babashka version! The JVM version is almost 500x slower!

How does this compare to Python?

$ multitime -n 10 -s 1 python3
        Mean        Std.Dev.    Min         Median      Max
user    0.012       0.004       0.006       0.011       0.018       

So, bb‘s startup is as fast as, perhaps even a little faster than, Python 3. Pretty cool!

All that said, the creator of Babashka has said, publicly:

It’s not targeted at Python programmers or Go programmers. I just want to use Clojure. The target audience for Babashka is people who want to use Clojure to build scripts and CLI tools.

Fair enough. But, as Rich Hickey said, there can be really good reasons for Python, Ruby, and Go programmers to take a peek at Clojure. There are some situations in which it could really simplify your code or approach. Not always, but there are certainly some strengths. Here’s what Hickey had to say about it:

[New Clojure users often] find the amount of code they have to write is significantly reduced, 2—5x or more. A much higher percentage of the code they are writing is related to their problem domain.

Aside from being a useful tool for this niche, bb is also just a fascinating F/OSS research project. For example, the way it manages to pull off native binaries across platforms is via the GraalVM native-image facility. Studying GraalVM native-image is interesting in itself, but bb makes use of this facility and makes its benefit accessible to Clojure programmers without resorting to complex build toolchains.

With bb now stable, its creator took a stab at rewriting the clojure wrapper script itself in Babashka. That is, Clojure programmers may not have realized that when they invoke clojure on Linux, what’s really happening is that they are calling out to a bash script that then detects the local JVM and classpath, and then execs out to the java CLI for the JVM itself. On Windows, that same clojure wrapper script is implemented in PowerShell, pretty much by necessity, and serves the same purpose as the Linux bash script, but is totally different code. Well, now there’s something called deps.clj, which eliminates the need to use bash and PowerShell here, and uses Babashka-flavored Clojure code instead. See the deps.clj rationale in the README for more on that.

If you want a simple real-world example of a full-fledged Babashka-flavored Clojure program that does something useful at the command-line, you can take look at clj-kondo, a simple command-line Clojure linter (akin to pyflakes or flake8 in the Python community), which is also by the same author.

Overall, Babashka is not just a really cool hack, but also a very useful tool in the Clojurist’s toolbelt. I’ve become a convert and evangelist, as well as a happy user. Congrats to Michiel Borkent on a very interesting and powerful piece of open source software!

Note: Some of my understanding of Babashka solidified when hearing Michiel describe his project at the Clojure NYC virtual meetup. The meeting was recorded, so I’ll update this blog post when the talk is available.

Gustaf Erikson (gerikson)

June July 11, 2020 04:45 PM

Telemedecine is the only light in the darkness of COVID

This pic was supposed to be part of a pictorial depicting one day in my life during Corona, but I got bored of the concept. I just added it here so I don’t have an embarrasing gap for June 2020.

Jun 2019 | Jun 2018 | Jun 2017 | Jun 2016 | Jun 2015 | Jun 2014 | Jun 2013 | Jun 2012 | Jun 2011 | Jun 2010 | Jun 2009

Gonçalo Valério (dethos)

Why you shouldn’t remove your package from PyPI July 11, 2020 11:26 AM

Nowadays most software developed using the Python language relies on external packages (dependencies) to get the job done, correctly managing this “supply-chain” ends up being very important and having a big impact on the end product.

As a developer you should be cautious about the dependencies you include on your project, as I explained in a previous post, but you are always dependent on the job done by the maintainers of those packages.

As a public package owner/maintainer you also have to be aware that your decisions, actions and the code you write will have an impact on the projects that depend directly or indirectly on your package.

With this small introduction we arrive to the topic of this post, which is “What to do as a maintainer when you no longer want to support a given package?” or ” How to properly rename my package?”.

In both of these situations you might think “I will start by removing the package from PyPI”, so I hope the next lines will convince you that this is the worst you can do, for two reasons:

  • You will break the code or the build systems of all projects that depend on the current or past versions of your package.
  • You will free the namespace for others to use and if your package is popular enough this might become a juicy target for any malicious actor.

TLDR: your will screw your “users”.

The left-pad incident, while it didn’t happen in the python ecosystem, is a well known example of the first point and shows what happens when a popular package gets removed from the public index.

Malicious actors usually register packages using names that are similar to other popular packages with the hope that a user will end up installing them by mistake, something that already has been found multiple times on PyPI. Now image if that package name suddenly becomes available and is already trusted by other projects.

What should you do it then?

Just don’t delete the package.

I admit that in some rare occasions it might be required, but most of the time the best thing to do is to leave it there (specially for open-source ones).

Adding a warning to the code and informing the users in the README that the package is no longer maintained or safe to use is also a nice thing to do.

One good example I would like to mention was the transition from model-mommy to model-bakery. As a user it felt that it was done properly. Here is an overview of the steps they took:

  1. A new source code repository was created with the same contents. (This step is optional)
  2. After doing the required changes a new package was uploaded to PyPI.
  3. Deprecation warnings were added to the old code, mentioning the new package.
  4. The documentation was updated mentioning the new package and making it clear the old package will no longer be maintained.
  5. A new release of the old package was created, so the user could see the deprecation warnings.
  6. All further development was done on the new package.
  7. The old code repository was archived.

So here is what is shown every time the test suite of an affected project is executed:

/lib/python3.7/site-packages/model_mommy/ DeprecationWarning: Important: model_mommy is no longer maintained. Please use model_bakery instead:

In the end, even though I didn’t update right away, everything kept working and I was constantly reminded that I needed to make the change.

July 10, 2020

Robin Schroer (sulami)

Keyboardio Atreus Review July 10, 2020 12:00 AM

I recently received my early bird Keybardio Atreus Promotional photo courtesy of Keyboardio

from the Kickstarter and have now been using it for about three weeks, so I am writing a review for folks considering buying one after release.

A Bit of History

Most of this is also outlined on the Atreus website, but here is the short version: my colleague Phil Hagelberg designed the original Atreus keyboard in 2014, and has been selling kits for self-assembly ever since.

In 2019 Keyboardio, the company which created the Model 01, got together with Phil to build a pre-assembled commercial version of the Atreus. Their Kickstarter ran earlier in 2020 and collected almost $400k.

Phil’s original 42-key version can be built with either a PCB or completely hand-wired, and uses a wooden, acrylic, or completely custom (e.g. 3D-printed) case.

Keyboardio split the two larger thumb keys into two regular size keys, bringing the total up to 44, and uses a PCB and Cherry MX-style switches mounted on an Aluminium plate inside a black ABS case.


At a first impression, it is incredibly smallDimensions taking from the product page: 24.3 × 10 × 2.8cm, 310g.

, noticeably smaller still than the small Apple Magic Keyboard. At the same time, it uses a regular key spacing, so once your hands are in place it does not feel cramped at all. On the contrary, every time I use a different keyboard now, I feel that half the keys are too far away to reach comfortably. It is also flat enough that I can use it without a wrist rest.

Mine has Kailh Speed Copper switches, which require 40g of force to actuate, with very early actuation points. They are somewhat comparable to Cherry MX Browns without the dead travel before the tactile bump. As mentioned above, the switches are mounted on an aluminium plate, and can be swapped without disassembly.

The early actuation point of the switches does require some getting used to, I keep experiencing some key chatter, especially on my weaker fingers, though Jesse from Keyboardio is working hard on alleviating that.

When it comes to noise, you can hear that it is a mechanical keyboard. Even with relatively quiet switches, the open construction means that the sound of the keys getting released is audible in most environments. I would hesitate to bring it to a public space, like a café or a co-working space. Open-office depends on the general noise level, and how tolerant your coworkers are, I have not had anyone complain about the sound level in video conferences.

The keycaps used are XDA-profileSymmetrical and the same height across the keyboard, like lower profile SDA. That means you can rearrange them between rows.

laser-engraved PBT of medium thickness. Apparently there have been a lot of issues with the durability of the labels, so the specifics of that might change. I personally have had a single key start to fade a bit over 3 weeks of use, but I do not actually care.

The keyboard is powered by the ATmega32U4, which is a pretty standard controller for a keyboard, it is also used in the Teensy 2.0 for example.

I would judge the overall build quality as good. While it does not feel like an ultra-premium product, there is nothing specific I can actually complain about, no rough edges or manufacturing artefacts.


Out of the box, the keyboard uses the open-source Kaleidoscope firmware, which can be configured with the (also open-source) Chrysalis graphical configurator. Chrysalis with my Layer 0

Supposedly it is also possible to use QMK, and Phil has recently written Menelaus, a firmware in Microscheme.

I have stuck with (pre-release versions of) Kaleidoscope so far, which has worked out fairly well. Chrysalis is an Electron app, and doing sweeping changes in it can be a bit cumbersome compared to using text-based, declarative configuration, but it does the job. Flashing a new version onto the keyboard only takes a few seconds. I also have to mention the extensive documentation available. Kaleidoscope has a rich plugin infrastructure, very little of which I actually use, but it does seem to rival QMK in flexibility.

I am using the Atreus with Colemak, the same layout I have been using for almost a decade now, and compared to trying the Ergodox,When I tried using an Ergodox for the first time, the ortholinear layout really threw me off, and I kept hitting right in between keys.

the switching was much smoother. I am mostly back to my regular typing speed of 80-90 WPM after three weeks, and I can still use a regular staggered layout keyboard without trouble.

The modifier keys at the bottom are unusual, but work for me. I use the three innermost keys with my thumbs, and the bottom edges by just pushing down with my palm. It does require some careful arrangement to avoid often having to press two modifiers on the same time at once.

With only 44 physical keys, the keyboard makes heavy use of layers, which can be temporarily shifted to when holding a key, or switched to permanently. By default the first extra layer has common special characters on the left half, and a numpad on the right, which works better than a regular keyboard for me.

The only problem I sometimes have is the lack of a status indicator. This means I have to keep track of the keyboard state in my head when switching layers. Not a big problem though.


My conclusion is quite simple: if you are in the market for a keyboard like this, this might be the keyboard for you. It does what it does well, and is much cheaper than anything comparable that does not require manual assembly. I personally enjoy the small form factor, the flexible (set of) firmware, and the RSI-friendly layout.

I also want to highlight the truly amazing effort Keyboardio puts into supporting their customers. You can browse the Kickstarter or their GitHub projects to see how much effort they put into this, and I have been in contact with Jesse myself while trying to debug a debouncing issue in the firmware. I am very happy to support them with my wallet.

July 09, 2020

Tobias Pfeiffer (PragTob)

Guest on Parallel Passion Podcast July 09, 2020 08:24 PM

Hey everyone, yes yes I should blog more. The world is just in a weird place right now affecting all of us and I hope you’re all safe & sound. I do many interesting things while freelancing, but sadly didn’t allocate the time to blog about them yet. What did get the time to is […]

July 06, 2020

Frederik Braun (freddyb)

Hardening Firefox against Injection Attacks – The Technical Details July 06, 2020 10:00 PM

This blog post has first appeared on the Mozilla Attack & Defense blog and was co-authored with Christoph Kerschbaumer and Tom Ritter

In a recent academic publication titled Hardening Firefox against Injection Attacks (to appear at SecWeb – Designing Security for the Web) we describe techniques which we have incorporated into Firefox …

Andreas Zwinkau (qznc)

Wardley Maps July 06, 2020 12:00 AM

A book which presents a map visualization for business strategy

Read full article!

July 05, 2020

Ponylang (SeanTAllen)

Last Week in Pony - July 5, 2020 July 05, 2020 10:40 PM

There is a new set of public Docker images for Pony with SSL system libraries installed. These will be replacing the previous “x86-64-unknown-linux-builder-with-ssl” image.

Derek Jones (derek-jones)

Algorithms are now commodities July 05, 2020 10:14 PM

When I first started writing software, developers had to implement most of the algorithms they used; yes, hardware vendors provided libraries, but the culture was one of self-reliance (except for maths functions, which were technical and complicated).

Developers read Donald Knuth’s The Art of Computer Programming, it was the reliable source for step-by-step algorithms. I vividly remember seeing a library copy of one volume, where somebody had carefully hand-written, in very tiny letters, an update to one algorithm, and glued it to the page over the previous text.

Algorithms were important because computers were not yet fast enough to solve common problems at an acceptable rate; developers knew the time taken to execute common instructions and instruction timings were a topic of social chit-chat amongst developers (along with the number of registers available on a given cpu). Memory capacity was often measured in kilobytes, every byte counted.

This was the age of the algorithm.

Open source commoditized algorithms, and computers got a lot faster with memory measured in megabytes and then gigabytes.

When it comes to algorithm implementation, developers are now spoilt for choice; why waste time implementing the ‘low’ level stuff when there were plenty of other problems waiting to be implemented.

Algorithms are now like the bolts in a bridge: very important, but nobody talks about them. Today developers talk about story points, features, business logic, etc. Given a well-defined problem, many are now likely to search for an existing package, rather than write code from scratch (I certainly work this way).

New algorithms are still being invented, and researchers continue to look for improvements to existing algorithms. This is a niche activity.

There are companies where algorithms are not commodities. Google operates on a scale where what appears to others as small improvements, can save the company millions (purely because a small percentage of a huge amount can be a lot). Some company’s core competency may include an algorithmic component (whose non-commodity nature gives the company its edge over the competition), with the non-core competency treating algorithms as a commodity.

Knuth’s The Art of Computer Programming played an important role in making viable algorithms generally available; while the volumes are frequently cited, I suspect they are rarely read (I have not taken any of my three volumes off the shelf, to read, for years).

A few years ago, I suddenly realised that I was working on a book about software engineering that not only did not contain an algorithms chapter, and the 103 uses of the word algorithm all refer to it as a concept.

Today, we are in the age of the ecosystem.

Algorithms have not yet completed their journey to obscurity, which has to wait until people can tell computers what they want and not be concerned about the implementation details (or genetic algorithm programming gets a lot better).

Patrick Louis (venam)

D-Bus and Polkit, No More Mysticism and Confusion July 05, 2020 09:00 PM

freedesktop logo

Dbus and Polkit are two technologies that emanate an aura of confusion. While their names are omnipresent in discussions, and the internet has its share of criticism and rants about them, not many have a grasp of what they actually do. In this article I’ll give an overview of these technologies.

D-Bus, or Desktop Bus, is often described as a software that allows other processes to communicate with one another, to perform inter-process communication (IPC). However, this term is generic and doesn’t convey what it is used for. Many technologies exist that can perform IPC, from plain socket, to messaging queue, so what differentiates D-Bus from them.

D-Bus can be considered a middleware, a software glue that sits in the middle to provide services to software through a sort of plugin/microkernel architecture. That’s what the bus metaphor represents, it replicates the functionality of hardware buses, with components attaching themselves to known interfaces that they implement, and providing a mean of communication between them. With D-Bus these can be either procedure calls aka methods or signals aka notifications.

While D-Bus does offer 1-to-1 and 1-to-many IPC, it’s more of a byproduct of its original purpose than a mean of efficient process to process data transfer — it isn’t meant to be fast. D-Bus emerges from the world of desktop environments where blocks are well known, and each implements a functionality that should be accessible from other processes if needed without having to reinvent the transfer mechanism for each and every software.
This is the problem it tackles: having components in a desktop environment that are distributed in many processes, each fulfilling a specific job. In such case, if a process implements the behavior needed, instead of reimplementing it, it can instead harness the feature already provided by that other process.

Its design is heavily influenced by Service Oriented Architectures (SOA), Enterprise Service Buses (ESB), and microkernel architectures.
A bus permits abstracting communication between software, replacing all direct contact, and only allowing them to happen on the bus instead.
Additionally, the SOA allows software to expose objects that have methods that can be called remotely, and also allows other software to subscribe/publish events happening in remote objects residing in other software.
Moreover, D-Bus provides an easy plug-and-play, a loose coupling, where any software could detach itself from the bus and allow another process to be plugged, containing objects that implement the same features the previous process implemented.
In sum, it’s an abstraction layer for functionalities that could be implemented by any software, a standardized way to create pluggable desktop components. This is what D-Bus is about, this is the role it plays, and it explains the difficulty in grasping the concepts that gave rise to it.

The big conceptual picture goes as follows.
We have a D-Bus daemon running at an address and services that implement well known behaviors. These services attach to the D-Bus daemon and the attachment edge has a name, a bus name.
Inside these services, there are objects that implement the well known behavior. These objects also have a path leading to them so that you can target which object within that service implements the specific interface needed.
Then, the interface methods and events can be called or registered on this object inside this service, connected to this bus name, from another service that requires the behavior implemented by that interface to be executed.

This is how these particular nested components interact with one another, and it gives rise to the following:

Address of D-Bus daemon ->
Bus Name that the service attached to ->
Path of the object within this service ->
Interface that this object implements ->
Method or Signal concrete implementation

Or in graphical form:

D-Bus ecosystem

Instead of having everyone talk to one another:

p2p interaction

Let’s take a method call example that shows these 3 required pieces of information.

org.gnome.SessionManager \
/org/gnome/SessionManager \

   boolean true

Here, we have the service bus name org.gnome.SessionManager, the object path /org/gnome/SessionManager, and the interface/method name org.gnome.SessionManager.CanShutdown, all separated by spaces. If the /org/gnome/SessionManager only implements a single interface then we could call it as such CanShutdown, but here it doesn’t.

Let’s dive deeper into the pieces we’ve mentioned. They are akin to the ones in an SOA ecosystem, but with the addition of the bus name, bus daemon, and the abstraction for the plug-and-play.

  • Objects

An object is an entity that resides in a process/service and that effectuates some work. It is identified by a path name. The path name is usually written, though not mandatory, in a namespace format where it is grouped and divided by slashes /, just like Unix file system path.

For example: /org/gnome/Nautilus/window/1.

Objects have methods and signals, methods take input and return output, while signals are events that processes can subscribe to.

  • Interfaces

These methods and signals are concrete implementations of interfaces, the same definition as in OOP.
As with OOP, interfaces are a group of abstractions that have to be defined in the object that implements them. The members, methods and signals, are also namespaced under this interface name.


member method=GetRunningApplications
absolute name of method=org.gnome.Shell.Introspect.GetRunningApplications

Some interfaces are commonly implemented by objects, such as the org.freedesktop.Introspectable interface, which, as the name implies, makes the object introspectable. It allows to query the object about its capabilities, features, and other interfaces it implements. This is a very useful feature because it allows discovery.
It’s also worth mentioning that dbus can be used in a generic way to set and get properties of services’ objects through the org.freedesktop.DBus.Properties interface.

Interfaces can be described as standard, and for documentation, in D-Bus XML configuration files so that other programmers can use the reference to implement them properly. These files can also be used to auto-generate classes from the XML, making it quicker to implement and less error-prone.
These files can usually be found under /usr/share/dbus-1/interfaces/. Our org.gnome.Shell.Introspect of earlier is there in the file org.gnome.Shell.Introspect.xml along with our method GetRunningApplications. Here’s an excerpt of the relevant section.

	@short_description: Retrieves the description of all running applications

	Each application is associated by an application ID. The details of
	each application consists of a varlist of keys and values. Available
	keys are listed below.

	'active-on-seats' - (as)   list of seats the application is active on
								(a seat only has at most one active
<method name="GetRunningApplications">
	<arg name="apps" direction="out" type="a{sa{sv}}" />

Notice the type= part, which describes the format of the output, we’ll come back to what this means in the message format section, but in short each letter represents a basic type. The out direction means that it’s the type of an output value of the method, similarly in is for method parameters. See the following example taken from org.gnome.Shell.Screenshot.xml.

	@x: the X coordinate of the area to capture
	@y: the Y coordinate of the area to capture
	@width: the width of the area to capture
	@height: the height of the area to capture
	@flash: whether to flash the area or not
	@filename: the filename for the screenshot
	@success: whether the screenshot was captured
	@filename_used: the file where the screenshot was saved

	Takes a screenshot of the passed in area and saves it
	in @filename as png image, it returns a boolean
	indicating whether the operation was successful or not.
	@filename can either be an absolute path or a basename, in
	which case the screenshot will be saved in the $XDG_PICTURES_DIR
	or the home directory if it doesn't exist. The filename used
	to save the screenshot will be returned in @filename_used.
<method name="ScreenshotArea">
	<arg type="i" direction="in" name="x"/>
	<arg type="i" direction="in" name="y"/>
	<arg type="i" direction="in" name="width"/>
	<arg type="i" direction="in" name="height"/>
	<arg type="b" direction="in" name="flash"/>
	<arg type="s" direction="in" name="filename"/>
	<arg type="b" direction="out" name="success"/>
	<arg type="s" direction="out" name="filename_used"/>
  • Proxies

Proxies are the nuts and bolts of an RPC ecosystem, they represent remote objects, along with their methods, in your native code as if they were local. Basically, these are wrappers to make it more simple to manipulate things on D-Bus programmatically instead of worrying about all the components we’ve mentioned above. Programming with proxies might look like this.

Proxy proxy = new Proxy(getBusConnection(), "/remote/object/path");
Object returnValue = proxy.MethodName(arg1, arg2);
  • Bus names

The bus name, or also sometimes called connection name, is the name of the connection that an application gets assigned when it connects to D-Bus. Because D-Bus is a bus architecture, it requires that each assigned name be unique, you can’t have two applications using the same bus name. Usually, it is the D-Bus daemon that generates this random unique value, one that begins with a colon by convention, however, applications may ask to own well-known names instead. These well-known names, as reverse domain names, are for cases when people want to agree on a standard unique application that should implement a certain behavior. Let’s say for instance a specification for a com.mycompany.TextEditor bus name, where the mandatory object path should be /com/mycompany/TextFileManager, and supporting interface org.freedesktop.FileHandler. This makes the desktop environment more predictable and stable. However, today this is still only a dream and has nothing to do with current desktop environment implementations.

  • Connection and address of D-Bus daemon

The D-Bus daemon is the core of D-Bus, it is what everything else attaches itself to. Thus, the address that the daemon uses and listens to should be well known to clients. The mean of communication can be varied from UNIX domain sockets to TCP/IP sockets if used remotely.
In normal scenarios, there are two daemons running, a system-wide daemon and a per-session daemon, one for system-level applications and one for session related applications such as desktop environments. The address of the session bus can be discovered by reading the environment variable $DBUS_SESSION_BUS_ADDRESS, while the address of the system D-Bus daemon is discovered by checking a predefined UNIX domain socket path, though it can be overridden by using another environment variable, namely $DBUS_SYSTEM_BUS_ADDRESS.
Keep in mind that it’s always possible to start private buses, private daemons for non-standard use.

  • Service

A service is the application daemon connected to a bus that provides some utility to clients via the objects it contains that implement some interfaces. Normally we talk of services when the bus name is well-known, as in not auto-generated but using a reverse domain name. Due to D-Bus nature, services are singleton and owner of the bus name, and thus are the only applications that can fulfill specific requests. If any other application wants to use the particular bus name they have to wait in a queue of aspiring owner until the first one relinquishes it.

Within the D-Bus ecosystem, you can request that the D-Bus daemon automatically start a program, if not already started, that provides a given service (well-known name) whenever it’s needed. We call this service activation. It’s quite convenient as you don’t have to remember what application does what, nor care if it’s already running, but instead send a generic request to D-Bus and rely on it to launch it.

To do this we have to define a service file in the /usr/share/dbus-1/services/ directory that describes what and how the service will run.
A simple example goes as follows.

[D-BUS Service]

You can also specify the user with which the command will be executed using a User= line, and even specify if it’s in relation with a systemd service using SystemdService=.

Additionally, if you are creating a full service, it’s a good practice to define its interfaces explicitly in the /usr/share/dbus-1/interfaces as we previously mentioned.

Now when calling the org.gnome.ServiceName, D-Bus will check to see if the service exists already on the bus, if not it will block the method call, search for the service in the directory, if it matches, start the service as specified to take ownership of the bus name, and finally continue with the method call. If there’s no service file, an error is returned. It’s possible programmatically to make such call asynchronous to avoid blocking.

This is actually a mechanism that systemd can use for service activation when the application acquires a name on dbus (Service Type=dbus). For example, polkit and wpa_supplicant. When the dbus daemon is started with --systemd-activation, as shown below, then systemd services can be started on the fly whenever they are needed. That’s also related to SystemdService= we previously mentioned, as both a systemd unit file and a dbus daemon service file are required in tandem.

dbus         498       1  0 Jun05 ?        00:01:41 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
vnm          810     795  0 Jun05 ?        00:00:19 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

And the systemd unit file for polkit.

Description=Authorization Manager

ExecStart=/usr/lib/polkit-1/polkitd --no-debug

Here’s an exploratory example of service activation.
Let’s say we found a service file for Cheese (A webcam app) in the /service directory that is called org.gnome.Cheese.service.

We have no clue what interfaces and methods it implements because its interfaces aren’t described in the /interfaces directory, so we send it any message.

$ dbus-send --session \
--dest=org.gnome.Cheese \
/ org.gnome.Cheese.nonexistent

If we now take a look at the processes, we can clearly see it has been started by the dbus daemon.

$ ps -ef | grep cheese
vnm       514882  514877  0 09:43 pts/21   00:00:00 grep cheese

Cheese probably implements introspect so let’s try to see which methods it has.

$ gdbus introspect --session \
--dest org.gnome.Cheese \
--object-path /org/gnome/Cheese | less

We can see that it implements the org.freedesktop.Application interface that is described here, but that I couldn’t find the interface description of in /usr/share/dbus-1/interfaces/. So let’s try to call one of them, the org.freedesktop.Application.Activate seems interesting, it should start the application for us.

$ gdbus call --session --dest org.gnome.Cheese \
--object-path /org/gnome/Cheese \
--method org.freedesktop.Application.Activate  '{}'

NB: I’m using gdbus instead of dbus-send because dbus-send has limitation with complex types such as (a{sv}), a dictionary of key with type “string” and value of type “variant”. We’ll explain the types in the next section.

And cheese will open.
So this call is based on pure service activation.

What kind of messages are sent, and what’s up with the type we mentioned.

Messages, the unit of data transfer in D-Bus, are composed of header and data. The header contains information regarding the sender, receiver, and the message type, while the data is the payload of the message.

The D-Bus message type, not to be confused with the type format of the data payload, could be either a signal (DBUS_MESSAGE_TYPE_SIGNAL), a method call(DBUS_MESSAGE_TYPE_SIGNAL), or an error (DBUS_MESSAGE_TYPE_ERROR).

D-Bus is fully typed and type-safe as far as the payload is concerned, that means the types are predefined and are checked to see if they fit the signatures.

The following types are available:

<contents>   ::= <item> | <container> [ <item> | <container>...]
<item>       ::= <type>:<value>
<container>  ::= <array> | <dict> | <variant>
<array>      ::= array:<type>:<value>[,<value>...]
<dict>       ::= dict:<type>:<type>:<key>,<value>[,<key>,<value>...]
<variant>    ::= variant:<type>:<value>
<type>       ::= string | int16 | uint16 | int32 | uint32 | int64 | uint64 | double | byte | boolean | objpath

These are what is represented in the previous example with the type= in the interface definition. Here are some descriptions.

b           ::= boolean
s           ::= string
i           ::= int
u           ::= uint
d           ::= double
o           ::= object path
v           ::= variant (could be different types)
a{keyvalue} ::= dictionary of key-value type
a(type)     ::= array of value of type

As was said, the actual method of transfer of the information isn’t mandated by the protocol, but it can usually be done locally via UNIX sockets, pipes, or via TCP/IP.

It wouldn’t be very secure to have anyone on the machine be able to send messages to the dbus daemon and do service activation, or call any and every method, some of them could be dealing with sensitive data and activities. It wouldn’t be very secure either to have this data sent in plain text.
On the transfer side, that is why D-Bus implements a simple protocol based on SASL profiles for authenticating one-to-one connections. For the authorization, the dbus daemon controls access to interfaces by a security system of policies.

The policies are read and represented in XML files that can be found in multiple places, including /usr/share/dbus-1/session.conf, /usr/share/dbus-1/system.conf/, /usr/share/dbus-1/session.d/*, and /usr/share/dbus-1/system.d/*.
These files mainly control which user can talk to which interface. If you are not able to talk with a D-Bus service or get an org.freedesktop.DBus.Error.AccessDeniederror, then it’s probably due to one of these files.

For example:

<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
	<policy user="vnm">
		<allow own="net.nixers"/>
		<allow send_destination="net.nixers"/>
		<allow send_interface="net.nixers.Blog" send_member="GetPosts"/>

In this example, the user “vnm” can:

  • Own the interface net.nixers
  • Send messages to the owner of the given service
  • Call GetPosts from interface net.nixers.Blog

If services need more granularity when it comes to permission, then polkit can be used instead.

There’s a lot more that can be configured in the dbus daemon, namely in the configuration files for the session wide daemon in /usr/share/dbus-1/session.conf, and the system wide daemon in /usr/share/dbus-1/system.conf. Such as the way it listens to connections, the limits regarding messages, and where they read other files.

So how do we integrate and harness dbus in our client or service programs.

libdbus schema

We do this using libraries, of course, which there are many. The most low-level one being libdbus, the reference implementation of the specification. However, it’s quite hard to use so people rely on other libraries such as GDBus (part of GLib in GNOME), QtDBus (part of Qt so KDE too), dbus-java, and sd-bus (which is part of systemd).
Some of these libraries offer the proxy capability we’ve talked, namely manipulating dbus objects as if they were local. They also could offer ways to generate classes in the programming language of choice by inputting an interface definition file (see gdbus-codegen and qdbusxml2cpp for an idea).

Let’s name a few projects that rely on D-Bus.

  • KDE: A desktop environment based on Qt
  • GNOME: A desktop environment based on gtk
  • Systemd: An init system
  • Bluez: A project adding Bluetooth support under Linux
  • Pidgin: An instant messaging client
  • Network-manager: A daemon to manage network interfaces
  • Modem-manager: A daemon to provide an API to dial with modems - works with Network-Manager
  • Connman: Same as Network-Manager but works with Ofono for modem
  • Ofono: A daemon that exposing features provided by telephony devices such as modems

One thing that is nice about D-Bus is that there is a lot of tooling to interact with it, it’s very exploratory.

Here’s a bunch of useful ones:

  • dbus-send: send messages to dbus
  • dbus-monitor: monitor all messages
  • gdbus: manipulate dbus with gtk
  • qdbus: manipulate dbus with qt
  • QDBusViewer: exploratory gui
  • D-Feet: exploratory gui

I’ll list some examples.

Monitor all the method calls in the org.freedesktop namespace.

$ dbus-monitor --session type=method_call \

For instance, we can debug what happens when we use the command line tool notify-send(1).

This is equivalent to this line of gdbus(1).

$ gdbus call --session --dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications \
--method org.freedesktop.Notifications.Notify \
my_app_name 42 \
gtk-dialog-info "The Summary" \
"Here's the body of the notification" '[]' '{}' 5000

Or as we’ve seen, we can use dbus-send(1), however it has some limitations with dictionaries and variant types. Here are some more examples of it.

$ dbus-send --system --print-reply \
--dest=org.freedesktop.systemd1 \
/org/freedesktop/systemd1/unit/apache2_2eservice \
org.freedesktop.DBus.Properties.Get \
string:'org.freedesktop.systemd1.Unit' \

$ dbus-send --system --print-reply --type=method_call \
--dest=org.freedesktop.systemd1 \
/org/freedesktop/systemd1 \
org.freedesktop.systemd1.Manager.GetUnit \

D-Feet QDBusViewer

D-Feet and QDBusViewer are GUI that are driven by the introspectability of objects. You can also introspect using gbus and qdbus.

Either through calling org.freedesktop.DBus.Introspectable.Introspect.

With gdbus:

$ gdbus call --session --dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications \
--method org.freedesktop.DBus.Introspectable.Introspect

With dbus-send:

$ dbus-send --session --print-reply \
--dest=org.freedesktop.Notifications \
/org/freedesktop/Notifications \

Or by using the introspect feature of the tool, here gdbus, which will output it in a fancy colored way:

$ gdbus introspect --session \
--dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications

D-Bus is not without limitations and critics. As we said in the introduction, it isn’t meant for high performance IPC, it’s meant for control, and not data transfer. So it’s fine to use it to activate a chat application, for instance, but not to have a whole media stream pass on it.
D-Bus has also been criticized as being bloated and over-engineered, though those claims are often unsubstantiated and only come from online rants. It remains that D-Bus is still heavily popular and that there’s no replacement that is a real contender.

Now, let’s turn our attention to Polkit.

Polkit, formerly PolicyKit, is a service running on dbus that offers clients a way to perform granular system-wide privilege authentication, something dbus default policies are not able to do, nor sudo.
Unlike sudo, that switches the user and grants permission to the whole process, polkit delimits distinct actions, categorizes users by group or name, and decides whether the action is allowed or not. This is all offered system-wide, so that dbus services can query polkit to know if clients have privileges or not.
In polkit parlance, we talk of MECHANISMS, privileged services, that offer actions to SUBJECTS, which are unprivileged programs.

The polkit authority is a system daemon, usually dbus service activated, named “polkitd”, and running as the polkitd user UID.

$ ps -ef | grep polkitd
polkitd   904  1  0 Jun05 ?  00:00:34 /usr/lib/polkit-1/polkitd --no-debug

The privileged services (MECHANISMS) can define a set of actions for which authentication is required. If another process wants to access the method of such privileged service, maybe through dbus method call, the privilege service will query polkit. Polkit will then consult two things, the action policy defined by that service and a set of programmatic rules that generally apply. If needed, polkit will initiate an authentication agent to verify that the user is who they say they are. Finally, polkit sends its result back to the privilege service and let it know if the user is allowed to perform the action or not.

In summary, the following definitions apply:

  • Subject - a user
  • Action - a privileged duty that (generally) requires some authentication.
  • Result - the action to take given a subject/action pair and a set of rules. This may be to continue, to deny, or to prompt for a password.
  • Rule - a piece of logic that maps a subject/action pair to a result.

And they materialize in these files:

  • /usr/share/polkit-1/actions - Default policies for each action. These tell polkit whether to allow, deny, or prompt for a password.
  • /etc/polkit-1/rules.d - User-supplied rules. These are JavaScript scripts.
  • /usr/share/polkit-1/rules.d - Distro-supplied rules. Do not change these because they will be overwritten by the next upgrade.

Which can be summarized in this picture:

polkit architecture

Thus, polkit works along a per-session authentication agent, usually started by the desktop environment. This is another service that is used whenever a user needs to be prompted for a password to prove its identity.
The polkit package contains a textual authentication agent called pkttyagent, which is used as a general fallback but lacks in features. I advise anyone that is trying the examples in this post to install a decent authentication agent instead.

Here’s a list of popular ones:

  • lxqt-policykit - which provides /usr/bin/lxqt-policykit-agent
  • lxsession - which provides /usr/bin/lxpolkit
  • mate-polkit - which provides /usr/lib/mate-polkit/polkit-mate-authentication-agent-1
  • polkit-efl - which provides /usr/bin/polkit-efl-authentication-agent-1
  • polkit-gnome - which provides /usr/lib/polkit-gnome/polkit-gnome-authentication-agent-1
  • polkit-kde-agent - which provides /usr/lib/polkit-kde-authentication-agent-1
  • ts-polkitagent - which provides /usr/lib/ts-polkitagent
  • xfce-polkit - which provides /usr/lib/xfce-polkit/xfce-polkit

Authentication agent

Services/mechanisms have to define the set of actions for which clients require authentication. This is done through defining a policy XML file in the /usr/share/polkit-1/actions/ directory. The actions are defined in a namespaced format, and there can be multiple ones per policy file.
A simple, grep '<action id' * | less in this directory should give an idea of the type of actions that are available. You can also list all the installed polkit actions, using the pkaction(1) command.

For example:

org.xfce.thunar.policy: <action id="org.xfce.thunar">
org.freedesktop.policykit.policy:  <action id="org.freedesktop.policykit.exec">

NB: File names aren’t required to be the same as the action id namespace.

This file defines metadata information for each action, such as the vendor, the vendor URL, the icon name, the message that will be displayed when requiring authentication in multiple languages, and the description. The important sections in the action element are the defaults and annotate elements.

The defaults element is the one that polkit inspects to know if a client is authorized or not. It is composed of 3 mandatory sub-elements: allow_any for authorization policy that applies to any client, allow_inactive for policy that apply to clients in inactive session on local console, and allow_active for client in the currently active session on local consoles.
These elements take as value one of the following:

  • no - Not authorized
  • yes - Authorized.
  • auth_self - The owner of the current session should authenticate (usually the user that logged in, your user password)
  • auth_admin - Authentication by the admin is required (root)
  • auth_self_keep - Same as auth_self but the authentication is kept for some time that is defined in polkit configurations.
  • auth_admin_keep - Same as auth_admin but also keeps it for some time

The annotate element is used to pass extra key-value pair to the action. There can be multiple key-value that are passed. Some annotations/key-values are well known, such as the org.freedesktop.policykit.exec.path which, if passed to the pkexec program that is shipped by default with polkit, will tell it how to execute a certain program.
Another defined annotation is the org.freedesktop.policykit.imply which will tell polkit that if a client was authorized for the action it should also be authorized for the action in the imply annotation.
One last interesting annotation is the org.freedesktop.policykit.owner, which will let polkitd know who has the right to interrogate it about whether other users are currently authorized to do certain actions or not.

Other than policy actions, polkit also offers a rule system that is applied every time it needs to resolve authentication. The rules are defined in two directories, /etc/polkit-1/rules.d/ and /usr/share/polkit-1/rules.d/. As users, we normally add custom rules to the /etc/ directory and leave the /usr/share/ for distro packages rules.
Rules within these files are defined in javascript and come with a preset of helper methods that live under the polkit object.

The polkit javascript object comes with the following methods, which are self-explanatory.

  • void addRule( polkit.Result function(action, subject) {...});
  • void addAdminRule( string[] function(action, subject) {...}); called when administrator authentication is required
  • void log( string message);
  • string spawn( string[] argv);

The polkit.Result object is defined as follows:

polkit.Result = {
    NO              : "no",
    YES             : "yes",
    AUTH_SELF       : "auth_self",
    AUTH_SELF_KEEP  : "auth_self_keep",
    AUTH_ADMIN      : "auth_admin",
    AUTH_ADMIN_KEEP : "auth_admin_keep",
    NOT_HANDLED     : null

Note that the rule files are processed in alphabetical order, and thus if a rule is processed before another and returns any value other than polkit.Result.NOT_HANDLED, for example polkit.Result.YES, then polkit won’t bother continuing processing the next files. Thus, file name convention does matter.

The functions polkit.addRule, and polkit.addAdminRule, have the same arguments, namely an action and a subject. Respectively being the action being requested, which has an id attribute, and a lookup() method to fetch annotations values, and the subject which has as attributes the pid, user, groups, seat, session, etc, and methods such as isInGroup, and isInNetGroup.

Here are some examples taken from the official documentation:

Log the action and subject whenever the action org.freedesktop.policykit.exec is requested.

polkit.addRule(function(action, subject) {
    if ( == "org.freedesktop.policykit.exec") {
        polkit.log("action=" + action);
        polkit.log("subject=" + subject);

Allow all users in the admin group to perform user administration without changing policy for other users.

polkit.addRule(function(action, subject) {
    if ( == "org.freedesktop.accounts.user-administration" &&
        subject.isInGroup("admin")) {
        return polkit.Result.YES;

Define administrative users to be the users in the wheel group:

polkit.addAdminRule(function(action, subject) {
    return ["unix-group:wheel"];

Run an external helper to determine if the current user may reboot the system:

polkit.addRule(function(action, subject) {
    if ("org.freedesktop.login1.reboot") == 0) {
        try {
            // user-may-reboot exits with success (exit code 0)
            // only if the passed username is authorized
            return polkit.Result.YES;
        } catch (error) {
            // Nope, but do allow admin authentication
            return polkit.Result.AUTH_ADMIN;

The following example shows how the authorization decision can depend on variables passed by the pkexec(1) mechanism:

polkit.addRule(function(action, subject) {
    if ( == "org.freedesktop.policykit.exec" &&
        action.lookup("program") == "/usr/bin/cat") {
        return polkit.Result.AUTH_ADMIN;

Keep in mind that polkit will track changes in both the policy and rules directories, so there’s no need to worry about restarting polkit, changes will appear immediately.

We’ve mentioned a tool called pkexec(1) that comes pre-installed along polkit. This program lets you execute a command as another user, by default executing it as root. It is a sort of sudo replacement but that may appear confusing to most users who have no idea about polkit. However, the integration with authentication agent is quite nice.

So how do we integrate and harness polkit in our subject and mechanism software. We do this using libraries, of course, which there is are many to integrate with different desktop environments.
The libpolkit-agent-1, or the libpolkit-gobject-1 (gtk), libraries are used by the mechanisms, and this is most of what is needed. The portion of code that requires authentication can be wrapped with a check on polkit.
For instance, the polkit_authority_check_authorization() is used to check whether a subject is authorized.

As for writing an authentication agent, it will have to implement the registration methods to be able to receive requests from polkit.

Remember, polkit is a dbus service, and thus all its interfaces are well known and can be introspected. That means that you can possibly interact with it directly through dbus instead of using a helper library.

Polkit also offers some excellent manpages that are extremely useful, be sure to check polkit(8), polkitd(8), pkcheck(1), pkaction(1), pkexec(1).

The following tools are of help:

  • polkit-explorer or polkitex - a GUI to inspect policy files
  • pkcreate - a WIP tool to easily create policy files, but it seems it is lacking
  • pkcheck - Check whether a subject has privileges or not
  • pkexec - Execute a command as another user

Let’s test through some examples.

First pkaction(1), to query the policy file.

$ pkaction -a org.xfce.thunar -v

  description:       Run Thunar as root
  message:           Authentication is required to run Thunar as root.
  vendor:            Thunar
  icon:              system-file-manager
  implicit any:      auth_self_keep
  implicit inactive: auth_self_keep
  implicit active:   auth_self_keep
  annotation:        org.freedesktop.policykit.exec.path -> /usr/bin/thunar
  annotation:        org.freedesktop.policykit.exec.allow_gui -> true

Compared to polkitex:

freedesktop logo

We can get the current shell PID.

$ ps
    PID TTY          TIME CMD
 421622 pts/21   00:00:00 zsh
 421624 pts/21   00:00:00 ps

And then give ourselves temporary privileges to org.freedesktop.systemd1.manage-units permission.

$ pkcheck --action-id 'org.freedesktop.systemd1.manage-units' --process 421622 -u
$ pkcheck --list-temp
authorization id: tmpauthz10
action:           org.freedesktop.systemd1.manage-units
subject:          unix-process:421622:195039910 (zsh)
obtained:         26 sec ago (Sun Jun 28 10:53:39 2020)
expires:          4 min 33 sec from now (Sun Jun 28 10:58:38 2020)

As you can see, if the auth_admin_keep or auth_self_keep are set, the authorization will be kept for a while and can be listed using pkcheck.

You can try to exec a process as another user, just like sudo:

$ pkexec /usr/bin/thunar

If you want to override the currently running authentication agent, you can test having pkttyagent running in another terminal passing it the -p argument for the process it will listen to.

# terminal 1
$ pkttyagent -p 423619
# terminal 2
$ pkcheck --action-id 'org.xfce.thunar' --process 423619 -u
# will display in terminal 1
==== AUTHENTICATING FOR org.xfce.thunar ====
Authentication is required to run Thunar as root.
Authenticating as: vnm

So this is it for polkit, but what’s the deal with consolekit and systemd logind, and what’s the relation with polkit.

Remember we’ve talked about sessions when discussing the <default> element of polkit policy files, this is where these two come in. Let’s quote again:

  • auth_self - The owner of the current session should authenticate (usually the user that logged in, your user password)
  • allow_active - for client in the currently active session on local consoles

The two programs consolekit and systemd logind have as purpose to be services on dbus that can be interrogated about the status of the current session, its users, its seats, its login. It can also be used to manage the session with methods for shutting down, suspending, restarting, and hibernating the machine.

$ loginctl show-session $XDG_SESSION_ID
Timestamp=Fri 2020-06-05 21:06:43 EEST

# in another terminal we monitor using
$ dbus-monitor --system
# and the output
method call time=1593360621.762509 sender=:1.59516 \
-> destination=org.freedesktop.login1 serial=2 \
path=/org/freedesktop/login1; \
interface=org.freedesktop.login1.Manager; \

method call time=1593360621.763069 sender=:1.59516 \
-> destination=org.freedesktop.login1 serial=3 \
path=/org/freedesktop/login1/session/_32; \
interface=org.freedesktop.DBus.Properties; \

As can be seen, this is done through the org.freedesktop.login1.Manager bus name.

And so, polkit uses data gathered from systemd logind or consolekit to create the 3 domain rules we’ve seen, the allow_any, allow_inactive, and allow_active. This is where these two interact with one another.
The following condition applies for the returned value of systemd logind:

  • allow_any mean any session (even remote sessions)
  • allow_inactive means Remote == false and Active == false
  • allow_active means Remote == false and Active == true

In conclusion, all these technologies, D-Bus, polkit, and systemd logind, are inherently intertwined, and this is as much a positive aspect as it is a fragile point of failure. They each complete one another but if one goes down, there could be issues echoing all across the system.
I hope this post has removed the mystification around them and helped anyone to understand what they stand for: Yet another glue in the desktop environments, similar to this post but solving another problem.


July 04, 2020

Jeff Carpenter (jeffcarp)

用20行Python构建Markov Chain语句生成器 July 04, 2020 06:44 PM

A bot who can write a long letter with ease, cannot write ill. —Jane Austen, Pride and Prejudice 这篇文章将引导您逐步学习如何使用Python从头开始编写马尔可夫链(Markov Chain),以生成好像一个真实的人写的英语的全新句子。 简·奥斯丁的《傲慢与偏见》(Pride and Prejudice by Jane Austen) 是我们用来构建马尔可夫链的文字。 Colab 上有一篇可运行的笔记本版本。 Read the English version of this post here. Setup 首先下载“傲慢与偏见”的全文。 # 下载Pride and Prejudice和并切断头. !curl | tail -n+32 > /content/pride-and-prejudice.txt # 预览文件. !head -n 10 /content/pride-and-prejudice.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 707k 100 707k 0 0 1132k 0 --:--:-- --:--:-- --:--:-- 1130k PRIDE AND PREJUDICE By Jane Austen Chapter 1 It is a truth universally acknowledged, that a single man in possession 添加一些必要的导入。

July 03, 2020

Unrelenting Technology (myfreeweb)

Wow, about a month ago Spot (ex-Spotinst), the service that can... July 03, 2020 12:36 AM

Wow, about a month ago Spot (ex-Spotinst), the service that can auto-restore an EC2 spot instance after it gets killed, fixed their arm64 support! (Used to be that it would always set the AMI’s “architecture” metadata to amd64, haha.)

And of course their support didn’t notify me that it was fixed , the service didn’t auto-notify me that an instance finally was successfully restored after months of trying and failing, AWS didn’t notify either (it probably can but I haven’t set anything up?), so I wasted a few bucks running a spare inaccessible clone server of my website. Oh well, at least now I can use a spot instance again without worrying about manual restore.

UPD: hmm, it still tried i386 on another restore! dang it.

Pete Corey (petecorey)

Recursive Roguelike Tiles July 03, 2020 12:00 AM

The roguelike dungeon generator we hacked together in a previous post sparked something in my imagination. What kind of flora and fauna lived there, and how could we bring them to life?

The first thing that came to mind was grass. We should have some way of algorithmically generating grass throughout the open areas of our dungeon. A quick stab at adding grass could be to randomly colorize floor tiles as we render them:

But this isn’t very aesthetically pleasing. The grass tiles should be smaller than the walkable tiles to give us some visual variety. We could model this by giving every ground tile a set of grass tiles. All of the grass tiles in a given area live entirely within their parent ground tile.

This is better, but we can go further. To spice up our grass, let’s inject some life into it. We’ll model our grass cells as a basic cellular automaton that changes its state over time, looking to its immediate neighbors to decide what changes to make.

Because of how we recursively modeled our tiles, finding all of the neighbors of a single grass tile takes some work:

const getGrass = (x1, y1, x2, y2) => {
    let ground = state[y1 * w + x1];
    return _.get(ground, `grass.${y2 * grassWidth + x2}`{:.language-javascript});

const getGrassNeighbors = (i, x, y) => {
    let ix = i % w;
    let iy = Math.floor(i / w);
    return _.chain([
        [-1, -1],
        [0, -1],
        [1, -1],
        [-1, 0],
        [1, 0],
        [-1, 1],
        [0, 1],
        [1, 1],
        .map(([dx, dy]) => {
            let nx = x + dx;
            let ny = y + dy;
            if (nx >= 0 && nx < grassWidth && ny >= 0 && ny < grassWidth) {
                return getGrass(ix, iy, nx, ny);
            } else if (nx < 0 && ny >= 0 && ny < grassWidth) {
                return getGrass(ix - 1, iy, grassWidth - 1, ny);
            } else if (nx >= grassWidth && ny >= 0 && ny < grassWidth) {
                return getGrass(ix + 1, iy, 0, ny);
            } else if (nx >= 0 && nx < grassWidth && ny < 0) {
                return getGrass(ix, iy - 1, nx, grassWidth - 1);
            } else if (nx >= 0 && nx < grassWidth && ny >= grassWidth) {
                return getGrass(ix, iy + 1, nx, 0);
            } else if (nx < 0 && ny < 0) {
                return getGrass(ix - 1, iy - 1, grassWidth - 1, grassWidth - 1);
            } else if (nx < 0 && ny >= grassWidth) {
                return getGrass(ix - 1, iy + 1, grassWidth - 1, 0);
            } else if (nx >= grassWidth && ny < 0) {
                return getGrass(ix + 1, iy - 1, 0, grassWidth - 1);
            } else if (nx >= grassWidth && ny >= grassWidth) {
                return getGrass(ix + 1, iy + 1, 0, 0);

Once we get get each grass cell’s neighbors (sometimes dipping into a neighboring ground cell’s grass tiles), we can start modeling a basic cellular automaton.

In this example, if a grass tile has more than four neighbors that are “alive”, we set its value to the average of all of its neighbors, smoothing the area out. Otherwise, we square it’s value, effectively darkening the tile:

for (let y = 0; y < grassWidth; y++) {
    for (let x = 0; x < grassWidth; x++) {
        let grass = cell.grass[y * grassWidth + x];
        let neighbors = getGrassNeighbors(i, x, y);
        let alive = _.filter(neighbors, ({ value }) => value > 0.5);
        if (_.size(alive) > 4) {
            cell.grass[y * grassWidth + x].value = _.chain(neighbors)
        } else {
            cell.grass[y * grassWidth + x].value =
                cell.grass[y * grassWidth + x].value *
                cell.grass[y * grassWidth + x].value;

There is no rhyme of reason for choosing these rules, but they produce interesting results:

We can even take this idea of recursive tile further. What if every grass tile had a set of flower tiles? Again, those flower tiles could be driven by cellular automata rules, or simple randomly generated.

Now I’m even more pulled in. What else lives in these caves? How do they change over time? Refresh the page for more dunegons!

canvas { width: 100%; height: 100%; }

July 01, 2020

Jan van den Berg (j11g)

How I read 52 books in a year July 01, 2020 07:30 PM

My book tracking app alerted me that I read 52 books over the last twelve months. So, *franticly crunching numbers* yes, indeed, that averages to one book per week!

This brings the book average to 226 pages per book.

I follow a couple of blogs of people that read way more than I do. Like these guys, respectively read 116, 105, 74 and 58 books in 2019. I don’t know how they managed to do so, but 52 is definitely a personal best for me and this blogpost is about how I did this.

When I say that I have read a book, I mean: I read it cover to cover. No skimming or skipping, or glossing through. That’s not reading. And no audio books. Nothing against that, but my point is to read a book as the author intended it (of course, this is different when you study a subject and need to pick and choose parts).
Full disclosure, I am currently experimenting reading Moby Dick with the book in hand and the audio book playing along. It’s fun, and a good way to get your teeth into such a classic. But I still need my eyes to follow the words and I don’t think listening to an audiobook while doing other things is the same experience. A book is not a podcast.

Getting serious

I’ve always liked reading but if I had to state a regret it would still be that I wish I had read more. There is always a certain anxiety when I enter a library or bookstore. The average human, or even a frantic reader, will never read more than a few thousand books in their lifetime. So I can never read just what my local library has in stock: even if it takes a lifetime. There are just too.many.books. With this in mind, a minute watching TV is a minute wasted reading.

I realised I find few activities more rewarding than reading. With this realisation in mind I consciously decided that I would take reading more seriously. And of course I still watch a little bit of TV and movies, but just a bit more consciously.

Here are some principles I developed around reading to keep me on track.

Principle 1: Track broadly

For me, this is key. So much so, that last year I wrote my own book tracking app, to exactly fit my needs. In my app I cannot only track what I have read, or am currently reading, but also what I want to read.

I used to use a spreadsheet, whatever works for you, but I was often getting lost in what I was reading (see Principle 2). So having this app definitely helps.

Principle 2: Read widely

This may be the most important principle on multiple levels. It not only means that I want to read many different books or genres but also that I like to read them simultaneously.

Of course I have favorite genres or subjects, but I try to be open-minded about every book (I wouldn’t snuff Danielle Steele). You never know what you might learn about yourself.


And before I meticulously kept track, this is usually where I got lost. Not every book demands the same energy or attention level and you should be able to switch it up without regret.

Which I do. So at a certain point last year I was reading 11 different books at once: diaries, biographies, novels, management books, historical books. You name it. Because my app allows me to directly see what I started it’s easy to keep track of this and — most importantly — switch it up when I am not feeling a certain book. Instead of dreading picking up a certain book for months or a half read book getting lost on my bookshelf I just move on to a different book, and know I will eventually get to that book. My app tracks it. And I always do! Some books I haven’t touched in months but I pick em up again after some time when I feel like it, and more often than not it’s usually a better experience. I have now had this experience more than once. And it was quite the revelation. The lesson is: different moods ask for different books.

So far I only actively stopped reading two books, with no intention of reading any further ever (this is fine!). So this is rare. Most books I start, I have already done a little bit of research, to know enough that I want to read them.

Another benefit when you switch a lot between books is that I noticed it helps to retain what the books are about. It’s a different experience when you read a book over two months as opposed to two days. Because you have to actively remind yourself of what the book was about again.

Principle 3: Buy loosely

The app allows me to add books to my wish list, and as you can see in the screenshot I bought 90 books last year. Mostly from thrift stores, they are absolute goldmines. And yes, I don’t read e-books. I need to feel paper.

The ‘Books I want‘ list from my app is a guideline for thrift store visits, but mostly I just look all over the place. And I used to be a bit hesitant to buy a book, as it would indicate a future commitment to myself to read it. But since reading Nassim Nicholas Taleb’s Black Swan and his thoughts on famous writer Umberto Eco’s personal library (here and here), I have been able to shake this habit a bit. So if a book looks interesting: buy it!

Bookmark stickies.


So those are the three main principles. Here are some other tips that help to keep your reading on track.

  • I dislike using a highlighter. It ruins books. Even if it’s just paper that got for 50 cents a thrift store.
  • I have used the classic highlighters and last year I moved to a pencil highlighter, a little bit less permanent but still not great. So since a couple of months I use bookmark stickies. They are great! It doesn’t matter what type of book it is, I read every book with a stack of sticky bookmarks and annotate what I like or want to remember. (This would definitely be my number one reason to move to ebooks at some point..).
  • To retain things, I usually read the sticky parts again after finishing or when picking up a book if it has been a while.
  • Read everyday. Even it’s just a couple of minutes. Don’t break the chain. Create a habit.
  • Put your phone on mute. I do most of my reading between 8 and 10 pm. If you text or call me between those hours, I probably won’t see or hear it.
  • Write! After all, what good is reading if you don’t write? I tend to blog about every book I read (few exceptions: i.e. when it’s a really small book). This helps with retention and thinking about what you liked or want to remember. And also you create your own little archive. I often look up my own posts, to see what I was thinking.

So there you have it! Now, let’s see what’s on TV.

The post How I read 52 books in a year appeared first on Jan van den Berg.

Pete Corey (petecorey)

Hello Roguelike July 01, 2020 12:00 AM

Like a lot of folks, the desire to create my own video games is what originally got me into computer programming. While I don’t play many video games these days, video game development still holds a special place in my heart. I’m especially fascinated by procedural generation techniques used in video games and many forms of computer art, so you’ll often find me creeping around the /r/roguelikedev and /r/generative subreddits.

Inspired by a recent post on /r/roguelikedev, I decided to try my hand at implementing a very basic dungeon generator using a random walk algorithm.

After getting our canvas set up, we can implement our basic random walk algorithm by starting in the center of our grid and moving in random directions, filling in each square we encounter as we come to it:

let pixelSize = 32;
let w = Math.floor(width / pixelSize);
let h = Math.floor(height / pixelSize);

let state = [];
let x = Math.floor(w / 2);
let y = Math.floor(h / 2);
let filled = 0;
let path = [];
let maxFilled = 500;

while (filled < maxFilled) {
    path.push(y * w + x);
    if (!state[y * w + x]) {
        state[y * w + x] = true;
    let [nx, ny] = getNextDirection(x, y);
    x += nx;
    y += ny;

Notice that we’re also keeping track of the sequence of steps, or the path we took to as we moved through our grid. Also notice that this isn’t particularly “good” code. That doesn’t matter as long as we’re having fun.

The getNextDirection function just returns a random direction, with a little added fanciness to keep our path from falling off our grid:

let getNextDirection = (cx, cy) => {
    let [x, y] = _.sample([[0, 1], [0, -1], [1, 0], [-1, 0]]);
    if (cx + x < 1 || cy + y < 1 || cx + x >= w - 1 || cy + y >= h - 1) {
        return getNextDirection(cx, cy);
    } else {
        return [x, y];

Animating this algorithm is its own microcosm of interesting divergences…

Once we have our fully filled out grid, we can flip our perspective and render the walls around the steps we took through the grid, rather than rendering the steps themselves:

We can add a path through our dungeon by removing the cycles from our path and tracing its newly simplified form through our grid. We could even hint at up and down stairways with orange dots at the beginning and end of our path.

This is a ridiculously simple algorithm, but what comes out of it absolutely pulls me in. What else lives in these dungeons? What kinds of things could we expect to find, and how would we bring those things to life?

Refresh the page to get more dunegons!

canvas { width: 100%; height: 100%; }

June 30, 2020

Derek Jones (derek-jones)

beta: Evidence-based Software Engineering – book June 30, 2020 10:12 PM

My book, Evidence-based software engineering: based on the publicly available data is now out on beta release (pdf, and code+data). The plan is for a three-month review, with the final version available in the shops in time for Christmas (I plan to get a few hundred printed, and made available on Amazon).

The next few months will be spent responding to reader comments, and adding material from the remaining 20 odd datasets I have waiting to be analysed.

You can either email me with any comments, or add an issue to the book’s Github page.

While the content is very different from my original thoughts, 10-years ago, the original aim of discussing all the publicly available software engineering data has been carried through (in some cases more detailed data, in greater quantity, has supplanted earlier less detailed/smaller datasets).

The aim of only discussing a topic if public data is available, has been slightly bent in places (because I thought data would turn up, and it didn’t, or I wanted to connect two datasets, or I have not yet deleted what has been written).

The outcome of these two aims is that the flow of discussion is very disjoint, even disconnected. Another reason might be that I have not yet figured out how to connect the material in a sensible way. I’m the first person to go through this exercise, so I have no idea where it’s going.

The roughly 620+ datasets is three to four times larger than I thought was publicly available. More data is good news, but required more time to analyse and discuss.

Depending on the quantity of issues raised, updates of the beta release will happen.

As always, if you know of any interesting software engineering data, please tell me.

Jan van den Berg (j11g)

Bono on Bono – Michka Assayas June 30, 2020 08:19 PM

I have a soft spot for Bono. The megalomaniac lead singer of probably the world’s most commercial band (“the only band with their own iPod”). The Irish humanitarian multi-millionaire. Yes, I get all the criticism. Still, few singers can belt it out like Bono can. And I will forever stand by that.

On May 10th this year, Bono turned 60. So I thought it would be a good time to (re)read his 2005 biography.

I got this book, with a bunch of others, in 2006 at an HMV in Manchester. Good times.

Bono on Bono – Michka Assayas (2005) – 368 pages

It sort of took me back a bit when I realised that most of this book was written in 2003 and 2004, when Bono was only a couple years older than I am now😲. By then he was of course already a very well established and very famous person. The book is written somewhere between two U2 albums: All That You Can’t Leave Behind and How To Dismantle An Atomic Bomb. So it finds Bono in a sort of musical lull, but with VERY high energy on issues like humanitarian aid en debt-relief causes.


The book is written as a dialogue, which is a very interesting concept! But I don’t think the chemistry between this Irishman and Frenchman works all the time. Or, I just don’t get their banter, because it’s cringy at times and the questions often go in directions I don’t want them to go (I would have asked different things!). It is also strange that there seems to be an effort to put everything down verbatim (with inserts like “Bono laughs” or “pauses reflectively”) while clearly this book and the interviews have been edited. Which is fine! But why the emphasis on this fake realness?

I am also not sure of the reason for this biography, other than to emphasize Bono’s humanitarian efforts. This biography therefore also suffers what so many biographies suffer from: high on current events, low on what actually made the subject into the person they are now (Neil Young’s biography is the worst example of this).

Granted, Bono is very vulnerable in discussing his youth and parents. This was probably the most revealing and most interesting part. Also because these are few of the actual biographical parts of this biography. I also enjoyed how Bono talked about his religious beliefs. You don’t always get this from the music. But the tête-à-têtes Bono had with Bush and Clinton were probably very on topic in 2005, but they seem like something from another lifetime in 2020 and less relevant.

So I get this is not a book about U2 but about Bono, but I would have expected a little bit more stories about music. And this is not like Keith Richards or Bruce Springsteen‘s tremendous biographies, which were written when they were much older and are much more about the music.

So now that I am done complaining, I could just say that this book is less of a book than more of a collection of what could be magazine interviews, but HOWEVER: I still liked it!

I mean, it’s about Bono. And he definitively is one of a kind. How could you not like it!

The post Bono on Bono – Michka Assayas appeared first on Jan van den Berg.

Who moved my cheese? – Spencer Johnson June 30, 2020 08:03 PM

People like stories, people remember stories. So, tell stories! This is what I learned from Seth Godin. But Spencer Johnson clearly understands this concept too.

Who moved my cheese? – Spencer Johnson (1998) – 95 pages

This little book embodies the concepts of how to deal with change in one memorable parable.

Johnson probably wasn’t the first to do so, but this concept — packing management theories as stories — is everywhere now. And this little book, probably has a lot to do with this trend. It was after all the bestselling book EVER at’s tenth anniversary. Go figure.

The post Who moved my cheese? – Spencer Johnson appeared first on Jan van den Berg.

Marx – Peter Singer June 30, 2020 08:01 PM

This was the third book in a twelve part series of introductions to famous thinkers/philosophers (previously I read Plato and Kierkegaard). You might expect these books to be small (check) and comprehensible (not so much). So like the other two books, this book suffers from the same problems.

Marx – Peter Singer (1999) – 111 pages

Sure, you’ll get an introduction on Marx, and you get a better understanding of what influenced his thinking and what his special relation to Hegel was. Interesting, enlightening, great!

However for an introduction I find the language, specifically in the critical parts, way too scientific. So I am always struggling with the question for who are these books written? Clearly an experienced philosopher would not pick up an introduction like this? And for someone just dipping their toes — it is after all an introduction — I think the language can be a bit overwhelming. So writer, who are you trying to impress? The material is there, but it could do with a bit of editing.

The post Marx – Peter Singer appeared first on Jan van den Berg.

Dan Luu (dl)

How do cars fare in crash tests they're not specifically optimized for? June 30, 2020 07:06 AM

Any time you have a benchmark that gets taken seriously, some people will start gaming the benchmark. Some famous examples in computing are the CPU benchmark specfp and video game benchmarks. With specfp, Sun managed to increase its score on (a sub-benchmark of specfp) by 12x with a compiler tweak that essentially re-wrote the benchmark kernel, which increased the Sun UltraSPARC’s overall specfp score by 20%. At times, GPU vendors have added specialized benchmark-detecting code to their drivers that lowers image quality during benchmarking to produce higher benchmark scores. Of course, gaming the benchmark isn't unique to computing and we see people do this in other fields. It’s not surprising that we see this kind of behavior since improving benchmark scores by cheating on benchmarks is much cheaper (and therefore higher ROI) than improving benchmark scores by actually improving the product.

As a result, I'm generally suspicious when people take highly specific and well-known benchmarks too seriously. Without other data, you don't know what happens when conditions aren't identical to the conditions in the benchmark. With GPU and CPU benchmarks, it’s possible for most people to run the standard benchmarks with slightly tweaked conditions. If the results change dramatically for small changes to the conditions, that’s evidence that the vendor is, if not cheating, at least shading the truth.

Benchmarks of physical devices can be more difficult to reproduce. Vehicle crash tests are a prime example of this -- they're highly specific and well-known benchmarks that use up a car for some test runs.

While there are multiple organizations that do crash tests, they each have particular protocols that they follow. Car manufacturers, if so inclined, could optimize their cars for crash test scores instead of actual safety. Checking to see if crash tests are being gamed with hyper-specific optimizations isn't really feasible for someone who isn't a billionaire. The easiest way we can check is by looking at what happens when new tests are added since that lets us see a crash test result that manufacturers weren't optimizing for just to get a good score.

While having car crash test results is obviously better than not having them, the results themselves don't tell us what happens when we get into an accident that doesn't exactly match a benchmark. Unfortunately, if we get into a car accident, we don't get to ask the driver of the vehicle we're colliding with to change their location, angle of impact, and speed, in order for the collision to comply with an IIHS, NHTSA, or *NCAP, test protocol.

For this post, we're going to look at IIHS test scores when they added the (driver side) small overlap and passenger side small overlap tests, which were added in 2012, and 2018, respectively. We'll start with a summary of the results and then discuss what those results mean and other factors to consider when evaluating car safety, followed by details of the methodology.


The ranking below is mainly based on how well vehicles scored when the driver-side small overlap test was added in 2012 and how well models scored when they were modified to improve test results.

  • Tier 1: good without modifications
    • Volvo
  • Tier 2: mediocre without modifications; good with modifications
    • None
  • Tier 3: poor without modifications; good with modifications
    • Mercedes
    • BMW
  • Tier 4: poor without modifications; mediocre with modifications
    • Honda
    • Toyota
    • Subaru
    • Chevrolet
    • Tesla
    • Ford
  • Tier 5: poor with modifications or modifications not made
    • Hyundai
    • Dodge
    • Nissan
    • Jeep
    • Volkswagen

These descriptions are approximations. Honda, Ford, and Tesla are the poorest fits for these descriptions, with Ford arguably being halfway in between Tier 4 and Tier 5 but also arguably being better than Tier 4 and not fitting into the classification and Honda and Tesla not really properly fitting into any category (with their category being the closest fit), but some others are also imperfect. Details below.

General commentary

If we look at overall mortality in the U.S., there's a pretty large age range for which car accidents are the leading cause of death. Although the numbers will vary depending on what data set we look at, when the driver-side small overlap test was added, the IIHS estimated that 25% of vehicle fatalities came from small overlap crashes. It's also worth noting that small overlap crashes were thought to be implicated in a significant fraction of vehicle fatalities at least since the 90s; this was not a novel concept in 2012.

Despite the importance of small overlap crashes, from looking at the results when the IIHS added the driver-side and passenger-side small overlap tests in 2012 and 2018, it looks like almost all car manufacturers were optimizing for benchmark and not overall safety. Except for Volvo, all carmakers examined produced cars that fared poorly on driver-side small overlap crashes until the driver-side small overlap test was added.

When the driver-side small overlap test was added in 2012, most manufacturers modified their vehicles to improve driver-side small overlap test scores. However, until the IIHS added a passenger-side small overlap test in 2018, most manufacturers skimped on the passenger side. When the new test was added, they beefed up passenger safety as well. To be fair to car manufacturers, some of them got the hint about small overlap crashes when the driver-side test was added in 2012 and did not need to make further modifications to score well on the passenger-side test, including Mercedes, BMW, and Tesla (and arguably a couple of others, but the data is thinner in the other cases; Volvo didn't need a hint).

Other benchmark limitations

There are a number of other areas where we can observe that most car makers are optimizing for benchmarks at the expensive of safety.

Gender, weight, and height

Another issue is crash test dummy overfitting. For a long time, adult NHSTA and IIHS tests used a 1970s 50%-ile male dummy, which is 5'9" and 171lbs. Regulators called for a female dummy in 1980 but due to budget cutbacks during the Reagan era, initial plans were shelved and the NHSTA didn't put one in a car until 2003. The female dummy is a scaled down version of the male dummy, scaled down to 5%-ile 1970s height and weight (4'11", 108lbs; another model is 4'11", 97lbs). In frontal crash tests, when a female dummy is used, it's always a passenger (a 5%-ile woman is in the driver's seat in one NHSTA side crash test and the IIHS side crash test). For reference, in 2019, the average weight of a U.S. adult male was 198 lbs and the average weight of a U.S. adult female was 171 lbs.

Using a 1970s U.S. adult male crash test dummy causes a degree of overfitting for 1970s 50%-ile men. For example, starting in the 90s, manufacturers started adding systems to protect against whiplash. Volvo and Toyota use a kind of system that reduces whiplash in men and women and appears to have slightly more benefit for women. Most car makers use a kind of system that reduces whiplash in men but, on average, has little impact on whiplash injuries in women.

It appears that we also see a similar kind of optimization for crashes in general and not just whiplash. We don't have crash test data on this, and looking at real-world safety data is beyond the scope of this post, but I'll note that, until around the time the NHSTA put the 5%-ile female dummy into some crash tests, most car manufacturers not named Volvo had a significant fatality rate differential in side crashes based on gender (with men dying at a lower rate and women dying at a higher rate).

Volvo claims to have been using computer models to simulate what would happen if women (including pregnant women) are involved in a car accident for decades.

Other crashes

Volvo is said to have a crash test facility where they do a number of other crash tests that aren't done by testing agencies. A reason that they scored well on the small overlap tests when they were added is that they were already doing small overlap crash tests before the IIHS started doing small overlap crash tests.

Volvo also says that they test rollovers (the IIHS tests roof strength and the NHSTA computes how difficult a car is to roll based on properties of the car, but neither tests what happens in a real rollover accident), rear collisions (Volvo claims these are especially important to test if there are children in the 3rd row of a 3-row SUV), and driving off the road (Volvo has a "standard" ditch they use; they claim this test is important because running off the road is implicated in a large fraction of vehicle fatalities).

If other car makers do similar tests, I couldn't find much out about the details. Based on crash test scores, it seems like they weren't doing or even considering small overlap crash tests before 2012. Based on how many car makers had poor scores when the passenger side small overlap test was added in 2018, I think it would be surprising if other car makers had a large suite of crash tests they ran that aren't being run by testing agencies, but it's theoretically possible that they do and just didn't include a passenger side small overlap test.


We shouldn't overgeneralize from these test results. As we noted above, crash test results test very specific conditions. As a result, what we can conclude when a couple new crash tests are added is also very specific. Additionally, there are a number of other things we should keep in mind when interpreting these results.

Limited sample size

One limitation of this data is that we don't have results for a large number of copies of the same model, so we're unable to observe intra-model variation, which could occur due to minor, effectively random, differences in test conditions as well as manufacturing variations between different copies of same model. We can observe that these do matter since some cars will see different results when two copies of the same model are tested. For example, here's a quote from the IIHS report on the Dodge Dart:

The Dodge Dart was introduced in the 2013 model year. Two tests of the Dart were conducted because electrical power to the onboard (car interior) cameras was interrupted during the first test. In the second Dart test, the driver door opened when the hinges tore away from the door frame. In the first test, the hinges were severely damaged and the lower one tore away, but the door stayed shut. In each test, the Dart’s safety belt and front and side curtain airbags appeared to adequately protect the dummy’s head and upper body, and measures from the dummy showed little risk of head and chest injuries.

It looks like, had electrical power to the interior car cameras not been disconnected, there would have been only one test and it wouldn't have become known that there's a risk of the door coming off due to the hinges tearing away. In general, we have no direct information on what would happen if another copy of the same model were tested.

Using IIHS data alone, one thing we might do here is to also consider results from different models made by the same manufacturer (or built on the same platform). Although this isn't as good as having multiple tests for the same model, test results between different models from the same manufacturer are correlated and knowing that, for example, a 2nd test of a model that happened by chance showed significantly worse results should probably reduce our confidence in other test scores from the same manufacturer. There are some things that complicate this, e.g., if looking at Toyota, the Yaris is actually a re-branded Mazda2, so perhaps that shouldn't be considered as part of a pooled test result, and doing this kind of statistical analysis is beyond the scope of this post.

Actual vehicle tested may be different

Although I don't think this should impact the results in this post, another issue to consider when looking at crash test results is how results are shared between models. As we just saw, different copies of the same model can have different results. Vehicles that are somewhat similar are often considered the same for crash test purposes and will share the same score (only one of the models will be tested).

For example, this is true of the Kia Stinger and the Genesis G70. The Kia Stinger is 6" longer than the G70 and a fully loaded AWD Stinger is about 500 lbs heavier than a base-model G70. The G70 is the model that IIHS tested -- if you look up a Kia Stinger, you'll get scores for a Stinger with a note that a base model G70 was tested. That's a pretty big difference considering that cars that are nominally identical (such as the Dodge Darts mentioned above) can get different scores.

Quality may change over time

We should also be careful not to overgeneralize temporally. If we look at crash test scores of recent Volvos (vehicles on the Volvo P3 and Volvo SPA platforms), crash test scores are outstanding. However, if we look at Volvo models based on the older Ford C1 platform1, crash test scores for some of these aren't as good (in particular, while the S40 doesn't score poorly, it scores Acceptable in some categories instead of Good across the board). Although Volvo has had stellar crash test scores recently, this doesn't mean that they have always had or will always have stellar crash test scores.

Models may vary across markets

We also can't generalize across cars sold in different markets, even for vehicles that sound like they might be identical. For example, see this crash test of a Nissan NP300 manufactured for sale in Europe vs. a Nissan NP300 manufactured for sale in Africa. Since European cars undergo EuroNCAP testing (similar to how U.S. cars undergo NHSTA and IIHS testing), vehicles sold in Europe are optimized to score well on EuroNCAP tests. Crash testing cars sold in Africa has only been done relatively recently, so car manufacturers haven't had PR pressure to optimize their cars for benchmarks and they'll produce cheaper models or cheaper variants of what superficially appear to be the same model. This appears to be no different from what most car manufacturers do in the U.S. or Europe -- they're optimizing for cost as long as they can do that without scoring poorly on benchmarks. It's just that, since there wasn't an African crash test benchmark, that meant they could go all-in on the cost side of the cost-safety tradeoff2.

This report compared U.S. and European car models and found differences in safety due to differences in regulations. They found that European models had lower injury risk in frontal/side crashes and that driver-side mirrors were designed in a way that reduced the risk of lane-change crashes relative to U.S. designs and that U.S. vehicles were safer in rollovers and had headlamps that made pedestrians more visible.

Non-crash tests

Over time, more and more of the "low hanging fruit" from crash safety has been picked, making crash avoidance relatively more important. Tests of crash mitigation are relatively primitive compared to crash tests and we've seen that crash tests had and have major holes. One might expect, based on what we've seen with crash tests, that Volvo has a particularly good set of tests they use for their crash avoidance technology (traction control, stability control, automatic braking, etc.), but I don't know of any direct evidence for that.

Crash avoidance becoming more important might also favor Tesla, since they seem more aggressive about pushing software updates (so people wouldn't have to buy a newer model to get improved crash avoidance) and it's plausible that they use real-world data from their systems to inform crash avoidance in a way that most car companies don't, but I also don't know of any direct evidence of this.

Scores of vehicles of different weights aren't comparable

A 2700lb subcompact vehicle that scores Good may fare worse than a 5000lb SUV that scores Acceptable. This is because the small overlap tests involve driving the vehicle into a fixed obstacle, as opposed to a reference vehicle or vehicle-like obstacle of a specific weight. This is, in some sense, equivalent to crashing the vehicle into a vehicle of the same weight, so it's as if the 2700lb subcompact was tested by running it into a 2700lb subcompact and the 5000lb SUV was tested by running it into another 5000 lb SUV.

How to increase confidence

We've discussed some reasons we should reduce our confidence in crash test scores. If we wanted to increase our confidence in results, we could look at test results from other test agencies and aggregate them and also look at public crash fatality data (more on this later). I haven't looked at the terms and conditions of scores from other agencies, but one complication is that the IIHS does not allow you to display the result of any kind of aggregation if you use their API or data dumps (I, time consumingly, did not use their API for this post because of that).

Using real life crash data

Public crash fatality data is complex and deserves its own post. In this post, I'll note that, if you look at the easiest relevant data for people in the U.S., this data does not show that Volvos are particularly safe (or unsafe). For example, if we look at this report from 2017, which covers models from 2014, two Volvo models made it into the report and both score roughly middle of the pack for their class. In the previous report, one Volvo model is included and it's among the best in its class, in the next, one Volvo model is included and it's among the worst in its class. We can observe this kind of variance for other models, as well. For example, among 2014 models, the Volkswagen Golf had one of the highest fatality rates for all vehicles (not just in its class). But among 2017 vehicles, it had among the lowest fatality rates for all vehicles. It's unclear how much of that change is from random variation and how much is because of differences between a 2014 and 2017 Volkswagen Golf.

Overall, it seems like noise is a pretty important factor in results. And if we look at the information that's provided, we can see a few things that are odd. First, there are a number of vehicles where the 95% confidence interval for the fatality rate runs from 0 to N. We should have pretty strong priors that there was no 2014 model vehicle that was so safe that the probability of being killed in a car accident was zero. If we were taking a Bayesian approach (though I believe the authors of the report are not), and someone told us that the uncertainty interval for the true fatality rate of a vehicle had a >= 5% of including zero, we would say that either we should use a more informative prior or we should use a model that can incorporate more data (in this case, perhaps we could try to understand the variance between fatality rates of different models in the same class and then use the base rate of fatalities for the class as a prior, or we could incorporate information from other models under the same make if those are believed to be correlated).

Some people object to using informative priors as a form of bias laundering, but we should note that the prior that's used for the IIHS analysis is not completely uninformative. All of the intervals reported stop at zero because they're using the fact that a vehicle cannot create life to bound the interval at zero. But we have information that's nearly as strong that no 2014 vehicle is so safe that the expected fatality rate is zero, using that information is not fundamentally different from capping the interval at zero and not reporting negative numbers for the uncertainty interval of the fatality rate.

Also, the IIHS data only includes driver fatalities. This is understandable since that's the easiest way to normalize for the number of passengers in the car, but it means that we can't possibly see the impact of car makers not improving passenger small-overlap safety until the passenger-side small overlap test was added in 2018, the result of lack of rear crash testing for the case Volvo considers important (kids in the back row of a 3rd row SUV), etc.

We can also observe that, in the IIHS analysis, many factors that one might want to control for aren't (e.g., miles driven isn't controlled for, which will make trucks look relatively worse and luxury vehicles look relatively better, rural vs. urban miesl driven also isn't controlled for, which will also have the same directional impact). One way to see that the numbers are heavily influenced by confounding factors is by looking at AWD or 4WD vs. 2WD versions of cars. They often have wildly different fatalty rates even though the safety differences are not very large (and the difference is often in favor of the 2WD vehicle). Some plausible causes of that are random noise, differences in who buys different versions of the same vehicle, and differences in how the vehicle are used.

If we'd like to answer the question "which car makes or models are more or less safe", I don't find any of the aggregations that are publicly available to be satisfying and I think we need to look at the source data and do our own analysis to see if the data are consistent with what we see in crash test results.


We looked at 12 different car makes and how they fared when the IIHS added small overlap tests. We saw that only Volvo was taking this kind of accident seriously before companies were publicly shamed for having poor small overlap safety by the IIHS even though small overlap crashes were known to be a significant source of fatalities at least since the 90s.

Although I don't have the budget to do other tests, such as a rear crash test in a fully occupied vehicle, it appears plausible and perhaps even likely that most car makers that aren't Volvo would have mediocre or poor test scores if a testing agency decided to add another kind of crash test.

Bonus: "real engineering" vs. programming

As Hillel Wayne has noted, although programmers often have an idealized view of what "real engineers" do, when you compare what "real engineers" do with what programmers do, it's frequently not all that different. In particular, a common lament of programmers is that we're not held liable for our mistakes or poor designs, even in cases where that costs lives.

Although automotive companies can, in some cases, be held liable for unsafe designs, just optimizing for a small set of benchmarks, which must've resulted in extra deaths over optimizing for safety instead of benchmark scores, isn't something that engineers or corporations were, in general, held liable for.

Bonus: reputation

If I look at what people in my extended social circles think about vehicle safety, Tesla has the best reputation by far. If you look at broad-based consumer polls, that's a different story, and Volvo usually wins there, with other manufacturers fighting for a distant second.

I find the Tesla thing interesting since their responses are basically the opposite of what you'd expect from a company that was serious about safety. When serious problems have occurred (with respect to safety or otherwise), they often have a very quick response that's basically "everything is fine". I would expect an organization that's serious about safety or improvement to respond with "we're investigating", followed by a detailed postmortem explaining what went wrong, but that doesn't appear to be Tesla's style.

For example, on the driver-side small overlap test, Tesla had one model with a relevant score and it scored Acceptable (below Good, but above Poor and Marginal) even after modifications were made to improve the score. Tesla disputed the results, saying they make "the safest cars in history" and implying that IIHS should be ignored in favor of NHSTA test scores:

While IIHS and dozens of other private industry groups around the world have methods and motivations that suit their own subjective purposes, the most objective and accurate independent testing of vehicle safety is currently done by the U.S. Government which found Model S and Model X to be the two cars with the lowest probability of injury of any cars that it has ever tested, making them the safest cars in history.

As we've seen, Tesla isn't unusual for optimizing for a specific set of crash tests and achieving a mediocre score when an unexpected type of crash occurs, but their response is unusual. However, it makes sense from a cynical PR perspective. As we've seen over the past few years, loudly proclaiming something, regardless of whether or not it's true, even when there's incontrovertible evidence that it's untrue, seems to not only work, that kind of bombastic rhetoric appears to attract superfans who will aggressively defend the brand. If you watch car reviewers on youtube, they'll sometimes mention that they get hate mail for reviewing Teslas just like they review any other car and that they don't see anything like it for any other make.

Apple also used this playbook to good effect in the 90s and early '00s, when they were rapidly falling behind in performance and responded not by improving performance, but by running a series of ad campaigns saying that had the best performance in the world and that they were shipping "supercomputers" on the desktop.

Another reputational quirk is that I know a decent number of people who believe that the safest cars they can buy are "American Cars from the 60's and 70's that aren't made of plastic". We don't have directly relevant small overlap crash test scores for old cars, but the test data we do have on old cars indicates that they fare extremely poorly in overall safety compared to modern cars. For a visually dramatic example, see this crash test of a 1959 Chevrolet Bel Air vs. a 2009 Chevrolet Malibu.

Appendix: methodology summary

The top-line results section uses scores for the small overlap test both because it's the one where I think it's the most difficult to justify skimping on safety as measured by the test and it's also been around for long enough that we can see the impact of modifications to existing models and changes to subsequent models, which isn't true of the passenger side small overlap test (where many models are still untested).

For the passenger side small overlap test, someone might argue that the driver side is more important because you virtually always have a driver in a car accident and may or may not have a front passenger. Also, for small overlap collisions (which simulates a head-to-head collision where the vehicles only overlap by 25%), driver's side collisions are more likely than passenger side collisions.

Except to check Volvo's scores, I didn't look at roof crash test scores (which were added in 2009). I'm not going to describe the roof test in detail, but for the roof test, someone might argue that the roof test score should be used in conjunction with scoring the car for rollover probability since the roof test just tests roof strength, which is only relevant when a car has rolled over. I think, given what the data show, this objection doesn't hold in many cases (the vehicles with the worst roof test scores are often vehicles that have relatively high rollover rates), but it does in some cases, which would complicate the analysis.

In most cases, we only get one reported test result for a model. However, there can be multiple versions of a model -- including before and after making safety changes intended to improve the test score. If changes were made to the model to improve safety, the test score is usually from after the changes were made and we usually don't get to see the score from before the model was changed. However, there are many exceptions to this, which are noted in the detailed results section.

For this post, scores only count if the model was introduced before or near when the new test was introduced, since models introduced later could have design changes that optimize for the test.

Appendix: detailed results

On each test, IIHS gives an overall rating (from worst to best) of Poor, Marginal, Acceptable, or Good. The tests have sub-scores, but we're not going to use those for this analysis. In each sub-section, we'll look at how many models got each score when the small overlap tests were added.


All Volvo models examined scored Good (the highest possible score) on the new tests when they were added (roof, driver-side small overlap, and passenger-side small overlap). One model, the 2008-2017 XC60, had a change made to trigger its side curtain airbag during a small overlap collision in 2013. Other models were tested without modifications.


Of three pre-existing models with test results for driver-side small overlap, one scored Marginal without modifications and two scored Good after structural modifications. The model where we only have unmodified test scores (Mercedes C-Class) was fully re-designed after 2014, shortly after the driver-side small overlap test was introduced.

As mentioned above, we often only get to see public results for models without modifications to improve results xor with modifications to improve results, so, for the models that scored Good, we don't actually know how they would've scored if you bought a vehicle before Mercedes updated the design, but the Marginal score from the one unmodified model we have is a negative signal.

Also, when the passenger side small overlap test was added, the Mercedes vehicles also generally scored Good. This is, indicating that Mercedes didn't only increase protection on the driver's side in order to improve test scores.


Of the two models where we have relevant test scores, both scored Marginal before modifications. In one of the cases, there's also a score after structural changes were made in the 2017 model (recall that the driver-side small overlap test was introduced in 2012) and the model scored Good afterwards. The other model was fully-redesigned after 2016.

For the five models where we have relevant passenger-side small overlap scores, all scored Good, indicating that the changes made to improve driver-side small overlap test scores weren't only made on the driver's side.


Of the five Honda models where we have relevant driver-side small overlap test scores, two scored Good, one scored Marginal, and two scored Poor. The model that scored Marginal had structural changes plus a seatbelt change in 2015 that changed its score to Good, other models weren't updated or don't have updated IIHS scores.

Of the six Honda models where we have passenger driver-side small overlap test scores, two scored Good without modifications, two scored Acceptable without modifications, and one scored Good with modifications to the bumper.

All of those models scored Good on the driver side small overlap test, indicating that when Honda increased the safety on the driver's side to score Good on the driver's side test, they didn't apply the same changes to the passenger side.


Of the six Toyota models where we have relevant driver-side small overlap test scores for unmodified models, one score Acceptable, four scored Marginal, and one scored Poor.

The model that scored Acceptable had structural changes made to improve its score to Good, but on the driver's side only. The model was later tested in the passenger-side small overlap test and scored Acceptable. Of the four models that scored Marginal, one had structural modifications made in 2017 that improved its score to Good and another had airbag and seatbelt changes that improved its score to to Acceptable. The vehicle that scored Poor had structural changes made that improved its score to acceptable in 2014, followed by later changes that improved its score to Good.

There are four additional models where we only have scores from after modifications were made. Of those, one scored Good, one score Acceptable, one scored Marginal, and one scored Poor.

In general, changes appear to have been made to the driver's side only and, on introduction of the passenger side small overlap test, vehicles had passenger side small overlap scores that were the same as the driver's side score before modifications.


Of the two models with relevant driver-side small overlap test scores for unmodified models, one scored Marginal and one scored Poor. Both of those models were produced into 2019 and neither has an updated test result. Of the three models where we have relevant results for modified vehicles, two scored Acceptable and one score Marginal. Also, one model was released the year the small overlap test was introduced and one the year after; both of those scored Acceptable. It's unclear if those should be considered modified or not since the design may have had last-minute changes before release.

We only have three relevant passenger-side small overlap tests. One is Good (for a model released in 2015) and the other two are Poor; these are the two models mentioned above as having scored Marginal and Poor, respectively, on the driver-side small overlap test. It appears that the models continued to be produced into 2019 without safety changes. Both of these unmodified models were trucks and this isn't very unusual for a truck and is one of a number of reasons that fatality rates are generally higher in trucks -- until recently, many of them are based on old platforms that hadn't been updated for a long time.


Of the three Chevrolet models where we have relevant driver-side small overlap test scores before modifications, one scored Acceptable and two scored Marginal. One of the Marginal models had structural changes plus a change that caused side curtain airbags to deploy sooner in 2015, which improved its score to Good.

Of the four Chevrolet models where we only have relevant driver-side small overlap test scores after the model was modified (all had structural modifications), two scored Good and two scored Acceptable.

We only have one relevant score for the passenger-side small overlap test, that score is Marginal. That's on the model that was modified to improve its driver-side small overlap test score from Marginal to Good, indicating that the changes were made to improve the driver-side test score and not to improve passenger safety.


We don't have any models where we have relevant passenger-side small overlap test scores for models before they were modified.

One model had a change to cause its airbag to deploy during small overlap tests; it scored Acceptable. Two models had some kind of structural changes, one of which scored Good and one of which score Acceptable.

The model that had airbag changes had structural changes made in 2015 that improved its score from Acceptable to Good.

For the one model where we have relevant passenger-side small overlap test scores, the score was Marginal. Also, for one of the models with structural changes, it was indicated that, among the changes, were changes to the left part of the firewall, indicating that changes were made to improve the driver's side test score without improving safety for a passenger on a passenger-side small overlap crash.


There's only one model with relevant results for the driver-side small overlap test. That model scored Acceptable before and after modifications were made to improve test scores.


Of the five vehicles where we have relevant driver-side small overlap test scores, one scored Acceptable, three scored Marginal, and one scored Poor. We don't have any indication that models were modified to improve their test scores.

Of the two vehicles where we have relevant passenger-side small overlap test scores for unmodified models, one scored Good and one scored Acceptable.

We also have one score for a model that had structural modifications to score Acceptable, which later had further modifications that allowed it to score Good. That model was introduced in 2017 and had a Good score on the driver-side small overlap test without modifications, indicating that it was designed to achieve a good test score on the driver's side test without similar consideration for a passenger-side impact.


Of the five models where we have relevant driver-side small overlap test scores for unmodified models, two scored Acceptable, one scored Marginal, and two scored Poor. There are also two models where we have test scores after structural changes were made for safety in 2015; both of those models scored Marginal.

We don't have relevant passenger-side small overlap test scores for any model, but even if we did, the dismal scores on the modified models means that we might not be able to tell if similar changes were made to the passenger side.


Of the seven models where we have relevant driver-side small overlap test scores for unmodified models, two scored Acceptable and five scored Poor.

We have one model that only has test scores for a modified model; the frontal airbags and seatbelts were modified in 2013 and the side curtain airbags were modified in 2017. The score afterward modifications was Marginal.

One of the models that scored Poor had structural changes made in 2015 that improved its score to Good.

Of the four models where we have relevant passenger-side small overlap test scores, two scored Good, one scored Acceptable (that model scored good on the driver-side test), and one score Marginal (that model also scored Marginal on the driver-side test).


Of the two models where we have relevant driver-side small overlap test scores for unmodified models, one scored Marginal and one scored Poor.

There's one model where we only have test score after modifications; that model has changes to its airbags and seatbelts and it scored Marginal after the changes. This model was also later tested on the passenger-side small overlap test and scored Poor.

One other model has a relevant passenger-side small overlap test score; it scored Good.


The two models where we have relevant driver-side small overlap test scores for unmodified models both scored Marginal.

Of the two models where we only have scores after modifications, one was modified 2013 and scored Marginal after modifications. It was then modified again in 2015 and scored Good after modifications. That model was later tested on the passenger side small-overlap test, where it scored Acceptable, indicating that the modifications differentially favored the driver's side. The other scored Acceptable after changes made in 2015 and then scored Good after further changes made in 2016. The 2016 model was later tested on the passenger-side small overlap test and scored Marginal, once again indicating that changes differentially favored the driver's side.

We have passenger-side small overlap test for two other models, both of which scored Acceptable. These were models introduced in 2015 (well after the introduction of the driver-side small overlap test) and scored Good on the driver-side small overlap test.

Appendix: miscellania

A number of name brand car makes weren't included. Some because they have relatively low sales in the U.S. are low and/or declining rapidly (Mitsubishi, Fiat, Alfa Romeo, etc.), some because there's very high overlap in what vehicles are tested (Kia, Mazda, Audi), and some because there aren't relevant models with driver-side small overlap test scores (Lexus). When a corporation owns an umbrella of makes, like FCA with Jeep, Dodge, Chrysler, Ram, etc., these weren't pooled since most people who aren't car nerds aren't going to recognize FCA, but may recognize Jeep, Dodge, and Chrysler.

If the terms of service of the API allowed you to use IIHS data however you wanted, I would've included smaller makes, but since the API comes with very restrictive terms on how you can display or discuss the data which aren't compatible with exploratory data analysis and I couldn't know how I would want to display or discuss the data before looking at the data, I pulled all of these results by hand (and didn't click through any EULAs, etc.), which was fairly time consuming, so there was a trade-off between more comprehensive coverage and the rest of my life.

Appendix: what car should I buy?

That depends on what you're looking for, there's no way to make a blanket recommendation. For practical information about particular vehicles, Alex on Autos is the best source that I know of. I don't generally like videos as a source of practical information, but car magazines tend to be much less informative than youtube car reviewers. There are car reviewers that are much more popular, but their popularity appears to come from having witty banter between charismatic co-hosts or other things that not only aren't directly related to providing information, they actually detract from providing information. If you just want to know about how cars work, Engineering Explained is also quite good, but the information there is generally practical.

For reliability information, Consumer Reports is probably your best bet (you can also look at J.D. Power, but the way they aggregate information makes it much less useful to consumers).

Thanks to Leah Hanson, Travis Downs, Prabin Paudel, and Justin Blank for comments/corrections/discussion

  1. this includes the 2004-2012 Volvo S40/V50, 2006-2013 Volvo C70, and 2007-2013 Volvo C30, which were designed during the period when Ford owned Volvo. Although the C1 platform was a joint venture between Ford, Volvo, and Mazda engineers, the work was done under a Ford VP at a Ford facility. [return]
  2. to be fair, as we saw with the IIHS small overlap tests, not every manufacturer did terribly. In 2017 and 2018, 8 vehicles sold in Africa were crash tested. One got what we would consider a mediocre to bad score in the U.S. or Europe, five got what we would consider to be a bad score, and "only" three got what we would consider to be an atrocious score. The Nissan NP300, Datsun Go, and Cherry QQ3 were the three vehicles that scored the worst. Datsun is a sub-brand of Nissan and Cherry is a Chinese brand, also known as Qirui.

    We see the same thing if we look at cars sold in India. Recently, some tests have been run on cars sent to the Indian market and a number of vehicles from Datsun, Renault, Chevrolet, Tata, Honda, Hyundai, Suzuki, Mahindra, and Volkswagen came in with atrocious scores that would be considered impossibly bad in the U.S. or Europe.


June 29, 2020

Phil Hagelberg (technomancy)

in which a compiler takes steps towards strapping its boots June 29, 2020 11:30 PM

One of the biggest milestones in a programming language is when the language gets to the point where it can be used to write its own implementation, which is called self-hosting. This is seen as a sign of maturity since reaching this point requires getting a lot of common problems shaken out first.

The compiler for the Fennel programming language was written using Lua, and it emits Lua code as output. Over time, certain parts of the compiler were added that were written in Fennel, starting with fennelview, which is the pretty-printer for Fennel data structures. Once the macro system stabilized, many built-in forms that had originally been hard-coded into the compiler using Lua got ported to the macro system. After that the REPL was ported to Fennel as a relatively independent piece of code, followed by the command-line launcher script and a helper module to explain and identify compiler errors. The parser had already seen an impressive port to Fennel using a literate programming approach, but we hadn't incorporated this into the mainline repository yet because the literate approach made it a bit tricky to bring in.

As you might expect, any attempt at self-hosting can easily run into "chicken or egg" problems—how do you use the language to write the implementation if the language hasn't been finished being defined yet? Sometimes this requires simply limiting yourself to a subset; for instance, the built-in macros in Fennel cannot themselves use any macros but must be written in a macroless subset of Fennel. In other cases, such as the launcher, we keep a copy of the old pre-self-hosted version around in order to build the new version.

lake union/lake washington canal

That's about as far as we could get on the path to self-hosting without changing the approach, because most of the remaining code was fairly entangled, and we didn't have clear boundaries to port it one piece at a time. At this stage there were 2250 lines of Lua and 1113 lines of Fennel. I recently took some time to reorganize the compiler into four independent "pseudo-modules" with clear dependencies between the pieces. But even with the independent modules broken out, we were still looking at porting 800 lines of intricate compiler code and 900 lines of special forms all in two fell swoops.

That's when I started to consider an alternate approach. The Fennel compiler takes Fennel code as input and produces Lua code as output. We have a big pile of Lua code in the compiler that we want turned into Fennel code. What if we could reverse the process? That's when Antifennel was born.

(fn early-return [compile {: arguments}]
  (let [args (map arguments compile)]
    (if (any-complex-expressions? arguments 1)
        (early-return-complex compile args)
        (list (sym :lua)
              (.. "return " (table.concat (map args view) ", "))))))

(fn binary [compile {: left : right : operator} ast]
  (let [operators {:== := "~=" :not= "#" :length "~" :bnot}]
    (list (sym (or (. operators operator) operator))
          (compile left)
          (compile right))))

Antifennel takes Lua code and parses[1] it, then walks the abstract syntax tree of Lua and builds up an abstract syntax tree of Fennel code based on it. I had to add some features to fnlfmt, the formatter for Fennel, in order to get the output to look decent, but the overall approach is overall rather straightforward since Fennel and Lua have a great deal of overlap in their semantics.

The main difficulties came from supporting features which are present in the Lua language but not in Fennel. Fennel omits somethings which are normal in Lua, usually because the code becomes easier to understand if you can guarantee certain things never happen. For instance, when you read a Fennel function, you don't have to think about where in the code the possible return values can be found; these can only occur in tail positions because there is no early return. But Lua allows you to return (almost) anywhere in the function!

Fennel has one "secret" feature to help with this: the lua special form:

(lua "return nextState, value")

Included specifically to make the task of porting existing code easier, the lua form allows you to emit Lua code directly without the compiler checking its validity. This is an "escape hatch" that can allow you to port Lua code as literally as possible first, then come back once you have it working and clean up the ugly bits once you have tests and things in place. It's not pretty, but it's a practical compromise that can help you get things done.

Unfortunately it's not quite as simple as just calling (lua "return x"), because if you put this in the output every time there's a return in the Lua code, most of it will be in the tail position. But Fennel doesn't understand that the lua call is actually a return value; it thinks that it's just a side-effect, and it will helpfully insert a return nil after it for consistency. In order to solve this I needed to track which returns occurred in the tail position and which were early returns, so I could use normal Fennel methods for the tail ones and use this workaround hack only for early returns[2]. But that ended up being easier than it sounds.

Other incompatibilities were the lack of a break form (which could easily be addressed with the (lua "break") hack because it only happens in a non-tail position), the lack of repeat form (compiled into a while with a break at the end), and the fact that locals default to being immutable in Fennel and mutability is opt-in. This last one I am currently handling by emitting all locals as var regardless of whether they are mutated or not, but I plan on adding tracking in to allow the compiler to emit the appropriate declaration based on how it's used.

While it's still too early to swap out the canonical implementation of the Fennel compiler, the Antifennel-compiled version works remarkably well, passing the entire language test suite across every supported version of the Lua runtime at 79% the length of the Lua version. I'm looking forward to finishing the job and making the Fennel codebase written purely using Fennel itself.

[1] Antifennel uses the parser from the LuaJIT Language Toolkit, which is another self-hosted compiler that takes Lua code as input and emits LuaJIT bytecode without requiring any C code to be involved. (Of course, in order to run the bytecode, you have to use the full LuaJIT VM, which is mostly written in C.) I had to make one small change to the parser in order to help it "mangle" identifiers that were found to conflict with built-in special forms and macros in Fennel, but other than that it worked great with no changes. The first big test of Antifennel was making sure it could compile its own parser dependency from Lua into Fennel, which it could do on the second day.

[2] Even that is a slight oversimplification, because the lua return hack only works on literals and identifiers, not complex expressions. When a complex expression is detected being returned, we compile it to a wrapping let expression and only pass in the bound local name to the return.

June 28, 2020

Ponylang (SeanTAllen)

Last Week in Pony - June 28, 2020 June 28, 2020 03:52 PM

Ponyup and ponylang-mode have new releases.

Bogdan Popa (bogdan)

Deploying Racket Web Apps June 28, 2020 12:00 PM

Someone recently asked about how to deploy Racket web apps on the Racket Slack. The most common answers were install Racket on the target machine, then ship your code there or use Docker (basically a “portable” variant of option 1). I wanted to take a few minutes today and write about my preferred way of deploying Racket apps: build an executable with the application code, libraries and assets embedded into it and ship that around.

June 23, 2020

Marc Brooker (mjb)

Code Only Says What it Does June 23, 2020 12:00 AM

Code Only Says What it Does

Only loosely related to what it should do.

Code says what it does. That's important for the computer, because code is the way that we ask the computer to do something. It's OK for humans, as long as we never have to modify or debug the code. As soon as we do, we have a problem. Fundamentally, debugging is an exercise in changing what a program does to match what it should do. It requires us to know what a program should do, which isn't captured in the code. Sometimes that's easy: What it does is crash, what it should do is not crash. Outside those trivial cases, discovering intent is harder.

Debugging when should do is subtle, such as when building distributed systems protocols, is especially difficult. In our Millions of Tiny Databases paper, we say:

Our code reviews, simworld tests, and design meetings frequently referred back to the TLA+ models of our protocols to resolve ambiguities in Java code or written communication.

The problem is that the implementation (in Physalia's case the Java code) is both an imperfect implementation of the protocol, and an overly-specific implementation of the protocol. It's overly-specific because it needs to be fully specified. Computers demand that, and no less, while the protocol itself has some leeway and wiggle room. It's also overly-specific because it has to address things like low-level performance concerns that the specification can't be bothered with.

Are those values in an ArrayList because order is actually important, or because O(1) random seeks are important, or some other reason? Was it just the easiest thing to write? What happens when I change it?

Business logic code, while lacking the cachet of distributed protocols, have even more of these kinds of problems. Code both over-specifies the business logic, and specifies it inaccurately. I was prompted to write this by a tweet from @mcclure111 where she hits the nail on the head:

This is a major problem with code: You don't know which quirks are load-bearing. You may remember, or be able to guess, or be able to puzzle it out from first principles, or not care, but all of those things are slow and error-prone. What can we do about it?

Design Documentation

Documentation is uncool. Most software engineers seem to come out of school thinking that documentation is below them (tech writer work), or some weird thing their SE professor talked about that is as archaic as Fortran. Part of this is understandable. My own software engineering courses emphasized painstakingly documenting the implementation in UML. No other mention of documentation was made. Re-writing software in UML helps basically nobody. I finished my degree thinking that documentation was unnecessary busywork. Even the Agile Manifesto agreed with me1:

Working software over comprehensive documentation

What I discovered later was that design documentation, encoding the intent and decisions made during developing a system, helps teams be successful in the short term, and people be successful in the long term. Freed from fitting everything in my head, emboldened by the confidence that I could rediscover forgotten facts later, I could move faster. The same applies to teams.

One thing I see successful teams doing is documenting not only the what and why behind their designs, but the how they decided. When it comes time to make changes to the system—either for debugging or in response to changing requirements—these documents are invaluable. It's hard to decide whether its safe to change something, when you don't know why it's like that in the first place. The record of how you decided is important because you are a flawed human, and understanding how you came to a decision is useful to know when that decision seems strange, or surprising.

This documentation process doesn't have to be heavyweight. You don't have to draw painstaking ER diagrams unless you think they are helpful. You should probably ignore UML entirely. Instead, describe the system in prose as clearly and succinctly as you can. One place to start is by building an RFC template for your team, potentially inspired by one that you find on the web. SquareSpace's template seems reasonable. Some designs will fit well into that RFC format, other's won't. Prefer narrative writing where you can.

Then, keep the documents. Store them somewhere safe. Soak them in vinegar and tie them around your chest. You're going to want to make sure that the people who need to maintain the system can find them. As they are spelunking through history, help them feel more like a library visitor and less like Lara Croft.

I'm not advocating for Big Design Up Front. Many of the most important things we learn about a project we learn during the implementation. Some of the most important things we learn years after the implementation is complete. Design documentation isn't a static one-time ahead-of-time deliverable, but an ongoing process. Most importantly, design documentation is not a commitment to bad ideas. If it's wrong, fix it and move forward. Documentation is not a deal with the devil.


Few topics invite a programmer flame war like comments. We're told that comments are silly, or childish, or make it hard to show how manly you are in writing that convoluted mess of code. If it was hard to write, it should be hard to read. After all, you're the James Joyce of code.

That silliness aside, back to @mcclure111's thread:

Comments allow us to encode authorial intent into our code in a way that programming languages don't always. Types, traits, interfaces, and variable names do put intent into code, but not completely (I see you, type system maximalists). These same things allow us to communicate a lack of intent—consider RandomAccess vs ArrayList—but are also incomplete. Well-commented code should make the intent of the author clear, especially in cases where that intent is either lost in the translation to code, or where implementation constraints hide the intent of the design. Code comments that link back to design documents are especially useful.

Some languages need comments more than others. Some, like SQL, I find to nearly always obscure the intent of the design behind implementation details.

Formal Specification

In Who Builds a House Without Drawing Blueprints? Leslie Lamport writes:

The need for specifications follows from two observations. The first is that it is a good idea to think about what we are going to do before doing it, and as the cartoonist Guindon wrote: "Writing is nature's way of letting you know how sloppy your thinking is."

The second observation is that to write a good program, we need to think above the code level.

I've found that specification, from informal specification with narrative writing to formal specification with TLA+, makes writing programs faster and helps reduce mistakes. As much as I like that article, I think Lamport misses a key part of the value of formal specification: it's a great communication tool. In developing some of the trickiest systems I've built, I've found that heavily-commented formal specifications are fantastically useful documentation. Specification languages are all about intent, and some make it easy to clearly separate intent from implementation.

Again, from our Millions of Tiny Databases paper:

We use TLA+ extensively at Amazon, and it proved exceptionally useful in the development of Physalia. Our team used TLA+ in three ways: writing specifications of our protocols to check that we understand them deeply, model checking specifications against correctness and liveness properties using the TLC model checker, and writing extensively commented TLA+ code to serve as the documentation of our distributed protocols. While all three of these uses added value, TLA+’s role as a sort of automatically tested (via TLC),and extremely precise, format for protocol documentation was perhaps the most useful.

Formal specifications make excellent documentation. Like design docs, they aren't immutable artifacts, but a reflection of what we have learned about the problem.


Building long-lasting, maintainable, systems requires not only communicating with computers, but also communicating in space with other people, and in time with our future selves. Communicating, recording, and indexing the intent behind our designs is an important part of that picture. Make time for it, or regret it later.


  1. To be charitable to the Agile folks, comprehensive does seem to be load-bearing.

June 22, 2020

Henry Robinson (henryr)

Network Load Balancing with Maglev June 22, 2020 07:52 PM

Maglev: A Fast and Reliable Software Network Load Balancer Eisenbud et. al., NSDI 2016 [paper] Load balancing is a fundamental primitive in modern service architectures - a service that assigns requests to servers so as to, well, balance the load on each server. This improves resource utilisation and ensures that servers aren’t unnecessarily overloaded. Maglev is - or was, sometime before 2016 - Google’s network load-balancer that managed load-balancing duties for search, GMail and other high-profile Google services.

June 21, 2020

Derek Jones (derek-jones)

How should involved if-statement conditionals be structured? June 21, 2020 10:43 PM

Which of the following two if-statements do you think will be processed by readers in less time, and with fewer errors, when given the value of x, and asked to specify the output?

// First - sequence of subexpressions
if (x > 0 && x  10 || x > 20 && x  30)
   print "b");

// Second - nested ifs
if (x > 0 && x  10)
else if (x > 20 && x  30)

Ok, the behavior is not identical, in that the else if-arm produces different output than the preceding if-arm.

The paper Syntax, Predicates, Idioms — What Really Affects Code Complexity? analyses the results of an experiment that asked this question, including more deeply nested if-statements, the use of negation, and some for-statement questions (this post only considers the number of conditions/depth of nesting components). A total of 1,583 questions were answered by 220 professional developers, with 415 incorrect answers.

Based on the coefficients of regression models fitted to the results, subjects processed the nested form both faster and with fewer incorrect answers (code+data). As expected performance got slower, and more incorrect answers given, as the number of intervals in the if-condition increased (up to four in this experiment).

I think short-term memory is involved in this difference in performance; or at least I can concoct a theory that involves a capacity limited memory. Comprehending an expression (such as the conditional in an if-statement) requires maintaining information about the various components of the expression in working memory. When the first subexpression of x > 0 && x 10 || x > 20 && x 30 is false, and the subexpression after the || is processed, there is no now forget-what-went-before point like there is for the nested if-statements. I think that the single expression form is consuming more working memory than the nested form.

Does the result of this experiment (assuming it is replicated) mean that developers should be recommended to write sequences of conditions (e.g., the first if-statement example) about as:

if (x > 0 && x  10)
else if (x > 20 && x  30)

Duplicating code is not good, because both arms have to be kept in sync; ok, a function could be created, but this is extra effort. As other factors are taken into account, the costs of the nested form start to build up, is the benefit really worth the cost?

Answering this question is likely to need a lot of work, and it would be a more efficient use of resources to address questions about more commonly occurring conditions first.

A commonly occurring use is testing a single range; some of the ways of writing the range test include:

if (x > 0 && x  10) ...

if (0  x && x  10) ...

if (10 > x && x > 0) ...

if (x > 0 && 10 > x) ...

Does one way of testing the range require less effort for readers to comprehend, and be more likely to be interpreted correctly?

There have been some experiments showing that people are more likely to give correct answers to questions involving information expressed as linear syllogisms, if the extremes are at the start/end of the sequence, such as in the following:

     A is better than B
     B is better than C

and not the following (which got the lowest percentage of correct answers):

     B is better than C
     B is worse than A

Your author ran an experiment to find out whether developers were more likely to give correct answers for particular forms of range tests in if-conditions.

Out of a total of 844 answers, 40 were answered incorrectly (roughly one per subject; it was a paper and pencil experiment, so no timings). It's good to see that the subjects were so competent, but with so few mistakes made the error bars are very wide, i.e., too few mistakes were made to be able to say that one representation was less mistake-prone than another.

I hope this post has got other researchers interested in understanding developer performance, when processing if-statements, and that they will be running more experiments help shed light on the processes involved.

Ponylang (SeanTAllen)

Last Week in Pony - June 21, 2020 June 21, 2020 05:11 PM

We have new releases for ponyup, ponylang-mode, and release-notes-bot-action.

June 19, 2020

Pierre Chapuis (catwell)

[Quora] Transparency in distributed systems UX June 19, 2020 06:52 PM

Yet another Quora answer, this time to with this question which I answered on August 24, 2016:

Why is Transparency a major issue in distributed databases?

First, a few words about what "transparency" is. Transparency is a UX term about the user not noticing that they are using a distributed system. We actually talk about transparencies in the plural, because there are several kinds of transparencies: fault transparency, location transparency, concurrency transparency, access transparency, etc. In my opinion, the choice of wording is not so good: when we talk about "transparency" we actually mean we hide things from the user, so "opacity" would be more logical. (The name comes from the fact that, if we replace a single node system by a distributed system, it will be transparent for the user, i.e. then will not notice it.)

The reason why transparency is important is usability. The more transparencies our system has, the less cognitive burden there is on the user. In other words: transparencies simplify the API of the system.

However, what transparencies we implement or not is a trade-off between that simplicity of API and things like flexibility, performance, and sometimes correctness. Years ago (when Object Oriented programming à la Java was booming) it was fashionable to abstract everything and make the user forget that they were using an actual distributed system. For instance, we had RPC everywhere, which kind of hid the network from the user. Since then we learnt that abstracting the network entirely is a bad idea.

On the other hand, exposing too many knobs to the user is dangerous as well: they might turn them without really understanding what they do and set the system on fire.

So, determining what to expose to the user and what to implement "transparently" is a crucial point in all distributed systems work, not only databases.

In databases in particular, conflict resolution is a contention point. Do we only provide the user with databases that are consistent, knowing that this severly impacts performance and availability? Do we let them tweak the parameters (the R and W parameters in a quorum system, for instance)? Do we tolerate divergence, detect it, inform the user and let them reconcile (à la CouchDB)? Do we provide the user with constrained datastructures that resolve conflicts by themselves (CRDTs)?

Some people have gone as far as saying that Distributed Systems Are a UX Problem and I tend to agree with this line of reasoning.

Frederic Cambus (fcambus)

Viewing ANSI art in MS-DOS virtual machines June 19, 2020 04:28 PM

I sometimes get reports about Ansilove rendering some artworks differently than other ANSI art editors and viewers for modern platforms.

Ansilove tries to be faithful to ANSI.SYS and MS-DOS based editors and viewers rendering, as the vast majority of artworks were created during the DOS era. Most of the time, using ACiDDraw and ACiD View in DOSBox is enough, but when in doubt, it can be useful to verify how ANSI.SYS rendered a particular piece.

Once we have MS-DOS installed and working in a virtual machine, the next step is accessing files within the VM. The easiest way to do so is to create and use virtual floppy images to transfer files.

On a Linux machine, one can use mkfs.msdos to create an empty floppy image:

mkfs.msdos -C floppy.img 1440

The image can then be mounted on the host to copy the desired content, then attached to the virtual machine.

In the MS-DOS guest, we need to enable ANSI.SYS in CONFIG.SYS:


We can then render the files we want to verify:


80x50 mode can be enabled this way:


Wesley Moore (wezm)

Working Around GitHub Browser Sniffing to Get Better Emoji on Linux June 19, 2020 08:03 AM

I have my system configured1 to use JoyPixels for emoji, which I consider vastly more attractive than Noto Color Emoji. Sadly GitHub uses browser sniffing to detect Linux user-agents and replaces emoji with (badly aligned) images of Noto Color Emoji. They don't do this on macOS and Windows. In this post I explain how I worked around this.

Screenshot of GitHub showing two comments, one with emoji set in the Noto Color Emoji font, the other in the JoyPixels Font.

The solution is simple: make GitHub think you're using a Mac or Windows PC. There are various ways to change the User-Agent string of Firefox. The easiest is via about:config but I didn't want it to be a global change — I want sites to know that I'm using Linux in logs/privacy respecting analytics (I block most trackers).

I ended up using the User-Agent Switcher and Manager browser add-on. I configured its allow list to only include, and use the User-Agent string for Firefox on macOS. The end result? JoyPixels, just like I wanted.

P.S. If anyone from GitHub sees this. Please stop browser sniffing Linux visitors. Linux desktops and browsers have had working emoji support for years now.


I use the term, "configured", loosely here as all I really did was install the ttf-joypixels package.

June 17, 2020

Gustaf Erikson (gerikson)

Closing Time by Joe Queenan June 17, 2020 09:13 PM

An unflinching but entertaining memoir about growing up with an alchoholic father in working-class Philadelphia.

Britain’s War Machine by David Edgerton June 17, 2020 08:12 PM

A revisionist look at the material grounding of Great Britain (and its Empire) in World War II.

Unlike most contemporary views, Edgerton sees Dunkirk not as a low point but as a temporary setback. The real setback was Japan’s entry into the war and Britain’s need to divert forces and treasure to defend the Empire.

In the post-war years, with the Empire gone and Britain’s relative standing diminished, Dunkirk grows in stature, and the myth of the small island sacrificing itself for peace and democracy grows with it.

5,000 dead in Sweden June 17, 2020 12:17 PM

Pages From The Fire (kghose)

Install exiftool on Synology June 17, 2020 02:19 AM

Install Perl from the Synology package manager Use curl to download the *nix distribution to a folder (instruction here) make is not present on the Synology, but we don’t need it. Copy exiftool and lib directory to a directory of your choosing, or leave it where it is, if you won’t erase it Use vi… Read More Install exiftool on Synology

June 16, 2020

Benjamin Pollack (gecko)

Distributed Perfection June 16, 2020 08:20 PM

One of the hardest things for me to come to terms with in my adult life has been that…well, no one is perfect. The civil rights leader you laud turns out to be a womanizer. A different one held up as a herald of nonviolence turns out to be violently antisemitic. An author who is unquestionably a major supporter of women’s rights and the LGB part of the acronym turns out to be an enemy of the TQ part of the acronym.

But that doesn’t mean we just absolutely write them all off. When it comes to people, we understand that no one is perfect. People are complex, nuanced individuals. It’s entirely reasonable that someone who is a true rights champion in some arena might be straight-up retrograde in another. Sometimes they improve, sometimes they don’t, but either way, they’re people, who deserve to be lauded for their wins and criticized for their faults. They should be allowed faults. They shouldn’t have to be perfect.

I feel like we’re really slow to adopt this attitude to tech, even though tech is, ultimately, made by people. Jepsen tests are awesome, but I’ve become numb to people seeing a single Jepsen test failure as an indicator the entire database is broken. An amazing hobbyist graphical OS might be almost impossibly impressive, but people write it off because the kernel interface isn’t quite what they want. GitHub sucks because it needs JavaScript and bloated browsers to operate, even though it demonstrably has made it much easier for the average person to contribute to free and open-source software.

There’s so much work on trying to get a more diverse community into tech, but I feel like we lose a lot of our potential diversity right there, in our insistence that everything be straight-up perfect or be thrown out. Of course I’d like my tech stack to be perfect. But it’s written by people, and people are notoriously complicated, unreliable, and nuanced. And it’s important I meet them where they are, as people.

There’s no call to action in this post, and I’m deliberately not linking anything because I don’t want to fan the flames. But I do want to ask that, before you write a project or its people off as incompetent, lazy, offensive, or stupid, that you take a moment to explore that they’re people with strengths and weaknesses, and the tech they produce will likely be along similar axes.

June 14, 2020

Gokberk Yaltirakli (gkbrk)

Status update, June 2020 June 14, 2020 11:00 PM

After seeing other people (like Drew and emersion) publish these for a while, I decided to write my own “Status Update” for the first time. I can’t always find enough time to write blog posts, so these monthly status updates should be useful both for other people to keep an eye on what I’m working on, and for me as a historical record.

A full blog post needs to have a decent amount of content, and has a lot of time costs related to research, writing, and editing. This means for small updates, there might be too much friction in getting a post out. On the other hand, a monthly update needs a lot less polish for each individual item you want to talk about. As long as you pay attention to the post as a whole, individual items are allowed to lack the substance expected of a standalone post.

These reasons, combined with my not-so-great track record in posting regularly, makes me think I will be doing this in the future too. So let’s get on with it.

This month I migrated my kernel’s build system from a Makefile to the Ninja build system. Other than depending on a non-standard tool, I am really happy with this decision. I think Ninja is easier to maintain unless your Makefile is really simple.

Aside from the build system migration, and some small fix-ups, I have completely removed the C-based drivers and migrated them all to the new C++ driver system. This code cleanup made it easier to work on device drivers; and as a result of that we now have a graphics driver that works with QEMU, along with a framebuffer to avoid partial frames.

Speaking of Ninja, I also wrote a Python implementation of Ninja. This was both a way to learn about how build systems works, and a way to build my projects in environments without Ninja. While it doesn’t have full feature parity with the original implementation, it can build my projects and even bootstrap Ninja itself.

Fed up with the state of privacy on the Modern Web™, I started working on a browser extension that aims to plug some privacy leaks. It is still not polished enough, and occasionally breaks some JS-heavy websites. But despite that I’ve been using it daily and it’s not too problematic. I’ll try to make an announcement once it’s fully polished and ready to use.

Just as the previous months, I’m still learning more about DSP. Last month I created a software modulator for APT. This month I decided to go a little old-school and made a tool to create for Hellschreiber. Unfortunately, I haven’t had the chance to put up the code online yet.

I wrote a couple pages on my wiki about DSP, and I managed to contribute a little to the Signal Identification Wiki. I’d recommend checking it out, it’s a great resource to sink some time into.

Short update about university: It’s finally over. Once it’s marked and graded, I’d love to write about my dissertation project here.

That’s all for this month! Thanks for reading.

Derek Jones (derek-jones)

An experiment involving matching regular expressions June 14, 2020 10:40 PM

Recommendations for/against particular programming constructs have one thing in common: there is no evidence backing up any of the recommendations. Running experiments to measure the impact of particular language features on developer performance is not something that researchers do (there have been a handful of experiments looking at the impact of strong typing on developer performance; the effect measured was tiny).

In February I discovered two groups researching regular expressions. In the first post on duplicate regexs, I promised to say something about the second group. This post discusses an experiment comparing developer comprehension of various regular expressions; the paper is: Exploring Regular Expression Comprehension.

The experiment involved 180 workers on Mechanical Turk (to be accepted, workers had to correctly answer four or five questions about regular expressions). Workers/subjects performed two different tasks, matching and composition.

  • In the matching task workers saw a regex and a list of five strings, and had to specify whether the regex matched (or not) each string (there was also an unsure response).
  • In the composition task workers saw a regular expression, and had to create a string matched by this regex. Each worker saw 10 different regexs, which were randomly drawn from a set of 60 regexs (which had been created to be representative of various regex characteristics). I have not analysed this data yet.

What were the results?

For the matching task: given each of the pairs of regexs below, which one (of each pair) would you say workers were most likely to get correct?

         R1                  R2
1.     tri[a-f]3         tri[abcdef]3
2.     no[w-z]5          no[wxyz]5
3.     no[w-z]5          no(w|x|y|z)5
4.     [ˆ0-9]            [\D]

The percentages correct for (1) were essentially the same, at 94.0 and 93.2 respectively. The percentages for (2) were 93.3 and 87.2, which is odd given that the regex is essentially the same as (1). Is this amount of variability in subject response to be expected? Is the difference caused by letters being much less common in text, so people have had less practice using them (sounds a bit far-fetched, but its all I could think of). The percentages for (3) are virtually identical, at 93.3 and 93.7.

The percentages for (4) were 58 and 73.3, which surprised me. But then I have been using regexs since before \D support was generally available. The MTurk generation have it easy not having to use the ‘hard stuff’ 😉

See Table III in the paper for more results.

This matching data might be analysed using Item Response theory, which can take into account differences in question difficulty and worker/subject ability. The plot below looks complicated, but only because there are so many lines. Each numbered colored line is a different regex, worker ability is on the x-axis (greater ability on the right), and the y-axis is the probability of giving a correct answer (code+data; thanks to Peipei Wang for fixing the bugs in my code):

Probability of giving a correct answer, by subject ability, for 60 regex matching questions

Yes, for question 51 the probability of a correct answer decreases with worker ability. Heads are being scratched about this.

There might be some patterns buried in amongst all those lines, e.g., particular kinds of patterns require a given level of ability to handle, or correct response to some patterns varying over the whole range of abilities. These are research questions, and this is a blog article: answers in the comments :-)

This is the first experiment of its kind, so it is bound to throw up more questions than answers. Are more incorrect responses given for longer regexs, particularly if they cannot be completely held in short-term memory? It is convenient for the author to use a short-hand for a range of characters (e.g., a-f), and I was expecting a difference in performance when all the letters were enumerated (e.g., abcdef); I had theories for either one being less error-prone (I obviously need to get out more).

Ponylang (SeanTAllen)

Last Week in Pony - June 14, 2020 June 14, 2020 03:00 PM

We have some new RFCs and more updates to the Emacs ponylang-mode. This week’s sync meeting includes a discussion about build systems and package management.

Bogdan Popa (bogdan)

Announcing http-easy June 14, 2020 03:00 PM

Yesterday I released http-easy, a high-level HTTP client for Racket. I started working on it after getting annoyed at some of the code in my racket-sentry package. The same day I wrote that code, someone started a mailing list thread asking for a “practical” HTTP client so that served as additional motivation to spend some time on this problem. Here’s a basic example: 1 2 (require net/http-easy) (response-xexpr (get "https://example.

June 11, 2020

Oleg Kovalov (olegkovalov)

kakkaka June 11, 2020 05:25 PM


kakkaka was originally published in testlolkek on Medium, where people are continuing the conversation by highlighting and responding to this story.

June 10, 2020

Robin Schroer (sulami)

LISP<sub>1</sub> Has Won June 10, 2020 12:00 AM

I am currently working on a compiler for a new programming language which has been in the making for a few months at this point. There is nothing public to show yet, everything is very early stage, and there are plenty of decisions to make and work to be done before I will publish anything.

That being said, I will write about both the progress as well as different topics I come across, so stay tuned if you are interested in that.

The language I am writing currently has a Lisp-like syntax, because that is easy to parse an work with,I just really don’t want to deal with operator precedence, and honestly I like writing Lisp, so I find it unlikely that I’ll actually change to a non-Lisp-like syntax.

which is why I am sharing some thoughts on one of the big bike sheds in software history.


LISP1 and LISP2 are terms to describe the way symbol namespaces work in different LISP-like programming languages.It can also be applied to other languages with first-class functions, e.g. Python & Ruby.

The explanation is actually very simple, LISP1 has a single shared namespace for functions and variables. This means a symbol can refer to either a function or a variable, but not both. Consider the following Racket code:

(define (double x)
  (* 2 x))

(define triple (* 3 4))

;; => #<procedure:double>

;; => 12

(double triple)
;; => 24

When you resolve a symbol to a variable, you cannot know if it will resolve to a function or not.

LISP2 on the other hand has a separate namespace for functions. This has the advantage that every name can be used twice,I’m glossing over the fact that Common Lisp actually has more than two namespaces, depending on your definition of the term namespace.

once for a function, and once for a variable. The tradeoff is that the user has to specify in which namespace they want to resolve a symbol. Consider the following Emacs Lisp code:

(defun double (x)
  (* 2 x))

(defvar double (* 2 4))

(funcall #'double double)
;; => (funcall <function double> <variable double>)
;; => (double 6)
;; => 12

Note the added punctuation to denote the first double as a symbol resolving to a function.


LISP is one of the oldest programming languages that is still used commercially today in some form, if you accept Common Lisp in its lineage. It appears that the namespace separation in the original LISP 1.5 was mostly incidental, and has been regretted since.

The set of LISP2 languages is quite small these days. Besides Common Lisp and Emacs Lisp, both of which are over three decades old at this point, there are also Ruby and Perl.Honourable mention: Lisp Flavoured Erlang is a LISP2.

The other ancient LISP-like language, Scheme, is a LISP1, and so is its popular modern dialect Racket (as demonstrated above). Almost every other somewhat popular language chooses to share a single namespace between functions and variables. Examples include Clojure, Janet, Python, Java, JavaScript, and even Rust.

Clearly the benefits of less syntactic clutter and cognitive overhead have won in the popular arena, to the point that the established de facto standard itself becomes a good reason to stick with a single unified namespace. Of course improvement, by its very definition, always requires change, but language designers need to be acutely aware of the cost incurred by diverging from the established norm.

June 09, 2020

Gokberk Yaltirakli (gkbrk)

Faux-DEFLATE June 09, 2020 11:00 PM

I was working on a proof-of-concept implementation of a file format that uses DEFLATE compression. Since it was supposed to be a self-contained example, I didn’t want to bring in a fully-featured compressor like zlib. I skimmed the DEFLATE RFC and noticed that it supports raw / uncompressed data blocks. I wrote a simple encoder that stores uncompressed blocks to solve my problem, and wanted to document it on my blog for future reference.

Structure of a DEFLATE stream

A DEFLATE stream is made up of blocks. Each block has a 3-bit header. The first bit signifies the last block of the stream, and the other two make up the block type.

The block types are

  • 00 - no compression
  • 01 - compressed with fixed Huffman codes
  • 10 - compressed with dynamic Huffman codes
  • 11 - reserved (error)

In this post, we’re only interested in the first one (00).

Structure of an uncompressed block

In the uncompressed block, the 3-bit header is contained in a byte. A good property of the uncompressed block type being 00 is the ease of constructing the header.

  • If the block is the final one, the header is 1
  • If the block is not the final one, the header is 0

After the header byte, there is the length and negated length. These are both encoded as little endian uint16_t’s.

  • length is the number of data bytes in the block
  • negated length is one’s complement of length, ~len & 0xFFFF

After the header, length and the negated length, length bytes of data follow. If there are no more blocks after this one, the final bit is set.

Python implementation

Here’s a simple DEFLATE implementation in Python. The output should be a valid DEFLATE stream for real decoders.

def faux_deflate(reader, writer, bufsize=2048):
    while True:
        chunk =

        header = 0

        if not chunk:
            header = 1

        _len = len(chunk)
        nlen = ~_len & 0xFFFF
        writer.write(struct.pack("<BHH", header, _len, nlen))

        if not chunk:

There’s also a decoder that can only decode uncompressed blocks.

def faux_inflate(reader, writer):
    while header :=
        header, _len, nlen = struct.unpack("<BHH", header)

        assert header in [0, 1]
        assert nlen == ~_len & 0xFFFF


Useful resources

Frederik Braun (freddyb)

Understanding Web Security Checks in Firefox (Part 1) June 09, 2020 10:00 PM

This blog post has first appeared on the Mozilla Attack & Defense blog and was co-authored with Christoph Kerschbaumer

This is the first part of a blog post series that will allow you to understand how Firefox implements Web Security fundamentals, like the Same-Origin Policy. This first post of the series …

Jeremy Morgan (JeremyMorgan)

Get and Store Temperature from a Raspberry Pi with Go June 09, 2020 05:08 PM

In this tutorial, I’ll show you how to grab temperature from a Raspberry Pi and build an endpoint to store the data, with Go. You will learn: How to retrieve the temperature from a sensor How to send that data in JSON Build an API endpoint to receive it Store the data in SQLite database And we’ll do it all with Go. I did a live stream of the entire process that you can watch here.

June 08, 2020

Marc Brooker (mjb)

Some Virtualization Papers Worth Reading June 08, 2020 12:00 AM

Some Virtualization Papers Worth Reading

A short, and incomplete, survey.

A while back, Cindy Sridharan asked on Twitter for pointers to papers on the past, present and future of virtualization. A picked a few of my favorites, and given the popularity of that thread I decided to collect some of them here. This isn't a literature survey by any means, just a collection of some papers I've found particularly interesting or useful. As usual, I'm biased towards papers I enjoyed reading, rather than those I had to slog through.

Popek and Goldberg's 1974 paper Formal Requirements for Virtualizable Third Generation Architectures is rightfully a classic. They lay out a formal framework of conditions that a computer architecture must fulfill to support virtual machines. It's 45 years old, so some of the information is dated, but the framework and core ideas have stood the test of time.

Xen and the Art of Virtualization, from 2003, described the Xen hypervisor and a novel technique for running secure virtualization on commodity x86 machines. The exact techniques are less interesting than they were then, mostly because of hardware virtualization features on x86 like VT-x, but the discussion of the filed and trade-offs is enlightening. Xen's influence on the industry has been huge, especially because it was used as the foundation of Amazon EC2, which triggered the following decade's explosion in cloud computing. Disco: Running Commodity Operating Systems on Scalable Multiprocessors from 1997 is very useful from a similar perspective (and thanks to Pekka Enberg for the tip on that one). Any paper that has "our approach brings back an idea popular in the 1970s" in its abstract gets my attention immediately.

A Comparison of Software and Hardware Techniques for x86 Virtualization, from 2006, looks at some of the early versions of that x86 virtualization hardware and compares it to software virtualization techniques. As above, hardware has moved on since this was written, but the criticisms and comparisons are still useful to understand.

The security, compatibility and performance trade-offs of different approaches to isolation are complex. On compatibility, A study of modern Linux API usage and compatibility: what to support when you're supporting is a very nice study of how much of the Linux kernel surface area actually gets touched by applications, and what is needed to be truly compatible with Linux. Randal's The Ideal Versus the Real: Revisiting the History of Virtual Machines and Containers surveys the history of isolation, and what that means in the modern world. Anjali's Blending Containers and Virtual Machines: A Study of Firecracker and gVisor is another of a related genre, with some great data comparing three methods of isolation.

My VM is Lighter (and Safer) than your Container from SOSP'17 has also been influential in changing they way a lot of people think about virtualization. A lot of people I talk to see virtualization as a heavy tool with multi-second boot times and very limited density, mostly because that's the way it's typically used in industry. Manco et al's work wasn't the first to burst that bubble, but they do it very effectively.

Our own paper Firecracker: Lightweight Virtualization for Serverless Applications describes Firecracker, new open-source Virtual Machine Monitor (VMM) specialized for serverless workloads. The paper also covers how we use it in AWS Lambda, and some of what we see as the future challenges in this space. Obviously I'm biased here, being an author of that paper.

June 07, 2020

Derek Jones (derek-jones)

C++ template usage June 07, 2020 10:29 PM

Generics are a programming construct that allow an algorithm to be coded without specifying the types of some variables, which are supplied later when a specific instance (for some type(s)) is instantiated. Generics sound like a great idea; who hasn’t had to write the same function twice, with the only difference being the types of the parameters.

All of today’s major programming languages support some form of generic construct, and developers have had the opportunity to use them for many years. So, how often generics are used in practice?

In C++, templates are the language feature supporting generics.

The paper: How C++ Templates Are Used for Generic Programming: An Empirical Study on 50 Open Source Systems contains lots of interesting data :-) The following analysis applies to the five largest projects analysed: Chromium, Haiku, Blender, LibreOffice and Monero.

As its name suggests, the Standard Template Library (STL) is a collection of templates implementing commonly used algorithms+other stuff (some algorithms were commonly used before the STL was created, and perhaps some are now commonly used because they are in the STL).

It is to be expected that most uses of templates will involve those defined in the STL, because these implement commonly used functionality, are documented and generally known about (code can only be reused when its existence is known about, and it has been written with reuse in mind).

The template instantiation measurements show a 17:1 ratio for STL vs. developer-defined templates (i.e., 149,591 vs. 8,887).

What are the usage characteristics of developer defined templates?

Around 25% of developer defined function templates are only instantiated once, while 15% of class templates are instantiated once.

Most templates are defined by a small number of developers. This is not surprising given that most of the code on a project is written by a small number of developers.

The plot below shows the percentage instantiations (of all developer defined function templates) of each developer defined function template, in rank order (code+data):

Number of tasks having a given estimate.

Lines are each a fitted power law, whose exponents vary between -1.5 and -2. Is it just me, or are these exponents surprising close?

The following is for developer defined class templates. Lines are fitted power law, whose exponents vary between -1.3 and -2.6. Not so close here.

Number of tasks having a given estimate.

What processes are driving use of developer defined templates?

Every project has its own specific few templates that get used everywhere, by all developers. I imagine these are tailored to the project, and are widely advertised to developers who work on the project.

Perhaps some developers don’t define templates, because that’s not what they do. Is this because they work on stuff where templates don’t offer much benefit, or is it because these developers are stuck in their ways (if so, is it really worth trying to change them?)

Ponylang (SeanTAllen)

Last Week in Pony - June 7, 2020 June 07, 2020 03:16 PM

This weeks sync meeting includes discussions about type system soundness and dependency management.

June 06, 2020

Patrick Louis (venam)

Evolutionary Software Architecture June 06, 2020 09:00 PM

Building Evolutionary Architectures

In a previous post, I’ve underlined the philosophy behind Domain Driven Design, DDD, and now I’d like to move to a practical approach that handles real issues in software development and architecture: requirements that constantly change, and models that are never precise, never current, and/or never using the best technology available. One of the solution to such problems is to build an evolutionary architecture.

To be able to have a discussion we have to understand the part that software architecture plays, which is not straight forward considering the many definitions and re-definitions of it. I’ll note the particularly fascinating ones.

The architecture of a software system (at a given point in time) is its organization or structure of significant components interacting through interfaces, those components being composed of successively smaller components and interfaces. — IEEE definition of software architecture

In most successful software projects, the expert developers working on that project have a shared understanding of the design. This shared understanding is called “architecture.” This understanding includes how the system is divided into components and how the components interact through interfaces. — Ralph Johnson

Going towards more abstract definitions such as the following.

Architecture is the stuff that’s hard to change later. And there should be as little of that stuff as possible. — Martin Fowler

Architecture is about the important stuff. Whatever that is. — Martin Fowler

Stuff that’s hard to change later. — Neal Ford

These definitions barely overlap but there’s still a vague essence joining them which we can extract. We can say that architecture is concerned with the important decisions in a software project, the objects of those decisions, the shared knowledge of them, and how to reason about them. If we view this from an evolutionary architecture standpoint, the best architecture is one where decisions are flexible, easily replaceable, reversible, and deferred as late as possible so that they can be substituted for alternatives that recent experiences have shown to be superior.
Because architecture is about decision-making, it is inherently tied with the concept of technical debt, the compromise of trading time for a design that is not perfect. Keep in mind that debt accumulates and often leads to architectural decay as changes keep coming and entropy increases.

Similarly, due to the vague definition of architecture, the role of architect is hard to describe. Whether it should be a completely separate role, or whether everyone in a team acts as one, is ambiguous. The vociferous software architecture evangelist Martin Fowler prefers the term Architectus Oryzus, referring to architects that are also active contributors on the projects, thus getting direct insights from their involvement.

The software architecture thought process can be applied at two broad levels: the application level and the enterprise level. Application architecture is about describing the structure of the application and how they fit together, usually using design patterns, while enterprise architecture is about the organizational level software issues such as practices, information flow, methodology standards, release mechanisms, personnel related activities, technology stacks enforced, etc.

Design relates to all the well known development design patterns, refactoring techniques, the usage of frameworks, how to bundle components together, and other daily concerns. In an evolutionary architecture, it’s preferable to have an emergent design instead of one that is set up front.

This gives us a good idea of what software architecture is about, so what’s the current state of it, and why do we need a solution such as building evolutionary architectures?

The usual way we develop software today is by fighting incoming changes that we want to incorporate in the current architecture. Software development has a dynamic equilibrium, and currently we find that software is in a constantly unbalanced and unstable state whenever there are changes to be included. That is because even though we’d like to do the right things at the right time, we can’t predict what those decisions should be, predictability is almost impossible. For example, we can’t predict disruptive technologies that don’t exist yet. As the software ages, we juggle changes and new requirements, there’s no room for experimentation, we only respond.
Stakeholders want the software to fulfill architecture significant requirements, also known as the “ilities” and KPI, such as auditability, performance, security, scalability, privacy, legality, productivity, portability, stability, etc. They expect those to not degrade. Hence, we have to find the least-worst trade-off between them and blindly not introduce anything that could hinder them. This is hard to do, be it because of a business-driven change, such as new features, new customers, new market, etc., or be it because of ecosystem change such as advances in technology, library upgrades, frameworks, operating systems, etc.


In recent years, we’ve seen the rise of agile development methodologies that are meant to replace the waterfall approach. They are more apt at facing this challenge, they create an iterative and dynamic way to control the change process. What we call evolutionary architecture starts from the idea of embracing change and constant feedback but wants to apply it across the whole architecture spectrum, on multiple dimensions. It’s not the strongest that survive, it’s the ones that are the most responsive to change. What is evolutionary software architecture.

Evolutionary architecture is a meta-architecture, a way of thinking about software in evolutionary terms. A guide, a first derivative, dictating design principles that promote change as a first citizen. Here is Neal Ford’s, Rebecca Parson’s, and Patrick Kua’s definition in their book “Building Evolutionary Architectures” which we’ll dissect.

An evolutionary architecture supports guided, incremental change across multiple dimensions.

  • Multiple dimensions

There are no separate systems. The world is a continuum. Where to draw a boundary around a system depends on the purpose of the discussion. — Donella H. Meadows

While the agile methodology is only concerned with people and processes, evolutionary architecture encompasses the whole spectrum including the technical, the data, the domain, the security, the organizational, and the operational aspects. We want different perspectives, and all evolvable, those are our dimensions. The evolutionary mindset should surround it all in a holistic view of software systems. For this, we add a new requirement, an “-ility”, we call the evolvability of a dimension. This will help measure how easily change in a dimension can evolve the architecture — easily be included in the dynamic equilibrium.
For example, the big ball of mud architecture, with its extreme coupling and architectural rotting, has a dimension of evolvability of 0 because any change in any dimension is daunting.
The layered architecture has a one-dimensional structural evolvability because change at one layer ripples only through the lower one. However, the domain dimension evolvability is often 0 when domain concepts are smeared and coupled across layer boundaries, thus a domain change requires major refactoring and ripples through all the layers.
The microservice style of architecture, that hinges on the post-devops and agile revolution, has a structural and domain dimension evolvability of n, n being the number of isolated services running. Each service in a microservice architecture represents a domain bounded context, which can be changed independently of the others because of its boundary. In the world of evolutionary architecture we call such disjunct piece a quantum. An architectural quantum is an independently deployable component with high functional cohesion, which includes all the structural elements required for the system to function properly. In a monolith architecture, the whole monolith is the quantum. However, from a temporal coupling perspective dimension, transaction may resonate through multiple services in a microservice architecture, and thus have an evolvability in the transactional dimension of 0.

  • Incremental change

It is not enough to have a measure of how easy change can be applied, we also need to continually and incrementally do it. This applies both to how teams build software, such as the agile methodology, and how the software is deployed, things such as continuous integration, continuous delivery, and continuous verification/validation.
These rely on good devops practices that let you take back control in complex systems, such as automated deployment pipelines, automated machine provisioning, good monitoring, gradual migration of new services by controlling routes, using database migration tools, using chaos engineering to facilitate the management of services, and more.

  • Guided Change

We can experiment without hassle, trivially and reversibly, with evolvability across multiple dimension and incremental change. But to start the evolutionary process this is what we need: a guide that will push, using experiments as the main stressors, the architecture in the direction we want. We call this selector an evolutionary fitness function, similar to the language used in genetic algorithms for the optimization function.

An Architectural fitness function provides an objective integrity assessment of some architectural characteristic(s).

Fitness functions are metrics that can cover one or multiple dimensions we care about and want to optimize. There’s a wide range of such functions, and this is where the evolutionary architecture shines, it encourages testing, hypothesis, and gathering data in all manner possible to see how these metrics evolve, and the software along with it. Experimentation and hypothesis-driven development are some superpowers that evolutionary architectures deliver.
This isn’t limited to the usual test units and static analysis but extends way beyond simple code quality metrics. These could be automated or not, global or not, continuous or not, dynamic or not, domain specific or not, etc. Let’s mention interesting techniques that can be used for experimentation and that are now facilitated.

  • A/B testing.
  • Canary Releases aka phased rollout.
  • TDD to find emergent design.
  • Security as code, especially in the deployment pipeline
  • Architecture as code, also in the deployment pipeline with test framework such as ArchUnit.
  • Licenses as code, surprisingly this works too.
  • Test in production: through instrumentation and metrics, or direct interaction with users.
  • Feature flags/feature toggles, to toggle behavior on and off.
  • Chaos engineering, for example using the simian army as a continuous fitness function. “The facilitation of experiments to uncover systemic weakness”.
  • Social code analysis to find hotspot in code.
  • Github scientist, to test hypothesis in production while keeping normal behavior.
    • Decides whether to run or not the try block.
    • Randomizes the order in which use and try blocks are run.
    • Measures the durations of all behaviors.
    • Compares the result of try to the result of use.
    • Swallows (but records) any exceptions raised in the try.
    • Publishes all the information.

The benefit of all the experimentation are soon seen, creating a real interactive feedback loop with users, a buffet of options. The dynamic equilibrium takes care of itself and there are fewer surprises.

This is enabled by the team building the evolutionary architecture. Like with DDD, Conway’s law applies, the shape of the organization is directly reflected in the software — You can’t affect the architecture without affecting the people that build it.

So far, we’ve seen that such team should embrace devops and agile development, that’s a given. Additionally, the team should itself be a cocoon for evolution and experimentation. By making it cross-functional, that is every role and expertise should be found in it, and responsible for a single project, we remove the bottlenecks in the organization. We need a team that resembles our architectural quantum.
A small team, one that can be fed by two pizzas — a two-pizzas team — avoids the separation between who decides what needs to be done and who decides how it’s going to be done. Everyone is there and decides together. We talk of teams in charge of products rather than projects.
The size of the team also allows information to flow seamlessly. All can share the architectural and domain knowledge. Methods that can be used are the usual documentation, architectural decision records, pair programming, and even mob programming.

As nice as it is to have teams that are single working units taking the best decisions for their projects, it’s also important to limit their boundaries. Many companies prefer giving loose recommendations about the software stacks teams can use instead of letting them have their own silos of specialized and centralized knowledge. Again, we face the dynamic equilibrium but this time at the team level. The parallel in enterprise architecture is called the “classic alternatives” strategy.
Human governance in these teams shouldn’t be restrictive because it would make it hard to move. However, the teams are guided by their own fitness functions, an automatic architectural governance. The continuous verifications in the delivery pipeline act as the guard-rail mechanism.

There are two big principles that should be kept in mind while applying all the above: “last responsible moment” and “bring the pain forward”. Together, they have the effect of making team members less hesitant and more prone to experiment.

The last responsible moment, an idea from the LEAN methodology, is about postponing decisions that are not immediately required to find the time to gather as much information as possible to let the best possible choice emerge.
This is especially useful when it comes to structural designs and technological decisions, as insights and clear contexts appear late. It helps avoid the potential cost and technical debts of useless abstractions and vendor-locking the code to frameworks. That is in direct opposition to the classical way of doing software architecture where those decisions are taken upfront.

What to remember when taking decisions

Bringing the pain forward, an idea inspired from the extreme programming methodology, is about facing difficult, long, painful tasks instead of postponing them. The more often we encounter them, the more we’ll know their ins-and-outs, and the more we’ll be incited to automate the pain away. It’s the dynamic equilibrium of pain vs time, the pain increases exponentially if we wait.
This is why it’s encouraged to do things like test in production, apply techniques of chaos engineering, rebooting often, garbage collecting services, merging code often, using database migration tools, etc. Eventually, the known-unknowns become known-knowns, the common predictable pain is gone.

In a world where software keeps getting complex, building evolutionary architectures leads into the topic of building robust, resilient, adaptive, and rugged systems. Software is now an intimate part of our lives, and we rely on it, it has real world effects. We could get inspired by the aerospace world and take a look at the checklist manifesto, or we could embrace statelessness and a throwable/disposable architecture (disposable software, erase your darlings), or maybe go the way of the flexible reactive architectures with their self-healing mechanisms. Anything is possible.

In this post, I’ve given my overview of the way I perceive evolutionary software architecture and its place in software architecture as a whole. It is clearly a step forward from the typical static view of architecture and offers a novel and organic approach, as the name implies. None of what is described is necessarily novel but putting all these methods and thinking together is. If you want an in-depth explanation, you can take a look at the O’Reilly book “Building Evolutionary Software” by Neal Ford, Rebecca Parsons, and Patrick Kua. I hope this article kick-starts your journey.


Frederic Cambus (fcambus)

OpenBSD framebuffer console and custom color palettes June 06, 2020 04:33 PM

On framebuffer consoles, OpenBSD uses the rasops(9) subsystem, which was imported from NetBSD in March 2001.

The RGB values for the ANSI color palette in rasops have been chosen to match the ones in Open Firmware, and are different from those in the VGA text mode color palette.

Rasops palette:

Rasops palette

VGA text mode palette:

VGA text mode palette

As one can see, the difference is quite significant, and decades of exposure to MS-DOS and Linux consoles makes it quite difficult to adapt to a different palette.

RGB values for the ANSI color palette are defined in sys/dev/rasops/rasops.c, and here are the proper ones to use to match the VGA text mode palette:

#define	NORMAL_BLACK	0x000000
#define	NORMAL_RED	0xaa0000
#define	NORMAL_GREEN	0x00aa00
#define	NORMAL_BROWN	0xaa5500
#define	NORMAL_BLUE	0x0000aa
#define	NORMAL_MAGENTA	0xaa00aa
#define	NORMAL_CYAN	0x00aaaa
#define	NORMAL_WHITE	0xaaaaaa

#define	HILITE_BLACK	0x555555
#define	HILITE_RED	0xff5555
#define	HILITE_GREEN	0x55ff55
#define	HILITE_BROWN	0xffff55
#define	HILITE_BLUE	0x5555ff
#define	HILITE_MAGENTA	0xff55ff
#define	HILITE_CYAN	0x55ffff
#define	HILITE_WHITE	0xffffff

And here is a diff doing just that, which I sent to tech@ back in January 2017.

June 05, 2020

Stjepan Golemac (stjepangolemac)

Building an easy on the eyes IKEA style blog, in no time, for free, again June 05, 2020 05:58 PM

Desktop after workPhoto by Luca Bravo on Unsplash

It’s been over a year since I bought a new domain and redesigned my blog. I was planning to write about many things, I set up a nice Markdown editor on my Mac, everything was ready to go. Then I did nothing, for a year. 😴

During this lockdown, I felt the urge to do it all over again but I successfully resisted it. Then I caved in one evening and rebuilt everything. This time it’s much simpler and I’m much more pleased with the tech stack.

The future is static! 🚀 Keep reading here.

If you’re wondering why I’m not writing on Medium anymore I wrote a post about that too. Check it out.

June 04, 2020

Aaron Bieber (qbit)

OpenSSH - Configuring FIDO2 Resident Keys June 04, 2020 11:43 PM

Table of Contents

  1. The Setup
  2. Creating keys
    1. Generating the non-resident handle
    2. Generating the resident handle
  3. Using the token
    1. Resident
      1. Transient usage with ssh-add
      2. Permanent usage with ssh-agent
    2. Non-resident

The Setup

If you haven’t heard, OpenSSH recently ([2020-02-14 Fri]) gained support for FIDO2/U2F hardware authenticators like the YubiKey 5!

This allows one to log into remote hosts with the touch of a button and it makes me feel like I am living in the future!

Some of these hardware tokens even support multiple slots, allowing one to have multiple keys!

On top of all that, the tokens can do “resident” and “non-resident” keys. “Resident” means that the key is effectively retrievable from the token (it doesn’t actually get the key - it’s a handle that lets one use the hardware key on the device).

This got me thinking about how I could use a single token (with two keys) to access the various machines I use.

In my use case, I have two types of machines I want to connect to:

  • greater security: machines I want to grant access to from a very select number of devices.

The greater key will require me to copy the “key handle” to the machines I want to use it from.

  • lesser security: machines I want to access from devices that may not be as secure.

The lesser key will be “resident” to the YubiKey. This means it can be downloaded from the YubiKey itself. Because of this, it should be trusted a bit less.

Creating keys

When creating FIDO keys (really they are key handles) one needs to explicitly tell the tool being used that it needs to pick the next slot. Otherwise generating the second key will clobber the first!

Generating the non-resident handle

greater will require me to send the ~/.ssh/ed25519_sk_greater handle to the various hosts I want to use it from.

We will be using ssh-keygen to create our resident key.

ssh-keygen -t ed25519-sk -Oapplication=ssh:greater -f ~/.ssh/ed25519_sk_greater

Generating the resident handle

Because resident keys allow for the handle to be downloaded from the token, I have changed the PIN on my token. The PIN is the only defense against a stolen key. Note: the PIN can be a full passphrase!

Again via ssh-keygen.

ssh-keygen -t ed25519-sk -Oresident -Oapplication=ssh:lesser -f ~/.ssh/ed25519_sk_lesser

Using the token


The resident key can be used by adding it to ssh-agent or by downloading the handle / public key using ssh-keygen:

Transient usage with ssh-add

ssh-add -K

This will prompt for the PIN (which should be set as it’s the only defense against a stolen key!)

No handle files will be placed on the machine you run this on. Handy for machines you want to ssh from but don’t fully trust.

Permanent usage with ssh-agent

ssh-keygen -K

This will also prompt for the PIN, however, it will create the private key handle and corresponding public key and place them in $CWD.


The non-resident key will only work from hosts that have the handle (in our case ~/.ssh/ed25519_sk_greater). As such, the handle must be copied to the machines you want to allow access from.

Once the handle is in place, you can specify it’s usage in ~/.ssh/config:

Host secretsauce
    IdentityFile ~/.ssh/ed25519_sk_greater

June 02, 2020

Jeremy Morgan (JeremyMorgan)

How to Install Go on the Raspberry Pi June 02, 2020 06:20 PM

If you want to install Go on your Raspberry Pi you have a few options. In the past there was a lot of cross compiling and hacking to get it done, but now you can install it through Apt. However, you’re likely to find an older version. For instance at the time of this writing, an updated Raspberry Pi OS shows a version of 1.11.1 in the repositories. However the current version is 1.

Dan Luu (dl)

Finding the Story June 02, 2020 07:05 AM

This is an archive of an old pseudonymously written post from the 90s that seems to have disappeared from the internet.

I see that Star Trek: Voyager has added a new character, a Borg. (From the photos, I also see that they're still breeding women for breast size in the 24th century.) What ticked me off was the producer's comment (I'm paraphrasing), "The addition of Seven of Nine will give us limitless story possibilities."

Uh-huh. Riiiiiight.

Look, they did't recognize the stories they had. I watched the first few episodes of Voyager and quit when my bullshit meter when off the scale. (Maybe that's not fair, to judge them by only a few episodes. But it's not fair to subject me to crap like the holographic lungs, either.)

For those of you who don't watch Star Trek: Voyager, the premise is that the Voyager, sort of a space corvette, gets transported umpteen zillions of light years from where it should be. It will take over seventy years at top speed for them to get home to their loved ones. For reasons we needn't go into here, the crew consists of a mix of loyal Federation members and rebels.

On paper, this looks good. There's an uneasy alliance in the crew, there's exploration as they try to get home, there's the whole "island in space" routine. And the Voyager is nowhere near as big as the Enterprise -- it's not mentally healthy for people to stay aboard for that long.

But can this idea actually sustain a whole series? Would it be interesting to watch five years of "the crew bickers" or "they find a new clue to faster interstellar travel but it falls through"? I don't think so.

(And, in fact, the crew settled down awfully quickly.)

The demands of series television subvert the premise. The basic demand of series television is that our regular characters are people we come to know and to care about -- we want them to come into our living rooms every week. We must care about their changes, their needs, their desires. We must worry when they're put in jeopardy. But we know it's a series, so it's hard to make us worry. We know that the characters will be back next week.

The demands of a story require someone to change of their own accord, to recognize some difference. The need to change can be imposed from without, but the actual change must be self-motivated. (This is the fundamental paradox of series television: the only character allowed to change is a guest, but the instrument of that change has to be a series regular, therefore depriving both characters of the chance to do something interesting.)

Series with strict continuity of episodes (episode 2 must follow episode 1) allow change -- but they're harder to sell in syndication after the show goes off the air. Economics favour unchanging regular characters.

Some series -- such as Hill Street Blues -- get around the jeopardy problem by actually making characters disposable. Some characters show up for a few episodes and then die, reminding us that it could happen to the regulars, too. Sometimes it does happen to the regulars.

(When the characters change in the pilot, there may be a problem. A writer who was approached to work on Mary Tyler Moore's last series saw from the premise that it would be brilliant for six episodes and then had noplace to go. The first Fox series starring Tea Leoni, Flying Blind, had a very funny pilot and set up an untenable situation.)

I'm told the only interesting character on Voyager has been the doctor, who can change. He's the only character allowed to grow.

The first problem with Voyager, then, is that characters aren't allowed to change -- or the change is imposed from outside. (By the way, an imposed change is a great way to start a story. The character then fights it, and that's interesting. It's a terrible way to end a story.)

The second problem is that they don't make use of the elements they have. Let's go back to the first season. There was an episode in which there's a traitor on board who is as smart as Janeway herself. (How psychiatric testing missed this, I don't know, but the Trek universe has never had really good luck with psychiatry.) After leading Janeway by the nose for fifty minutes, she figures out who it is, and confronts him. He says yes -- and beams off the ship, having conveniently made a deal with the locals.

Perfect for series television. We've got a supposedly intelligent villain out there who could come back and Janeway's been given a run for her money -- except that I felt cheated. Where's the story? Where's the resolution?

Here's what I think they should have done. It's not traditional series television, but I think it would have been better stories.

First of all, the episode ends when Janeway confronts the bad guy and arrests him. He's put in the brig -- and stays there. The viewer gets some sense of victory here.

But now there's someone as smart as Janeway in the brig. Suddenly we've set up Silence of the Lambs. (I don't mind stealing if I steal from good sources.) Whenever a problem is big enough, Janeway has this option: she can go to the brig and try and make a deal with the bad guy. "The ship dies, you die." Not only that, here's someone on board ship with whom she has a unique relationship -- one not formally bounded by rank. What does the bad guy really want?

And whenever Janeway's feeling low, he can taunt her. "By the way, I thought of a way to get everyone home in one-tenth the time. Have you, Captain?"

You wouldn't put him in every episode. But any time you need that extra push, he's there. Remember, we can have him escape any time we want, through the same sleight used in the original episode.

Furthermore, it's one thing to catch him; it's another thing to keep him there. You can generate another entire episode out of an escape attempt by the prisoner. But that would be an intermediate thing. Let's talk about the finish I would have liked to have seen.

Let's invent a crisis. The balonium generator explodes; we're deep in warp space; our crack engineering crew has jury-rigged a repair to the sensors and found a Class M planet that might do for the repairs. Except it's just too far away. The margin is tight -- but can't be done. There are two too many people on board ship. Each requires a certain amount of food, air, water, etc. Under pressure, Neelix admits that his people can go into suspended animation, so he does. The doctor tries heroically but the engineer who was tending the balonium generator dies. (Hmmm. Power's low. The doctor can only be revived at certain critical moments.) Looks good -- but they were using air until they died; one more crew member must die for the rest to live.

And somebody remembers the guy in the brig. "The question of his guilt," says Tuvok, "is resolved. The authority of the Captain is absolute. You are within your rights to hold a summary court martial and sentence him to death."

And Janeway says no. "The Federation doesn't do that."

Except that everyone will die if she doesn't. The pressure is on Janeway, now. Janeway being Janeway, she's looking for a technological fix. "Find an answer, dammit!" And the deadline is coming up. After a certain point, the prisoner has to die, along with someone else.

A crewmember volunteers to die (a regular). Before Janeway can accept, yet another (regular) crewmember volunteers, and Janeway is forced to decide. -- And Tuvok points out that while morally it's defensible if that member volunteered to die, the ship cannot continue without either of those crewmembers. It can continue without the prisoner. Clearly the prisoner is not worth as much as those crewmembers, but she is the captain. She must make this decision.

Our fearless engineering crew thinks they might have a solution, but it will use nearly everything they've got, and they need another six hours to work on the feasibility. Someone in the crew tries to resolve the problem for her by offing the prisoner -- the failure uses up more valuable power. Now the deadline moves up closer, past the six hours deadline. The engineering crew's idea is no longer feasible.

For his part, the prisoner is now bargaining. He says he's got ideas to help. Does he? He's tried to destroy the ship before. And he won't reveal them until he gets a full pardon.

(This is all basic plotting: keep piling on difficulties. Put a carrot in front of the characters, keep jerking it away.)

The tricky part is the ending. It's a requirement that the ending derive logically from what has gone before. If you're going to invoke a technological fix, you have to set the groundwork for it in the first half of the show. Otherwise it's technobabble. It's deus ex machina. (Any time someone says just after the last commercial break, "Of course! If we vorpalize the antibogon flow, we're okay!" I want to smack a writer in the head.)

Given the situation set up here, we have three possible endings:

  • Some member of the crew tries to solve the problem by sacrificing themselves. (Remember, McCoy and Spock did this.) This is a weak solution (unless Janeway does it) because it takes the focus off Janeway's decision.
  • Janeway strikes a deal with the prisoner, and together they come up with a solution (which doesn't involve the antibogon flow). This has the interesting repercussions of granting the prisoner his freedom -- while everyone else on ship hates his guts. Grist for another episode, anyway.
  • Janeway kills the prisoner but refuses to hold the court martial. She may luck out -- the prisoner might survive; that million-to-one-shot they've been praying for but couldn't rely on comes through -- but she has decided to kill the prisoner rather than her crew.

My preferred ending is the third one, even though the prisoner need not die. The decision we've set up is a difficult one, and it is meaningful. It is a command decision. Whether she ends up killing the prisoner is not relevant; what is relevant is that she decides to do it.

John Gallishaw once categorized all stories as either stories of achievement or of decision. A decision story is much harder to write, because both choices have to matter.

June 01, 2020

Nikita Voloboev (nikivi)

Carlos Fenollosa (carlesfe)

Seven years later, I bought a new Macbook. For the first time, I don't love it June 01, 2020 02:31 PM

The 2013 Macbook Air is the best computer I have ever owned. My wish has always been that Apple did nothing more than update the CPU and the screen, touching nothing else. I was afraid the day of upgrading my laptop would come.

But it came.

My Air was working flawlessly, if only unbearably slow when under load. Let me dig a bit deeper into this problem, because this is not just the result of using old hardware.

When video conferencing or under high stress like running multiple VMs the system would miss key presses or mouse clicks. I'm not saying that the system was laggy, which it was, and it is expected. Rather, that I would type the word "macbook" and the system would register "mok", for example. Or I would start a dragging event, the MouseUp never registered but the MouseMove continued working, so I ended up flailing an icon around the screen or moving a window to some unexpected place.

This is mostly macOS's fault. I own a contemporary x230 with similar specs running Linux and it doesn't suffer from this issue. Look, I was a computer user in the 90s and I perfectly understand that an old computer will be slow to the point of freezing, but losing random input events is a serious bug on a modern multitasking system.

Point #1: My old computer became unusable due to macOS, not hardware, issues.


As I mentioned, I had been holding on my purchase due to the terrible product lineup that Apple held from 2016 to 2019. Then Apple atoned, and things changed with the 2019 16". Since I prefer smaller footprints, I decided that I would buy the 13" they updated next.

So here I am, with my 2020 Macbook Pro, i5, 16 GB RAM, 1 TB SSD. But I can't bring myself to love it like I loved my 2013 Air.

Let me explain why. Maybe I can bring in a fresh perspective.

Most reviewers evaluate the 2020 lineup with the 2016-2019 versions in mind. But I'm just some random person, not a reviewer. I have not had the chance to even touch any Mac since 2015. I am not conditioned towards a positive judgement just because the previous generation was so much worse.

Of course the new ones are better. But the true test is to compare them to the best laptops ever made: 2013-2015 Airs and Pros.

Point #2: this computer is not a net win from a 2013 Air.

Let me explain the reasons why.

The webcam

You will see the webcam reviewed as an afterthought in most pieces. I will cover it first. I feel like Apple is mocking us by including the worst possible webcam on the most expensive laptop.

Traditionally, this has been a non-issue for most people. However, due to covid-19 and working from home, this topic has become more prominent.

In my case, even before the pandemic I used to do 2-3 video conferences every day. Nowadays I spend the day in front of my webcam.

What infuriates me is that the camera quality in the 2013 Air is noticeably better. Why couldn't they use at least the same part, if not a modern one?

See for yourself. It really feels like a ripoff. Apple laughing at us.

A terrible quality picture from the macbook pro webcam
The 2020 macbook pro webcam looks horrible, and believe me, it is not only due to Yours Truly's face.

A reasonable quality picture from the 2013 Air
A reasonable quality picture from the 2013 Air

For reference, this is the front facing camera of the 2016 iPhone SE
For reference, this is the front facing camera of the 2016 iPhone SE, same angle and lighting conditions.

For reference, a picture taken with my 2006 Nokia 5200
As a second reference, a picture taken with the 640x480 VGA camera of my 2006 Nokia 5200. Which of the above looks the most like this?

I would have paid extra money to have a better webcam on my macbook.

The trackpad

The mechanism and tracking is excellent, but the trackpad itself is too large and the palm rejection algoritm is not good enough.

Point #3: The large trackpad single-handedly ruins using the experience of working on this laptop for me.

I am constantly moving the cursor accidentally. This situation is very annoying, especially for a touch typist as my fingers are always on hjkl and my thumb on the spacebar. This makes my thumb knuckle constantly brush the trackpad and activate it.

I really, really need to fix this, because I have found myself unconsciously raising my palms and placing them at a different angle. This may lead to RSI, which I have suffered from in the past.

This is a problem that Apple created on their own. Having an imperfect palm rejection algorithm is not an issue unless you irrationally enlarge the trackpad so much that it extends to the area where the palm of touch typists typically rests.

Video: Nobody uses the trackpad like this
Is it worth it to antagonize touch typists in order to be able to move the cursor from this tiny corner?

I would accept this tradeoff if the trackpad was Pencil-compatible and we could use it as some sort of handwriting tablet. That would actually be great!

Another very annoying side effect of it being so large is that, when your laptop is in your lap, sometimes your clothes accidentally brush the trackpad. The software then registers spurious movements or prevents some gestures from happening because it thinks there is a finger there.

In summary, it's too big for no reason, which turns it into an annoyance for no benefit. This trackpad offers a bad user experience, not only that, it also ruins the keyboard—read below.

I would have paid extra money to have a smaller trackpad on my macbook.

The keyboard

The 2015 keyboard was very good, and this one is better. The keyswitch mechanism is fantastic, the layout is perfect, and this is probably the best keyboard on a laptop.

Personally, I did not mind the Escape key shenanigans because I remapped it to dual Ctrl/Escape years ago, which I recommend you do too.

Touch ID is nice, even though I'm proficient at typing my password, so it was not such a big deal for me. Face ID would have been much more convenient, I envy Windows Hello users.

Unfortunately, the large trackpad torpedoes the typing experience. Writing on this Macbook Pro is worse than on my 2013 Air.

I will keep searching for a tool which disables trackpad input within X miliseconds of a key press or disables some areas of the trackpad. I have not had any luck with neither Karabiner nor BetterTouchTool.

The Touchbar

After having read mostly negative feedback about it, I was determined to drill myself to like it, you know, just to be a bit contrarian.

"I will use tools to customize it so much that it will be awesome as a per-application custom function layer!"

Unfortunately, the critics are right. It's an anti-feature. I gave it an honest try, I swear. It is just bad, though it could have been better with a bit more effort.

I understand why it's there. Regular users probably find it useful and cute. It's ironically, a feature present in pro laptops meant for non-pro users: slow typists and people who don't know the regular keyboard shortcuts.

That being said, I would not mind it, probably would even like it, if it weren't for three major drawbacks:

First and foremost, it is distracting to the point that the first thing I did was to search how to completely turn it off.

This is because, by default, it offers typing suggestions. Yes, while you are typing and trying to concentrate, there is something in your field of vision constantly flashing words that you didn't mean to type and derailing your train of thought.

Easy to fix, but it makes me wonder what were Apple product managers thinking.

Secondly, it is placed in such a way that resting your fingers on top of the keyboard trigger accidental key presses.

I can and will retrain my hand placement habits. After all, this touchbar-keyboard-trackpad combo is forcing many people to learn to place their hands in unnatural positions to accommodate these poorly designed peripherals.

However, Apple could have mitigated this by implementing a pressure sensor to make it more difficult to generate involuntary key presses. It would be enough to distinguish a brush from a tap.

Finally, and this is also ironic because it's in contradiction with the previous point, due to lack of feedback, sometimes you're not sure whether you successfully pressed a touchbar key. And, in my experience, there is an unjustifiable large number of times where you have to press them twice, or press very deliberately to activate that key you want.

There are some redeeming features, though.

As stated above, I am determined to make it bearable, and even slightly useful for me, by heavily modifying it. I suggest you go to System Preferences > Keyboard and use the "Expanded Control Strip".

Then, customize the touchbar buttons, remove keys you don't use, and add others. Consider paying for BetterTouchTool for even more customization options.

Then, on the same window, go to the Shortcuts tab, and select Function keys on the left. This allows you to use function keys by default in some apps, which is useful for Terminal and other pro apps like Pycharm.

(Get the third irony? To make the touchbar, a pro feature, useful for pro apps, the best setup is to make it behave like normal function keys)

Finally, if you're registering accidental key presses, just leave an empty space in the touchbar to let your fingers rest safely until you re-train your hands to rest somewhere else. This is ridiculous, but hey, better than getting your brightness suddenly dimming to zero accidentaly.

Leave an empty space in the touchbar
Leave an empty space in the touchbar on the area where you are used to rest your fingers.

I would have paid extra money to not have a touchbar on my macbook.

The ports

Another much-debated feature where I resigned myself to just accept this new era of USB-C.

I did some research online and bought the "best" USB-C hub, along with new dongles. I don't mind dongles, because I was already using some with my Air. It's not like I swim in money, but there is no need to blow this out of proportion.

Well, I won't point any fingers to any review site, but that "best" hub is going back to Amazon as I write these lines. Some of my peripherals disconnect randomly, plus I get an "electric arc" noise when I disconnect the hub cable. I don't know how that is even possible.

The USB-C situation is terrible. Newly bought peripherals still come with USB-A cables. Regarding hubs, it took me a few years to find a reliable USB3 hub for my 2013 Air. I will keep trying, wish me luck.

About Magsafe, even though I really liked it, I don't miss it as much as I expected. I do miss the charging light, though. No reason not to have it integrated in the official cable, like the XPS does.

Some people say that charging via USB-C is actually better due to standardization of all devices, but I don't know what periperals these people use. My iPhone and Airpods charge via Lightning, my Apple Watch charges via a puck, and other minor peripherals like cameras and external batteries all charge via micro-USB. Now I have to carry the same amount of cables as before, I just swapped the Magsafe cable and charger for the USB-C cable and charger.

Another poorly thought decision is the headphone jack. It is on the wrong side. Most of the population is right-handed, so there usually is a notebook, mouse, or other stuff to the right of the laptop. The headphones cable then gets in the way. The port should have been on the left, and close to the user, not far away from them, to gain a few extra centimeters to the cable.

By the way, not including the extension cord is unacceptable. This cord is not only a convenience, but it increases safety, because it's the only way to have earth grounding for the laptop. Without it, rubbing your fingers on the surface of the computer generates this weird vibration due to current. I have always recommended Mac users that they use their chargers with the extension cable even if they don't need the extra length.

I would have paid extra money to purchase an Apple-guaranteed proper USB-C hub. Alternatively, I would have paid extra money for this machine to have a couple of USB-A ports so I can keep using my trusty old hub.

I would not have paid extra money to have the extension cord, because it should have come included with this 2,200€ laptop. I am at a loss for words. Enough of paying extra money for things that Apple broke on purpose.

Battery life

8-9 hours with all apps closed except Safari. Browsing lightly, with an occasional video, and brightness at the literal minimum. This brightness level is only realistic if it's night time. In a normally lit environment you need to set the brightness level at around 50%.

It's not that great. My Air, when it was new, easily got 12 hours of light browsing. Of course, it was not running Catalina, but come on.

When I push the laptop a bit more, with a few Docker containers, Pycharm running, Google Chrome with some Docs opened, and brightness near the maximum, I get around 4 hours. In comparison, that figure is reasonable.

Overall, it's not bad, but I expected more.

While we wait for a Low Power Mode on the mac, do yourself a favor and install Turbo Boost Switcher Pro.

The screen

Coming from having never used a Retina screen on a computer, this Macbook Pro impressed me.

Since I don't edit photos or videos professionally, I can only appreciate it for its very crisp text. The rest of features are lost on me, but this does not devalue my opinion of the screen.

The 500-nit brightness is not noticeable on a real test with my 2013 Air. For some reason, both screens seem equally bright when used in direct daylight.

This new Retina technology comes with a few drawbacks, though.

First, it's impossible to get a terminal screen without anti-aliasing. My favorite font, IBM VGA8, is unreadable when anti aliased, which is a real shame, because I've been using it since the 90s, and I prefer non-anti-aliased fonts on terminals.

Additionally, many pictures on websites appear blurry because they are not "retina-optimized". The same happens with some old applications which display crappy icons or improperly proportioned layouts. This is not Apple's fault, but it affects the user experience.

Finally, the bezels are not tiny like those in the XPS 13, but they are acceptable. I don't mind them.

To summarize, I really like this screen, but like everything else in this machine, it is not a net gain. You win some, you lose some.


This is the reason why I had to switch from my old laptop, and the 2020 MBP delivers.

It allows me to perform tasks that were very painful in my old computer. Everything is approximately three times faster than it was before, which really is a wow experience, like upgrading your computer in the 90s.

Not much to add. This is a modern computer and, as such, it is fast.

Build quality

Legendary, as usual.

To nitpick on a minor issue, I'd like Apple to make the palm rest area edges a bit less sharp. After typing for some time I get pressure marks on my wrists. They are not painful, but definitely discomforting.

Likewise, then typing on my lap, especially when wearing sports shorts in summer like I'm doing right now, the chassis also leaves marks on my legs near the hinge area. Could have been reduced by blunting the edges too.

One Thousand Papercuts

In terms of software, Apple also needs to get its stuff together.

Catalina is meh. Not terrible, but with just too many annoyances.

  • Mail keeps opening by itself while I'm doing video conferences and sharing my screen. I have to remind myself to close Mail before any video conference, because if I don't, other people will read my inbox. It's ridiculous that this bug has not been fixed yet. Do you remember when Apple mocked Microsoft because random alert windows would steal your focus while you were typing? This is 100x worse.
  • My profile picture appears squished on the login screen, and there is no way to fix it. The proportions are correctly displayed on the iCloud settings window.
  • Sometimes, after resuming from sleep, the laptop doesn't detect its own keyboard. I can assure you, the keyboard was there indeed, and note how the dock is still the default one. This happened to me minutes after setting up the computer for the first time, before I had any chance to install software or change any settings.
  • I get constant alerts to re-enter my password for some internet account, but my password is correct. Apple's services need to differentiate a timeout from a rejected password, or maybe retry a couple times before prompting.
  • Critical software I used doesn't run anymore and I have to look for alternatives. This includes Safari 13 breaking extensions that were important for me. Again, I was prepared for this, but it's worth mentioning.

Praise worthy

Here are a few things that Apple did really well and don't fit into any other category.

  • has "solved" the photos problem. It is that great. As a person who has 50k photos in their library, going back to pictures of their great grandparents: Thank you, Apple!
  • Continuity features have been adding up, and the experience is now outstanding. The same goes for iCloud. If you have an iPhone and a Mac, things are magical.
  • Fan and thermal configuration is very well crafted on this laptop. It runs totally silent, and when the fans kick off, the system cools down very quickly and goes back to silent again.
  • The speakers are crisp and they have very nice bass. They don't sound like a tin can like most laptops, including the 2013 Air, do.


This computer is bittersweet.

I'm happy that I can finally perform tasks which were severely limited on my previous laptop. But this has nothing to do with the design of the product, it is just due to the fact that the internals are more modern.

Maybe loving your work tools is a privilege that only computer nerds have. Do taxi drivers love their cars? Do baristas love their coffee machines? Do gardeners love their leaf blowers? Do surgeons love their scalpels?

Yes, I have always loved my computer. Why wouldn't I? We developers spend at least eight hours a day touching and looking at our silicon partners. We earn our daily bread thanks to them. This is why we chose our computers carefully with these considerations in mind, why we are so scrupulous when evaluating them.

This is why it's so disappointing that this essential tool comes with so many tradeoffs.

Even though this review was exhaustive, don't get me wrong, most annoyances are minor except for the one deal-breaker: the typing experience. I have written this review with the laptop keyboard and it's been a continuous annoyance. Look, another irony. Apple suffered so much to fix their keyboard, yet it's still ruined by a comically large trackpad. The forest for the trees.

Point #4: For the first time since using Macs, I do not love this machine.

Going back to what "Pro" means

Apple engineers, do you know who is the target audience for these machines?

This laptop has been designed for casual users, not pro users. Regular users enjoy large trackpads and Touch Bars because they spend their day scrolling through Twitter and typing short sentences.

Do you know who doesn't, because it gets in the way of them typing their essays, source code, or inputting their Photoshop keyboard shortcuts? Pro users.

In 2016 I wrote:

However, in the last three to five years, everybody seemed to buy a Mac, even friends of mine who swore they would never do it. They finally caved in, not because of my advice, but because their non-nerd friends recommend MBPs. And that makes sense. In a 2011 market saturated by ultraportables, Windows 8, and laptops which break every couple years, Macs were a great investment. You can even resell them after five years for 50% of their price, essentially renting them for half price.

So what happened? Right now, not only Pros are using the Macbook Pro. They're not a professional tool anymore, they're a consumer product. Apple collects usage analytics for their machines and, I suppose, makes informed decisions, like removing less used ports or not increasing storage on iPhones for a long time.

What if Apple is being fed overwhelmingly non-Pro user data for their Pro machines and, as a consequence, their decisions don't serve Pro users anymore, but rather the general public?

The final irony: Apple uses "Pro" in their product marketing as a synonymous for "the more expensive tier", and they are believing their own lies. Their success with consumer products is fogging their understanding of what a real Pro needs.

We don't need a touchbar that we have to disable for Pro apps.

We don't need a large trackpad that gets in the way of typing.

We need more diverse ports to connect peripherals that don't work well with adapters.

We need a better webcam to increase productivity and enhance communication with our team.

We need that you include the effin extension cable so that there is no current on the chassis.

We need you to not splash our inbox contents in front of guests while sharing our screens.

We need a method to extend the battery as long as possible while we are on the road—hoping that comes back some day.

Point #5: Apple needs to continue course-correcting their design priorities for power users

Being optimistic for the future

I have made peace with the fact that, unlike my previous computer, this one will not last me for 7 years. This was a very important factor in my purchase decision. I know this mac is just bridging a gap between the best lineup in Apple's history (2015) and what will come in the future. It was bought out of necessity, not out of desire.

14" laptop? ARM CPUs? We will be awaiting new hardware eagerly, hoping that Apple keeps rolling back some anti-features like they did with the butterfly keyboard. Maybe the Touchbar and massive trackpad will be next. And surely the laggy and unresponsive OS will have been fixed by then.

What about the alternatives?

Before we conclude I want to anticipate a question that will be in some people's mind. Why didn't you buy another laptop?

Well, prior to my purchase I spent two months trying to use a Linux setup full-time. It was close, but not 100% successful. Critical software for my job had no real alternatives, or those were too inconvenient.

Regarding Windows, I had eyes on the XPS 13 and the X1 Carbon which are extremely similar to this macbook in most regards. I spent some time checking if Windows 10 had improved since the last time I used it and it turns out it hasn't. I just hate Windows so much it is irrational. Surely some people prefer it and feel the same way about the Mac. To each their own.

Point #6: Despite its flaws, macOS is the OS that best balances convenience with productive work. When combined with an iPhone it makes for an unbeatable user experience.

I decided that purchasing this new Mac was the least undesirable option, and I still stand by that decision. I will actively try to fix the broken trackpad, which will increase my customer satisfaction from a 6 —tolerate— to an 8 or 9 —like, even enjoy—.

But that will still be far away from the perfect, loving 10/10 experience I had with the 2013 Air.

Tags: apple, hardware

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

May 31, 2020

Derek Jones (derek-jones)

Estimating in round numbers May 31, 2020 10:20 PM

People tend to use round numbers. When asked the time, the response is often rounded to the nearest 5-minute or 15-minute value, even when using a digital watch; the speaker is using what they consider to be a relevant level of accuracy.

When estimating how long it will take to perform a task, developers tend to use round numbers (based on three datasets). Giving what appears to be an overly precise value could be taken as communicating extra information, e.g., an estimate of 1-hr 3-minutes communicates a high degree of certainty (or incompetence, or making a joke). If the consumer of the estimate is working in round numbers, it makes sense to give a round number estimate.

Three large software related effort estimation datasets are now available: the SiP data contains estimates made by many people, the Renzo Pomodoro data is one person’s estimates, and now we have the Brightsquid data (via the paper “Utilizing product usage data for requirements evaluation” by Hemmati, Didar Al Alam and Carlson; I cannot find an online pdf at the moment).

The plot below shows the total number of tasks (out of the 1,945 tasks in the Brightsquid data) for which a given estimate value was recorded; peak values shown in red (code+data):

Number of tasks having a given estimate.

Why are there estimates for tasks taking less than 30 minutes? What are those 1 minute tasks (are they typos, where the second digit was omitted and the person involved simply create a new estimate without deleting the original)? How many of those estimate values appearing once are really typos, e.g., 39 instead of 30? Does the task logging system used require an estimate before anything can be done? Unfortunately I don’t have access to the people involved. It does look like this data needs some cleaning.

There are relatively few 7-hour estimates, but lots for 8-hours. I’m assuming the company works an 8-hour day (the peak at 4-hours, rather than three, adds weight to this assumption).

Ponylang (SeanTAllen)

Last Week in Pony - May 31, 2020 May 31, 2020 02:40 PM

A bunch of updates to ponylang-mode. The Pony Zulip now has a ‘jobs’ stream for posting Pony-related job opportunities. The ‘Add maybe to itertools’ RFC will be voted on in the next sync meeting.

Dan Luu (dl)

A simple way to get more value from tracing May 31, 2020 07:06 AM

A lot of people seem to think that distributed tracing isn't useful, or at least not without extreme effort that isn't worth it for companies smaller than FB. For example, here are a couple of public conversations that sound like a number of private conversations I've had. Sure, there's value somewhere, but it costs too much to unlock.

I think this overestimates how much work it is to get a lot of value from tracing. At Twitter, Rebecca Isaacs was able to lay out a vision for how to get value from tracing and executed on it (with help from a number other folks, including Jonathan Simms, Yuri Vishnevsky, Ruben Oanta, Dave Rusek, Hamdi Allam, and many others1) such that the work easily paid for itself. This post is going to describe the tracing "infrastructure" we've built and describe some use cases where we've found it to be valuable. Before we get to that, let's start with some background about the situation before Rebecca's vision came to fruition.

At a high level, we could say that we had a trace-view oriented system and ran into all of the issues that one might expect from that. Those issues are discussed in more detail in this article by Cindy Sridharan. However, I'd like to discuss the particular issues we had in more detail since I think it's useful to look at what specific things were causing problems.

Taken together, the issues were problematic enough that tracing was underowned and arguably unowned for years. Some individuals did work in their spare time to keep the lights on or improve things, but the lack of obvious value from tracing led to a vicious cycle where the high barrier to getting value out of tracing made it hard to fund organizationally, which made it hard to make tracing more usable.

Some of the issues that made tracing low ROI included:

  • Schema made it impossible to run simple queries "in place"
  • No real way to aggregate info
    • No way to find interesting or representative traces
  • Impossible to know actual sampling rate, sampling highly non-representative
  • Time


The schema was effectively a set of traces, where each trace was a set of spans and each span was a set of annotations. Each span that wasn't a root span had a pointer to its parent, so that the graph structure of a trace could be determined.

For the purposes of this post, we can think of each trace as either an external request including all sub-RPCs or a subset of a request, rooted downstream instead of at the top of the request. We also trace some things that aren't requests, like builds and git operations, but for simplicity we're going to ignore those for this post even though the techniques we'll discuss also apply to those.

Each span corresponds to an RPC and each annotation is data that a developer chose to record on a span (e.g., the size of the RPC payload, queue depth of various queues in the system at the time of the span, or GC pause time for GC pauses that interrupted the RPC).

Some issues that came out of having a schema that was a set of sets (of bags) included:

  • Executing any query that used information about the graph structure inherent in a trace required reading every span in the trace and reconstructing the graph
  • Because there was no index or summary information of per-trace information, any query on a trace required reading every span in a trace
  • Practically speaking, because the two items above are too expensive to do at query time in an ad hoc fashion, the only query people ran was some variant of "give me a few spans matching a simple filter"


Until about a year and a half ago, the only supported way to look at traces was to go to the UI, filter by a service name from a combination search box + dropdown, and then look at a list of recent traces, where you could click on any trace to get a "trace view". Each search returned the N most recent results, which wouldn't necessarily be representative of all recent results (for reasons mentioned below in the Sampling section), let alone representative of all results over any other time span.

Per the problems discussed above in the schema section, since it was too expensive to run queries across a non-trivial number of traces, it was impossible to ask questions like "are any of the traces I'm looking at representative of common traces or am I looking at weird edge cases?" or "show me traces of specific tail events, e.g., when a request from service A to service B times out or when write amplification from service A to some backing database is > 3x", or even "only show me complete traces, i.e., traces where we haven't dropped spans from the trace".

Also, if you clicked on a trace that was "too large", the query would time out and you wouldn't be able to view the trace -- this was another common side effect of the lack of any kind of rate limiting logic plus the schema.


There were multiple places where a decision was made to sample or not. There was no document that listed all of these places, making it impossible to even guess at the sampling rate without auditing all code to figure out where sampling decisions were being made.

Moreover, there were multiple places where an unintentional sampling decision would be made due to the implementation. Spans were sent from services that had tracing enabled to a local agent, then to a "collector" service, and then from the collector service to our backing DB. Spans could be dropped at of these points: in the local agent; in the collector, which would have nodes fall over and lose all of their data regularly; and at the backing DB, which would reject writes due to hot keys or high load in general.

This design where the trace id is the database key, with no intervening logic to pace out writes, meant that a 1M span trace (which we have) would cause 1M writes to the same key over a period of a few seconds. Another problem would be requests with a fanout of thousands (which exists at every tech company I've worked for), which could cause thousands writes with the same key over a period of a few milliseconds.

Another sampling quirk was that, in order to avoid missing traces that didn't start at our internal front end, there was logic that caused an independent sampling decision in every RPC. If you do the math on this, if you have a service-oriented architecture like ours and you sample at what naively might sound like a moderately low rate, like, you'll end up with the vast majority of your spans starting at a leaf RPC, resulting in a single span trace. Of the non-leaf RPCs, the vast majority will start at the 2nd level from the leaf, and so on. The vast majority of our load and our storage costs were from these virtually useless traces that started at or near a leaf, and if you wanted to do any kind of analysis across spans to understand the behavior of the entire system, you'd have to account for this sampling bias on top of accounting for all of the other independent sampling decisions.


There wasn't really any kind of adjustment for clock skew (there was something, but it attempted to do a local pairwise adjustment, which didn't really improve things and actually made it more difficult to reasonably account for clock skew).

If you just naively computed how long a span took, even using timestamps from a single host, which removes many sources of possible clock skew, you'd get a lot of negative duration spans, which is of course impossible because a result can't get returned before the request for the result is created. And if you compared times across different hosts, the results were even worse.


The solutions to these problems fall into what I think of as two buckets. For problems like dropped spans due to collector nodes falling over or the backing DB dropping requests, there's some straightforward engineering solution using well understood and widely used techniques. For that particular pair of problems, the short term bandaid was to do some GC tuning that reduced the rate of collector nodes falling over by about a factor of 100. That took all of two minutes, and then we replaced the collector nodes with a real queue that could absorb larger bursts in traffic and pace out writes to the DB. For the issue where we oversampled leaf-level spans due to rolling the sampling dice on every RPC, that's one of these little questions that most people would get right in an interview that can sometimes get lost as part of a larger system that has a number of solutions, e.g., since each span has a parent pointer, we must be able to know if an RPC has a parent or not in a relevant place and we can make a sampling decision and create a traceid iff a span has no parent pointer, which results in a uniform probability of each span being sampled, with each sampled trace being a complete trace.

The other bucket is building up datasets and tools (and adding annotations) that allow users to answer questions they might have. This isn't a new idea, section 5 of the Dapper paper discussed this and it was published in 2010.

Of course, one major difference is that Google has probably put at least two orders of magnitude more effort into building tools on top of Dapper than we've put into building tools on top of our tracing infra, so a lot of our tooling is much rougher, e.g., figure 6 from the Dapper paper shows a trace view that displays a set of relevant histograms, which makes it easy to understand the context of a trace. We haven't done the UI work for that yet, so the analogous view requires running a simple SQL query. While that's not hard, presenting the user with the data would be a better user experience than making the user query for the data.

Of the work that's been done, the simplest obviously high ROI thing we've done is build a set of tables that contain information people might want to query, structured such that common queries that don't inherently have to do a lot of work don't have to do a lot of work.

We have, partitioned by day, the following tables:

  • trace_index
    • high-level trace-level information, e.g., does the trace have a root; what is the root; if relevant, what request endpoint was hit, etc.
  • span_index
    • information on the client and server
  • anno_index
    • "standard" annotations that people often want to query, e.g., request and response payload sizes, client/server send/recv timestamps, etc.
  • span_metrics
    • computed metrics, e.g., span durations
  • flat_annotation
    • All annotations, in case you want to query something not in anno_index
  • trace_graph
    • For each trace, contains a graph representation of the trace, for use with queries that need the graph structure

Just having this set of tables, queryable with SQL queries (or a Scalding or Spark job in cases where Presto SQL isn't ideal, like when doing some graph queries) is enough for tracing to pay for itself, to go from being difficult to justify to being something that's obviously high value.

Some of the questions we've been to answer with this set of tables includes:

  • For this service that's having problems, give me a representative set of traces
  • For this service that has elevated load, show me which upstream service is causing the load
  • Give me the list of all services that have unusual write amplification to downstream service X
    • Is traffic from a particular service or for a particular endpoint causing unusual write amplification? For example, in some cases, we see nothing unusual about the total write amplification from B -> C, but we see very high amplification from B -> C when B is called by A.
  • Show me how much time we spend on serdes vs. "actual work" for various requests
  • Show me how much different kinds of requests cost in terms of backend work
  • For requests that have high latency, as determined by mobile client instrumentation, show me what happened on the backend
  • Show me the set of latency critical paths for this request endpoint (with the annotations we currently have, this has a number issues that probably deserve their own post)
  • Show me the CDF of services that this service depends on
    • This is a distribution because whether or not a particular service calls another service is data dependent; it's not uncommon to have a service that will only call another one every 1000 calls (on average)

We have built and are building other tooling, but just being able to run queries and aggregations against trace data, both recent and historical, easily pays for all of the other work we'd like to do. This analogous to what we saw when we looked at metrics data, taking data we already had and exposing it in a way that lets people run arbitrary queries immediately paid dividends. Doing that for tracing is less straightforward than doing that for metrics because the data is richer, but it's a not fundamentally different idea.

I think that having something to look at other than the raw data is also more important for tracing than it is for metrics since the metrics equivalent of a raw "trace view" of traces, a "dashboard view" of metrics where you just look at graphs, is obviously and intuitively useful. If that's all you have for metrics, people aren't going to say that it's not worth funding your metrics infra because dashboards are really useful! However, it's a lot harder to see how to get value out of a raw view of traces, which is where a lot of the comments about tracing not being valuable come from. This difference between the complexity of metrics data and tracing data makes the value add for higher-level views of tracing larger than it is for metrics.

Having our data in a format that's not just blobs in a NoSQL DB has also allowed us to more easily build tooling on top of trace data that lets users who don't want to run SQL queries get value out of our trace data. An example of this is the Service Dependency Explorer (SDE), which was primarily built by Yuri Vishnevsky, Rebecca Isaacs, and Jonathan Simms, with help from Yihong Chen. If we try to look at the RPC call graph for a single request, we get something that's pretty large. In some cases, the depth of the call tree can be hundreds of levels deep and it's also not uncommon to see a fanout of 20 or more at some levels, which makes a naive visualization difficult to interpret.

In order to see how SDE works, let's look at a smaller example where it's relatively easy to understand what's going on. Imagine we have 8 services, A through H and they call each other as shown in the tree below, we we have service A called 10 times, which calls service B a total of 10 times, which calls D, D, and E 50, 20, and 10 times respectively, where the two Ds are distinguished by being different RPC endpoints (calls) even though they're the same service, and so on, shown below:

Diagram of RPC call graph; this will implicitly described in the relevant sections, although the entire SDE section in showing off a visual tool and will probably be unsatisfying if you're just reading the alt text; the tables described in the previous section are more likely to be what you want if you want a non-visual interpretation of the data, the SDE is a kind of visualization

If we look at SDE from the standpoint of node E, we'll see the following: SDE centered on service E, showing callers and callees, direct and indirect

We can see the direct callers and callees, 100% of calls of E are from C, and 100% of calls of E also call C and that we have 20x load amplification when calling C (200/10 = 20), the same as we see if we look at the RPC tree above. If we look at indirect callees, we can see that D has a 4x load amplification (40 / 10 = 4).

If we want to see what's directly called by C downstream of E, we can select it and we'll get arrows to the direct descendents of C, which in this case is every indirect callee of E.

SDE centered on service E, with callee C highlighted

For a more complicated example, we can look at service D, which shows up in orange in our original tree, above.

In this case, our summary box reads:

  • On May 28, 2020 there were...
    • 10 total TFE-rooted traces
    • 110 total traced RPCs to D
    • 2.1 thousand total traced RPCs caused by D
    • 3 unique call paths from TFE endpoints to D endpoints

The fact that we see D three times in the tree is indicated in the summary box, where it says we have 3 unique call paths from our front end, TFE to D.

We can expand out the calls to D and, in this case, see both of the calls and what fraction of traffic is to each call.

SDE centered on service D, with different calls to D expanded by having clicked on D

If we click on one of the calls, we can see which nodes are upstream and downstream dependencies of a particular call, call4 is shown below and we can see that it never hits services C, H, and G downstream even though service D does for call3. Similarly, we can see that its upstream dependencies consist of being called directly by C, and indirectly by B and E but not A and C:

SDE centered on service D, with call4 of D highlighted by clicking on call 4; shows only upstream and downstream load that are relevant to call4

Some things we can easily see from SDE are:

  • What load a service or RPC call causes
    • Where we have unusual load amplification, whether that's generally true for a service or if it only occurs on some call paths
  • What causes load to a service or RPC call
  • Where and why we get cycles (very common for Strato, among other things
  • What's causing weird super deep traces

These are all things a user could get out of queries to the data we store, but having a tool with a UI that lets you click around in real time to explore things lowers the barrier to finding these things out.

In the example shown above, there are a small number of services, so you could get similar information out of the more commonly used sea of nodes view, where each node is a service, with some annotations on the visualization, but when we've looked at real traces, showing thousands of services and a global makes it very difficult to see what's going on. Some of Rebecca's early analyses used a view like that, but we've found that you need to have a lot of implicit knowledge to make good use of a view like that, a view that discards a lot more information and highlights a few things makes it easier to users who don't happen to have the right implicit knowledge to get value out of looking at traces.

Although we've demo'd a view of RPC count / load here, we could also display other things, like latency, errors, payload sizes, etc.


More generally, this is just a brief description of a few of the things we've built on top of the data you get if you have basic distributed tracing set up. You probably don't want to do exactly what we've done since you probably have somewhat different problems and you're very unlikely to encounter the exact set of problems that our tracing infra had. From backchannel chatter with folks at other companies, I don't think the level of problems we had was unique; if anything, our tracing infra was in a better state than at many or most peer companies (which excludes behemoths like FB/Google/Amazon) since it basically worked and people could and did use the trace view we had to debug real production issues. But, as they say, unhappy systems are unhappy in their own way.

Like our previous look at metrics analytics, this work was done incrementally. Since trace data is much richer than metrics data, a lot more time was spent doing ad hoc analyses of the data before writing the Scalding (MapReduce) jobs that produce the tables mentioned in this post, but the individual analyses were valuable enough that there wasn't really a time when this set of projects didn't pay for itself after the first few weeks it took to clean up some of the worst data quality issues and run an (extremely painful) ad hoc analysis with the existing infra.

Looking back at discussions on whether or not it makes sense to work on tracing infra, people often point to the numerous failures at various companies to justify a buy (instead of build) decision. I don't think that's exactly unreasonable, the base rate of failure of similar projects shouldn't be ignored. But, on the other hand, most of the work described wasn't super tricky, beyond getting organizational buy-in and having a clear picture of the value that tracig can bring.

One thing that's a bit beyond the scope of this post that probably deserves its own post is that, tracing and metrics, while not fully orthogonal, are complementary and having only one or the other leaves you blind to a lot of problems. You're going to pay a high cost for that in a variety of ways: unecessary incidents, extra time spent debugging incidents, generally higher monetary costs due to running infra inefficiently, etc. Also, while metrics and tracing individually gives you much better visibility than having either alone, some problemls require looking at both together; some of the most interesting analyses I've done involve joining (often with a literal SQL join) trace data and metrics data.

To make it concrete, an example of something that's easy to see with tracing but annoying to see with logging unless you add logging to try to find this in particular (which you can do for any individual case, but probably don't want to do for the thousands of things tracing makes visible), is something we looked at above: "show me cases where a specific call path from the load balancer to A causes high load amplification on some service B, which may be multiple hops away from A in the call graph. In some cases, this will be apparent because A generally causes high load amplificaiton on B, but if it only happens in some cases, that's still easy to handle with tracing but it's very annoying if you're just looking at metrics.

An example of something where you want to join tracing and metrics data is when looking at the performance impact of something like a bad host on latency. You will, in general, not be able to annotate the appropriate spans that pass through the host as bad because, if you knew the host was bad at the time of the span, the host wouldn't be in production. But you can sometimes find, with historical data, a set of hosts that are bad, and then look up latency critical paths that pass through the host to determine the end-to-end imapct of the bad host.

Everyone has their own biases, with respect to tracing, mine come from generally working on things that try to direct improve cost, reliability, and latency, so the examples are focused on that, but there are also a lot of other uses for tracing. You can check out Distributed Tracing in Practice or Mastering Distributed Tracing for some other perspectives.


Thanks to Rebecca Isaacs, Leah Hanson, Yao Yue, and Yuri Vishnevsky for comments/corrections/discussion.

  1. this will almost certainly be an incomplete list, but some other people who've pitched in include Moses, Tiina, Rich, Rahul, Ben, Mike, Mary, Arash, Feng, Jenny, Andy, Yao, Yihong, Vinu, and myself.

    Note that this relatively long list of contributors doesn't contradict this work being high ROI. I'd estimate that there's been less than 2 person-years worth of work on everything discussed in this post. Just for example, while I spend a fair amount of time doing analyses that use the tracing infra, I think I've only spent on the order of one week on the infra itself.

    In case it's not obvious from the above, even though I'm writing this up, I was a pretty minor contributor to this. I'm just writing it up because I sat next to Rebecca as this work was being done and was super impressed by both her process and the outcome.


May 30, 2020

Dan Luu (dl)

A simple way to get more value from metrics May 30, 2020 07:06 AM

We spent one day1 building a system that immediately found a mid 7 figure optimization (which ended up shipping). In the first year, we shipped mid 8 figures per year worth of cost savings as a result. The key feature this system introduces is the ability to query metrics data across all hosts and all services and over any period of time (since inception), so we've called it LongTermMetrics (LTM) internally since I like boring, descriptive, names.

This got started when I was looking for a starter project that would both help me understand the Twitter infra stack and also have some easily quantifiable value. Andy Wilcox suggested looking at JVM survivor space utilization for some large services. If you're not familiar with what survivor space is, you can think of it as a configurable, fixed-size buffer, in the JVM (at least if you use the GC algorithm that's default at Twitter). At the time, if you looked at a random large services, you'd usually find that either:

  1. The buffer was too small, resulting in poor performance, sometimes catastrophically poor when under high load.
  2. The buffer was too large, resulting in wasted memory, i.e., wasted money.

But instead of looking at random services, there's no fundamental reason that we shouldn't be able to query all services and get a list of which services have room for improvement in their configuration, sorted by performance degradation or cost savings. And if we write that query for JVM survivor space, this also goes for other configuration parameters (e.g., other JVM parameters, CPU quota, memory quota, etc.). Writing a query that worked for all the services turned out to be a little more difficult than I was hoping due to a combination of data consistency and performance issues. Data consistency issues included things like:

  • Any given metric can have ~100 names, e.g., I found 94 different names for JVM survivor space
    • I suspect there are more, these were just the ones I could find via a simple search
  • The same metric name might have a different meaning for different services
    • Could be a counter or a gauge
    • Could have different units, e.g., bytes vs. MB or microseconds vs. milliseconds
  • Metrics are sometimes tagged with an incorrect service name
  • Zombie shards can continue to operate and report metrics even though the cluster manager has started up a new instance of the shard, resulting in duplicate and inconsistent metrics for a particular shard name

Our metrics database, MetricsDB, was specialized to handle monitoring, dashboards, alerts, etc. and didn't support general queries. That's totally reasonable, since monitoring and dashboards are lower on Maslow's hierarchy of observability needs than general metrics analytics. In backchannel discussions from folks at other companies, the entire set of systems around MetricsDB seems to have solved a lot of the problems that plauge people at other companies with similar scale, but the specialization meant that we couldn't run arbitrary SQL queries against metrics in MetricsDB.

Another way to query the data is to use the copy that gets written to HDFS in Parquet format, which allows people to run arbitrary SQL queries (as well as write Scalding (MapReduce) jobs that consume the data).

Unfortunately, due to the number of metric names, the data on HDFS can't be stored in a columnar format with one column per name -- Presto gets unhappy if you feed it too many columns and we have enough different metrics that we're well beyond that limit. If you don't use a columnar format (and don't apply any other tricks), you end up reading a lot of data for any non-trivial query. The result was that you couldn't run any non-trivial query (or even many trivial queries) across all services or all hosts without having it time out. We don't have similar timeouts for Scalding, but Scalding performance is much worse and a simple Scalding query against a day's worth of metrics will usually take between three and twenty hours, depending on cluster load, making it unreasonable to use Scalding for any kind of exploratory data analysis.

Given the data infrastructure that already existed, an easy way to solve both of these problems was to write a Scalding job to store the 0.1% to 0.01% of metrics data that we care about for performance or capacity related queries and re-write it into a columnar format. I would guess that at least 90% of metrics are things that almost no one will want to look at in almost any circumstance, and of the metrics anyone really cares about, the vast majority aren't performance related. A happy side effect of this is that since such a small fraction of the data is relevant, it's cheap to store it indefinitely. The standard metrics data dump is deleted after a few weeks because it's large enough that it would be prohibitively expensive to store it indefinitely; a longer metrics memory will be useful for capacity planning or other analyses that prefer to have historical data.

The data we're saving includes (but isn't limited to) the following things for each shard of each service:

  • utilizations and sizes of various buffers
  • CPU, memory, and other utilization
  • number of threads, context switches, core migrations
  • various queue depths and network stats
  • JVM version, feature flags, etc.
  • GC stats
  • Finagle metrics

And for each host:

  • various things from procfs, like iowait time, idle, etc.
  • what cluster the machine is a part of
  • host-level info like NIC speed, number of cores on the host, memory,
  • host-level stats for "health" issues like thermal throttling, machine checks, etc.
  • OS version, host-level software versions, host-level feature flags, etc.
  • Rezolus metrics

For things that we know change very infrequently (like host NIC speed), we store these daily, but most of these are stored at the same frequency and granularity that our other metrics is stored for. In some cases, this is obviously wasteful (e.g., for JVM tenuring threshold, which is typically identical across every shard of a service and rarely changes), but this was the easiest way to handle this given the infra we have around metrics.

Although the impetus for this project was figuring out which services were under or over configured for JVM survivor space, it started with GC and container metrics since those were very obvious things to look at and we've been incrementally adding other metrics since then. To get an idea of the kinds of things we can query for and how simple queries are if you know a bit of SQL, here are some examples:

Very High p90 JVM Survivor Space

This is part of the original goal of finding under/over-provisioned services. Any service with a very high p90 JVM survivor space utilization is probably under-provisioned on survivor space. Similarly, anything with a very low p99 or p999 JVM survivor space utilization when under peak load is probably overprovisioned (query not displayed here, but we can scope the query to times of high load).

A Presto query for very high p90 survivor space across all services is:

with results as (
  select servicename,
    approx_distinct(source, 0.1) as approx_sources, -- number of shards for the service
    -- real query uses [coalesce and nullif]( to handle edge cases, omitted for brevity
    approx_percentile(jvmSurvivorUsed / jvmSurvivorMax, 0.90) as p90_used,
    approx_percentile(jvmSurvivorUsed / jvmSurvivorMax, 0.50) as p50_used,
  from ltm_service 
  where ds >= '2020-02-01' and ds <= '2020-02-28'
  group by servicename)
select * from results
where approx_sources > 100
order by p90_used desc

Rather than having to look through a bunch of dashboards, we can just get a list and then send diffs with config changes to the appropriate teams or write a script that takes the output of the query and automatically writes the diff. The above query provides a pattern for any basic utilization numbers or rates; you could look at memory usage, new or old gen GC frequency, etc., with similar queries. In one case, we found a service that was wasting enough RAM to pay my salary for a decade.

I've been moving away from using thresholds against simple percentiles to find issues, but I'm presenting this query because this is a thing people commonly want to do that's useful and I can write this without having to spend a lot of space explain why it's a reasonable thing to do; what I prefer to do instead is out of scope of this post and probably deserves its own post.

Network utilization

The above query was over all services, but we can also query across hosts. In addition, we can do queries that join against properties of the host, feature flags, etc.

Using one set of queries, we were able to determine that we had a significant number of services running up against network limits even though host-level network utilization was low. The compute platform team then did a gradual rollout of a change to network caps, which we monitored with queries like the one below to determine that we weren't see any performance degradation (theoretically possible if increasing network caps caused hosts or switches to hit network limits).

With the network change, we were able to observe, smaller queue depths, smaller queue size (in bytes), fewer packet drops, etc.

The query below only shows queue depths for brevity; adding all of the quantities mentioned is just a matter of typing more names in.

The general thing we can do is, for any particular rollout of a platform or service-level feature, we can see the impact on real services.

with rolled as (
   -- rollout was fixed for all hosts during the time period, can pick an arbitrary element from the time period
   arbitrary(element_at(misc, 'egress_rate_limit_increase')) as rollout,
 from ltm_deploys
 where ds = '2019-10-10'
 and zone = 'foo'
 group by ipAddress
), host_info as(
   arbitrary(nicSpeed) as nicSpeed,
 from ltm_host
 where ds = '2019-10-10'
 and zone = 'foo'
 group by ipAddress
), host_rolled as (
 from rolled
 join host_info on rolled.ipAddress = host_info.ipAddress
), container_metrics as (
 from ltm_container
 where ds >= '2019-10-10' and ds <= '2019-10-14'
 and zone = 'foo'
 approx_percentile(netTxQlen, 1, 0.999, 0.0001) as p999_qlen,
 approx_percentile(netTxQlen, 1, 0.99, 0.001) as p99_qlen,
 approx_percentile(netTxQlen, 0.9) as p90_qlen,
 approx_percentile(netTxQlen, 0.68) as p68_qlen,
 count(*) as cnt
from container_metrics
join host_rolled on host_rolled.hostId = container_metrics.hostId
group by service, nicSpeed, rollout

Other questions that became easy to answer

  • What's the latency, CPU usage, CPI, or other performance impact of X?
    • Increasing or decreasing the number of performance counters we monitor per container
    • Tweaking kernel parameters
    • OS or other releases
    • Increasing or decreasing host-level oversubscription
    • General host-level load
    • Retry budget exhaustion
  • For relevant items above, what's the distribution of X, in general or under certain circumstances?
  • What hosts have unusually poor service-level performance for every service on the host, after controlling for load, etc.?
    • This has usually turned out to be due to a hardware misconfiguration or fault
  • Which services don't play nicely with other services aside from the general impact on host-level load?
  • What's the latency impact of failover, or other high-load events?
    • What level of load should we expect in the future given a future high-load event plus current growth?
    • Which services see more load during failover, which services see unchanged load, and which fall somewhere in between?
  • What config changes can we make for any fixed sized buffer or allocation that will improve performance without increasing cost or reduce cost without degrading performance?
  • For some particular host-level health problem, what's the probability it recurs if we see it N times?
  • etc., there are a lot of questions that become easy to answer if you can write arbitrary queries against historical metrics data

Design decisions

LTM is about as boring a system as is possible. Every design decision falls out of taking the path of least resistance.

  • Why using Scalding?
    • It's standard at Twitter and the integration made everything trivial. I tried Spark, which has some advantages. However, at the time, I would have had to do manual integration work that I got for free with Scalding.
  • Why use Presto and not something that allows for live slice & dice queries like Druid?
    • Rebecca Isaacs and Jonathan Simms were doing related work on tracing and we knew that we'd want to do joins between LTM and whatever they created. That's trivial with Presto but would have required more planning and work with something like Druid, at least at the time.
    • George Sirois imported a subset of the data into Druid so we could play with it and the facilities it offers are very nice; it's probably worth re-visiting at some point
  • Why not use Postgres or something similar?
    • The amount of data we want to store makes this infeasible without a massive amount of effort; even though the cost of data storage is quite low, it's still a "big data" problem
  • Why Parquet instead of a more efficient format?
    • It was the most suitable of the standard supported formats (the other major suppported format is raw thrift), introducing a new format would be a much larger project than this project
  • Why is the system not real-time (with delays of at least one hour)?
    • Twitter's batch job pipeline is easy to build on, all that was necessary was to read some tutorial on how it works and then write something similar, but with different business logic.
    • There was a nicely written proposal to build a real-time analytics pipeline for metrics data written a couple years before I joined Twitter, but that never got built because (I estimate) it would have been one to four quarters of work to produce an MVP and it wasn't clear what team had the right mandate to work on that and also had 4 quarters of headcount available. But the add a batch job took one day, you don't need to have roadmap and planning meetings for a day of work, you can just do it and then do follow-on work incrementally.
    • If we're looking for misconfigurations or optimization opportunities, these rarely go away within an hour (and if they did, they must've had small total impact) and, in fact, they often persist for months to years, so we don't lose much by givng up on real-time (we do lose the ability to use the output of this for some monitoring use cases)
    • The real-time version would've been a system that significant operational cost can't be operated by one person without undue burden. This system has more operational/maintenance burden than I'd like, probably 1-2 days of mine time per month a month on average, which at this point makes that a pretty large fraction of the total cost of the system, but it never pages, and the amount of work can easily be handeled by one person.

Boring technology

I think writing about systems like this, that are just boring work is really underrated. A disproportionate number of posts and talks I read are about systems using hot technologies. I don't have anything against hot new technologies, but a lot of useful work comes from plugging boring technologies together and doing the obvious thing. Since posts and talks about boring work are relatively rare, I think writing up something like this is more useful than it has any right to be.

For example, a couple years ago, at a local meetup that Matt Singer organizes for companies in our size class to discuss infrastructure (basically, companies that are smaller than FB/Amazon/Google) I asked if anyone was doing something similar to what we'd just done. No one who was there was (or not who'd admit to it, anyway), and engineers from two different companies expressed shock that we could store so much data, and not just the average per time period, but some histogram information as well. This work is too straightforward and obvious to be novel, I'm sure people have built analogous systems in many places. It's literally just storing metrics data on HDFS (or, if you prefer a more general term, a data lake) indefinitely in a format that allows interactive queries.

If you do the math on the cost of metrics data storage for a project like this in a company in our size class, the storage cost is basically a rounding error. We've shipped individual diffs that easily pay for the storage cost for decades. I don't think there's any reason storing a few years or even a decade worth of metrics should be shocking when people deploy analytics and observability tools that cost much more all the time. But it turns out this was surprising, in part because people don't write up work this boring.

An unrelated example is that, a while back, I ran into someone at a similarly sized company who wanted to get similar insights out of their metrics data. Instead of starting with something that would take a day, like this project, they started with deep learning. While I think there's value in applying ML and/or stats to infra metrics, they turned a project that could return significant value to the company after a couple of person-days into a project that took person-years. And if you're only going to either apply simple heuristics guided by someone with infra experience and simple statistical models or naively apply deep learning, I think the former has much higher ROI. Applying both sophisticated stats/ML and practitioner guided heuristics together can get you better results than either alone, but I think it makes a lot more sense to start with the simple project that takes a day to build out and maybe another day or two to start to apply than to start with a project that takes months or years to build out and start to apply. But there are a lot of biases towards doing the larger project: it makes a better resume item (deep learning!), in many places, it makes a better promo case, and people are more likely to give a talk or write up a blog post on the cool system that uses deep learning.

The above discusses why writing up work is valuable for the industry in general. We covered why writing up work is valuable to the company doing the write-up in a previous post, so I'm not going to re-hash that here.

Appendix: stuff I screwed up

I think it's unfortunate that you don't get to hear about the downsides of systems without backchannel chatter, so here are things I did that are pretty obvious mistakes in retrospect. I'll add to this when something else becomes obvious in retrospect.

  • Not using a double for almost everything
    • In an ideal world, some things aren't doubles, but everything in our metrics stack goes through a stage where basically every metric is converted to a double
    • I stored most things that "should" be an integral type as an integral type, but doing the conversion from long -> double -> long is never going to be more precise than just doing thelong -> double conversion and it opens the door to other problems
    • I stored some things that shouldn't be an integral type as an integral type, which causes small values to unnecessarily lose precision
      • Luckily this hasn't caused serious errors for any actionable analysis I've done, but there are analyses where it could cause problems
  • Using asserts instead of writing bad entries out to some kind of "bad entries" table
    • For reasons that are out of scope of this post, there isn't really a reasonable way to log errors or warnings in Scalding jobs, so I used asserts to catch things that shoudn't happen, which causes the entire job to die every time something unexpected happens; a better solution would be to write bad input entries out into a table and then have that table emailed out as a soft alert if the table isn't empty
      • An example of a case where this would've saved some operational overhead is where we had an unusual amount of clock skew (3600 years), which caused a timestamp overflow. If I had a table that was a log of bad entries, the bad entry would've been omitted from the output, which is the correct behavior, and it would've saved an interruption plus having to push a fix and re-deploy the job.
  • Longterm vs. LongTerm in the code
    • I wasn't sure which way this should be capitalized when I was first writing this and, when I made a decision, I failed to grep for and squash everything that was written the wrong way, so now this pointless inconsistency exists in various places

These are the kind of thing you expect when you crank out something quickly and don't think it through enough. The last item is trivial to fix and not much of a problem since the ubiquitous use of IDEs at Twitter means that basically anyone who would be impacted will have their IDE supply the correct capitalization for them.

The first item is more problematic, both in that it could actually cause incorrect analyses and in that fixing it will require doing a migration of all the data we have. My guess is that, at this point, this will be half a week to a week of work, which I could've easily avoided by spending thirty more seconds thinking through what I was doing.

The second item is somewhere in between. Between the first and second items, I think I've probably signed up for roughly double the amount of direct work on this system (so, not including time spent on data analysis on data in the system, just the time spent to build the system) for essentially no benefit.

Thanks to Leah Hanson, Andy Wilcox, Lifan Zeng, and Matej Stuchlik for comments/corrections/discussion

  1. The actual work involved was about a day's work, but it was done over a week since I had to learn Scala as well as Scalding and the general Twitter stack, the metrics stack, etc.

    One day is also just an estimate for the work for the initial data sets. Since then, I've done probably a couple more weeks of work and Wesley Aptekar-Cassels and Kunal Trivedi have probably put in another week or two of time. The opertional cost is probably something like 1-2 days of my time per month (on average), bringing the total cost to on the order a month or two.

    I'm also not counting time spent using the dataset, or time spent debugging issues, which will include a lot of time that I can only roughly guess at, e.g., when the compute platform team changed the network egress limits as a result of some data analysis that took about an hour, that exposed a latent mesos bug that probably cost a day of Ilya Pronin's time, David Mackey has spent a fair amount of time tracking down weird issues where the data shows something odd is going on, but we don't know what is, etc. If you wanted to fully account for time spent on work that came out of some data analysis on the data sets discussed in the post, I suspect, between service-level teams, plus platform-level teams like our JVM, OS, and HW teams, we're probably at roughly 1 person-year of time.

    But, because the initial work it took to create a working and useful system was a day plus time spent working on orientation material and the system returned seven figures, it's been very easy to justify all of this additional time spent, which probably wouldn't have been the case if a year of up-front work was required. Most of the rest of the time isn't the kind of thing that's usually "charged" on roadmap reviews on creating a system (time spent by users, operational overhead), but perhaps the ongoing operational cost shlould be "charged" when creating the system (I don't think it makes sense to "charge" time spent by users to the system since, the more useful a system is, the more time users will spend using it, that doesn't really seem like a cost).

    There'a also been work to build tools on top of this, Kunal Trivedi has spent a fair amount of time building a layer on top of this to make the presentation more user friendly than SQL queries, which could arguably be charged to this project.


Andreas Zwinkau (qznc)

One Letter Programming Languages May 30, 2020 12:00 AM

If you are looking for a free name, there is none.

Read full article!

May 29, 2020

Jeremy Morgan (JeremyMorgan)

How to Build Your First JAMstack Site May 29, 2020 10:51 PM

Are you wondering what all this new hype is over JAMstack? What is a JAMstack site? How do I build one? Where do I deploy it? If you’ve asked any of these questions over the last couple of months, this article is for you. We’re going to learn what JAMstack is, and how to build our first JAMstack blog. If you already have an idea what a JAMstack site is, you can skip this section and go directly to:

Wesley Moore (wezm)

Setting the amdgpu HDMI Pixel Format on Linux May 29, 2020 10:48 PM

This week I discovered some details of digital display technology that I was previously unaware of: pixel formats. I have two Dell P2415Q displays connected to my computer. One via DisplayPort, the other via HDMI. The HDMI connected one was misbehaving and showing a dull picture. It turned out I needed to force the HDMI port of my RX560 graphics card to use RGB output instead of YCbCr. However, the amdgpu driver does not expose a means to do this. So, I used an EDID hack to make it look like the display only supported RGB.

tl;dr You can't easily configure the pixel format of the Linux amdgpu driver but you can hack the EDID of your display so the driver chooses RGB. Jump to the instructions.

Previously I had one display at work and one at home, both using DisplayPort and all was well. However, when I started working from home at the start of 2020 (pre-pandemic) the HDMI connected one has always been a bit flakey. The screen would go blank for second, then come back on. I tried 3 different HDMI cables each more premium (and hopefully shielded than the last) without success.

This week the frustration boiled over and I vented to some friends. I was on the brink of just rage buying a new graphics card with multiple DisplayPorts, since I'd never had any trouble with that connection. I received one suggestion to swap the cables between the two, to rule out a fault with the HDMI connected display. I was quite confident the display was ok but it was a sensible thing to try before dropping cash on a new graphics card. So I swapped the cables over.

After performing the magical incantation to enable HDMI 2.0 and get 4K 60Hz on the newly HDMI connected display I immediately noticed lag. I even captured it in a slow motion video on my phone to prove I wasn't going crazy. Despite xrandr reporting a 60Hz connection it seemed as though it was updating at less than that. This led me to compare the menus of the two displays. It was here I noticed that the good one reported an input colour format of RGB, the other YPbPr.

This led to more reading about pixel formats in digital displays — a thing I was not previously aware of. Turns out that ports like HDMI support multiple ways of encoding the pixel data, some sacrificing dynamic range for lower bandwidth. I found this article particularly helpful, DisplayPort vs. HDMI: Which Is Better For Gaming?.

My hypothesis at this point was that the lag was being introduced by my display converting the YPbPr input to its native RGB. So, I looked for a way to change the pixel format output from the HDMI port of my RX560 graphics card. Turns out this is super easy on Windows, but the amdgpu driver on Linux does not support changing it.

In trying various suggestions in that bug report I rebooted a few times and the lag mysteriously went away but the pixel format remained the same. At this point I noticed the display had a grey cast to it, especially on areas of white. This had been present on the other display when it was connected via HDMI too but I just put it down to being a couple of years older than the other one. With my new pixel format knowledge in hand I knew this was was the source of lack of brightness. So, I was still determined to find a way to force the HDMI output to RGB.

The Fix

It was at this point I found this Reddit post describing a terrible hack, originally described by Parker Reed in this YouTube video: Copy the EDID of the display and modify it to make it seem like the display only supports RGB. The amdgpu driver then chooses that format instead. Amazingly enough it worked! I also haven't experienced the screen blanking issue since swapping cables. I can't say for sure if that is fixed but the HDMI cable is now further away from interference from my Wi-Fi router, so perhaps that helped.

The following are the steps I took on Arch Linux to use a modified EDID:

  1. Install wxEDID from the AUR.
  2. Make a copy of the EDID data: cp /sys/devices/pci0000:00/0000:00:03.1/0000:09:00.0/drm/card0/card0-HDMI-A-1/edid Documents/edid.bin
  3. Edit edid.bin with wxEDID and change these values:
    1. Find SPF: Supported features -> vsig_format -> replace 0b01 wih 0b00
    2. Find CHD: CEA-861 header -> change the value of YCbCr420 and YCbCr444 to 0
    3. Recalculate the checksum: Options > Recalc Checksum.
    4. Save the file.

Note: I had to attempt editing the file a few times as wxEDID kept segfaulting. Eventually it saved without crashing though.

Now we need to get the kernel to use the modified file:

  1. sudo mkdir /lib/firmware/edid

  2. sudo mv edid.bin /lib/firmware/edid/edid.bin

  3. Edit the kernel command line. I use systemd-boot, so I edited /boot/loader/entries/arch.conf and added drm_kms_helper.edid_firmware=edid/edid.bin to the command line, making the full file look like this:

     title   Arch Linux
     linux   /vmlinuz-linux
     initrd  /amd-ucode.img
     initrd  /initramfs-linux.img
     options root=PARTUUID=2f693946-c278-ed44-8ba2-67b07c3b6074 resume=UUID=524c0604-c307-4106-97e4-1b9799baa7d5 resume_offset=4564992 drm_kms_helper.edid_firmware=edid/edid.bin rw
  4. Regenerate the initial RAM disk: sudo mkinitcpio -p linux

  5. Reboot

After rebooting the display confirmed it was now using RGB and visually it was looking much brighter! 🤞 the display blanking issue remains fixed as well.

May 27, 2020

Frederic Cambus (fcambus)

OpenBSD/armv7 on the CubieBoard2 May 27, 2020 10:39 PM

I bought the CubieBoard2 back in 2016 with the idea to run OpenBSD on it, but because of various reliability issues with the onboard NIC, it ended up running NetBSD for a few weeks before ending up in a drawer.

Back in October, Mark Kettenis committed code to allow switching to the framebuffer "glass" console in the bootloader on OpenBSD/armv7, making it possible to install the system without using a serial cable.

>> OpenBSD/armv7 BOOTARM 1.14
boot> set tty fb0
switching console to fb0

This prompted me to plug the board again, and having support for the framebuffer console is a game changer. It also allows running Xenocara, if that's your thing.

Here is the output of running file on executables:

ELF 32-bit LSB shared object, ARM, version 1

And this is the result of the md5 -t benchmark:

MD5 time trial.  Processing 10000 10000-byte blocks...
Digest = 52e5f9c9e6f656f3e1800dfa5579d089
Time   = 1.340000 seconds
Speed  = 74626865.671642 bytes/second

For the record, LibreSSL speed benchmark results are available here.

System message buffer (dmesg output):

OpenBSD 6.7-current (GENERIC) #299: Sun May 24 18:25:45 MDT 2020
real mem  = 964190208 (919MB)
avail mem = 935088128 (891MB)
random: good seed from bootblocks
mainbus0 at root: Cubietech Cubieboard2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A7 r0p4
cpu0: 32KB 32b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu0: 256KB 64b/line 8-way L2 cache
cortex0 at mainbus0
psci0 at mainbus0: PSCI 0.0
sxiccmu0 at mainbus0
agtimer0 at mainbus0: tick rate 24000 KHz
simplebus0 at mainbus0: "soc"
sxiccmu1 at simplebus0
sxipio0 at simplebus0: 175 pins
sxirtc0 at simplebus0
sxisid0 at simplebus0
ampintc0 at simplebus0 nirq 160, ncpu 2: "interrupt-controller"
"system-control" at simplebus0 not configured
"interrupt-controller" at simplebus0 not configured
"dma-controller" at simplebus0 not configured
"lcd-controller" at simplebus0 not configured
"lcd-controller" at simplebus0 not configured
"video-codec" at simplebus0 not configured
sximmc0 at simplebus0
sdmmc0 at sximmc0: 4-bit, sd high-speed, mmc high-speed, dma
"usb" at simplebus0 not configured
"phy" at simplebus0 not configured
ehci0 at simplebus0
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ohci0 at simplebus0: version 1.0
"crypto-engine" at simplebus0 not configured
"hdmi" at simplebus0 not configured
sxiahci0 at simplebus0: AHCI 1.1
scsibus0 at sxiahci0: 32 targets
ehci1 at simplebus0
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ohci1 at simplebus0: version 1.0
"timer" at simplebus0 not configured
sxidog0 at simplebus0
"ir" at simplebus0 not configured
"codec" at simplebus0 not configured
sxits0 at simplebus0
com0 at simplebus0: ns16550, no working fifo
sxitwi0 at simplebus0
iic0 at sxitwi0
axppmic0 at iic0 addr 0x34: AXP209
sxitwi1 at simplebus0
iic1 at sxitwi1
"gpu" at simplebus0 not configured
dwge0 at simplebus0: address 02:0a:09:03:27:08
rlphy0 at dwge0 phy 1: RTL8201L 10/100 PHY, rev. 1
"hstimer" at simplebus0 not configured
"display-frontend" at simplebus0 not configured
"display-frontend" at simplebus0 not configured
"display-backend" at simplebus0 not configured
"display-backend" at simplebus0 not configured
gpio0 at sxipio0: 32 pins
gpio1 at sxipio0: 32 pins
gpio2 at sxipio0: 32 pins
gpio3 at sxipio0: 32 pins
gpio4 at sxipio0: 32 pins
gpio5 at sxipio0: 32 pins
gpio6 at sxipio0: 32 pins
gpio7 at sxipio0: 32 pins
gpio8 at sxipio0: 32 pins
usb2 at ohci0: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1
usb3 at ohci1: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1
simplefb0 at mainbus0: 1920x1080, 32bpp
wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation)
scsibus1 at sdmmc0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <SD/MMC, SC64G, 0080> removable
sd0: 60906MB, 512 bytes/sector, 124735488 sectors
uhidev0 at uhub2 port 1 configuration 1 interface 0 "Lenovo ThinkPad Compact USB Keyboard with TrackPoint" rev 2.00/3.30 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub2 port 1 configuration 1 interface 1 "Lenovo ThinkPad Compact USB Keyboard with TrackPoint" rev 2.00/3.30 addr 2
uhidev1: iclass 3/1, 22 report ids
ums0 at uhidev1 reportid 1: 5 buttons, Z and W dir
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 16: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 17: input=2, output=0, feature=0
uhid2 at uhidev1 reportid 19: input=8, output=8, feature=8
uhid3 at uhidev1 reportid 21: input=2, output=0, feature=0
uhid4 at uhidev1 reportid 22: input=2, output=0, feature=0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootfile: sd0a:/bsd
boot device: sd0
root on sd0a (f7b555b0fa0e8c49.a) swap on sd0b dump on sd0b

Sensors output:

$ sysctl hw.sensors
hw.sensors.sxits0.temp0=39.50 degC
hw.sensors.axppmic0.temp0=30.00 degC
hw.sensors.axppmic0.volt0=4.95 VDC (ACIN)
hw.sensors.axppmic0.volt1=0.03 VDC (VBUS)
hw.sensors.axppmic0.volt2=4.85 VDC (APS)
hw.sensors.axppmic0.current0=0.11 A (ACIN)
hw.sensors.axppmic0.current1=0.00 A (VBUS)
hw.sensors.axppmic0.indicator0=On (ACIN), OK
hw.sensors.axppmic0.indicator1=Off (VBUS)

May 25, 2020

Gustaf Erikson (gerikson)

4,000 dead in Sweden May 25, 2020 09:48 AM

Alsing dead May 25, 2020 09:48 AM

3,000 dead in Sweden May 25, 2020 09:47 AM

2,000 dead in Sweden May 25, 2020 09:47 AM

Marc Brooker (mjb)

Reading Research: A Guide for Software Engineers May 25, 2020 12:00 AM

Reading Research: A Guide for Software Engineers

Don't be afraid.

One thing I'm known for at work is reading research papers, and referring to results in technical conversations. People ask me if, and how, they should read papers themselves. This post is a long-form answer to that question. The intended audience is working software engineers.

Why read research?

I read research in one of three mental modes.

The first mode is solution finding: I’m faced with a particular problem, and am looking for solutions. This isn’t too different from the way that you probably use Stack Overflow, but for more esoteric or systemic problems. Solution finding can work directly from papers, but I tend to find books more useful in this mode, unless I know an area well and am looking for something specific.

A more productive mode is what I call discovery. In this case, I’ve been working on a problem or in a space, and know something about it. In discovery mode, I want to explore around the space I know and see if there are better solutions. For example, when I was building a system using Paxos, I read a lot of literature about consensus protocols in general (including classics like Viewstamped Replication1, and newer papers like Raft). The goal in discovery mode is to find alternative solutions, opportunities for optimization, or new ways to think about a problem.

The most intellectually gratifying mode for me is curiosity mode. Here, I’ll read papers that just seem interesting to me, but aren’t related to anything I’m currently working on. I’m constantly surprised by how reading broadly has helped me solve problems, or just informed by approach. For example, reading about misuse-resistant cryptography primitives like GCM-SIV has deeply informed my approach to API design. Similarly, reading about erasure codes around 2005 helped me solve an important problem for my team just this year.

I’ve found reading for discovery and curiosity very helpful to my career. It has also given me tools that makes reading for solution finding more efficient. Sometimes, reading for curiosity leads to new paths. About five years ago I completely changed what I was working on after reading Latency lags bandwidth, which I believe is one of the most important trends in computing.

Do I need a degree to read research papers?

No. Don’t expect to be able to pick up every paper and understand it completely. You do need a certain amount of background knowledge, but no credentials. Try to avoid being discouraged when you don't understand a paper, or sections of a paper. I'm often surprised when I revisit something after a couple years and find I now understand it.

Learning a new field from primary research can be very difficult. When tackling a new area, books, blogs, talks, and courses are better options.

How do I find papers worth reading?

That depends on the mode you’re in. In solution finding and discovery modes, search engines like Google Scholar are a great place to start. One challenge with searching is that you might not even know the right things to search for: it’s not unusual for researchers to use different terms from the ones you are used to. If you run into this problem, picking up a book on the topic can often help bridge the gap, and the references in books are a great way to discover papers.

Following particular authors and researchers can be great for discovery and curiosity modes. If there’s a researcher who’s working in a space I’m interested in, I’ll follow them on Twitter or add search alerts to see when they’ve published something new.

Conferences and journals are another great place to go. Most of the computer science research you’ll read is probably published at conferences. There are some exceptions. For example, I followed ACM Transactions on Storage when I was working in that area. Pick a couple of conferences in areas that you’re interested in, and read through their programs when they come out. In my area, NSDI and Eurosys happened earlier this year, and OSDI is coming up. Jeff Huang has a nice list of best paper winners at a wide range of CS conferences.

A lot of research involves going through the graph of references. Most papers include a list of references, and as I read I note down which ones I’d like to follow up on and add them to my reading list. References form a directed (mostly) acyclic graph of research going into the past.

Finally, some research bloggers are worth following. Adrian Colyer's blog is worth its weight in gold. I’ve written about research from researchers from Leslie Lamport, Nancy Lynch and others, too.

That’s quite a fire hose! How do I avoid drowning?

You don’t have to drink that whole fire hose. I know I can’t. Titles and abstracts can be a good way to filter out papers you want to read. Don’t be afraid to scan down a list of titles and pick out one or two papers to read.

Another approach is to avoid reading new papers at all. Focus on the classics, and let time filter out papers that are worth reading. For example, I often find myself recommending Jim Gray's 1986 paper on The 5 Minute Rule and Lisanne Bainbridge's 1983 paper on Ironies of Automation2.

Who writes research papers?

Research papers in the areas of computer science I work in are generally written by one of three groups. First, researchers at universities, including professors, post docs, and graduate students. These are people who’s job it is to do research. They have a lot of freedom to explore quite broadly, and do foundational and theoretical work.

Second, engineering teams at companies publish their work. Amazon’s Dynamo, Firecracker, Aurora and Physalia papers are examples. Here, work is typically more directly aimed at a problem to be solved in a particular context. The strength of industry research is that it’s often been proven in the real world, at scale.

In the middle are industrial research labs. Bell Labs was home to some of the foundational work in computing and communications. Microsoft Research do a great deal of impressive work. Industry labs, as a broad generalization, also tend to focus on concrete problems, but can operate over longer time horizons.

Should I trust the results in research papers?

The right answer to this question is no. Nothing about being in a research paper guarantees that a result is right. Results can range from simply wrong, to flawed in more subtle ways3.

On the other hand, the process of peer review does help set a bar of quality for published results, and results published in reputable conferences and journals are generally trustworthy. Reviewers and editors put a great deal of effort into this, and it’s a real strength of scientific papers over informal publishing.

My general advice is to read methods carefully, and verify results for yourself if you’re going to make critical decisions based on them. A common mistake is to apply a correct result too broadly, and assume it applies to contexts or systems it wasn’t tested on.

Should I distrust results that aren’t in research papers?

No. The process of peer review is helpful, but not magical. Results that haven’t been peer reviewed, or rejected from peer review aren’t necessarily wrong. Some important papers have been rejected from traditional publishing, and were published in other ways. This happened to Leslie Lamport's classic paper introducing Paxos:

I submitted the paper to TOCS in 1990. All three referees said that the paper was mildly interesting, though not very important, but that all the Paxos stuff had to be removed. I was quite annoyed at how humorless everyone working in the field seemed to be, so I did nothing with the paper.

It was eventually published 8 years later, and quite well received:

This paper won an ACM SIGOPS Hall of Fame Award in 2012.

There's a certain dance one needs to know, and follow, to get published in a top conference or journal. Some of the steps are necessary, and lead to better research and better communities. Others are just for show.

What should I look out for in the methods section?

That depends on the field. In distributed systems, one thing to look out for is scale. Due to the constraints of research, systems may be tested and validated at a scale below what you’ll need to run in production. Think carefully about how the scale assumptions in the paper might impact the results. Both academic and industry authors have an incentive to talk up the strengths of their approach, and avoid highlighting the weaknesses. This is very seldom done to the point of dishonesty, but worth paying attention to as you read.

How do I get time to read?

This is going to depend on your personal circumstances, and your job. It's not always easy. Long-term learning is one of the keys to a sustainable and successful career, so it's worth making time to learn. One of the ways I like to learn is by reading research papers. You might find other ways more efficient, effective or enjoyable. That's OK too.


Pekka Enberg pointed me at How to Read a Paper by Srinivasan Keshav. It describes a three-pass approach to reading a paper that I like very much:

The first pass gives you a general idea about the paper. The second pass lets you grasp the paper’s content, but not its details. The third pass helps you understand the paper in depth.

Murat Demirbas shared his post How I Read a Research Paper which contains a lot of great advice. Like Murat, I like to read on paper, although I have taken to doing my lighter-weight reading using LiquidText


  1. I wrote a blog post about Viewstamped Replication back in 2014. It's a pity VR isn't more famous, because it's an interestingly different framing that helped me make sense of a lot of what Paxos does.
  2. Obviously stuff like maths is timeless, but even in fast-moving fields like systems there are papers worth reading from the 50s and 60s. I think about Sayre's 1969 paper Is automatic “folding” of programs efficient enough to displace manual? when people talk about how modern programmers don't care about efficiency.
  3. There's a lot of research that looks at the methods and evidence of other research. For a start, and to learn interesting things about your own benchmarking, take a look at Is Big Data Performance Reproducible in Modern Cloud Networks? and A Nine Year Study of File System and Storage Benchmarking

Joe Nelson (begriffs)

Logging TLS session keys in LibreSSL May 25, 2020 12:00 AM

LibreSSL is a fork of OpenSSL that improves code quality and security. It was originally developed for OpenBSD, but has since been ported to several platforms (Linux, *BSD, HP-UX, Solaris, macOS, AIX, Windows) and is now the default TLS provider for some of them.

When debugging a program that uses LibreSSL, it can be useful to see decrypted network traffic. Wireshark can decrypt TLS if you provide the secret session key, however the session key is difficult to obtain. It is different from the private key used for functions like tls_config_set_keypair_file(), which merely secures the initial TLS handshake with asymmetric cryptography. The handshake establishes the session key between client and server using a method such as Diffie-Hellman (DH). The session key is then used for efficient symmetric cryptography for the remainder of the communication.

Web browsers, from their Netscape provenance, will log session keys to a file specified by the environment variable SSLKEYLOGFILE when present. Netscape packaged this behavior in its Network Security Services library.

OpenSSL and LibreSSL don’t implement that NSS behavior, although OpenSSL allows code to register a callback for when TLS key material is generated or received. The callback receives a string in the NSS Key Log Format.

In addition to refactoring OpenSSL code, LibreSSL offers a simplified TLS interface called libtls. The simplicity makes it more likely that applications will use it safely. However, I couldn’t find an easy way to log session keys for my libtls connection.

I found a somewhat hacky way to do it, and asked their development list whether there’s a better way. From the lack of response, I assume there isn’t yet. Posting the solution here in case it’s helpful for anyone else.

This module provides a tls_dump_keylog() function that appends to the file specified in SSLKEYLOGFILE.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include <openssl/ssl.h>

/* A copy of the tls structure from libtls/tls_internal.h
 * This is a fragile hack! When the structure changes in libtls
 * then it will be Undefined Behavior to alias it with this.
 * See C99 section 6.5 (Expressions), paragraph 7
struct tls_internal {
	struct tls_config *config;
	struct tls_keypair *keypair;

	struct {
		char *msg;
		int num;
		int tls;
	} error;

	uint32_t flags;
	uint32_t state;

	char *servername;
	int socket;

	SSL *ssl_conn;
	SSL_CTX *ssl_ctx;

	struct tls_sni_ctx *sni_ctx;

	X509 *ssl_peer_cert;
	STACK_OF(X509) *ssl_peer_chain;

	struct tls_conninfo *conninfo;

	struct tls_ocsp *ocsp;

	tls_read_cb read_cb;
	tls_write_cb write_cb;
	void *cb_arg;

static void printhex(FILE *fp, const unsigned char* s, size_t len)
	while (len-- > 0)
		fprintf(fp, "%02x", *s++);

bool tls_dump_keylog(struct tls *tls)
	FILE *fp;
	unsigned int len_key, len_id;
	unsigned char key[256];
	const unsigned char *id;

	const char *path = getenv("SSLKEYLOGFILE");
	if (!path)
		return false;

	/* potentially nonstrict aliasing */
	sess = SSL_get_session(((struct tls_internal*)tls)->ssl_conn);
	if (!sess)
		fprintf(stderr, "Failed to get session for TLS\n");
		return false;
	len_key = SSL_SESSION_get_master_key(sess, key, sizeof key);
	id      = SSL_SESSION_get_id(sess, &len_id);

	if ((fp = fopen(path, "a")) == NULL)
		fprintf(stderr, "Unable to write keylog to '%s'\n", path);
		return false;
	fputs("RSA Session-ID:", fp);
	printhex(fp, id, len_id);
	fputs(" Master-Key:", fp);
	printhex(fp, key, len_key);
	fputs("\n", fp);
	return true;

To use the logfile in Wireshark, right click on a TLS packet, and select Protocol Preferences(Pre)-Master-Secret log filename.

(Pre)-Master-Secret log filename menu item

(Pre)-Master-Secret log filename menu item

In the resulting dialog, add the filename to the logfile. Then you can view the decrypted traffic with FollowTLS Stream.

Follow TLS stream menu item

Follow TLS stream menu item

May 24, 2020

Derek Jones (derek-jones)

New users generate more exceptions than existing users (in one dataset) May 24, 2020 10:42 PM

Application usage data is one of the rarest kinds of public software engineering data.

Even data that might be used to approximate application usage is rare. Server logs might be used as a proxy for browser usage or operating system usage, and number of Debian package downloads as a proxy for usage of packages.

Usage data is an important component of fault prediction models, and the failure to incorporate such data is one reason why existing fault models are almost completely worthless.

The paper Deriving a Usage-Independent Software Quality Metric appeared a few months ago (it’s a bit of a kitchen sink of a paper), and included lots of usage data! As far as I know, this is a first.

The data relates to a mobile based communications App that used Google analytics to log basic usage information, i.e., daily totals of: App usage time, uses by existing users, uses by new users, operating system+version used by the mobile device, and number of exceptions raised by the App.

Working with daily totals means there is likely to be a non-trivial correlation between usage time and number of uses. Given that this is the only public data of its kind, it has to be handled (in my case, ignored for the time being).

I’m expecting to see a relationship between number of exceptions raised and daily usage (the data includes a count of fatal exceptions, which are less common; because lots of data is needed to build a good model, I went with the more common kind). So a’fishing I went.

On most days no exception occurred (zero is the ideal case for the vendor, but I want lots of exception to build a good model). Daily exception counts are likely to be small integers, which suggests a Poisson error model.

It is likely that the same set of exceptions were experienced by many users, rather like the behavior that occurs when fuzzing a program.

Applications often have an initial beta testing period, intended to check that everything works. Lucky for me the beta testing data is included (i.e., more exceptions are likely to occur during beta testing, which get sorted out prior to official release). This is the data I concentrated my modeling.

The model I finally settled on has the form (code+data):

Exceptions approx uses^{0.1}newUserUses^{0.54}e^{0.002sqrt{usagetime}}AndroidVersion

Yes, newUserUses had a much bigger impact than uses. This was true for all the models I built using data for all Android/iOS Apps, and the exponent difference was always greater than two.

Why square-root, rather than log? The model fit was much better for square-root; too much better for me to be willing to go with a model which had usagetime as a power-law.

The impact of AndroidVersion varied by several orders of magnitude (which won’t come as a surprise to developers using earlier versions of Android).

There were not nearly as many exceptions once the App became generally available, and there were a lot fewer exceptions for the iOS version.

The outsized impact of new users on exceptions experienced is easily explained by developers failing to check for users doing nonsensical things (which users new to an App are prone to do). Existing users have a better idea of how to drive an App, and tend to do the kind of things that developers expect them to do.

As always, if you know of any interesting software engineering data, please let me know.

Ponylang (SeanTAllen)

Last Week in Pony - May 24, 2020 May 24, 2020 02:33 PM

Damon Kwok has been doing a ton of awesome work on the Emacs ponylang-mode. Sean T Allen gave an informal presentation on Pony to the Houston Functional Programmers Users Group.

May 22, 2020

Wesley Moore (wezm)

Software Bounties May 22, 2020 10:07 AM

I don't have time to build all the things I'd like to build, so I'm offering bounties on the following work.


  • Payment will be made via PayPal when the criteria is met. If you would prefer another mechanism, feel free to suggest it, but no guarantees.
  • Amounts are in Australian dollars.
  • I will not pay out a bounty after the expiration date.
  • I may choose to extend the expiration date.
  • How can I trust you'll pay me? I like to think that I'm a trustworthy person. However, if you would like to discuss a partial payment prior to starting work please get in touch.
  • You have to be the primary contributor to claim the bounty. If someone else does all the work and you just nudge it over the line the other person is the intended recipient.
  • If in doubt contact me.

Cairo user-font With Colour Bitmap Always Comes Out Black

The Evince PDF viewer uses Poppler to render PDFs, which in turn uses Cairo. Emoji embedded in a PDF using a PDF Type 3 font always come out as a black silhouette instead of the colour image when viewed in Evince do to a limitation in Cairo's user-font functionality.


Criteria: Implement support for colour user-fonts in Cairo, resulting in the Cairo issue being closed as completed.
Language: C
Amount: AU$500
Expires: 2021-01-01T00:00:00Z

Emoji Reactions in Fractal

Fractal is a Matrix client written in Rust using GTK.

Criteria: Implement emoji reactions in Fractal to the satisfaction of the maintainers, resulting in the issue being closed as completed.
Language: Rust
Amount: AU$500
Expires: 2021-01-01T00:00:00Z

Update Mattermost Server to Support Emoji Added After Unicode 9.0

Mattermost's emoji picker is stuck on emoji from Unicode 9. We're now up to Unicode 13 and many emoji added in the last few years are missing. This bounty pertains only to the work required in the Mattermost server, not the desktop and mobile apps.

Criteria: Update the list of emoji in the Mattermost server to Unicode 13.0, resulting in the issue being closed as completed.
Language: Go
Amount: AU$200
Expires: 2021-01-01T00:00:00Z

Carlos Fenollosa (carlesfe)

No more Google Analytics May 22, 2020 09:24 AM

I have removed the GA tracking code from this website. does not use any tracking technique, neither with cookies, nor js, nor image pixels.

Even though this was one of the first sites to actually implement a consent-based GA tracking, the current situation with the cookie banners is terrible.

We are back to the flash era where every site had a "home page" and you needed to perform some extra clicks to view the actual content. Now those extra clicks are spent in disabling all the tracking code.

I hate the current situation so much that I just couldn't be a part of it any more. So, no banner, no cookies, no js, nothing. Any little traffic I get I'll analyze with a log parser like webalizer. I wasn't checking it anyways.

Tags: internet, web, security

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

May 21, 2020

Gonçalo Valério (dethos)

Dynamic DNS using Cloudflare Workers May 21, 2020 11:09 PM

In this post I’ll try to describe a simple solution, that I came up with, to solve the issue of dynamically updating DNS records when the IP addresses of your machines/instances changes frequently.

While Dynamic DNS isn’t a new thing and many services/tools around the internet already provide solutions to this problem (for more than 2 decades), I had a few requirements that ruled out most of them:

  • I didn’t want to sign up to a new account in one of these external services.
  • I would prefer to use a domain name under my control.
  • I don’t trust the machine/instance that executes the update agent, so according to the principle of the least privilege, the client should only able to update one DNS record.

The first and second points rule out the usual DDNS service providers and the third point forbids me from using the Cloudflare API as is (like it is done in other blog posts), since the permissions we are allowed to setup for a new API token aren’t granular enough to only allow access to a single DNS record, at best I would’ve to give access to all records under that domain.

My solution to the problem at hand was to put a worker is front of the API, basically delegating half of the work to this “serverless function”. The flow is the following;

  • agent gets IP address and timestamp
  • agent signs the data using a previously known key
  • agent contacts the worker
  • worker verifies signature, IP address and timestamp
  • worker fetches DNS record info of a predefined subdomain
  • If the IP address is the same, nothing needs to be done
  • If the IP address is different, worker updates DNS record
  • worker notifies the agent of the outcome

Nothing too fancy or clever, right? But is works like a charm.

I’ve published my implementation on GitHub with a FOSS license, so anyone can modify and reuse. It doesn’t require any extra dependencies, it consists of only two files and you just need to drop them at the right locations and you’re ready to go. The repository can be found here and the contains the detailed steps to deploy it.

There are other small features that could be implemented, such as using the same worker with several agents that need to update different records, so only one of these “serverless functions” would be required. But these improvements will have wait for another time, for now I just needed something that worked well for this particular case and that could be easily deployed in a short time.

Robin Schroer (sulami)

Literate Calculations in Emacs May 21, 2020 12:00 AM

It is no secret that I am a big fan of literate programming for many use cases. I think it is a great match for investigative or exploratory notes, research, and configuration.

On a Friday evening about two weeks ago, my flatmate came up with an idea for doing calculations in a literate way. Of course, if you really wanted to, you could use a Jupyter Notebook, but we were looking for something more lightweight, and ideally integrated into Emacs.

A quick search came up empty, so on Saturday morning I got started writing what came to be Literate Calc Mode. The features I wanted included named references to earlier results, spreadsheet-like recalculations on every change, and the ability to save my calculations to a file. And then of course the ability to interlace calculations with explanatory text.

It was in part inspired by the iOS app TydligScreenshot of Tydlig

, which also provides calculations with automatically updating references to earlier results, but does not allow saving the workspaces as files, which I find very limiting.

But enough talk, this is what the result looks like in action:

This is literate-calc-minor-mode, running alongside org-mode. As you can see, it automatically picks up calculations and inserts the results as overlays at the end of the line. It allows the user to bind results to variables, which can even have names with spaces. Any change causes all values to be recalculated, similar to a spreadsheet.

Because it uses Emacs’ built-in calc-eval behind the scenes, it supports almost everything M-x calc does, including formulas, complex units, and unresolved mathematical variables.

Of course there are also other convenience functions, such as evaluating just a single line, or inserting the results into the file for sharing. I do have some more plans for the future, which are outlined in the documentation.

In addition to hopefully providing some value to other Emacs users, this was also a great learning experience.On a meta-level, writing this post has taught me how to use <video> on my blog.

I have learned a lot about overlays in Emacs, and I published my first package on MELPA, which was a thoroughly pleasant experience.

eta (eta)

Writing as a form of relief May 21, 2020 12:00 AM

The content on this blog does not get updated very frequently.

This is largely because, as a somewhat permanent and public sort of thing, I have to be quite careful about what stuff I write on here, since it could come back to bite me later, right? We’ve all heard the stories about people getting turned down from job offers due to some embarrassing stuff they posted on the social network du jour12, and I generally get the impression that being careful with what you choose to write into your permanent online record is generally a good thing. (Well, when you phrase it like that…)

What are blogs for, though? As far as I can tell, almost nobody reads this one; there are a few stragglers who come here through Google because I posted about something like microG or Rust programming (even though there are now much better resources to learn Rust out there, given the language has changed a whole load since I started learning it…).

Furthermore, I don’t consider myself the kind of person who’s happy to go and do lots of writing about technical topics (for the moment, at least). Some people can sustain an entire blogging habit by packing things full of interesting technical content / deep dives / whatever. This is great, because then the content is at least ‘useful’ to the person reading it (in some sense3), instead of being some poor sap whining on about random other things happening in their life.

Unfortunately, though, I’m not one of these people. So, either I go about my life and don’t write any part of it up on the blog, or I do the converse, and end up spewing things I’ll probably regret later out into the web at large.

There’s a pandemic on. Everyone’s feeling miserable, a lot of people have very tragically lost their lives to the COVID-19 virus, and people are beginning to question all sorts of things about the existence we led before this all started.

So, what the hell, let’s just get on with it then.

The utility of personal content

Some people might be of the opinion that personal content (like somewhat soppy blog posts) is not worth reading, and should perhaps be gotten rid of entirely. However – and obviously I’m going to be biased here! – I don’t really think so. Okay, I think the sort of content where people talk about whatever mundane things they’ve been getting up to (“so, this week, I washed my bike, went out for a run, tinkered with Node.js a bit…”) is perhaps a bit of a waste of time – I’d call that ‘oversharing’, perhaps. But I do think it’s possible to read stuff about someone else’s problems and gain some insight into how you might be able to solve your own, so I don’t really want to dismiss personal content entirely.

I guess there’s a distinction between content that is purely descriptive – explaining how much you hate yourself, or how annoying thing X is, or whatever – and content that has an analytical or empathetic component as well – trying to figure out the reasons why this is the case and provide some advice to people feeling the same way, or otherwise attempting to connect your own personal experience to what others may feel. The former has no value to the reader, really – oh, poor random internet commentator. How sad. Imagine an agony aunt column without the agony aunt’s responses. How awful would that be? But the latter kind of stuff can definitely be of some value; I’ve read things on the web that have influenced the way I look at the world and respond to things – most people probably have. So it’s not entirely worthless!

The title of this post

In fact, I think writing about things is a great way to process and deal with said things. I’m not just talking about personal or emotional matters – way back in 2016 when I wrote a short Rust tutorial series, the aim was as much to inform others as to force me to be honest about my own Rust abilities; writing something up gets you to specify what exactly you mean in plain English, which can be great for identifying gaps in your knowledge, or areas of flawed thinking.

This is partially why, as it says in the title, writing can be a form of relief; there’s something about putting pen to paper that makes you feel just a bit better about whatever it is you’re writing about, be that your frustrations learning a new programming language or something more personal.

It’s also, in some ways, a lot lower friction than talking to someone about something. If you start calling up your friends and ranting to them about how much asynchronous programming paradigms suck, you eventually lose most of your friends – whereas you aren’t going to annoy anyone, or take up anyone’s time, by writing about things4. (Unless you do something crazy like start sending your friends letters in the post containing your rants. This is also a good way to lose most of your friends.)

‘Mental health awareness’

Now, of course, you don’t actually have to publish anything to get these benefits; simply writing something up should be enough. (This is the idea behind journaling, I think.) In fact, as I discussed at the start, publishing things can be harmful to your career.

However, I think it’s still worth doing: I’m a human being, and you are too. There are thousands of tech blogs that just talk about tech and don’t talk about anything personal or human; there are thousands of people who only talk about technical topics on their website and never mention a thing about their private lives. I’m not saying they should – but I do tend to think that seeing other people talk about their problems publicly can be a great motivator for you to do the same (for example, I’m a big fan of and her occasional post about toxic Silicon Valley culture). To me, that’s what this concept of ‘mental health awareness’ is about (at least in part): recognizing that other people are people too, and trying to get people to talk more openly about their thoughts and feelings, instead of just keeping them to themselves.

So, yeah. Write (somewhat critically) about things that bother you, even if they aren’t technical. It’s helpful for you, and you never know what impact it’ll have on somebody else!

Or, you know, just don’t, if you’re not into that sort of thing. But I’m going to give it a try.

Also, the hope is that just trying to get into a semi-regular pattern of writing about /anything/ without much of a filter will mean that more technical stuff seeps out as well. We’ll see what happens!

  1. Well, people tell me this happens. I’m not, ehm, experienced enough to actually have heard of this happening first-hand. 

  2. On a related note, if you’re someone who might have the capability to make me a job offer, just… do me a solid and don’t read the blog, okay? :p 

  3. More on this later. 

  4. You also are probably not going to annoy your friends by talking about personal issues if you really feel the need to talk to someone about them, since that’s what friends are for! However, it doesn’t feel too great having to do this a lot (where ‘a lot’ is subjectively defined) – in other words, even though your friends might not actually get annoyed, your fear of them getting annoyed (and perhaps not telling you) might be enough to make you not want to talk to them. 

May 20, 2020

Benjamin Pollack (gecko)

The Deprecated *nix API May 20, 2020 07:31 PM

I realized the other day that, while I do almost all of my development “in *nix”, I don’t actually meaningfully program in what I traditionally have thought of as “*nix” anymore. And, if things like Hacker News, Lobsters, and random dotfiles I come across on GitHub are any indication, then there are many developers like me.

“I work on *nix” can mean a lot of very different things, depending on who you ask. To some, it honestly just means they’re on the command line: being in cmd.exe on Windows, despite utterly different (and not necessarily inferior!) semantics, might qualify. To others, it means a rigid adherence to POSIX, even if GNU’s incompatible variants might rule the day on the most common Linux distros. To others, it truly means working on an actual, honest-to-goodness Unix derivative, such as some of the BSDs—or perhaps a SunOS or Solaris derivative, like OpenIndiana.

To me, historically, it’s meant that I build on top of the tooling that Unix provides. Even if I’m on Windows, I might be developing “in *nix” as long as I’m using sed, awk, shell scripts, and so on, to get what I need to do done. The fact I’m on Windows doesn’t necessarily matter; what matters is the underlying tooling.

But the other day, I realized that I’ve replaced virtually all of the traditional tooling. I don’t use find; I use fd. I don’t use sed; I use sd. du is gone for dust, bash for fish, vim for kakoune, screen for tmux, and so on. Even the venerable grep and awk are replaced by not one, but two tools, and not in a one-for-one: depending on my ultimate goal, ripgrep and angle-grinder replace either or both tools, sometimes in concert, and sometimes alone.

I’m not particularly interested in a discussion on whether these tools are “better”; they work better for me, so I use them. Based on what I see on GitHub, enough other people feel similarly that all of these incompatible variations on a theme must be heavily used.

My concern is that, in that context, I think the meaning of “I write in *nix” is starting to blur a bit. The API for Windows is defined in terms of C (or perhaps C++, if you squint). For Linux, it’s syscalls. For macOS, some combo of C and Objective-C. But for “*nix”, without any clarifying context, I for one think in terms of shell scripts and their utilities. And the problem is that my own naïve scripts, despite being written on a legit *nix variant, simply will not run on a vanilla Linux, macOS, or *BSD installation. They certainly can—I can install fish, and sd, and ripgrep, and whatever else I’m using, very easily—but those tools aren’t available out-of-the-box, any more than, I dunno, the PowerShell 6 for Linux is. (Or MinGW is for Windows, to turn that around.) It amounts to a gradual ad-hoc breakage of the traditional ad-hoc “*nix” API, in favor of my own, custom, bespoke variant.

I think, in many ways, what we’re seeing is a good thing. sed, awk, and the other traditional tools all have (let’s be honest) major failings. There’s a reason that awk, despite recent protestations, was legitimately replaced by Perl. (At least, until people forgot why that happened in the first place.) But I do worry about the API split, and our poor ability to handle it. Microsoft, the paragon of backwards compatibility, has failed repeatedly to actually ensure that compatibility, even when armed with much richer metadata than vague, non-version-pinned plain-text shell-scripts calling ad-hoc, non-standard tooling. If we all go to our own variants of traditional Unix utilities, I worry that none of my scripts will meaningfully run in a decade.

Or maybe they will. Maybe my specific preferred forks of Unix utilities rule the day and all of my scripts will go through unscathed.

May 19, 2020

Jan van den Berg (j11g)

Unorthodox – Netflix miniseries May 19, 2020 06:19 PM

I was impressed by the Netflix miniseries Unorthodox. Specifically with the talented actors, the believable authentic world-building and the spot-on casting (so good). With regards to all of these aspects this is a very good show.

Huge parts of the show are in Yiddish which is a unique experience (especially when you speak a little bit of German). It felt genuine and intimate.

Moishe and Esty – Two main characters with similar but ultimately opposite experiences

I like that the story works with flashbacks and you are thrown right in the middle. And a lot is left unexplained, specifically Hasidic customs. Some, you intuitively understand (e.g. consistently touching/kissing doorposts) while others left me puzzled (an entire kitchen wrapped in tin-foil?). The show does not over-explain and it keeps the story going, but it does provide enough pointers to dig deeper.

The final audition scene tied a lot of things together — families, friends and worlds — while at the same time it made it clear that some bridges were definitely burned. Wonderfully done.

Loose ends?

However, there were some things that could have been better, or that made little sense. Spoilers ahead.

  • How long was Esty in Berlin, are we watching days, weeks or even months? Sometimes I thought this was a couple of days. But that didn’t always make sense.
  • Why did the grandmother die? Esty was never made aware of this, so what was the purpose of this tragic subplot?
  • What was the meaning of Moishe’s successful gambling scene? The fact that he won in a pokergame didn’t add anything new to his character (we already knew he had an ambivalent personality) or the story, but they made it seem significant — including his full monty dive into a Berlin river.
  • I understand that a miniseries that is almost shorter than the latest Scorsese does not have time for everything. However, the relationship Esty quickly gets with the coffee-guy felt a bit forced and far-fetched for her character arc. You don’t go from removing your sheitel to sleeping with a guy in two (?) days.

This being said, maybe some of these are setups; loose ends for another season? It could very well be, because there are still some story lines left to explore (specifically Moishe). I would watch it.

The post Unorthodox – Netflix miniseries appeared first on Jan van den Berg.

Mark J. Nelson (mjn)

Newsgames of the 2000s May 19, 2020 12:00 PM

During grad school in the late 2000s, I used to maintain a list of newsgames, i.e. games that commented in some kind of timely and usually editorial way on current events. I remembered it existed while looking through an archive of my old website, and decided to reproduce it here in case it helps anyone looking for information on newsgames of that period. You can also browse the original list on the Wayback Machine if you prefer.

The list covers games released in 1999–2008, with a bit of ad-hoc summarization, analysis, and categorization, and link to more in-depth commentary elsewhere, where available. It's almost certainly missing other games of the same era that I didn't happen to run across. I've replaced dead links with updated ones or Wayback Machine links, where possible, but I've otherwise left the entries unedited as I originally wrote them. Even in cases where the Wayback Machine has a copy of the game, unfortunately, many of them run on the Flash platform, which is tricky to get working in modern browsers. (Note that since my commentary is unedited, when an entry says that a link is to the Internet Archive's copy or a game is unavailable, it means this was already the case as of the late 2000s.)


Pico's School

Released a few months after the Columbine school shootings, Pico's School drops you, as Pico, into a school that's just been shot up and taken over by angsty goth kids who like KMFDM. You play a graphical-adventure game to defeat them, interspersed with arcade-style boss fights. There are aliens also. Somewhat ambiguous what the commentary is; it was controversial at the time for supposedly making light of the tragedy through farcical elements or even appealing to disaffected teens. On a less newsgamey note, it was also the game that launched the reputation of Newgrounds, as well as an impressive technical achievement given the limitations of 1999-era Flash 3 that had made previous Flash games not nearly as interactive.


Kabul Kaboom!

An arcade-style game where you're an Afghani civilian who has to catch U.S. aid packages (hamburgers) while dodging U.S. bombs. Released during the U.S. war against the Taliban, in which it was also dropping humanitarian aid to Afghani civilians. A bit more discussion from designer Gonzalo Frasca can be found here.


Al Quaidamon

  • URL:
  • Author: Tom Fulp
  • Date: February 2002
  • Platform: Web
  • Types of commentary: direct criticism, satire, rhetoric of failure
  • Types of gameplay: classic arcade, whack a mole
  • Subjects: terrorism, human rights, police

A criticism of liberal-minded criticisms of U.S. treatment of its war-on-terror prisoners. There's a prisoner, and you can choose to either punch him, or feed him donuts, brush his hair, or attend to his wounds, each of which impacts a meter that shows how well he's being treated. The meter steadily decreases if you do nothing, and even with non-stop coddling it's hard to get up to Geneva Convention standards, which are in any case portrayed as being better than the lives of most Americans. Part of the War on Terror collection at newgrounds, although one of the few games in the collection with editorial content.


  • URL:
  • Author: Josh On
  • Date: May 2002
  • Platform: Web
  • Types of commentary: direct criticism, pointing out tradeoffs
  • Types of gameplay: strategy, resource management
  • Subjects: international affairs, corporations, war

A simulation of the war on terror that critiques war through the simulation rules. For example, business and military spending is basically the same; not using your troops enough reduces their effectiveness; not spending enough on domestic affairs makes you unpopular; not spending enough on the military can get you assassinated; and so on. The commentary isn't particularly subtle, but the way it's built into the simulation rules is a nice use of games' procedurality.

See this blog post by Michael Mateas for a more detailed rundown of the rules and commentary they produce, and, for those not afraid of books, pgs. 82-84 of Ian Bogost's Persuasive Games (MIT Press, 2007) for more discussion.


September 12

A critique of the "war on terror"'s use of missile strikes that cause civilian casualties. You can fire a missile periodically at terrorists, or not. If you do, you'll almost certainly cause the landscape to get increasingly battle-scarred, while causing an increase in the number of terrorists through civilian casualties. If you don't, terrorists will stay present at some default background level. The first game to call itself a "newsgame".

The Howard Dean for Iowa Game

  • URL:
  • Authors: Persuasive Games and Gonzalo Frasca
  • Date: December 2003
  • Platform: Web
  • Type of commentary: electioneering
  • Types of gameplay: strategy, social
  • Subjects: elections, activism

Somewhat of a milestone in political games, since it was commissioned officially by the Howard Dean campaign for his run in the 2004 Democratic primary. Has a map-level strategic view in which you place supporters, and once you place a supporter, goes into a short real-time segment where you try to wave your campaign sign at people. There were initially some social effects, with the map changing colors based on the level of Dean support in various regions created by other players of the game, plus even some instant-messaging integration, though the IM integration has since been disabled, and the social effects are hard to see since few people still play the game. The main aim of the game seems to have been to raise some vague awareness about how the caucus system works, plus just create some buzz.

The creators give a detailed retrospective account in this essay.

Kimdom Come

You play North Korean leader Kim Jong Il in a farcical game of brinksmanship, threatening South Korea with your missiles, staging parades, gaining concessions, and so on. Gameplay is a mixture of strategy and stuff like missile-command style arcade action. Released during one of many periods where North Korea was making belligerent noises and negotiating concessions from the West. Available for the Mac.

Reelect Bush?

  • URL:
  • Author: EllaZ Systems
  • Date: December 2003
  • Platform: Standalone executable
  • Type of commentary: satire
  • Types of gameplay: sim, quiz
  • Subjects: elections, famous people

One of several games bundled with the chatbot "AI Bush", a Bush-specific version of a chatbot from a group that seems to have state-of-the-art technology as far as chatbots go. This one mocks George W. Bush's reputed lack of knowledge on various subjects by having you play an advisor who whispers answers to him, though he stops listening to you if you feed him too much bad advice. Was available for Windows for purchase, but no longer seems to be, so summary based on the official blurb rather than playing it myself.



  • URL:
  • Author: Gonzalo Frasca
  • Date: March 2004
  • Platform: Web
  • Type of commentary: memorial
  • Type of gameplay: whack a mole
  • Subject: terrorism

Released two days after the March 2004 Madrid subway bombings, this is a simple game where you click on candles' flames to make them burn brighter, raising an overall light meter. They diminish some time after you click them, so gameplay is to keep clicking to keep them as bright as possible.

Useful Voter Guide

  • URL:
  • Author: LoveInWar
  • Date: March 2004
  • Platform: Web
  • Type of commentary: satire
  • Type of gameplay: quiz
  • Subjects: elections, media

A parody of both polarized politics and the voter questionnaires that ask you a series of questions and give you a political position or candidate based on your replies. In this one, you're presented with a series of people representing opposite stereotypes, and shoot the one you hate most: Would you rather put a bullet in the flag-waving guy, or the America-hating one? Based on your choices, you get a political position, invariably described in insulting terms. The original site seems to have disappeared, so the link is to the Internet Archive's copy.

John Kerry Tax Invaders

  • URL:
  • Author: The Republican Party
  • Date: April 2004
  • Platform: Web
  • Types of commentary: direct criticism, electioneering
  • Type of gameplay: classic arcade
  • Subjects: elections, taxes

A skin of Space Invaders with you playing George W. Bush shooting down John Kerry tax proposals. Not very much game rhetoric here beyond a skin, but it gets included in the list since it was an earlyish official political game. The original site is no longer up, so the link is to the Internet Archive's copy.

2004 Everybody Fight

  • URL:
  • Author: Digital Extreme
  • Date: May 2004
  • Platform: Standalone executable
  • Type of commentary: satire
  • Type of gameplay: first person shooter
  • Subject: elections

A first-person shooter featuring personalities from Taiwan's 2004 election. It's tempting to consider this just an opportunistic skin of an FPS, but the manufacturer claims it's a parody of violence and acrimony between supporters of the two main political camps. Summary based on an article in the Taipei Times.


  • URL:
  • Author: The Republican Party
  • Date: May 2004
  • Platform: Web
  • Types of commentary: direct criticism, electioneering, rhetoric of failure
  • Type of gameplay: non videogame
  • Subjects: elections, wealth, famous people

A stripped down representation of Monopoly, where you start out with $40,000, labeled as the average household income, and roll the dice to move around a board filled with properties owned by Kerry, which inevitably bankrupt you. The gameplay is a bit weak, but I suppose the point that Kerry is rich comes across. The original site is no longer up, so the link is to the Internet Archive's copy, which is somewhat broken unfortunately.

Escape from Woomera

A Half-Life mod that puts you inside Australia's controversial Woomera Detention Centre for asylum applicants. You can try to get yourself asylum; navigate the daily routines of what's essentially prison life; or try to escape from the razor-wire compound. Of course, you can't get asylum, and you can't escape either. The inevitable gameplay failure highlights the no-win situation asylum applicants are put in, and the portrayal of prison-like conditions aims to highlight to the game-playing public why they should oppose the detention center (the game is unapologetically part of an anti-Woomera campaign). Got a good bit of media coverage during development.


  • URL:
  • Author: Persuasive Games
  • Date: September 2004
  • Platform: Web
  • Types of commentary: electioneering, pointing out tradeoffs
  • Types of gameplay: resource management, whack a mole
  • Subjects: elections, activism

A political game commissioned by the Democratic Congressional Campaign Committee (DCCC) for the 2004 elections. You allocate your 10,000 activists between six public-policy areas, and that allocation combined with some whack-a-mole gameplay on your part affects the three factors of money, peace, and quality of life. Mainly a simulation and encouragement of activism itself and promoting the view that there are lots of ways to balance priorities rather than direct commentary on any of the six policy areas (though some references to Democratic policy proposals are thrown in). It also allows players to share their "activism plans", which are indexed by demographic information.

Take Back Illinois

  • URL:
  • Author: Persuasive Games
  • Date: September 2004
  • Platform: Web
  • Types of commentary: electioneering, pointing out tradeoffs
  • Types of gameplay: resource management, sim
  • Subjects: elections, health care, tort reform, education, economy, activism

A political game commissioned by the Illinois state Republican Party for the 2004 election. Actually a sequence of four games, released one per week, about medical malpractice reform, education, participation, and economic development. All except for the participation game have resource-management gameplay, where the game's rhetoric is built into the simulation rules, giving a particular view of the effects of various policy choices. The participation game is a bit different, and has you going around in a simulated world to involve people in politics.

A few comments from designer Ian Bogost at the Persuasive Games page on the game and Water Cooler Games.


  • URL:
  • Author: Gonzalo Frasca
  • Date: October 2004
  • Platform: Web
  • Type of commentary: electioneering
  • Type of gameplay: non videogame
  • Subject: elections

A game created for the October 2004 Uruguayan presidential elections, commissioned by the left-wing coalition Frente Amplio - Encuentro Progresista. You put together tiles to complete a puzzle that shows positive, uplifting images of Uruguay's future. Some commentary from BBC News is available here. The game itself doesn't seem to be online anymore; let me know if you know of or have a copy.



A monopoly-inspired board game (though on the computer) with inverted goals: you compete to destroy the economy and tear down the properties, satirizing the ongoing conflict and mismanagement of Robert Mugabe in Zimbabwe. Available for the Mac.

Airport Insecurity

A parodic simulation of airport security practices. Models 138 airports, with varying degrees of inconvenience that nonetheless fail to provide very good security. Mocks your annoying fellow travelers for good measure. Sells for $3.99 for certain Nokia mobile phones, so you can play it while in a security line.


Darfur is Dying

  • URL:
  • Author: USC students
  • Date: April 2006
  • Platform: Web
  • Types of commentary: direct criticism, rhetoric of failure
  • Types of gameplay: resource management, sim
  • Subjects: international affairs, war, human rights

A game designed to raise awareness about the humanitarian situation in Darfur. There are two gameplay modes: In one, you try to manage a camp, which is short on resources and under constant threat of attack. In the other, you send children to go fetch water, admist threat of attack and abduction. The main point is that the situation is hopeless and intolerable without outside assistance. Winner of a digital-activism competition that led to it being distributed by mtvU.

There are writeups at Gameology and at Serious Games Source.

Airport Security

Another parodic simulation of airport security practices, from the makers of Airport Insecurity, but this time from the perspective of the security personnel who have to enforce absurd changing regulations. Released shortly after a rule change banned liquids in carry-on luggage at U.S. airports.

So You Think You Can Drive, Mel?

You play Mel Gibson trying to drive his car while getting increasingly drunker, without running over state troopers or getting hit by the stars of david that Hasidic Jews on the side of the road throw at you. Mocks an incident in which Gibson was pulled over for drunk driving and went on a tirade about Jews. The editorial content isn't particularly strong, though Zach Whalen over at Gameology thinks it does have a reasonable amount.

Bacteria Salad

This game was released shortly after a bagged-spinach recall in the U.S. due to E. coli contamination, which affected a surprisingly large number of retail brands due to common sourcing. The game has you manage a farm operation, and trades off running separate small farms (less profitable, but less spread of contamination) versus consolidating them into a gigantic farm (very profitable, but contamination spreads more easily). The actual active gameplay is whack-a-mole style cleaning up of contamination and issuing recalls before anyone gets sick.


Food Import Folly

Released shortly after a contaminated food import scandal in the U.S., this game puts you in an impossible role inspecting food imports with few resources. The gameplay is whack-a-mole style, where you click on imports as they come in to inspect them before a contaminated one can get through, but you have only one guy, who takes some time to inspect a shipment and can't do anything else in the meantime. So the game mainly consists of sitting there losing regardless of what you do—a good example of what designer Ian Bogost calls a "rhetoric of failure". Also notable for being the first "playable editorial cartoon" published by a major newspaper (The New York Times).

Operation: Pedopriest

  • URL:
  • Author: Molleindustria
  • Date: June 2007
  • Platform: Web
  • Types of commentary: direct criticism, satire, rhetoric of failure
  • Type of gameplay: sim
  • Subjects: bureaucracy, religion, human rights

Based on a late-2006 BBC documentary alleging the Vatican had a secret process to deal with priest sex-abuse allegations quietly, this game puts you in the role of trying to protect children from pedophilic priests while also warding off police, parents, and so on. It turns out not to be possible to succeed at protecting the children, though just protecting the priests is possible.

Points of Entry

  • URL: (dead link)
  • Author: Persuasive Games
  • Date: June 2007
  • Platform: Web
  • Type of commentary: satire
  • Type of gameplay: sim
  • Subjects: immigration, bureaucracy

A commentary on a proposed system that would give prospective U.S. immigrants points based on various criteria such as job status, age, English skills, and so on. You play an immigration clerk who has to adjust the stats of a prospective immigrant so that they're better than those of the clerk next to you, but by as small a margin as possible. It's also timed. The scenario seems a bit silly, but it succeeds in making you learn what the proposed point allocations are, in addition to portraying the process as arbitrary and bureaucratic.

Presidential Pong

A fairly simple game, but the first newsgame published by CNN, in which presidential debates are played out as a game of pong. Could be interpreted as a fairly boring shallow skin on pong, or as satirizing the quality and level of earnestness of presidential debates.


  • URL:
  • Author: INM Inter Network Marketing
  • Date: October 2007
  • Platform: Web
  • Types of commentary: direct criticism, electioneering, rhetoric of failure
  • Types of gameplay: classic arcade, whack a mole
  • Subjects: elections, immigration, taxes

An almost comically racist game by the Swiss People's Party (SVP). Has good production values and made some news, so probably one of the more successful political games. It features the party's mascot, Zottel, a very Swiss goat, facing off in four games against abuse of naturalization, illegal immigration, EU tax collectors, and federal government waste. The party had previously courted controversy with a poster that showed Zottel kicking out a black sheep, and the theme reappears here, where in one game you need to keep the black sheep off Switzerland's green pastures without harrassing the friendly white sheep. Oh, and watch out for the dastardly Green Party, trying to smuggle illegal immigrants in their party buses! The game seems to have disappeared from the internet, but there are pretty good writeups here and here. Let me know if you have or know of an archived copy.

Matt Blunt Document Destroyer

  • URL:
  • Author: The Democratic Party
  • Date: November 2007
  • Platform: Web
  • Types of commentary: direct criticism, electioneering
  • Type of gameplay: whack a mole
  • Subject: corruption

A whack-a-mole game where you try to stop Missouri governor Matt Blunt from deleting emails, apparently as a response to some sort of email-deleting scandal. The editorial content is rather weak.



  • URL:
  • Author: Conor O'Kane
  • Date: January 2008
  • Platform: Standalone executable
  • Type of commentary: satire
  • Type of gameplay: classic arcade
  • Subject: whaling

A shmup billed as a "Japanese Cetacean Research Simulator". The political commentary is of course in the discontinuity between the billing (cetacean research) and the actual gameplay (a whale-harpooning shmup), paralleling the disconnect between the official and actual purposes of Japan's whaling program. Available for Windows.

I Can End Deportation

  • URL:
  • Author: Breakthrough
  • Date: February 2008
  • Platform: Standalone executable
  • Types of commentary: direct criticism, rhetoric of failure
  • Types of gameplay: sim, quiz
  • Subjects: immigration, bureaucracy

A game released amidst ongoing debate over immigration reform in the United States that aims to highlight the brokenness of the immigration and enforcement systems, and raise some sympathy for immigrants' situation. Gameplay is an open-world game where immigrants need to carry on life while avoiding police and making various choices, some of which are presented explicitly in pop-up quiz type boxes, which then give vaguely didactic correct answers to try to educate the player on the complexities of immigration law. Some gameplay rules make rhetorical points as well, such as the results of immigration trials being basically random.

There's an interesting exchange about the game at Water Cooler Games: Ian Bogost posts an extended critique that gives it a mostly lukewarm review, and lead designer Heidi Boisvert responds, also at some length (fifth comment down).

Available for the Mac and Windows.

Police Brutality

A response to the "Don't tase me, bro!" incident in which a student protestor at a John Kerry event was tasered. You organize students by knocking them out of their torpor and blocking the police. Sort of the opposite of the more common "rhetoric of failure", in which an impossible-to-win game points out the impossibility of a situation—here a possible-to-win game aims to emphasize the possibility of successfully organizing and resisting the police in such a situation.

Available for Windows.

Sevan Janiyan (sevan)

Heads up for RSS subscribers May 19, 2020 11:40 AM

I’m going to be experimenting with the migrating from WordPress to Hugo this week, if you subscribe to the RSS feeds on this site and wish to continue to do so, you might want to check everything is ok at your end after Monday the 25th. One of the key factors of migrating to Hugo …

May 18, 2020

Jan van den Berg (j11g)

Impatient Optimist: Bill Gates in His Own Words – Lisa Rogak May 18, 2020 12:06 PM

I have a lot of respect for Bill Gates and tend to follow what he does. So this book, just like the one on Steve Jobs, is a nice reminder of the man’s personality and his thinking process.

As it spans some 30+ years, there are mild variations noticeable, but overall: what you see is what you get and with Bill Gates and that is head-on, rational straightforwardness and a passion for software.

Impatient Optimist: Bill Gates in His Own Words – Lisa Rogak (2012) – 160 pagina’s

The post Impatient Optimist: Bill Gates in His Own Words – Lisa Rogak appeared first on Jan van den Berg.

iSteve – George Beahm en Wim Zefat May 18, 2020 12:05 PM

This is a book just with quotes from late Apple founder Steve Jobs. I already knew most of them, having read more than one book about Steve Jobs. Nonetheless, seeing his most salient quotes in one place is a good indication and reminder of the man’s personality and vision.

iSteve – George Beahm en Wim Zefat (2011) – 160 pagina’s

Since the quotes are all dated I particularly noticed 3 types of Steve.

  • The brass, cocky, young Steve (everything up until 1985, before his Apple exit)
  • The reflective, contemplating Steve (from 1985 – 2000 the in-between NeXT/Pixar years)
  • The seasoned, wise Steve (2000 – 2011)

You can probably date the quotes based on their spirit to either three of these periods.

The timeline after the quotes was a great plus for this book as well as the references! However this book was not without mistakes, there never was an iPhone 4GS (a 4S sure) and the iPod was introduced on October 23 2001 (not in november).

The post iSteve – George Beahm en Wim Zefat appeared first on Jan van den Berg.

May 17, 2020

Jeff Carpenter (jeffcarp)

Machine Learning Reference May 17, 2020 08:49 PM

I often need to look up random bits of ML-related information. This post is a currently work-in-progress attempt to collect common machine learning terms and formulas into one central place. I plan on updating this post as I come across further useful pieces of information as needed. This reference is not intended to be exhaustive—in fact the opposite—it is intended only to be a concise, opinionated collection of the most relevant bits of ML knowledge for quick lookup.

Derek Jones (derek-jones)

Happy 60th birthday: Algol 60 May 17, 2020 08:40 PM

Report on the Algorithmic Language ALGOL 60 is the title of a 16-page paper appearing in the May 1960 issue of the Communication of the ACM. Probably one of the most influential programming languages, and a language that readers may never have heard of.

During the 1960s there were three well known, widely used, programming languages: Algol 60, Cobol, and Fortran.

When somebody created a new programming languages Algol 60 tended to be their role-model. A few of the authors of the Algol 60 report cited beauty as one of their aims, a romantic notion that captured some users imaginations. Also, the language was full of quirky, out-there, features; plenty of scope for pin-head discussions.

Cobol appears visually clunky, is used by business people and focuses on data formatting (a deadly dull, but very important issue).

Fortran spent 20 years catching up with features supported by Algol 60.

Cobol and Fortran are still with us because they never had any serious competition within their target markets.

Algol 60 had lots of competition and its successor language, Algol 68, was groundbreaking within its academic niche, i.e., not in a developer useful way.

Language family trees ought to have Algol 60 at, or close to their root. But the Algol 60 descendants have been so successful, that the creators of these family trees have rarely heard of it.

In the US the ‘military’ language was Jovial, and in the UK it was Coral 66, both derived from Algol 60 (Coral 66 was the first language I used in industry after graduating). I used to hear people saying that Jovial was derived from Fortran; another example of people citing the language the popular language know.

Algol compiler implementers documented their techniques (probably because they were often academics); ALGOL 60 Implementation is a real gem of a book, and still worth a read today (as an introduction to compiling).

Algol 60 was ahead of its time in supporting undefined behaviors 😉 Such as: “The effect, of a go to statement, outside a for statement, which refers to a label within the for statement, is undefined.”

One feature of Algol 60 rarely adopted by other languages is its parameter passing mechanism, call-by-name (now that lambda expressions are starting to appear in widely used languages, call-by-name has a kind-of comeback). Call-by-name essentially has the same effect as textual substitution. Given the following procedure (it’s not a function because it does not return a value):

procedure swap (a, b);
   integer a, b, temp;
temp := a;
a := b;
b:= temp

the effect of the call: swap(i, x[i]) is:

  temp := i;
  i := x[i];
  x[i] := temp

which might come as a surprise to some.

Needless to say, programmers came up with ‘clever’ ways of exploiting this behavior; the most famous being Jensen’s device.

The follow example of go to usage appears in: International Standard 1538 Programming languages – ALGOL 60 (the first and only edition appeared in 1984, after most people had stopped using the language):

go to if Ab  c then L17
  else g[if w  0 then 2 else n]

Orthogonality of language use won out over the goto FUD.

The Software Preservation Group is a great resource for Algol 60 books and papers.

Ponylang (SeanTAllen)

Last Week in Pony - May 17, 2020 May 17, 2020 03:54 PM

Pony 0.35.0 and 0.35.1 have been released! Stable, our little dependency manager that could, has been deprecated in favor of our new dependency manager, Corral.

May 16, 2020

asrpo (asrp)

Speed improvements using hash tables May 16, 2020 03:11 PM

I wrote the Forth Lisp Python Continuum (Flpc)'s self hosted compiler in stages. When I completed the parser and gave it larger and larger pieces of its own source code, it was running too slow. I tried many things to speed it up, one that helped was using hash tables.

They helped make dictionaries [names] which can

An example of a dictionary:

May 15, 2020

Jeremy Morgan (JeremyMorgan)

What Is Deno and Why Is Everyone Talking About It? May 15, 2020 03:48 PM

Deno is a hot new runtime that may replace Node.js. Everyone’s talking about it like it’s the next big thing. It likely is. Here’s why. What Is Deno? From the manual: Deno is a JavaScript/TypeScript runtime with secure defaults and a great developer experience. It’s built on V8, Rust, and Tokio. Deno is designed to be a replacement for our beloved Node.js, and it’s led by Ryan Dahl, who started the Node.

May 14, 2020

Andrew Montalenti (amontalenti)

Python 3 is here and the sky is not falling May 14, 2020 08:38 PM

James Bennett, a long-time Python developer, blogger, and contributor to Django, recently wrote a nice post about the “end” of Python 2.x, entitled “Variations on the Death of Python 2.” It’s a great read for anyone who, like me, has been in the Python community a long time.

I’ve been a Python user since the early 2.x days, first discovering Python in a print copy of Linux Journal in the year 2000, where a well-known open source developer and advocate described his transition from Perl to Python. He wrote:

I was generating working code nearly as fast as I could type. When I realized this, I was quite startled.

An important measure of effort in coding is the frequency with which you write something that doesn’t actually match your mental representation of the problem, and have to backtrack on realizing that what you just typed won’t actually tell the language to do what you’re thinking. An important measure of good language design is how rapidly the percentage of missteps of this kind falls as you gain experience with the language. When you’re writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you’ve achieved mastery of the language.

But that didn’t make sense, because it was still day one and I was regularly pausing to look up new language and library features!

This was my first clue that, in Python, I was actually dealing with an exceptionally good design.

Python’s wonderful design as a language has always been a source of inspiration for me. I even wrote “The Elements of Python Style”, as an ode to how good Python code, to me, felt like good written prose. And, of course, many of my personal and professional projects are proudly Python Powered.


Thus, I was always a little worried about the Python 2 to 3 transition. I was concerned that this one big risk, taken on by the core team, could imperil the entire language, and thus the entire community. Perl 5 had embarked on a language schism toward Perl 6 (now Raku), and many believe that both communities (Perl 5 and Raku) became weaker as a result.

But, here we are in 2020, and Python 2 is EOL, and Python 3 is here to stay. A lot of the internet debates about Python 2 vs Python 3 (like this flame war on now seem to boil down to this question: was Python 3 a good idea, in retrospect?

Python 3 caused a split in the Python community. It caused confusion about what Python actually is. It also caused a lot of porting pain for many companies, and a decade-long migration effort among major open source libraries.

If you are a relatively recent Python user and have not heard much about the Python 2 vs 3 community-wide migration effort, you can get pointers to some of the history and technical details in this wiki page. There’s a nice tl;dr summary in this Python 3 Q&A. To understand some of the porting pain involved, a somewhat common 2-to-3 porting approach is covered in the official Python 2 to 3 porting guide.

Regardless of the amount of pain caused, ultimately, Python 3 is here now. It works, it’s popular, and the Python core team has officially made its final Python 2.7 release. The community has survived the transition. To quote the Python release manager in a recent email announcing the release of Python 2.7.18:

Python 2.7.18 is a special release. I refer, of course, to the fact that “2.7.18” is the closest any Python version number will ever approximate e, Euler’s number. Simply exquisite! A less transcendent property of Python 2.7.18 is that it is the last Python 2.7 release and therefore the last Python 2 release. It’s time for the CPython community to say a fond but firm farewell to Python 2. Users still on Python 2 can use e to compute the instantaneously compounding interest on their technical debt.

Ubuntu 20.04 LTS — the “Long-Term Support” April 2020 release of one of the world’s most popular desktop and server Linux distributions — includes Python 3.8.2 by default, and includes Python 2.7.18 only optionally (via the python2 and python2-dev packages), for compatibility with old scripts.

On Ubuntu 20.04, you can install a package, python-is-python3, to ensure that the python interpreter and associated commands on your Linux machine run Python 3, which means you can then only access Python 2.7 via commands like python2, pydoc2, pdb2, and so on. The default download links on are for Python 3.x. If a Windows 10 user runs python on a stock install, they’re directed to the Windows Store to install Python 3.x. We can only assume this same assumption will be coming to Mac OS X soon.

Support for Python 2.7 and Django 1.11 ends in 2020, according to the Django project FAQ. Major Python open source project maintainers — such as those behind TensorFlow, scikit-learn, pandas, tornado, PyTorch, PySpark, IPython, and NumPy — have signed a pledge to drop Python 2 support from their projects in 2020.

The Python library “wall of shame” has become a “wall of superpowers”. And it is no longer maintained, since its mission has been accomplished.

So, it’s worth asking some questions.

Now that we’re here, is there any point in resisting any longer? Short answer: no. Python 3 is here to stay, and Python 2 is really, truly end-of-life.

Will Python 4.x ever happen, creating a disruptive 3-to-4 transition akin to 2-to-3? Short answer: no. That’s very unlikely. To quote Brett Cannon:

We will never do this kind of backwards-incompatible change again. We have decided as a team that a change as big as unicode/str/bytes will never happen so abruptly again. […] We don’t see any shortcomings in the fundamental design of the language that could warrant the need to make such a major change.

What can we learn from this experience? What has the grand experiment in language evolution taught us? Short answer: that big changes like this always take longer than you think, even when you take into account Hofstadter’s Law. Breaking backwards compatibility in a large inter-connected open source community has real cost that will test the strength of that community.

We are fortunate that Python’s community was very strong indeed at the time of this transition, and it even grew rapidly during the transition, thanks to the explosion of Python-based web programming (via Django, Flask, Tornado, etc.), numerical computing (via PyData libraries, like NumPy, Pandas), machine learning (via scikit-learn, TensorFlow, PyTorch, etc.), distributed computing (via PySpark, Dask, etc.), cloud computing (boto, google-cloud, libcloud). The network effects driven by these popular communities were like a perpetual motion machine that ensured Python’s adoption and the freshness of libraries with regard to Python 3 support.

The community is even learning to evolve beyond the direction of its creator, the BDFL, who resigned in 2018 and laid the groundwork for a smooth transition to a Python Steering Council.

So, here we are. Where do we go from here? Can the Python community continue to evolve in a positive direction atop the foundation of Python 3.x? Short answer: yes!

Python has never been healthier, and the community has learned many lessons.

So, let’s get on with it! If you’ve been holding back on using Python 3 features for some time, you can get a nice summary from this Python 3 feature round-up, which goes up through Python 3.6. Then, you can check out the official “What’s New” guides for 3.7, 3.8, and 3.9. Some of my favorite new features include:

So, what are you waiting for? It’s time to get hacking! Here’s to the next Python releases, 3.9 and 4.0 3.10! And to 3.11, 3.12, 3.13, … thereafter!

Pete Corey (petecorey)

Adding More Chords to Glorious Voice Leader May 14, 2020 12:00 AM

Prior to its latest release, Glorious Voice Leader only let you choose from a pitifully small selection of chords to build a progression with. For a tool who’s primary purpose is to guide you through the wondrous world of guitar harmony, this was inexcusable.

Glorious Voice Leader was in dire need of more chord types.

That said, faced with the enormous data entry task of manually adding every chord quality I could think of (of which, here are a few), my programmer instincts kicked in. “Music is theory is an organized, systematic area of study,” I told myself. “There has to be a way to algorithmically generate all possible chord qualities,” my naive past self believed.

What a poor fool.

What’s the Problem?

I’ve written and re-written this post a handful of times now, and each time I’ve failed to sufficiently capture the complexity of the task at hand. Regardless of the direction I’ve tackled it from, be it generating names directly and inferring notes from that name, or inferring a name from a collection of notes, this is an incredibly complicated problem.

Music is art, and music theory exists to describe it. And unfortunately for me, people have been describing music in various ways for a very long time. This means that music theory is deeply cultural, deeply rooted in tradition, and not always as systematic as we’d like to believe it to be.

The first thing we need to do when coming up with “all possible chord qualities” is deciding which tradition we want to follow. For the purposes of Glorious Voice Leader, I’m largely concerned with the jazz tradition of chord naming, which has largely evolved to describe chords used in modern popular music.

But even within a single niche, ambiguities and asymmetries abound!

A “maj7/6” chord has the same notes as a “maj13 no 9”, assuming your “maj13” chords don’t have an 11th.

Some folks assume that a “maj13” chord includes a natural 11th. Some assume it includes a sharpened 11th. Other still assume that the 11th is omitted by entirely from a “maj13” chord.

Is “aug9” an acceptable chord name, or should it be “9#5”? Both qualities share the same set of notes, and both should be understandable to musicians, but only the latter is the culturally accepted name.

Speaking of alterations like “#5” and “b9”, which order should these appear in the chord name? Sorted by the degree being altered? Or sorted by importance? More concretely, is it a “7b9#5” chord, or a “7#5b9”?

Many notes in a chord are optional, including the root note! A Cmaj13 without a 1st, or 5th is perfectly acceptable. Even the third can be optional. But is a Cmaj13 without a 1st, 3rd, and 5th still a Cmaj13? At what point does a chord with missing notes cease to be that chord?

The subtleties and nuances goes on an on.

A More Human Approach

Rather than fully automating the generation of chord qualities and names through algorithmic means, I decided to take a more human approach. I start with a large set of human-accepted chord formulas and their corresponding names:

const baseQualities = [
  ["maj", "1 3 5"],
  ["maj6", "1 3 5 6"],
  ["maj/9", "1 3 5 9"],
  ["maj6/9", "1 3 5 6 9"],
  ["maj7", "1 3 5 7"],

From there, we can modify our formulas to specify which notes in the chord are optional. It’s important to note that when specifying optional notes, any or all of those notes may be missing and the name must still make sense.

const baseQualities = [
  ["maj", "1 3 5"],
  ["maj6", "1 3 (5) 6"],
  ["maj/9", "1 3 (5) 9"],
  ["maj6/9", "(1) 3 (5) 6 9"],
  ["maj7", "(1) 3 (5) 7"],

So a chord with a formula of “1 3 5 6 9”, “3 5 6 9”, “1 5 6 9”, or “3 6 9” can still be considered a “maj6/9” chord.

For ever [name, formula] pair, we’ll tease out the full set of scale degrees and the set of optionals. From there, we find all _.combinations of those optionals that we’ll remove from the list of degress. For each combination, the degrees without the missing degrees creates a new formula and name specifying which degrees are missing:

export const qualities = _.chain(baseQualities)
  .flatMap(([name, formula]) => {
    let degrees = _.chain(formula)
      .map(degree => _.trim(degree, "()"))
    let optionals = _.chain(formula)
      .filter(degree => _.startsWith(degree, "("))
      .map(degree => _.trim(degree, "()"))
    return _.chain(_.range(_.size(optionals) + 1))
      .flatMap(i => _.combinations(optionals, i))
      .map(missing => {
        let result = {
          name: _.chain(missing)
            .map(degree => `no ${_.replace(degree, /#|b/, "")}`)
            .join(" ")
            .thru(missingString => _.trim(`${name} ${missingString}`))
          formula: _.chain(degrees)
            .join(" ")
        result.value = JSON.stringify(result);
        return result;

Some formulas have so many optional notes that the removal of enough of them results in a chord with less than three notes. We don’t want that, so we’ll add one final filter to our qualities chain:

export const qualities = _.chain(baseQualities)
  .flatMap(([name, formula]) => { ... })
  .reject(({ degrees, parent, missing }) => {
    return _.size(degrees) < 3;

And that’s all there is to it.

Final Thoughts

From a set of eighty two baseQualities that I entered and maintain by hand, this algorithm generates three hundred twenty four total qualities that users of Glorious Voice Leader are free to choose from.

This list is by no means exhaustive, but with this approach I can easily change and add to it, without concern for the oddities and asymmetries of how actual humans name the chords they play.

A part of me still believes that an algorithmic approach can generate chord quality names that fall in line with human expectations, but I haven’t found it. I imagine this is one of those problems that will live in the back of my mind for years to come.

May 13, 2020

Gustaf Erikson (gerikson)

May 12, 2020

Jan van den Berg (j11g)

I, Robot – Isaac Asimov May 12, 2020 03:33 PM

If one writer is responsible for how we think about robots it is, of course, Isaac Asimov. The terrifically prolific writer and groundbreaking author of the science-fiction genre, produced numerous works with terrific futuristic insight — and, some were about robots. And I, Robot is a seminal work in this oeuvre. But this book is of course not really about robots, or the famous law of robotics.

I, Robot – Isaac Asimov (1950) – 271 pages

No, this law is a vehicle, for these 9 loosely connected stories to present — very clever — logical puzzles often with a philosophical or ethical undertone. And this is what makes this work hold up, even after 70 years (this was written in 1950 🤯).

Our views on robots might have changed but the questions remain valid. And it not so much the robots Asimov makes us think about, but even more so about what it means to be human.

The post I, Robot – Isaac Asimov appeared first on Jan van den Berg.

May 11, 2020

Jeremy Morgan (JeremyMorgan)

Setting Up Pop!_OS for Front End Development May 11, 2020 08:29 PM

If you’ve heard all the chatter lately about Pop!_OS and have been wanting to try it out, here’s a good guide to setting up a front end development machine. If you’re relatively new to Linux and are just trying it out, I suggest building this in a Virtual Machine. I have the full instructions for installing Pop!_OS in a virtual machine here. This is the best way to dip your toes in without significant modifications to your computer.

Bogdan Popa (bogdan)

Continuations in Racket's Web Server May 11, 2020 08:55 AM

In The Missing Guide to Racket’s Web Server, I said that dispatch/servlet is equivalent to: 1 2 3 (lambda (start) (lambda (conn req) (output-response conn (start req)))) That was an oversimplification. It does apply its start argument to incoming requests and it does take care of writing the responses to the appropriate connections, but it has another important job: to handle responses returned from continuations and to dispatch incoming requests to captured continuations.

May 10, 2020

Derek Jones (derek-jones)

Having all the source code in one file May 10, 2020 10:27 PM

An early, and supposedly influential, analysis of the Coronavirus outbreak was based on results from a model whose 15,000 line C implementation was contained in a single file. There has been lots of tut-tutting from the peanut gallery, about the code all being in one file rather than distributed over many files. The source on Github has been heavily reworked.

Why do programmers work with all the code in one file, rather than split across multiple files? What are the costs and benefits of having the 15K of source in one file, compared to distributing it across multiple files?

There are two kinds of people who work with code all in one file, novices and really capable developers. Richard Stallman is an example of a very capable developer who worked using files containing huge amounts of code, as anybody who looked at the early sources of gcc will be all to familiar.

The benefit of having all the code in one file is that it is easy to find stuff and make global changes. If the source is scattered over multiple files, then working on the code entails knowing which file to look in to find whatever; there is a learning curve (these days screens have lots of pixels, and editors support multiple windows with a different file in each window; I’m sure lots of readers work like this).

Many years ago, when 64K was a lot of memory, I sometimes had to do developer support: people would come to me complaining that the computer was preventing them writing a larger program. What had happened was they had hit the capacity limit of the editor. The source now had to be spread over multiple files to get over this ‘limitation’. In practice people experienced the benefits of using multiple files, e.g., editor loading files faster (because they were a lot smaller) and reduced program build time (because only the code that changed needed to be recompiled).

These days, 15K of source can be loaded or compiled in a blink of an eye (unless a really cheap laptop is being used). Computing power has significantly reduced these benefits that used to exist.

What costs might be associated with keeping all the source in one file?

Monolithic code makes sharing difficult. I don’t know anything about the development environment within which these researched worked. If there were lots of different programs using the same algorithms, or reading/writing the same file formats, then code reuse often provides a benefit that makes it worthwhile splitting off the common functionality. But then the researchers has to learn how to build a program from multiple source files, which a surprising number are unwilling to do (at least it has always been surprising to me).

Within a research group, sharing across researchers might be a possible (assuming they are making some use of the same algorithms and file formats). Involving multiple people in the ongoing evolution of software creates a need for some coordination. At the individual level it may be more cost-efficient for people to have their own private copies of the source, with savings only occurring at the group level. With software development having a low status in academia, I don’t see any of the senior researchers willingly take on a management role, for this code. Perhaps one of the people working on the code is much better than the others (it often happens), but are they going to volunteer themselves as chief dogs body for the code?

In the world of Open Source, where source code is available, cut-and-paste is rampant (along with wholesale copying of files). Working with a copy of somebody else’s source removes a dependency, and if their code works well enough, then go for it.

A cost often claimed by the peanut gallery is that having all the code in a single file is a signal of buggy code. Given that most of the programmers who do this are novices, rather than really capable developers, such code is likely to contain many mistakes. But splitting the code up into multiple files will not reduce the number of mistakes it contains, just distribute them among the files. Correlation is not causation.

For an individual developer, the main benefit of splitting code across multiple files is that it makes developers think about the structure of their code.

For multi-person projects there are the added potential benefits of reusing code, and reducing the time spent reading other people’s code (it’s no fun having to deal with 10K lines when only a few functions are of interest).

I’m not saying that the original code is good, bad, or indifferent. What I am saying is that the having all the source in one file may, or may not, be the most effective way of working. It’s complicated, and I have no problem going with the flow (and limiting the size of the source files I write), but let’s not criticise others for doing what works for them.

Carlos Fenollosa (carlesfe)