Planet Crustaceans

This is a Planet instance for lobste.rs community feeds. To add/update an entry or otherwise improve things, fork this repo.

August 06, 2020

Frederic Cambus (fcambus)

NetBSD on the NanoPi NEO2 August 06, 2020 08:41 PM

The NanoPi NEO2 from FriendlyARM has been serving me well since 2018, being my test machine for OpenBSD/arm64 related things.

As NetBSD/evbarm finally gained support for AArch64 in NetBSD 9.0, released back in February, I decided to give it a try on this device. The board only has 512MB of RAM, and this is where NetBSD really shines. Things have become a lot easier since jmcneill@ now provides bootable ARM images for a variety of devices, including the NanoPi NEO2.

On first boot, the system will resize the filesystem to automatically expand to the size of the SD card.

Growing ld0 MBR partition #1 (1052MB -> 60810MB)
Growing ld0 disklabel (1148MB -> 60906MB)
Resizing /
/dev/rld0a: grow cg |************************************                 |  69%

Once the system is up and running, we can add a regular user in the wheel group:

useradd -m -G wheel username

And add a password to the newly created user:

passwd username

From there we do not need the serial console anymore and can connect to the device using SSH.

NetBSD has binary packages available for this architecture, and installing and configuring pkgin can be done as follow:

export PKG_PATH=https://cdn.netbsd.org/pub/pkgsrc/packages/NetBSD/aarch64/9.0/All/
pkg_add pkgin
echo $PKG_PATH > /usr/pkg/etc/pkgin/repositories.conf
pkgin update

The base system can be kept up to date using sysupgrade, which can be installed via pkgin:

pkgin in sysupgrade

The following variable need to be set in /usr/pkg/etc/sysupgrade.conf:

RELEASEDIR="https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/evbarm-aarch64"

Lastly, the device has two user controllable LEDs which can be toggled on and off using sysctl.

To switch both LEDs on:

sysctl -w hw.led.nanopi_green_pwr=1
sysctl -w hw.led.nanopi_blue_status=1

To switch off the power LED automatically at boot time:

echo "hw.led.nanopi_green_pwr=0" >> /etc/sysctl.conf

Here is a dmesg for reference purposes:

[     1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[     1.000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[     1.000000]     2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights reserved.
[     1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[     1.000000]     The Regents of the University of California.  All rights reserved.

[     1.000000] NetBSD 9.0_STABLE (GENERIC64) #0: Wed Aug  5 15:20:21 UTC 2020
[     1.000000] 	mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64
[     1.000000] total memory = 497 MB
[     1.000000] avail memory = 479 MB
[     1.000000] timecounter: Timecounters tick every 10.000 msec
[     1.000000] armfdt0 (root)
[     1.000000] simplebus0 at armfdt0: FriendlyARM NanoPi NEO 2
[     1.000000] simplebus1 at simplebus0
[     1.000000] simplebus2 at simplebus0
[     1.000000] cpus0 at simplebus0
[     1.000000] simplebus3 at simplebus0
[     1.000000] psci0 at simplebus0: PSCI 1.1
[     1.000000] cpu0 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu0: package 0, core 0, smt 0
[     1.000000] cpu0: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.000000] cpu0: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.000000] cpu0: Dcache line 64, Icache line 64
[     1.000000] cpu0: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.000000] cpu0: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.000000] cpu0: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.000000] cpu0: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.000000] cpu0: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.000000] cpu1 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu1: package 0, core 1, smt 0
[     1.000000] cpu2 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu2: package 0, core 2, smt 0
[     1.000000] cpu3 at cpus0: Cortex-A53 r0p4 (Cortex V8-A core)
[     1.000000] cpu3: package 0, core 3, smt 0
[     1.000000] gic0 at simplebus1: GIC
[     1.000000] armgic0 at gic0: Generic Interrupt Controller, 224 sources (215 valid)
[     1.000000] armgic0: 16 Priorities, 192 SPIs, 7 PPIs, 16 SGIs
[     1.000000] fclock0 at simplebus2: 24000000 Hz fixed clock (osc24M)
[     1.000000] sunxisramc0 at simplebus1: SRAM Controller
[     1.000000] fclock1 at simplebus2: 32768 Hz fixed clock (ext_osc32k)
[     1.000000] gtmr0 at simplebus0: Generic Timer
[     1.000000] gtmr0: interrupting on GIC irq 27
[     1.000000] armgtmr0 at gtmr0: Generic Timer (24000 kHz, virtual)
[     1.000000] timecounter: Timecounter "armgtmr0" frequency 24000000 Hz quality 500
[     1.000010] sun8ih3ccu0 at simplebus1: H3 CCU
[     1.000010] sun8ih3rccu0 at simplebus1: H3 PRCM CCU
[     1.000010] sunxide2ccu0 at simplebus1: DE2 CCU
[     1.000010] sunxigpio0 at simplebus1: PIO
[     1.000010] gpio0 at sunxigpio0: 94 pins
[     1.000010] sunxigpio0: interrupting on GIC irq 43
[     1.000010] sunxigpio1 at simplebus1: PIO
[     1.000010] gpio1 at sunxigpio1: 12 pins
[     1.000010] sunxigpio1: interrupting on GIC irq 77
[     1.000010] fregulator0 at simplebus0: vcc3v3
[     1.000010] fregulator1 at simplebus0: usb0-vbus
[     1.000010] fregulator2 at simplebus0: gmac-3v3
[     1.000010] sun6idma0 at simplebus1: DMA controller (12 channels)
[     1.000010] sun6idma0: interrupting on GIC irq 82
[     1.000010] com0 at simplebus1: ns16550a, working fifo
[     1.000010] com0: console
[     1.000010] com0: interrupting on GIC irq 32
[     1.000010] sunxiusbphy0 at simplebus1: USB PHY
[     1.000010] sunxihdmiphy0 at simplebus1: HDMI PHY
[     1.000010] sunximixer0 at simplebus1: Display Engine Mixer
[     1.000010] sunxilcdc0 at simplebus1: TCON1
[     1.000010] sunxilcdc0: interrupting on GIC irq 118
[     1.000010] sunxirtc0 at simplebus1: RTC
[     1.000010] emac0 at simplebus1: EMAC
[     1.000010] emac0: Ethernet address 02:01:f7:f9:2f:67
[     1.000010] emac0: interrupting on GIC irq 114
[     1.000010] rgephy0 at emac0 phy 7: RTL8211E 1000BASE-T media interface
[     1.000010] rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[     1.000010] h3codec0 at simplebus1: H3 Audio Codec (analog part)
[     1.000010] sunximmc0 at simplebus1: SD/MMC controller
[     1.000010] sunximmc0: interrupting on GIC irq 92
[     1.000010] motg0 at simplebus1: 'otg' mode not supported
[     1.000010] ehci0 at simplebus1: EHCI
[     1.000010] ehci0: interrupting on GIC irq 104
[     1.000010] ehci0: EHCI version 1.0
[     1.000010] ehci0: 1 companion controller, 1 port
[     1.000010] usb0 at ehci0: USB revision 2.0
[     1.000010] ohci0 at simplebus1: OHCI
[     1.000010] ohci0: interrupting on GIC irq 105
[     1.000010] ohci0: OHCI version 1.0
[     1.000010] usb1 at ohci0: USB revision 1.0
[     1.000010] ehci1 at simplebus1: EHCI
[     1.000010] ehci1: interrupting on GIC irq 110
[     1.000010] ehci1: EHCI version 1.0
[     1.000010] ehci1: 1 companion controller, 1 port
[     1.000010] usb2 at ehci1: USB revision 2.0
[     1.000010] ohci1 at simplebus1: OHCI
[     1.000010] ohci1: interrupting on GIC irq 111
[     1.000010] ohci1: OHCI version 1.0
[     1.000010] usb3 at ohci1: USB revision 1.0
[     1.000010] sunxiwdt0 at simplebus1: Watchdog
[     1.000010] sunxiwdt0: default watchdog period is 16 seconds
[     1.000010] /soc/gpu@1e80000 at simplebus1 not configured
[     1.000010] gpioleds0 at simplebus0: nanopi:green:pwr nanopi:blue:status
[     1.000010] /soc/timer@1c20c00 at simplebus1 not configured
[     1.000010] /soc/video-codec@1c0e000 at simplebus1 not configured
[     1.000010] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[     1.000010] cpu2: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.000010] cpu2: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.040229] cpu2: Dcache line 64, Icache line 64
[     1.040229] cpu2: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.050220] cpu2: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.060220] cpu2: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.070220] cpu2: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.070220] cpu2: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.090221] cpu1: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.090221] cpu1: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.100222] cpu1: Dcache line 64, Icache line 64
[     1.110221] cpu1: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.110221] cpu1: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.120222] cpu1: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.130222] cpu1: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.140223] cpu1: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.150222] cpu3: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
[     1.160223] cpu3: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
[     1.160223] cpu3: Dcache line 64, Icache line 64
[     1.170223] cpu3: L1 32KB/64B 2-way read-allocate VIPT Instruction cache
[     1.180223] cpu3: L1 32KB/64B 4-way write-back read-allocate write-allocate PIPT Data cache
[     1.180223] cpu3: L2 512KB/64B 16-way write-back read-allocate write-allocate PIPT Unified cache
[     1.190223] cpu3: revID=0x180, PMCv3, 4k table, 64k table, 16bit ASID
[     1.200224] cpu3: auxID=0x11120, FP, CRC32, SHA1, SHA256, AES+PMULL, NEON, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
[     1.210224] sdmmc0 at sunximmc0
[     1.240225] uhub0 at usb0: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[     1.240225] uhub0: 1 port with 1 removable, self powered
[     1.240225] uhub1 at usb2: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[     1.250226] uhub1: 1 port with 1 removable, self powered
[     1.250226] uhub2 at usb1: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[     1.260226] uhub2: 1 port with 1 removable, self powered
[     1.260226] uhub3 at usb3: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[     1.275641] uhub3: 1 port with 1 removable, self powered
[     1.275641] IPsec: Initialized Security Association Processing.
[     1.350228] sdmmc0: SD card status: 4-bit, C10, U1, A1
[     1.350228] ld0 at sdmmc0: <0x03:0x5344:SC64G:0x80:0x0cd9141d:0x122>
[     1.360690] ld0: 60906 MB, 7764 cyl, 255 head, 63 sec, 512 bytes/sect x 124735488 sectors
[     1.370228] ld0: 4-bit width, High-Speed/SDR25, 50.000 MHz
[     1.990242] boot device: ld0
[     1.990242] root on ld0a dumps on ld0b
[     2.000243] root file system type: ffs
[     2.010242] kern.module.path=/stand/evbarm/9.0/modules

Marc Brooker (mjb)

Surprising Economics of Load-Balanced Systems August 06, 2020 12:00 AM

Surprising Economics of Load-Balanced Systems

The M/M/c model may not behave like you expect.

I have a system with c servers, each of which can only handle a single concurrent request, and has no internal queuing. The servers sit behind a load balancer, which contains an infinite queue. An unlimited number of clients offer c * 0.8 requests per second to the load balancer on average. In other words, we increase the offered load linearly with c to keep the per-server load constant. Once a request arrives at a server, it takes one second to process, on average. How does the client-observed mean request team vary with c?

Option A is that the mean latency decreases super-linearly, approaching one second as c increases (in other words, the time spent in queue approaches zero). Option B is constant. Option C is a linear improvement, and D is a linear degradation in latency. Which curve do you, intuitively, think that the latency will follow?

I asked my Twitter followers the same question, and got an interestingly mixed result:

Breaking down the problem a bit will help figure out which is the right answer. First, names. In the terminology of queue theory, this is an M/M/c queuing system: Poisson arrival process, exponentially distributed client service time, and c backend servers. In teletraffic engineering, it's Erlang's delay system (or, because terminology is fun, M/M/n). We can use one of the classic results of queueing theory to analyze this system, Erlang's C formula E2,n(A), which calculates the probability that an incoming customer request is enqueued (rather than handled immediately), based on the number of servers (n aka c), and the offered traffic A. For the details, see page 194 of the Teletraffic Engineering Handbook. Here's the basic shape of the curve (using our same parameters):

Follow the blue line up to half the saturation point, at 2.5 rps offered load, and see how the probability is around 13%. Now look at the purple line at half its saturation point, at 5 rps. Just 3.6%. So at half load the 5-server system is handling 87% of traffic without queuing, with double the load and double the servers, we handle 96.4% without queuing. Which means only 3.6% see any additional latency.

It turns out this improvment is, indeed, better than linear. The right answer to the Twitter poll is A.

Using the mean to measure latency is controversial (although perhaps it shouldn't be). To avoid that controversy, we need to know whether the percentiles get better at the same rate. Doing that in closed form is somewhat complicated, but this system is super simple, so we can plot them out using a Monte-Carlo simulation. The results look like this:

That's entirely good news. The median (p50) follows the mean line nicely, and the high percentiles (99th and 99.9th) have a similar shape. No hidden problems.

It's also good news for cloud and service economics. With larger c we get better latency at the same utilization, or better utilization for the same latency, all at the same per-server throughput. That's not good news only for giant services, because most of this goodness happens at relatively modest c. There are few problems related to scale and distributed systems that get easier as c increases. This is one of them.

There are some reasonable follow-up questions. Are the results robust to our arbitrary choice of 0.8? Yes, they are. Are the M/M/c assumptions of Poisson arrivals and exponential service time reasonable for typical services? I'd say they are reasonable, albeit wrong. Exponential service time is especially wrong: realistic services tend to me something more like log-normal. It may not matter. More on that another time.

August 05, 2020

Andrew Owen (yumaikas)

Art Challenge: The Middle Grind August 05, 2020 10:50 PM

The story so far

Emily came across an art challenge on Pintrest, and suggested that we could both do each prompt for it.

An art challenge that lists out 30 days of art prompts

Her medium of preference is pencil and ink, and mine is pixel art. This, unlike the previous post, covers 16 entries, because I fell behind in blog posts.

It’s also longer, and definitely has represented both Emily and I getting ready for the art challenge to be done with

Day 9: Urban Legend

Emily

An ink sketch of a wendigo

Andrew

A pixel picture of a weeping Mary statue

Day 10: Insect

Emily

A drawing of a iridescent beetle with a blue shell

Andrew

A pixel art drawing of a dragonfly

Day 11: Something you ate today

Emily

A nice looking ink sketch of a bagel, with a pen and an eraser on the sketch book

Andrew

SomethingIAteToday.png

Day 12: Your Spirit Animal

Emily

A detailed ink drawing of a bat

Andrew

A pixel-art picture of a squirrel sitting on a porch (or jumping over a log)

Day 13: Song LyricsYour Happy Place

Emily

A picture of Emily wrapped in a blanket on a couch, with a lamp, tissue box, phone, and Nintendo Switch

Andrew

A pixel art picture of my laptop, with Asperite open.

Day 14: Historical Figure

Emily

A ink picture of a corset, a

Andrew

An attempt to make a pixel art photo of Ada Lovelace

Day 15: Guilty Pleasure

Emily

A sketch of a Yellow Nitendo Switch with Stardew Valley on the screen

Andrew

An abstract grid of white, blue and brown grid squares, representative of a Scrabble Board

Day 16: Zodiac Sign

Emily

A picture of a Capricorn goat with horns and and a webbed mane

Andrew

The Aquarius sign is imposed over a big yellow moon over a waves, with a small lighthouse in the background

Day 17: Favorite TV Show

Emily

A picture of a naked Homer Simpson, his butt facing the viewer.

Andrew

A picture of

Day 18: Something with Wings

Emily

A picture of a bat with 3 jack-o-lanterns, which is nibbling on the largest jack-o-lantern

Andrew

A picture of a bat

Day 19: Famous Landmark

Emily

One of the sections from stonehenge

Andrew

A pixel-art picture of the pyramids of Giza

Day 20: Beverage

Emily

A drawing of a cup of water

Andrew

A picture of cup of water

Day 21: Teeth

Emily

A picture of a Zombie Skull with prominent teeth

Andrew

A pixel-art picture of an alligator skull

Day 22: Earth Day

Emily

A picture of the earth, with clouds, being held up by a pair of hands

Andrew

A pixel-art picture of the earth

Day 23: Dessert

Emily

A cupcake with sprinkles

Andrew

An ice cream cone on a metal stand with little chocolate chips

Day 24: Movie Prop

Emily

A drawing of the cat from Kiki's Deliver Service.

Andrew

A pixel-drawing of Wilson the volley ball from Castaway

August 04, 2020

Pepijn de Vos (pepijndevos)

A Rust HAL for your LiteX FPGA SoC August 04, 2020 12:00 AM

ULX3S demo

FPGAs are amazing in their versatility, but can be a real chore when you have to map out a giant state machine just to talk to some chip over SPI. For such cases, nothing beats just downloading an Arduino library and quickly hacking some example code. Or would there be a way to combine the versatility of an FPGA with the ease of Arduino libraries? That is the question I want to explore in this post.

Of course you can use an f32c softcore on your FPGA as an Arduino, but that’s a precompiled core, and basically doesn’t give you the ability to use your FPGA powers. Or you can build your own SoC with custom HDL components, but then you’re back to bare-metal programming.

Unless you can tap into an existing library ecosystem by writing a hardware abstraction layer for your SoC. And that is exactly what I’ve done by writing a Rust embedded HAL crate that works for any LiteX SoC!

LiteX allows you to assemble a SoC by connecting various components to a common Wishbone bus. It supports various RISC-V CPU’s (and more), and has a library of useful components such as GPIO and SPI, but also USB and Ethernet. These all get memory-mapped and can be accessed via the Wishbone bus by the CPU and other components.

The amazing thing is that LiteX can generate an SVD file for the SoC, which contains all the registers of the components you added to the SoC. This means that you can use svd2rust to compile this SVD file into a peripheral access crate.

This PAC crate abstracts away memory addresses, and since the peripherals themselves are reusable components, it is possible to build a generic HAL crate on top of it that supports a certain LiteX peripheral in any SoC that uses it. Once the embedded HAL traits are implemented, you can use these LiteX peripherals with every existing Rust crate.

The first step is to install LiteX. Due to a linker bug in Rust 1.45, I used the 1.46 beta. I’m also installing into a virtualenv to keep my system clean. While we’re going to use Rust, gcc is still needed for compiling the LiteX BIOS and for some objcopy action.

#rustup default beta
virtualenv env
source env/bin/activate
wget https://raw.githubusercontent.com/enjoy-digital/litex/master/litex_setup.py
chmod +x litex_setup.py
./litex_setup.py init install
./litex_setup.py gcc
export PATH=$PATH:$(echo $PWD/riscv64-*/bin/)

Now we need to make some decisions about which FPGA board and CPU we’re going to use. I’m going to be using my ULX3S, but LiteX supports many FPGA boards out of the box, and others can of course be added. For the CPU we have to pay careful attention to match it with an architecture that Rust supports. For example Vexrisc supports the im feature set by default, which is not a supported Rust target, but it also supports an i and imac variant, both of which Rust supports. PicoRV32 only supports i or im, so can only be used in combination with the Rust i target.

So let’s go ahead and make one of those. I’m going with the Vexrisc imac variant, but on a small iCE40 you might want to try the PicoRV32 (or even Serv) to save some space. Of course substitute the correct FPGA and SDRAM module on your board.

VexRisc:

cd litex-boards/litex_boards/targets
python ulx3s.py --cpu-type vexriscv --cpu-variant imac --csr-data-width 32 --device LFE5U-85F --sdram-module AS4C32M16 --csr-svd ulx3s.svd --build --load
rustup target add riscv32imac-unknown-none-elf

PicoRV32:

python ulx3s.py --cpu-type picorv32 --cpu-variant minimal --csr-data-width 32 --device LFE5U-85F --sdram-module AS4C32M16 --csr-svd ulx3s.svd --build --load
rustup target add riscv32i-unknown-none-elf

Most parameters should be obvious. The --csr-data-width 32 parameter sets the register width, which I’m told will be the default in the future, and saves a bunch of bit shifting later on. --csr-svd ulx3s.svd tells LiteX to generate an SVD file for your SoC. You can omit --build and --load and manually do these steps by going to the build/ulx3s/gateware/ folder and running build_ulx3s.sh. I also prefer to use the awesome openFPGALoader rather than the funky ujprog with a sweet openFPGALoader --board ulx3s ulx3s.bit.

Now it is time to generate the PAC crate with svd2rust. This crate is completely unique to your SoC, so there is no point in sharing it. As long as the HAL crate can find it you’re good. Follow these instructions to create a Cargo.toml with the right dependencies. In my experience you may want to update the version numbers a bit. I had to use the latest riscv and riscv-rt to make stuff work, but keep the other versions to not break the PAC crate.

cargo new --lib litex-pac
cd litex-pac/src
svd2rust -i ulx3s.svd --target riscv
cd ..
vim Cargo.toml

Now we can use these instructions to create our first Rust app that uses the PAC crate. I pushed my finished example to this repo. First create the app as usual, and add dependencies. You can refer to the PAC crate as follows.

litex-pac = { path = "../litex-pac", features = ["rt"]}

Then you need to create a linker script that tells the Rust compiler where to put stuff. Luckily LiteX generated the important parts for us, and we only have to define the correct REGION_ALIAS expressions. Since we will be using the BIOS, all our code will get loaded in main_ram, so I set all my aliases to that. It is possible to load code in other regions, but my attempts to put the stack in SRAM failed horribly when the stack grew too large, so better start with something safe and then experiment.

REGION_ALIAS("REGION_TEXT", main_ram);
REGION_ALIAS("REGION_RODATA", main_ram);
REGION_ALIAS("REGION_DATA", main_ram);
REGION_ALIAS("REGION_BSS", main_ram);
REGION_ALIAS("REGION_HEAP", main_ram);
REGION_ALIAS("REGION_STACK", main_ram);

Next, you need to actually tell the compiler about your architecture and linker scripts. This is done with the .cargo/config file. This should match the Rust target you installed, so be mindful if you are not using imac. Note the regions.ld file that LiteX generated, we’ll get to that in the next step.

[target.riscv32imac-unknown-none-elf]
rustflags = [
  "-C", "link-arg=-Tregions.ld",
  "-C", "link-arg=-Tmemory.x",
  "-C", "link-arg=-Tlink.x",
]

[build]
target = "riscv32imac-unknown-none-elf"

The final step before jumping in with the Rust programming is writing a build.rs file that copies the linker scripts to the correct location for the compiler to find them. I mostly used the example provided in the instructions, but added a section to copy the LiteX file. export BUILD_DIR to the location where you generated the LiteX SoC.

    let mut f = File::create(&dest_path.join("regions.ld"))
        .expect("Could not create file");
    f.write_all(include_bytes!(concat!(env!("BUILD_DIR"), "/software/include/generated/regions.ld")))
        .expect("Could not write file");

That’s it. Now the code you compile will actually get linked correctly. I found these iCEBreaker LiteX examples very useful to get started. This code will actually run with minimal adjustment on our SoC, and is a good start to get a feel for how the PAC crate works. Another helpful command is to run cargo doc --open in the PAC crate to see the generated documentation.

To actually upload the code, you have to convert the binary first.

cargo build --release
cd /target/riscv32imac-unknown-none-elf/release
riscv64-unknown-elf-objcopy litex-example -O binary litex-example.bin
litex_term --kernel litex-example.bin /dev/ttyUSB0

From here we “just” need to implement HAL traits on top of the PAC to be able to use almost any embedded library in the Rust ecosystem. However, one challenge is that the peripherals and their names are not exactly set in stone. The way that I solved it is that the HAL crate only exports macros that generate HAL trait implementations. This way your SoC can have 10 SPI cores and you just have to call the spi macro to generate a HAL for them. I uploaded the code in this repo.

Of course so far we’ve only used the default SoC defined for the ULX3S. The real proof is if we can add a peripheral, write a HAL layer for it, and then use an existing library with it. I decided to add an SPI peripheral for the OLED screen. First I added the following pin definition

    ("oled_spi", 0,
        Subsignal("clk",  Pins("P4")),
        Subsignal("mosi", Pins("P3")),
        IOStandard("LVCMOS33"),
    ),
    ("oled_ctl", 0,
        Subsignal("dc",   Pins("P1")),
        Subsignal("resn", Pins("P2")),
        Subsignal("csn",  Pins("N2")),
        IOStandard("LVCMOS33"),
    ),

and then the peripheral itself

    def add_oled(self):
        pads = self.platform.request("oled_spi")
        pads.miso = Signal()
        self.submodules.oled_spi = SPIMaster(pads, 8, self.sys_clk_freq, 8e6)
        self.oled_spi.add_clk_divider()
        self.add_csr("oled_spi")

        self.submodules.oled_ctl = GPIOOut(self.platform.request("oled_ctl"))
        self.add_csr("oled_ctl")

This change has actually been accepted upstream, so now you can just add the --add-oled command line option and you get a brand new SoC with an SPI controller for the OLED display. Once the PAC is generated again and the FullDuplex trait has been implemented for it, it is simply a matter of adding the SSD1306 or SDD1331 crate, and copy-pasting some example code. Just as easy as an Arduino, but on your own custom SoC!

August 03, 2020

Jeremy Morgan (JeremyMorgan)

Better Title Case in Go August 03, 2020 05:15 PM

In this article, I’ll show you how you can create better titles in Go. We’ll be using the strings library from the Go Standard Library for this tutorial. You’ll often have a string input that you want to change the casing of, and it’s easy with Go. Lower Case If you want to change your text to lowercase, use the strings.ToLower method: package main import ( "fmt" "strings" ) func main() { fmt.

Anish Athalye (anishathalye)

Organizing Data Through the Lens of Deduplication August 03, 2020 04:00 AM

Our home file server has been running since 2008, and over the last 12 years, it has accumulated more than 4 TB of data. The storage is shared between four people, and it tends to get disorganized over time. We also had a problem with duplicated data (over 500 GB of wasted space), an issue that is intertwined with disorganization. I wanted to solve both of these problems at once, and without losing any of our data. Existing tools didn’t work the way I wanted, so I wrote Periscope to help me clean up our file server.

Periscope works differently from most other duplicate file finders. It’s designed to be used interactively to explore the filesystem, understand which files are duplicated and where duplicates live, and safely delete duplicates, all without losing any data. Periscope enables exploring the filesystem with standard tools — the shell, and commands like cd, ls, tree, and so on — while providing additional duplicate-aware commands that mirror core filesystem utilities. For example, psc ls gives a directory listing that highlights duplicates, and psc rm deletes files only if a duplicate copy exists. Here is Periscope in action on a demo dataset:

The demo uses a small synthetic dataset. For the real thing, there were a lot more duplicates; here are the stats prior to the cleanup:

$ psc summary
  tracked 669,718
   unique 175,672
duplicate 494,046
 overhead  515 GB

Early attempts

The first time I tried to clean up our file server, I used well-known duplicate file finders like fdupes and its enhanced fork jdupes. At a high level, these programs scan the filesystem and output a list of duplicates. After that, you’re on your own. When I scanned the server, the tools found 494,046 duplicate files wasting a total of 515 GB of space. Going through these manually, one at a time, would be infeasible.

Many tools have a mode where they can prompt the user and delete files; with such a large number of duplicates, this would not be useful. Some tools have features that help with space savings but not with organization: hard linking duplicates, automatically deleting duplicates chosen arbitrarily, and automatically deleting duplicates chosen based on a heuristic like path depth. These features wouldn’t work for me.

I had a hypothesis that a lot of duplicate data was the result of entire directories being copied, so if the duplicate summary could merge duplicate directories rather than listing individual files, the output might be more manageable. I tried implementing this functionality, and I soon found out that this merging strategy works well for perfect copies, but it does not work well when folders have partial overlap, and most of the duplicate data on the server was like that. I tried to work around the issue and handle partial overlap through analyzing subset relationships between directories, but I basically ended up with a gigantic Venn diagram; I couldn’t figure out a clean and useful way to visualize the information.

Patterns of disorganization

I manually inspected some of the data on our file server to understand where duplicates came from and how they should be cleaned up, and I started noticing patterns:

  • A directory of organized data alongside a “to organize” directory. For example, we had organized media in “/Photos/{year}/{month}/{event name}”, and unorganized media in “/Unorganized”, in directories like “D300S Temp Copy Feb 11 2012”. In some cases the data inside the copies was fully represented in the organized photos directory hierarchy, but in other cases there were unique files that needed to be preserved and organized.
  • A directory snapshotted at different times. In many cases, it wasn’t necessary to keep multiple backups, we just needed the full set of unique files.
  • A redundant backup of an old machine. Nowadays we use Borg for machine backups, but in the past, we had cases where entire machines were backed up temporarily, such as before migrating to a new machine. Most of this data was copied to the new machine and subsequently backed up as part of that machine’s backups, but the old copy remained. Most of this data could be deleted, but in some cases there were files that were unique and needed to be preserved.
  • Duplicates in individuals’ personal directories. We organize some shared data like photos in a shared location, and other data in personal folders. We had some data that was copied in both locations.
  • Manually versioned documents. We had documents like “Essay.doc”, “Essay v2.doc”, “Essay v3.doc”, where some of the versions were identical to each other.

Generalizing from these patterns, I felt that an interactive tool would work best for cleaning up the data. The tool should support organizing data one directory at a time, listing directories and inspecting files to understand where duplicates live. I also wanted a safe wrapper around rm that would let me delete duplicates but not accidentally lose data by deleting a unique file. Additionally, I wanted a way to delete files in one directory only if they were present in another, so I could recursively delete everything in “/Unorganized” that was already present in “/Photos”.

Periscope

Periscope implements the functionality summarized above. A psc scan searches for duplicate files in the same way as other duplicate file finders but it caches the information in a database. After that, commands like psc ls can run fast by leveraging the database. Commands like psc summary and psc report show high-level information on duplicates, psc ls and psc info enable interactively exploring the filesystem, and psc rm safely deletes duplicates.

More information on the Periscope workflow and commands is available in the documentation.

Related work

There are tons of duplicate file finders out there — fdupes, jdupes, rmlint, ddh, rdfind, fslint, duff, fddf, and fclones — to name a few. These tools find and print out duplicates; some have additional features like prompting for deletion or automatically deleting dupes based on heuristics. They were not suitable for my use case.

dupd is a utility that scans for duplicates, saves information to a database, and then allows for exploring the filesystem while querying the duplicate database for information. It was a source of inspiration for Periscope. The tools have somewhat differing philosophies and currently have two key differences: Periscope aims to provide commands that mirror coreutils counterparts (e.g. psc ls is not recursive, unlike dupd), and Periscope provides commands to safely delete files (one of dupd’s design goals is to not delete files). These seem essential for “scaling up” and handling a large volume of duplicates.

Download

Periscope is free and open source software. Documentation, code, and binaries are available on GitHub.

Ponylang (SeanTAllen)

Last Week in Pony - August 2, 2020 August 03, 2020 01:46 AM

Pony 0.36.0 has been released! We recommend upgrading as soon as possible.

August 02, 2020

Derek Jones (derek-jones)

Scientific management of software production August 02, 2020 10:04 PM

When Frederick Taylor investigated the performance of workers in various industries, at the start of the 1900’s, he found that workers organise their work to suit themselves; workers were capable of producing significantly more than they routinely produced. This was hardly news. What made Taylor’s work different was that having discovered the huge difference between actual worker output and what he calculated could be achieved in practice, he was able to change work practices to achieve close to what he had calculated to be possible. Changing work practices took several years, and the workers did everything they could to resist it (Taylor’s The principles of scientific management is an honest and revealing account of his struggles).

Significantly increasing worker output pushed company profits through the roof, and managers everywhere wanted a piece of the action; scientific management took off. Note: scientific management is not a science of work, it is a science of the management of other people’s work.

The scientific management approach has been successfully applied to production where most of the work can be reduced to purely manual activities (i.e., requiring little thinking by those who performed them). The essence of the approach is to break down tasks into the smallest number of component parts, to simplify these components so they can be performed by less skilled workers, and to rearrange tasks in a way that gives management control over the production process. Deskilling tasks increases the size of the pool of potential workers, decreasing labor costs and increasing the interchangeability of workers.

Given the almost universal use of this management technique, it is to be expected that managers will attempt to apply it to the production of software. The software factory was tried, but did not take-off. The use of chief programmer teams had its origins in the scarcity of skilled staff; the idea is that somebody who knows what they were doing divides up the work into chunks that can be implemented by less skilled staff. This approach is essentially the early stages of scientific management, but it did not gain traction (see “Programmers and Managers: The Routinization of Computer Programming in the United States” by Kraft).

The production of software is different in that once the first copy has been created, the cost of reproduction is virtually zero. The human effort invested in creating software systems is primarily cognitive. The division between management and workers is along the lines of what they think about, not between thinking and physical effort.

Software systems can be broken down into simpler components (assuming all the requirements are known), but can the implementation of these components be simplified such that they can be implemented by less skilled developers? The process of simplification is practical when designing a system for repetitive reproduction (e.g., making the same widget again and again), but the first implementation of anything is unlikely to be simple (and only one implementation is needed for software).

If it is not possible to break down the implementation such that most of the work is easy to do, can we at least hire the most productive developers?

How productive are different developers? Programmer productivity has been a hot topic since people started writing software, but almost no effective research has been done.

I have no idea how to measure programmer productivity, but I do have some ideas about how to measure their performance (a high performance programmer can have zero productivity by writing programs, faster than anybody else, that don’t do anything useful, from the client’s perspective).

When the same task is repeatedly performed by different people it is possible to obtain some measure of average/minimum/maximum individual performance.

Task performance improves with practice, and an individual’s initial task performance will depend on their prior experience. Measuring performance based on a single implementation of a task provides some indication of minimum performance. To obtain information on an individual’s maximum performance they need to be measured over multiple performances of the same task (and of course working in a team affects performance).

Should high performance programmers be paid more than low performance programmers (ignoring the issue of productivity)? I am in favour of doing this.

What about productivity payments, e.g., piece work?

This question is a minefield of issues. Manual workers have been repeatedly found to set informal quotas amongst themselves, i.e., setting a maximum on the amount they will produce during a shift (see “Money and Motivation: An Analysis of Incentives in Industry” by William Whyte). Thankfully, I don’t think I will be in a position to have to address this issue anytime soon (i.e., I don’t see a reliable measure of programmer productivity being discovered in the foreseeable future).

Frederik Braun (freddyb)

Reference Sheet for Principals in Mozilla Code August 02, 2020 10:00 PM

Note: This is the reference sheet version. The details and the big picture are covered in Understanding Web Security Checks in Firefox (Part 1).

Principals as a level of privilege

A security context is always using one of these four kinds of Principals:

  • ContentPrincipal: This principal is used for typical …

Eric Faehnrich (faehnrich)

WarRiding August 02, 2020 06:52 PM

I recently found the StreetComplete app for Android. It turns filling in OpenStreetMap data into a Pokemon Go type game where you go to "Quests" that are locations that need more info.

Not too long ago I also found WiGLE which is a network discovery and mapping tool.

I could always use as reason to ride my bike and helping add data to OpenStreetMaps is a good reason. Doing that while I have WiGLE going, bringing along my Pwnagotchi, and I got myself a WarRiding setup.

Gustaf Erikson (gerikson)

July August 02, 2020 04:58 PM

Pete Corey (petecorey)

Descending Dungeon Numbers in J August 02, 2020 12:00 AM

Neil Sloane of the On-Line Encyclopedia of Integer Sequences has graced us with another appearance on Numberphile. In this video he discusses the sequence of “descending dungeon” numbers, their origins, how to construct them, and why they’re interesting.

I suggest you watch the video for a full explanation, but as a quick introduction, the sequence of descending dungeon numbers are generated by repeatedly reinterpreting a number as if it were written in a higher and higher base.

The sequence begins at 10. From there, we interpret 10 as if it were written in base 11, which gives us 11 in decimal (11 * 1 + 11 * 0). The next number in the sequence is 11 interpreted in base 12, or 13 (12 * 1 + 12 * 1). 13 is followed by 16. 16 is followed by 20, and so on.

Let’s try our hand at modeling the descending dungeon using the J programming language.

At the heart of our descending dungeon is the interpretation a number in a given base. J’s “base” verb (#.) lets us do just this, assuming our number is split into a list of its digits:


   11 #. 1 0
11
   12 #. 1 1
13
   13 #. 1 3
16
   14 #. 1 6
20

So it looks like we’re already half way there! Now we need a way of splitting a number into its component digits so we can feed it into #.. It turns out that we can do this using the inverse of the base verb, #.inv or #.:_1:


   10 #.inv 10
1 0
   10 #.inv 11
1 1
   10 #.inv 13
1 3
   10 #.inv 16
1 6

Great! We can now write a verb that accepts our current descending dungeon number and the next base to interpret it in, and spits our the next descending dungeon number in our sequence:


   11 (#.10&#.inv) 10
11
   12 (#.10&#.inv) 11
13
   13 (#.10&#.inv) 13
16
   14 (#.10&#.inv) 16
20

We’ll probably want to flip the order of our arguments to make it more ergonomic to feed in our list of bases, and reduce down to our sequence values:


   10 (#.10&#.inv)~ 11
11
   11 (#.10&#.inv)~ 12
13
   13 (#.10&#.inv)~ 13
16
   16 (#.10&#.inv)~ 14
20

With that change, we can easily “insert” (/) our verb between each element of our list of bases, and come up with the descending dungeon number at that point in the sequence:


   (#.10&#.inv)~/ 10
10
   (#.10&#.inv)~/ 10 11
11
   (#.10&#.inv)~/ 10 11 12
13
   (#.10&#.inv)~/ 10 11 12 13
16

We can “infix” (\) this verb to apply it to all successive parts of our list to build our our full list of descending dungeon numbers:


   (#.10&#.inv)~/\ 10 11 12 13
10 11 13 16

Now all that’s left is to clean up how we build our list of bases and plot our descent into the dungeon:


   plot (#.10&#.inv)~/\ 10 + i. 20

The video asks the question, “how quickly do these numbers grow?” The answer seems to be “very slowly, and then very quickly.” But where does that shift happen? The numbers of the sequence that we’ve seen so far seem to be increasing by a linearly increasing amount.

10 + 1 is 11. 11 + 2 is 13. 13 + 3 is 16. 16 + 4 is 20, and so on…

Let’s use J to calculate the difference between successive numbers in our descending dungeons sequence:


   sequence =: (#.10&#.inv)~/\ 10 + i. 20
   2(-~/;._3) sequence
1 2 3 4 5 6 7 8 9 10 22 48 104 224 480 1024 2176 4608 9728

We can look at this as the discrete derivative of our sequence, or the difference between each successive element. We can see that for the first ten steps into our descending dungeon, this delta increases linearly. However once we get to the eleventh step, things go off the rails.

So it seems like we can stand by our answer and definitively say that the descending dungeons sequence grows “very slowly, and then very quickly.” Good work, everyone. Mystery solved.

Overall, this was an interesting exercise in exploring numerical bases, and a fun excuse to practice our J skills. Here’s hoping Neil Sloane stars in another Numberphile video very soon!

August 01, 2020

Mark J. Nelson (mjn)

US universities announcing online Fall 2020 August 01, 2020 12:00 PM

A list of four-year universities in the United States who announced before August that their fall 2020 undergraduate classes will be taught all or almost all online, sorted by date of announcement.

There's variation within the plans in the list. Some are exclusively online, while others plan to have a limited number of in-person courses, e.g. science labs. Also, some universities plan to have the dorms and on-campus services open at reduced capacity, while others plan to have the campus mostly closed.


May 5 – California State University system: "our planning approach will result in CSU courses primarily being delivered virtually for the fall 2020 term, with limited exceptions".

June 11 – University of California Irvine (Irvine, CA): "Almost all undergraduate courses will be delivered in a remote format in the fall quarter. A few exceptions are being evaluated, and consist of specialized upper-division labs, specific clinical and experiential courses, and some design courses in Engineering."

June 15 – Harvard University (Cambridge, MA): "regardless of where our students are living, whether on campus or at home, learning will continue to be remote next year, with only rare exceptions" (Faculty of Arts and Sciences, which includes the undergraduate college).

June 16 – University of California Davis (Davis, CA): "When fall quarter instruction starts Sept. 30, the campus plans to offer most courses remotely, though some courses will also be available in person, depending on health guidelines and instructor preference."

June 17 – University of California Riverside (Riverside, CA): "No instructor will be required to teach in-person and no student will be required to participate in-person until the campus returns to normal operations".

June 17 – University of California Santa Cruz (Santa Cruz, CA): "UC Santa Cruz will offer most courses remotely or online and provide in-person instruction for a small number of courses that cannot be delivered remotely, as is the case for some laboratory, studio and field study courses".

June 22 – Bowdoin College (Brunswick, ME): "In order to provide the best learning experience possible, nearly all classes, including those on campus, will be taught online."

June 22 – University of Massachusetts Boston (Boston, MA): "Certain lab courses in the sciences and nursing courses that require the use of the simulation center will remain on campus. The rest of the curriculum will be delivered to you via remote instruction."

June 25 – Haskell Indian Nations University (Lawrence, KS): "Haskell President Ronald Graham told the Journal-World Thursday that all classes would be held virtually for the fall semester".

June 26 – New School (New York, NY): "All classes will be online this fall. Given what we know today, we believe that remote learning is the best option for the health and safety of the entire New School community and for preventing the spread of the virus".

June 29 – University of Massachusetts Amherst (Amherst, MA): "Only essential face-to-face labs, studios, performance, and other courses involving hands-on work will be conducted on campus and in-person. ... All other courses will be delivered remotely."

June 30 – Wilmington University (New Castle, DE): "No one knows if COVID-19 will continue to spread at its current pace. The virus is still in its first phase, and there is growing concern that numbers will increase, or possibly result in a second wave. Due to these uncertainties and the need to keep our community safe, courses will remain online for the fall 2020 semester."

June 30 – Zaytuna College (Berkeley, CA): "Zaytuna's leadership has decided that instruction for the Fall 2020 semester will be conducted online for both the undergraduate and graduate programs".

July 1 – Hampton University (Hampton, VA): "out of an abundance of caution for the health, safety and welfare of our students as well as the faculty, administrative staff, administrators, maintenance and custodial staff, and others with whom students might interact, Hampton University will provide remote instruction only for the first semester of academic year 2020-2021".

July 1 – Texas College (Tyler, TX): "Our efforts are not to compete with other entities and how they respond, but rather give consideration to the needs of our students, faculty and staff and internally assess what is needed for a safe environment pursuant to the resources we have available to us. With this as the backdrop of our planning, a decision has been made to offer online instruction only for the fall term."

July 1 – University of Southern California (Los Angeles, CA): "our undergraduate students primarily or exclusively will be taking their courses online in the fall term".

July 6 – Princeton University (Princeton, NJ): "Based on the information now available to us, we believe ... we will need to do much of our teaching online and remotely."

July 6 – Rutgers University (New Brunswick, NJ): "I am writing today to inform you that after careful consideration of all possible models for safely and effectively delivering instruction during the ongoing coronavirus pandemic, Rutgers is planning for a Fall 2020 semester that will combine a majority of remotely delivered courses with a limited number of in-person classes".

July 7 – Marymount Manhattan College (New York, NY): "Given new developments in the COVID-19 global pandemic, we have adopted a Virtual Classes/Open Campus model in which all classes will be offered in an online format".

July 8 – Pomona College (Claremont, CA): "As the public health situation deteriorated over the last two weeks, we had to look at the facts and make a responsible decision: In this unfolding emergency, we will not be able to bring students back to campus in the fall."

July 8 – Scripps College (Claremont, CA): "The Administration and Board of Trustees of the College have determined that our community can best achieve its mission and maintain safety by offering Scripps classes online during the fall 2020 semester".

July 10 – Jarvis Christian College (Hawkins, TX): "To limit exposure to the virus, we will continue our online classes for fall 2020".

July 10 – Savannah College of Art and Design (Savannah/Atlanta, GA): "Following careful deliberation of all reasonable options, the university is announcing that, as of right now, Fall 2020 on-ground courses will be delivered primarily virtually for SCAD Atlanta and SCAD Savannah students — with some exceptions to address the needs of certain programs and students".

July 10 – West Chester University (Chester County, PA): "My leadership team and I have made the decision to continue remote learning through the fall 2020 semester, with a few courses delivered in a hybrid format, meaning both in-person and remote, in order to assist those students with clinical placements, student teaching, performance obligations, internship sites, and similar academic responsibilities."

July 13 – Loyola University Chicago (Chicago, IL): "we have decided that the best plan for the upcoming fall semester is to shift most of our class offerings online".

July 14 – Bennett College (Greensboro, NC): "After careful research, analysis and consideration, we have made a decision to operate remotely for the Fall semester."

July 14 – Loyola Marymount University (Los Angeles, CA): "This fall will be unlike any other semester: Our undergraduate courses will be principally and primarily conducted remotely".

July 14 – Simmons University (Boston, MA): "Our thoughtful process has led us to decide that all of our teaching and activities will be online for the Fall 2020 semester, with very few exceptions".

July 14 – University of San Francisco (San Francisco, CA): "Based on the surge of COVID-19 cases in San Francisco and California, Gov. Newsom’s announcements yesterday about rolling back the state’s reopening plans, and specific instructions issued to higher education institutions today by the San Francisco Department of Public Health (SFDPH), it is clear that we need to pivot to USF’s operations being primarily remote for the fall 2020 semester. This means nearly all academic courses will be online — save for certain exceptions such as those in clinical nursing programs."

July 15 – Dickinson College (Carlisle, PA): "we have come to the very difficult decision that the fall 2020 semester will be remote".

July 15 – Lesley University (Cambridge, MA): "We do not want to repeat the disruptive experience of last spring where we had to shut down our campus on short notice, so most campus facilities will remain closed at least through the end of 2020".

July 15 – Occidental College (Los Angeles, CA): "Today we are announcing that for the Fall 2020 semester all instruction will be remote".

July 15 – Rhodes College (Memphis, TN): "I write with a heavy heart to let you know that despite our hopes and plans, the external health conditions in Memphis do not support an on-campus fall semester".

July 17 – University of the Pacific (Stockton, CA): "Unfortunately, our regions’ flat and comparatively low rates of COVID-19 cases experienced through the spring have rapidly accelerated over the past month. Therefore, we have determined that it would be unwise to reopen our campuses as we had hoped and planned."

July 20 – California College of the Arts (San Francisco/Oakland, CA): "We write today with the unfortunate news that our fall semester courses will be conducted entirely remotely (online)."

July 20 – Clark Atlanta University (Atlanta, GA): "Today, Clark Atlanta University (CAU) announced the move to remote learning for all students during the Fall 2020 semester, taking the necessary precautions to ensure the safety of its students and entire CAU community."

July 20 – Grinnell College (Grinnell, IA): "It is thus with a heavy heart that, after consulting with campus leaders and the Board of Trustees Executive Committee, and discussing a multitude of options, we have determined that Fall Term 1 classes will be offered remotely".

July 20 – Morehouse College (Atlanta, GA): "I am writing to inform you of the difficult decision I have made to remain virtual for the Fall 2020 semester".

July 20 – Spelman College (Atlanta, GA): "Because of the worsening health crisis, we have reluctantly come to the realization that we can no longer safely sustain a residential campus and in-person instruction. With a sense of great disappointment, I now share with you our decision that all instruction for the fall of 2020 at Spelman will be virtual."

July 21 – University of California Berkeley (Berkeley, CA): "The increase in cases in the local community is of particular concern. Given this development, as well as it being unlikely that there will be a dramatic reversal in the public health situation before the fall semester instruction begins on Aug. 26, we have made the difficult decision to begin the fall semester with fully remote instruction."

July 22 – Azusa Pacific University (Azusa, CA): "we will now pivot to remote learning in an online modality this fall".

July 22 – Clemson University (Clemson, SC): "Clemson University will begin the Fall semester online and will delay in-person instruction until Sept. 21".

July 22 – Edinboro University (Edinboro, PA): "after careful consideration, we have decided to move most of our courses online for the fall semester".

July 22 – Lafayette College (Easton, PA): "I am sorry to say that we will not be reconvening as a community on Aug. 17. Instead, all fall semester courses at Lafayette will be offered online".

July 22 – Pepperdine University (Malibu, CA): "we have decided we can best protect the health and well-being of our students, faculty, and staff by conducting our fall semester online".

July 22 – University of Delaware (Newark, DE): "we feel it is necessary to shift our plan until conditions improve. The majority of our academic courses in the fall 2020 semester will be delivered online".

July 23 – Randolph College (Lynchburg, VA): "The simple truth is that we do not see the situation in our country improving before our campus opens to our full student body in a month's time. Because of this, we are not confident the College would be able to remain in-person the entire semester without serious COVID-19-caused disruptions. ... the College has decided to move its instruction online for the fall semester."

July 23 – South Carolina State University (Orangeburg, SC): "The recent significant escalation in infections in South Carolina and the Orangeburg community has caused us to revisit all of our plans to date for this coming fall semester. As a result, we will start the Fall Semester 2020 with all classes being delivered remotely".

July 23 – Washington State University system: "For the Fall 2020 semester, all undergraduate courses at WSU, with very few exceptions, will be delivered at a distance and will be completed remotely, with extremely limited exceptions for in-person instruction."

July 24 – Claremont McKenna College (Claremont, CA): "Given the recent, substantial increases in COVID-19 infection, hospitalization, and death rates in California and Los Angeles County and, even more decisively, the absence of necessary state and county authorization for residential, in-person higher education programs to reopen, we will not be allowed to resume on-campus learning in the fall."

July 24 – Lyon College (Batesville, AR): "With a heavy heart, I am reaching out to inform you that on Thursday, the Board of Trustees determined remote instruction for the fall would be in the best interest of the College".

July 24 – Pitzer College (Claremont, CA): "Recently ... it became abundantly clear that in spite of the challenges and financial pain, the wisest and most responsible action was to shift our focus and devote all of our energy into creating the most robust and engaging on-line learning communities possible".

July 24 – Whitman College (Walla Walla, WA): "we have made the extremely difficult decision that the fall 2020 semester will primarily be via remote learning".

July 27 – Agnes Scott College (Decatur, GA): "It is with a profound sense of sadness and disappointment that I write to inform you that we have made the painful decision to move to fully online courses for the fall semester.".

July 27 – George Washington University (Washington, DC): "we have made the difficult decision to hold all undergraduate courses online for the fall semester, with limited exceptions".

July 29 – Georgetown University (Washington, DC): "Courses for all undergraduate and graduate students will begin in virtual mode. Due to the acceleration of the spread of the virus and increasing restrictions on interstate travel we cannot proceed with our original plans for returning to campus this fall."

July 30 – American University (Washington, DC): "These evolving health conditions and government requirements now compel us to adjust our plan and offer fall semester undergraduate and graduate courses online with no residential experience."

July 30 – California Baptist University (Riverside, CA): "Today I am announcing that courses will be delivered primarily through live/synchronous remote instruction when CBU's fall semester begins".

July 30 – Johnson C. Smith University (Charlotte, NC): "Because the rate of transmission of the coronavirus shows no sign of slowing down and in the interest of the health and safety of everyone in the JCSU family and our community, the Board of Trustees, the Administration and I have made the difficult decision to deliver instruction solely online for the fall 2020 semester."

July 31 – Goucher College (Towson, MD): "It is with a great deal of disappointment that I write to inform you that, in consultation with the Fall Reopening Task Force and the Board of Trustees, I have made the difficult decision that our undergraduate students should not return back to campus this fall, and instead we should prepare to deliver this semester’s courses entirely online with the majority of our students studying from home."

July 31 – Queens University of Charlotte (Charlotte, NC): "It is with profound sadness and disappointment that I let you know we have made the decision to move to 100% virtual instruction for the fall semester, with no residential experience."

* * *

I collected the dates above either from dated announcements on the university's website, or in cases where announcements were undated (surprisingly common), from the date they were posted on the university's Twitter feed, mentioned in news articles, etc.

There's a bit of a judgment call in what I've counted as an announcement of being "almost" entirely online. I've included universities that say there are "limited exceptions" or similar, but not those that aim for a significant percentage of classes to be in-person. For example, I didn't include UCLA's June 15 announcement, which some news stories reported as "mostly online", because their stated 15-20% of classes in-person or hybrid seems to me to be too high to count as almost-all online.

See also: Here’s a List of Colleges’ Plans for Reopening in the Fall from the Chronicle of Higher Education.

Opening for a funded Masters student August 01, 2020 12:00 PM

I'm recruiting a funded Masters student to study how AI bots play games. The goal is to systematically understand the kinds of difficulty posed by games to AI algorithms, as well as the robustness of any conclusions. Some example experiments include: looking at how performance scales with parameters such as CPU time and problem size; how sensitive results are to rule variations, choice of algorithm parameters, etc.; and identification of games that maximally differentiate algorithm performance. Two previous papers of mine that give some flavor of this kind of research: [1], [2].

The primary desired skill is ability to run computational simulations, and to collect and analyze data from them. The available funding would pay for four semesters of full-ride Masters tuition, plus 15-20 hours/week of a work-study job during the academic year. The American University Game Lab offers three Masters-level degrees: the MS in Computer Science's Game & Computational Media track, the MA in Game Design, and the MFA in Games and Interactive Media.

The successful applicant would be funded on the National Science Foundation grant Characterizing Algorithm-Relative Difficulty of Agent Benchmarks. This does not have any citizenship/nationality requirements.

Anyone interested should both apply for the desired Masters program through the official application linked above (deadline July 1, though earlier is better), and email me to indicate that they would like to be considered for this scholarship. It's also fine to email me with inquiries before applying.

August 2020 update: This position has now been filled!

Eric Faehnrich (faehnrich)

The Embroidered Computer August 01, 2020 04:00 AM

RSS-only test post!

The Embroidered Computer.

An 8-bit computer made with embroidery. Conductive threads serve as the circuits with relays sewn in used as switches to create the digital logic.

2 Beads moving

July 31, 2020

Aaron Bieber (qbit)

Unlocking SSH FIDO keys on device connect. July 31, 2020 11:43 PM

The problem

As a lazy type, I often find it trying to type “ssh-add -K” over and over. I even felt depleted typing it here!

Fortunately for me, OpenBSD makes it trivial to resolve this issue. All we need is:

The adder

This script will run our …hnnnssh-add -K.. command:

#!/bin/sh

trap 'ssh-add -K' USR1

while true; do
	sleep 1;
done

Notice the trap line there? More on that later! This script should be called via /usr/local/bin/fido & from ~/.xsession or similar. The important thing is that it runs after you log in.

The watcher

hotplugd (in OpenBSD base) does things when stuff happens. That’s just what we need!

This script (/etc/hotplugd/attach) will be called every time we attach a device:

#!/bin/sh

DEVCLASS=$1
DEVNAME=$2

case "$DEVNAME" in
	fido0)
		pkill -USR1 -xf "/bin/sh /usr/local/bin/fido"
	;;
esac

Notice that pkill command with USR1? That’s the magic that hits our trap line in the adder script!

Now enable / start up hotplugd:

# rcctl enable hotplugd
# rcctl start hotplugd

That’s it!

If you have all these bits in place, you should see ssh-askpass pop up when you connect a FIDO key to your machine!

Here is a video of it in action:

Your browser doesn't support HTML5 video. Here is a link to the video instead.

Thanks to kn@ for the USR1 suggestion! It really helped me be more lazy!

Eric Faehnrich (faehnrich)

Toward a Simpler Site July 31, 2020 05:00 AM

insert XKCD comic where they say most posts on an infrequently-updated site are about how they haven't posted much

I've been meaning to simplify this site. It had been static pages built with Jekyll. But I had been reading more about "indieweb" and just simpler web sites and realized I want this site even simpler.

Even though this site was made of static web pages, it was generated with Jekyll. Something even simpler would be to just create the pages directly. This became apparent when I did my last refresh install of my computer and didn't have Jekyll installed. I ran into issues re-installing/running it.

I've also been interested in simpler sites that don't need/have javascript. I have the Firefox addon NoScript installed to block javascript by default on sites. I get annoyed when a site doesn't work without javascript. And these are sites that just show text. They shouldn't need javascript to do just show raw text!

I realized my site didn't fully work without javascript either. The main text showed, but the link icons to RSS or my twitter didn't work.

Also, I did like the theme I downloaded for Jekyll for my site. But with looking at text on computer screens all day, I do find the dark themes in my code editors to be easier on my eyes so wanted that for my site as well.

The update to my site would then have very minimal styling, no javascript except for pages where the point was some app on the page.

I've stolen the styling from several places, I don't even remember where. I particularly like the trick I found with how I did the favicon. I also realize the color scheme is kinda modern-day cyberpunky which I'm alright with. The pagespeed test is pretty good.

Another reason for the update was how personal sites were back in the day. They were just a collection of web pages linked together. Hence the term web. It made for a good time exploring someone's site.

Now, even if a person has a site, it's mostly in the form of a blog so all pages are now ordered posts. I'm finding this too restricting and I want more of the old web of a bunch of linked sites.

This idea was explored in this post, The Garden and the Stream. They're talking more about how information is linked and not just pages or posts, but what I'm thinking of is similar. For instance, I could have a page of links I think are interesting. I'd update it with new links, but how to post that if all you have is a feed?

With completely hand-edited HTML pages, I can have the control to just edit the page, then I could hand edit the RSS page with just a note saying the page was updated. There wouldn't be a post on the site corresponding to that entry in the RSS, so if you subscribed to my RSS you'd get updated but it wouldn't clutter my posts section.

All pages will have a way to be navigated from the home page, there will be hopefully a hierarchy that makes sense, but all pages can be found through links.

Posts will be traditional idea of a post, and the entire text of it will also appear in the RSS entry for it so it can be read in an RSS reader. Pages will be just that, and could be updated. Those updates won't make the entire page show up in RSS, but I might put a small entry in RSS about the update. I may also put in RSS-only entries that act like a social media post. I'm thinking I'll do that to maybe share interesting links, I'll share maybe one interesting link each day.

So subscribe to my RSS feed if you want updates. I'm slowing getting all my pages converted and put up, but until then feel free to wander around.

July 28, 2020

Marc Brooker (mjb)

A Story About a Fish July 28, 2020 12:00 AM

A Story About a Fish

Nothing's more boring than a fishing story.

In the 1930s, Marjorie Latimer was working as a museum curator in East London. Not the eastern part of London as one may expect. This East London is a small city on South Africa's south coast, named so thanks to colonialism's great tradition of creative and culturally relevant place names. Latimer was a keen and knowledgeable naturalist, and had a deal with local fishermen that they would let her know if they found anything unusual in their nets. One morning in 1938, she got a call from a fishing boat captain named Hendrik Goosen. He'd found something very unusual indeed, and wanted Marjorie to look at it. The fish which Hendrik Goosen showed Marjorie Latimer was truly unusual. Unlike anything she had seen before.

Latimer knew just the person to identify it: professor JLB Smith at Rhodes University in nearby Grahamstown (now Makhanda). He was away, so she had the unusual fish gutted and taxidermied, and sent sketches to the professor. He replied (in all-caps, following the fashion at the time):

MOST IMPORTANT PRESERVE SKELETON AND GILLS

Smith had immediately identified the fish as something well known to science. Many like it had been seen before. This one, however, was particularly surprising. It was alive, nearly 66 millions years after the last of its kin had been thought dead. Latimer had found a Coelacanth, a species of fish that had hardly evolved in the last 400 million years, and believed to exist only in the fossil record.

Marjorie Latimer and the Coelacanth

At the time, the Coelacanths were thought to be closely related to the Rhipidistia, which were thought to be an ancestor of all modern land-based vertebrates. The science on that topic has moved on, but Goosen's chance find, combined with Latimer's hard work in having it identified, created a special moment in the history of biology.

I was thinking about this story last night, because my daughter has been learning about Coelacanths at school. In the 1940s, JLB Smith and his wife Margaret wrote and illustrated a beautiful book called The Sea Fishes of Southern Africa. My grandmother studied biology at Rhodes during the time they were writing the book, and knew the Smiths and Marjorie Latimer. Margaret Smith gave her a signed copy of their book, sometime around 1950. I was fortunate to inherit the book, and share the Smiths description and drawings of the Coelacanths with my daughter.

I hadn't opened The Sea Fishes of Southern Africa in ten years, but re-reading Smith's description of it was like a visit with my late grandmother. She never failed to share her excitement about, and appreciation for, all living things. I vividly remember her telling the Coelacanth story, and her small part in it, sharing the wonder of discovery and the importance of paying attention to the things around us. You never know when you'll learn something new. Perhaps it is unwise to be too dogmatic.

July 26, 2020

Derek Jones (derek-jones)

Surveys are fake research July 26, 2020 10:29 PM

For some time now, my default position has been that software engineering surveys, of the questionnaire kind, are fake research (surveys of a particular research field used to be worth reading, but not so often these days; that issues is for another post). Every now and again a non-fake survey paper pops up, but I don’t consider the cost of scanning all the fake stuff to be worth the benefit of finding the rare non-fake survey.

In theory, surveys could be interesting and worth reading about. Some of the things that often go wrong in practice include:

  • poorly thought out questions. Questions need to be specific and applicable to the target audience. General questions are good for starting a conversation, but analysis of the answers is a nightmare. Perhaps the questions are non-specific because the researcher is looking for direction: well please don’t inflict your search for direction on the rest of us (a pointless plea in the fling it at the wall to see if it sticks world of academic publishing).

    Questions that demonstrate how little the researcher knows about the topic serve no purpose. The purpose of a survey is to provide information of interest to those in the field, not as a means of educating a researcher about what they should already know,

  • little effort is invested in contacting a representative sample. Questionnaires tend to be sent to the people that the researcher has easy access to, i.e., a convenience sample. The quality of answers depends on the quality and quantity of those who replied. People who run surveys for a living put a lot of effort into targeting as many of the right people as possible,
  • sloppy and unimaginative analysis of the replies. I am so fed up with seeing an extensive analysis of the demographics of those who replied. Tables containing response break-down by age, sex, type of degree (who outside of academia cares about this) create a scientific veneer hiding the lack of any meaningful analysis of the issues that motivated the survey.

Although I have taken part in surveys in the past, these days I recommend that people ignore requests to take part in surveys. Your replies only encourage more fake research.

The aim of this post is to warn readers about the growing use of this form of fake research. I don’t expect anything I say to have any impact on the number of survey papers published.

Ponylang (SeanTAllen)

Last Week in Pony - July 26, 2020 July 26, 2020 08:00 PM

The July 21 sync includes discussions about ponyup and the String API.

Gustaf Erikson (gerikson)

Two more novels by Paul McAuley July 26, 2020 07:51 PM

(Previously.)

  • War of the Maps
  • Austral

McAuley has a wide range. These books were read in reverse publication order.

War of the Maps is a far-future SF story. After our sun has become a white dwarf, post-modern humans construct a Dyson sphere around it and seed it with humans and Earth life. According to the internal legends, they play around a bit then buzz off, leaving the rest of the environment to bumble along as best they can.

The tech level is more or less Victorian but people contend with unique challenges, such as a severe lack of metallic iron and malovelent AIs buried here and here.

Austral is a near-future crime story. A genetically modified young woman gets dragged into a kidnapping plot in a post-AGW Antarctica.

Both are well worth reading!

July 25, 2020

Gustaf Erikson (gerikson)

[SvSe] Söndagsvägen - berättelsen om ett mord av Peter Englund July 25, 2020 09:51 AM

Englund reflekterar Sveriges 60-tal via spegeln av ett sedan länge bortglömd mord. Genom att ta upp företeelser i tiden visas ett land i förändring, framförallt hur “det moderna projektet” börjar krackelera.

July 23, 2020

Andreas Zwinkau (qznc)

Peopleware July 23, 2020 12:00 AM

Leaders of software developer teams should care more about sociology.

Read full article!

July 21, 2020

Pete Corey (petecorey)

The Progression That Led Me to Build Glorious Voice Leader July 21, 2020 12:00 AM

This specific chord progression, played in this specific way, completely changed how I approach playing the guitar, opened my eyes to the beauty and elegance of voice leading seventh chords, and ultimately inspired me to build Glorious Voice Leader:

The progression is simply the C major scale, played in diatonic fourths, and harmonized as diatonic seventh chords.

To break that down further, we’re starting on C. Harmonizing C as a diatonic seventh chord in the scale of C major gives us a Cmaj7 chord. Next we’d move down a fourth to F. Harmonizing F as a seventh chord gives us Fmaj7. Next we’d move to B and a Bm7b5 chord, and continue until we arrive back at C.

For something that’s basically a glorified scale exercise, this chord progressions sounds good. It’s almost… musical.

One of the most interesting aspects of this chord progression is how, when properly voice led, the voicings fall smoothly down the scale. Try it for yourself. Pick any starting voicing of Cmaj7, and this miniature version of Glorious Voice Leader will fill in the rest of the progression:

Explore this chord progression in Glorious Voice Leader

At every transition in the progression, the root and third of the current chord stay where they are, but become the fifth and seventh of the following chord. The fifth and seventh move down a scale degree and become the root and third of the next chord.

Compare that to the same chord progression, but harmonizing each scale note as a triad, rather than a four-note seventh chord:

Explore this chord progression in Glorious Voice Leader

Without the gravity of the added seventh to pull it down, the progression tends to rise upwards. The third of the current chord moves up to become the root of the next chord, and the fifth moves up to become the third of the next chord. Only the root stays stationary, becoming the third in the next chord.

If we look closely, there isn’t much difference in the voice movement between the seventh chord and triad versions of these progressions. In fact, of the voices that move, there may be more total movement in the seventh chords. However, the two stationary voices help make the seventh chords feel more cohesive and interlocked.

This chord progression is a world unto itself, and can act as a jumping point into almost every area of music theory and study. I found the voice leading in these chords so fascinating that I dedicated hundreds of hours of my life to building Glorious Voice Leader, a tool designed to help you study and explore voice leading on the guitar.

What does this progression inspire in you?

canvas { width: 100%; height: 100%; } #root1, #root2, #root3 { width: 100%; } .subtitle { display: block; text-align: center; color: #999; margin: 1rem 0 0 0; font-size: 0.8rem; }

July 19, 2020

Derek Jones (derek-jones)

Effort estimation’s inaccurate past and the way forward July 19, 2020 10:07 PM

Almost since people started building software systems, effort estimation has been a hot topic for researchers.

Effort estimation models are necessarily driven by the available data (the Putnam model is one of few whose theory is based on more than arm waving). General information about source code can often be obtained (e.g., size in lines of code), and before package software and open source, software with roughly the same functionality was being implemented in lots of organizations.

Estimation models based on source code characteristics proliferated, e.g., COCOMO. What these models overlooked was human variability in implementing the same functionality (a standard deviation that is 25% of the actual size is going to introduce a lot of uncertainty into any effort estimate), along with the more obvious assumption that effort was closely tied to source code characteristics.

The advent of high-tech clueless button pushing machine learning created a resurgence of new effort estimation models; actually they are estimation adjustment models, because they require an initial estimate as one of the input variables. Creating a machine learned model requires a list of estimated/actual values, along with any other available information, to build a mapping function.

The sparseness of the data to learn from (at most a few hundred observations of half-a-dozen measured variables, and usually less) has not prevented a stream of puffed-up publications making all kinds of unfounded claims.

Until a few years ago the available public estimation data did not include any information about who made the estimate. Once estimation data contained the information needed to distinguish the different people making estimates, the uncertainty introduced by human variability was revealed (some consistently underestimating, others consistently overestimating, with 25% difference between two estimators being common, and a factor of two difference between some pairs of estimators).

How much accuracy is it realistic to expect with effort estimates?

At the moment we don’t have enough information on the software development process to be able to create a realistic model; without a realistic model of the development process, it’s a waste of time complaining about the availability of information to feed into a model.

I think a project simulation model is the only technique capable of creating a good enough model for use in industry; something like Abdel-Hamid’s tour de force PhD thesis (he also ignores my emails).

We are still in the early stages of finding out the components that need to be fitted together to build a model of software development, e.g., round numbers.

Even if all attempts to build such a model fail, there will be payback from a better understanding of the development process.

Ponylang (SeanTAllen)

Last Week in Pony - July 19, 2020 July 19, 2020 09:53 PM

We have some nice improvements to the website FAQ and corral documentation. RFC 67 has been approved and implemented!

July 18, 2020

Andrew Owen (yumaikas)

Art Challenge: First 8 days July 18, 2020 10:40 PM

Premise

Emily came across an art challenge on Pintrest, and suggested that we could both do each prompt for it.

An art challenge that lists out 30 days of art prompts

Her medium of preference is pencil and ink, and mine is pixel art.

Day 1: Bones

Emily

A picture of a squirrel skull with some acorns and oak leaves behind it

Andrew

A picture of a skull and crossbones

Day 2: Exotic Pet

Emily

An ink sketch of a long-necked tortoise

Andrew

A pixel art image of a hedgehog, named Pokey

Day 3: Something with two heads

Emily

A picture of a two headed rat

Andrew

A picture of a yellow lizard in two scenes, back to back against a window, the right scene is sunny and the lizard looks a little cheery, the left scene has rain and clouds in the background, and the lizard looks glum

Day 4: Flowers

Emily

An ink picture of a flower

Andrew

A pixel art image of 3 yellow flowers in a porch planter, with some flowers on a trellis behind the planter attached to some Ivy

Day 5: Childhood Toy

Emily

An ink picture of a lego minifigure with a blank face

Andrew

A pixel art rendering of a blue and yellow toy plane from 3 angles, front, top and side, with the final corner having a cloud

Day 6: Eyeball

Emily

An ink picture of an eyeball that has a wick, and it melting, like a candle. It looks mad at it's predicament

Andrew

A pixel art picture of an eyeball with eyelids. It is a little unsettling due to being mostly disembodied

Day 7: Crystals

Emily

An ink picture a crystal formation that is growing a mushroom, and some small plants

Andrew

An animation of a lumpy green mass cracking. As the cracks progress, a crystal heart is revealed, and the text

Day 8: Something from the sea

Emily

An ink picture of a ray that has a lot of cool repeating patterns

Andrew

A pixel

Postscript so far

Funnily enough, Emily’s been able to to finish her pictures much more quickly than I have. I suppose I let the pixels give me an excuse to be fussy. It’s been a good way to practice working with Asperite, and to be creative and let out some of the visuals I’ve had in my head for years, but have never expressed.

July 16, 2020

Gokberk Yaltirakli (gkbrk)

Status update, July 2020 July 16, 2020 09:00 PM

This has been a fast and chaotic month for my life and career, and a rather slow month for my blog and personal projects. I severely underestimated the effort it takes to pack up all my belongings while figuring out everything about my future employment. Because of this, the status update is both later than I intended, and has less content.


I added a search box to my website. You should be able to find it on the right hand side. I am not using a custom solution for search for now, so the search will just redirect you to DuckDuckGo.

I pushed a breaking update for JustIRC. JustIRC is one of the first Python modules I’ve written. It was created while I was just learning about network protocols and socket programming. These two facts combined resulted in an API design that was less than ideal.

A new version that fixed a lot of the problems with the API was pushed. Additionally, documentation of the module was greatly improved. Some missing functionality, such as TLS support was added. These improvements should make it much easier to use the library in new projects.

On the kernel side, things are mostly quiet aside from some bugfixes and internal changes.

A filesystem API was introduced. This API is implemented by TarFS, and will be implemented by any future filesystems. This abstraction over file operations will mainly be used for the upcoming Virtual File System implementation. A VFS will allow the kernel to mount multiple filesystems into the same file tree, and open the path for some cool FS tricks such as exposing devices and kernel internals as files and directories.

The dynamic memory allocator of the kernel was switched from a best-fit allocator to an exact-fit allocator. This ended up making the memory system more flexible. The previous allocator used to create fixed-size memory blocks of different sizes up to a maximum size. This meant larger allocations would fail, and you would have to waste some memory if they were not used.

The new allocator handles allocations of the same size much better, and the odd large allocation (like a framebuffer) does not waste any unnecessary memory. Along with the allocation algorithm, some bugs related to memory alignment were fixed. All the work on the memory allocator improved the system stability a lot.

My privacy and browser hardening extension, browser-harden, got some fixes that allow certain websites to work. Some JS-heavy frameworks were overriding a lot of browser APIs and interacting in a bad way with the extension. The issues I came across were fixed and the fix was pushed to Firefox Addons.

In order to learn desktop GUI programming, and to get familiar with the GTK framework, I made a simple GTK app in Python. It is an imageboard viewer in a single file.

That’s all for this month, thanks for reading!

Caius Durling (caius)

Tailscale, RFC1918, and DNS Rebinding Protection July 16, 2020 08:00 PM

Edit: Originally this post was written to be a workaround for Tailscale routing all DNS traffic over its own link when you configured it to push out existing DNS Server IPs. This turned out to be a bad assumption on my part. Thanks to apenwarr for helping me understand that shouldn’t be the case, and encouraging me to debug it properly rather than making assumptions.

Naturally it turned out to be a PEBKAC. I’d pushed out 162.159.25.4 as the DNS Server IP which is a nameserver rather than a forwarder. This in turn meant people were getting empty answers back to DNS queries, which stopped once they quit tailscale. (Go figure, Tailscale removes the resolver from the network stack when it quits.) The post has been updated to remove that invalid assumption. 🤦🏻‍♂️

Imagine we have a fleet of machines sat in a private network somewhere on a 172.16.20.20/24 IP range, with entries pointing at them published on public DNS servers. Eg, dig +short workhorse.fake.tld returns 172.16.20.21.

Initially this all works swimmingly, until someone comes along that is using a DNS forwarder that with DNS rebinding protection enabled. Daniel Miessler has a wonderfully succinct explanation on his blog about DNS Rebinding attacks, but to protect against it you stop your resolver returning answers to DNS queries from public servers which resolve to IP addresses within standard internal network ranges. (ie, rfc1918.)

This means for those users they can successfully connect to our Tailscale network and access everything by IPs directly, but can’t access any of the internal infrastructure by hostname. eg, dig +short workhorse.fake.tld will return an empty answer for them.

Once we figured out the root cause of that, for workarounds we figured we could either run a DNS forwarder within our own infrastructure, or get all our staff to change their home DNS settings and hope they were never on locked down networks ever again.

We chose the former, and thankfully dnsmasq is really easy to configure in this fashion and we already have a node which is acting as the tailscale subnet relay, so we dropped the following config in /etc/dnsmasq.conf on there:

# Only listen for requests from VPN/local for debugging
interface=tailscale0
interface=lo
# Google DNS
server=8.8.8.8
server=8.8.4.4
# Quad9
server=9.9.9.9
# Cloudflare
server=1.1.1.1
server=1.0.0.1
# Race all servers to see which wins
all-servers
# Try and stop DNS rebinding, except where we expect it to happen
bogus-priv
stop-dns-rebind
rebind-localhost-ok
rebind-domain-ok=/fake.tld/
domain-needed
filterwin2k
no-poll
no-resolv
cache-size=10000

One quick puppet run later, and our Tailscale subnet relays are happily running both tailscale and dnsmasq, serving out answers as fast as they can to other Tailscale nodes. Add port 53 to the Tailscale ACL and away we went.

Unrelenting Technology (myfreeweb)

Wow. micro HDMI is the worst connector ever. (well, at least this... July 16, 2020 05:00 PM

Wow. micro HDMI is the worst connector ever.

(well, at least this particular adapter is terrible.. or the Pi 4 grabs too hard?)

Joe Nelson (begriffs)

Create impeccable MIME email from markdown July 16, 2020 12:00 AM

The goal

I want to create emails that look their best in all mail clients, whether graphical or text based. Ideally I’d write a message in a simple format like Markdown, and generate the final email from the input file. Additionally, I’d like to be able to include fenced code snippets in the message, and make them available as attachments.

Demo

I created a utility called mimedown that reads markdown through stdin and prints multipart MIME to stdout.

Let’s see it in action. Here’s an example message:

## This is a demo email with code

Hey, does this code look fishy to you?

```crash.c
#include <stdio.h>

int main(void)
{
	char a[] = "string literal";
	char *p  = "string literal";

	/* capitalize first letter */
	p[0] = a[0] = 'S';
	printf("a: %s\np: %s\n", a, p);
	return 0;
}
```

It blows up when I compile it and run it:

```compile.txt
$ cc -std=c99 -pedantic -Wall -Wextra crash.c -o crash
$ ./crash
Bus error: 10
```

Turns out we're invoking undefined behavior.

* The C99 spec, appendix J.2 Undefined Behavior mentions this case:
  > The program attempts to modify a string literal (6.4.5).
* Steve Summit's C FAQ [question 1.32](http://c-faq.com/decl/strlitinit.html)
  covers the difference between an array initialized with string literal vs a
  pointer to a string literal constant.
* The SEI CERT C Coding standard
  [STR30-C](https://wiki.sei.cmu.edu/confluence/display/c/STR30-C.+Do+not+attempt+to+modify+string+literals)
  demonstrates the problem with non-compliant code, and compares with compliant
  fixes.

After running it through the generator and emailing it to myself, here’s how the result looks in the Fastmail web interface:

rendered in fastmail

rendered in fastmail

Notice how the code blocks are displayed inline and are available as attachments with the correct MIME type.

I intentionally haven’t configured Mutt to render HTML, so it falls back to the text alternative in the message, which also looks good. Notice how the message body is interleaved with Content-Disposition: inline attachments for each code snippet.

code and text in Mutt

code and text in Mutt

The email generator also creates references for external urls. It substitutes the urls in the original body text with references, and consolidates the links into a bibliography of type text/uri-list at the end of the message. Here’s another Mutt screenshot of the end of the message, with red circles added.

links as references

links as references

The generated MIME structure of our sample message looks like this:

  I     1 <no description>          [multipa/alternativ, 7bit, 3.1K]
  I     2 ├─><no description>            [multipa/mixed, 7bit, 1.7K]
  I     3 │ ├─><no description>      [text/plain, 7bit, utf-8, 0.1K]
  I     4 │ ├─>crash.c                 [text/x-c, 7bit, utf-8, 0.2K]
  I     5 │ ├─><no description>      [text/plain, 7bit, utf-8, 0.1K]
  I     6 │ ├─>compile.txt           [text/plain, 7bit, utf-8, 0.1K]
  I     7 │ ├─><no description>      [text/plain, 7bit, utf-8, 0.5K]
  I     8 │ └─>references.uri     [text/uri-list, 7bit, utf-8, 0.2K]
  I     9 └─><no description>         [text/html, 7bit, utf-8, 1.3K]

At the outermost level, the message is split into two alternatives: HTML and multipart/mixed. Within the multipart/mixed part is a succession of message text and code snippets, all with inline disposition. The final mixed item is the list of referenced urls (if necessary).

Other niceties

Lines of the message body are re-flowed to at most 72 characters, to conform to historical length constraints. Additionally, to accommodate narrow terminal windows, mimedown uses a technique called format=flowed. This is a clever standard (RFC 3676) which adds trailing spaces to any lines that we would like the client reader to re-flow, such as those in paragraphs.

Neither hard wrapping nor format=flowed is applied to code block fences in the original markdown. Code snippets are turned into verbatim attachments and won’t be mangled.

Finally, the HTML version of the message is tasteful and conservative. It should display properly on any HTML client, since it validates with ISO HTML (ISO/IEC 15445:2000, based on HTML 4.01 Strict).

Try it yourself

Clone it here: github.com/begriffs/mimedown. It’s written in portable C99. The only build dependency is the cmark library for parsing markdown.

July 15, 2020

Pages From The Fire (kghose)

Too many open files July 15, 2020 04:05 AM

One puzzling error that happens, albeit rarely, is having too many file handles open at the same time. You can check the limit allowed per process on your system by doing The key value is “open files” which, in my case, is limited to 256. In my application I was running into this. (As a… Read More Too many open files

July 13, 2020

Pete Corey (petecorey)

Suggesting Chord Names with Glorious Voice Leader July 13, 2020 12:00 AM

Glorious Voice Leader, my chord-obsessed side project, now has the ability to turn a collection of notes played on the guitar fretboard into a list of possible chord names. Deciding on a specific chord name is still a very human, very context dependent task, but we can let the computer do a lot of the heavy lifting for us.

I’ve included a simplified version of this chord namer to the left. Feel free to click on the frets to enter any guitar chord you’d like the name of. Glorious Voice Leader will crunch the numbers and come up with a list of possible names that exactly describes the chord you’ve entered, sorted alphabetically.

In the full-fledged Glorious Voice Leader application, this functionality is accessible by simply clicking on the fretboard without first selecting the name of the chord you want. This felt like an intuitive design decision. You might know the shape of a specific chord you want to play in a progression, but you’re not sure of its name.

Enter it into the fretboard and Glorious Voice Leader will give you a corresponding list of names. When you click on one of those names, it’ll automatically suggest alternative voicings that voice lead smoothly from the previous chord.

The actual code behind this feature is dead simple. We simply filter over our set of all possible chord roots and qualities, and compare the set of notes in each resulting chord with the set of notes entered by the user:


let possibleNames = _.chain(qualities)
  .flatMap(quality =>
    _.map(Object.keys(roots), root => {
      return {
        root,
        quality
      };
    })
  )
  .filter(({ root, quality }) => {
    if (_.isEmpty(chord.notes)) {
      return false;
    }
    let chordNotes = _.chain(chord.notes)
      .map(([string, fret]) => (tuning[string] + fret) % 12)
      .uniq()
      .sortBy(_.identity)
      .value();
    let qualityNotes = _.chain(quality.quality)
      .map(note => (roots[root] + note) % 12)
      .sortBy(_.identity)
      .value();
    return _.isEqual(chordNotes, qualityNotes);
  })
  .map(({ root, quality }) => {
    return `${root}${quality.name}`;
  })
  .sortBy(_.identity)
  .value();

From there we simply present the list of possible chord names to the user in some meaningful or actionable way.

For future work, it would be nice to sort the list of name suggestions in order of the lowest notes they entered on the fretboard. For example, if they entered the notes C, E, G, and B in ascending order, we should sort the Cmaj7 suggestion before the Am9 no 1 suggestion. As with all of the items on my future work list, there are many subtitles and nuances here that would have to be addressed before it becomes a reality.

I hope you find this helpful. If you find Glorious Voice Leader interesting or useful in any way, please let me know!

canvas { width: 100%; height: 100%; } #root { float: left; height: 40rem; margin: 0 0 0 2rem; }

July 12, 2020

Derek Jones (derek-jones)

No replies to 135 research data requests: paper titles+author emails July 12, 2020 09:05 PM

I regularly email researchers referring to a paper of theirs I have read, and asking for a copy of the data to use as an example in my evidence-based software engineering book; of course their work is cited as the source.

Around a third of emails don’t receive any reply (a small number ask why they should spend time sorting out the data for me, and I wrote a post to answer this question). If there is no reply after roughly 6-months, I follow up with a reminder, saying that I am still interested in their data (maybe 15% respond). If the data looks really interesting, I might email again after 6-12 months (I have outstanding requests going back to 2013).

I put some effort into checking that a current email address is being used. Sometimes the work was done by somebody who has moved into industry, and if I cannot find what looks like a current address I might email their supervisor.

I have had replies to later email, apologizing, saying that the first email was caught by their spam filter (the number of links in the email template was reduced to make it look less like spam). Sometimes the original email never percolated to the top of their todo list.

There are around 135 unreplied email requests (the data was automatically extracted from my email archive and is not perfect); the list of papers is below (the title is sometimes truncated because of the extraction process).

Given that I have collected around 620 software engineering datasets (there are several ways of counting a dataset), another 135 would make a noticeable difference. I suspect that much of the data is now lost, but even 10 more datasets would be nice to have.

After the following list of titles is a list of the 254 author last known email addresses. If you know any of these people, please ask them to get in touch.

If you are an author of one of these papers: ideally send me the data, otherwise email to tell me the status of the data (I’m summarising responses, so others can get some idea of what to expect).

50 CVEs in 50 Days: Fuzzing Adobe Reader
A Change-Aware Per-File Analysis to Compile Configurable Systems
A Design Structure Matrix Approach for Measuring Co-Change-Modularity
A Foundation for the Accurate Prediction of the Soft Error
AGENT-BASED SIMULATION OF THE SOFTWARE DEVELOPMENT PROCESS: A CASE STUDY
A Large Scale Evaluation of Automated Unit Test Generation Using
A large-scale study of the time required to compromise
A Large-Scale Study On Repetitiveness, Containment, and
Analysing Humanly Generated Random Number Sequences: A Pattern-Based
Analysis of Software Aging in a Web Server
Analyzing and predicting effort associated with finding & fixing
Analyzing CAD competence with univariate and multivariate
Analyzing Differences in Risk Perceptions between Developers
Analyzing the Decision Criteria of Software Developers Based on
An analysis of the effect of environmental and systems complexity on
An Empirical Analysis of Software-as-a-Service Development
An Empirical Comparison of Forgetting Models
An empirical study of the textual similarity between
An error model for pointing based on Fitts' law
An Evolutionary Study of Linux Memory Management for Fun and Profit
An examination of some software development effort and
An Experimental Survey of Energy Management Across the Stack
Anomaly Trends for Missions to Mars: Mars Global Surveyor
A Quantitative Evaluation of the RAPL Power Control System
Are Information Security Professionals Expected Value Maximisers?:
A replicated and refined empirical study of the use of friends in
ARRAY LAYOUTS FOR COMPARISON-BASED SEARCHING
A Study of Repetitiveness of Code Changes in Software Evolution
A Study on the Interactive Effects among Software Project Duration, Risk
Bias in Proportion Judgments: The Cyclical Power Model
Capitalization of software development costs
Configuration-aware regression testing: an empirical study of sampling
Cost-Benefit Analysis of Technical Software Documentation
Decomposing the problem-size effect: A comparison of response
Determinants of vendor profitability in two contractual regimes:
Diagnosing organizational risks in software projects:
Early estimation of users’ perception of Software Quality
MEASURING USER’S PERCEPTION AND OPINION OF SOFTWARE QUALITY
Empirical Analysis of Factors Affecting Confirmation
Estimating Agile Software Project Effort: An Empirical Study
Estimating computer depreciation using online auction data
Estimation fulfillment in software development projects
Ethical considerations in internet code reuse: A
Evaluating. Heuristics for Planning Effective and
Evaluating Pair Programming with Respect to System Complexity and
Evidence-Based Decision Making in Lean Software Project Management
Explaining Multisourcing Decisions in Application Outsourcing
Exploring defect correlations in a major. Fortran numerical library
Extended Comprehensive Study of Association Measures for
Eye gaze reveals a fast, parallel extraction of the syntax of
Factorial design analysis applied to the performance of
Frequent Value Locality and Its Applications
Historical and Impact Analysis of API Breaking Changes:
How do i know whether to trust a research result?
How do OSS projects change in number and size?
How much is “about” ? Fuzzy interpretation of approximate
Humans have evolved specialized skills of
Identifying and Classifying Ambiguity for Regulatory Requirements
Identifying Technical Competences of IT Professionals. The Case of
Impact of Programming and Application-Specific Knowledge
Individual-Level Loss Aversion in Riskless and Risky Choices
Industry Shakeouts and Technological Change
Inherent Diversity in Replicated Architectures
Initial Coin Offerings and Agile Practices
Interpreting Gradable Adjectives in Context: Domain
Is Branch Coverage a Good Measure of Testing Effectiveness?
JavaScript Developer Survey Results
Knowledge Acquisition Activity in Software Development
Language matters
Learning from Evolution History to Predict Future Requirement Changes
Learning from Experience in Software Development:
Learning from Prior Experience: An Empirical Study of
Links Between the Personalities, Views and Attitudes of Software Engineers
Making root cause analysis feasible for large code bases:
Making-Sense of the Impact and Importance of Outliers in Project
Management Aspects of Software Clone Detection and Analysis
Managing knowledge sharing in distributed innovation from the
Many-Core Compiler Fuzzing
Measuring Agility
Mining for Computing Jobs
Mining the Archive of Formal Proofs.
Modeling Readability to Improve Unit Tests
Modeling the Occurrence of Defects and Change
Modelling and Evaluating Software Project Risks with Quantitative
Moore’s Law and the Semiconductor Industry: A Vintage Model
More Testers – The Effect of Crowd Size and Time Restriction in
Motivations for self-assembling into project teams
Networks, social influence and the choice among competing innovations:
Nonliteral understanding of number words
Nonstationarity and the measurement of psychophysical response in
Occupations in Information Technology
On information systems project abandonment
On the Positive Effect of Reactive Programming on Software
ON THE USE OF REPLACEMENT MESSAGES IN API DEPRECATION:
On Vendor Preferences for Contract Types in Offshore Software Projects:
Peer Review on Open Source Software Projects:
Parameter-based refactoring and the relationship with fan-in/fan-out
Participation in Open Knowledge Communities and Job-Hopping:
Pipeline management for the acquisition of industrial projects
Predicting the Reliability of Mass-Market Software in the Marketplace
Prototyping A Process Monitoring Experiment
Quality vs risk: An investigation of their relationship in
Quantitative empirical trends in technical performance
Reported project management effort, project size, and contract type.
Reproducible Research in the Mathematical Sciences
Semantic Versioning versus Breaking Changes
Software Aging Analysis of the Linux Operating System
Software reliability as a function of user execution patterns
Software Start-up failure An exploratory study on the
Spatial estimation: a non-Bayesian alternative
System Life Expectancy and the Maintenance Effort: Exploring
Testing as an Investment
The enigma of evaluation: benefits, costs and risks of IT in
THE IMPACT OF PLANNING AND OTHER ORGANIZATIONAL FACTORS
The impact of size and volatility on IT project performance
The Influence of Size and Coverage on Test Suite
The Marginal Value of Increased Testing: An Empirical Analysis
The nature of the times to flight software failure during space missions
Theoretical and Practical Aspects of Programming Contest Ratings
The Performance of the N-Fold Requirement Inspection Method
The Reaction of Open-Source Projects to New Language Features:
The Role of Contracts on Quality and Returns to Quality in Offshore
The Stagnating Job Market for Young Scientists
Time Pressure — A Controlled Experiment of Test-case Development and
Turnover of Information Technology Professionals:
Unconventional applications of compiler analysis
Unifying DVFS and offlining in mobile multicores
Use of Structural Equation Modeling to Empirically Study the Turnover
Use Two-Level Rejuvenation to Combat Software Aging and
Using Function Points in Agile Projects
Using Learning Curves to Mine Student Models
Virtual Integration for Improved System Design
Which reduces IT turnover intention the most: Workplace characteristics
Why Did Your Project Fail?
Within-Die Variation-Aware Dynamic-Voltage-Frequency

Author emails (automatically extracted and manually checked to remove people who have replied on other issues; I hope I have caught them all).

Aaron.Carroll@nicta.com.au   abaker@ucar.edu   abd_elzamly@yahoo.com
actjn@siu.edu   agopal@rhsmith.umd.edu   akbar.namin@ttu.edu
aken@nsuok.edu   akmassey@umbc.edu   alessandro.murgia@uantwerpen.be
alexander.budzier@sbs.ox.ac.uk   alinebrito@dcc.ufmg.br   allen.p.nikora@jpl.nasa.gov
Allen.P.Nikora@jpl.nasa.gov   Altaf.Ahmad@asu.edu   Ana.Aizcorbe@bea.gov
angel.garcia@uc3m.es   anhnt@iastate.edu   a.pinna@diee.unica.it
arho.suominen@vtt.fi   arie.vandeursen@tudelft.nl   asang@ntu.edu.sg
austen.rainer@canterbury.ac.nz   awfboh@ntu.edu.sg   bent.flyvbjerg@sbs.ox.ac.uk
bf@ul.ie   bjg@empiricalreality.com   bojan.spasic@avl.com
bramesh@gsu.edu   brent@cosc.canterbury.ac.nz   brent.martin@canterbury.ac.nz
briand@simula.no   brian.fitzgerald@lero.ie   bronevetsky1@llnl.gov
burairah@utem.edu.my   calikli@chalmers.se   canton@mnec.gr
cc05@vokac.org   celio.santana@gmail.com   cguo13@hawk.iit.edu
charngda@ccr.buffalo.edu   charngdalu@yahoo.com   chenyy@comp.nus.edu.sg
chris.sauer@sbs.ox.ac.uk   christian.korunka@univie.ac.at   christopher.lidbury10@imperial.ac.uk
clitecky@business.siu.edu   cmagee@mit.edu   corey.phelps@mcgill.ca
cotroneo@unina.it   cthompson@cs.berkeley.edu   dagsj@ifi.uio.no
daniela.munteanu@univ-provence.fr   daniel.milroy@colorado.edu   dan@silverthreadinc.com
david@merobe.com   david.nembhard@oregonstate.edu   der.herr@hofr.at
dgrtwo@princeton.edu   dhkim@astate.edu   director@scit.edu
discy@nus.edu.sg   djl68@pitt.edu   dlautner@hawk.iit.edu
dport@hawaii.edu   dprtchan@nus.edu.sg   dredman@avsi.aero
drobinson@stackoverflow.com   dskusumo.itt@gmail.com   dwheeler@ida.org
eherrman@eva.mpg.de   Enrique.Dans@ie.edu   erik.arisholm@testify.no
ermira.daka@sheffield.ac.uk   etovar@fi.upm.es   fjshull@sei.cmu.edu
foreverheart9@gmail.com   founders@triplebyte.com   fschweitzer@ethz.ch
ghs2@psu.edu   gleison.brito@dcc.ufmg.br   glpkm@hotmail.com
gordon.fraser@uni-passau.de   greg@bronevetsky.com   gul.calikli@gu.se
guschroko@student.gu.se   hankhoffmann@cs.uchicago.edu   hannes.holm@foi.se
hannes.holm@ics.kth.se   hata@is.naist.jp   hbarth@wesleyan.edu
hello@ponyfoo.com   hiroshi.igaki@oit.ac.jp   hirtle@pitt.edu
hoan@iastate.edu   hora@dcc.ufmg.br   hrideshg@iastate.edu
huang@umd.edu   huazhe@cs.uchicago.edu   hwu28@hawk.iit.edu
ichischneider@gmail.com   I.Deary@ed.ac.uk   ilaria.lunesu@diee.unica.it
info@targetprocess.com   james@jpallister.com   jarmo.ahonen@uef.fi
jasmin.blanchette@mpi-inf.mpg.de   jasonweiyi@gmail.com   javier.alonso@duke.edu
jean-luc.autran@univ-provence.fr   jfmendes@ua.pt   jgo@ua.pt
jianh@illinois.edu   jimbo@business.siu.edu   jmunson@uidaho.edu
jo-anne.lefevre@carleton.ca   john.krogstie@ntnu.no   john.zhang@business.uconn.edu
jordan.weissmann@slate.com   jose.campos@sheffield.ac.uk   josephborel@aol.com
jselby@maplesoft.com   jselby@maplesoft.com
June.Verner@gmail.com junyang@engr.pitt.edu justinek@alumni.stanford.edu justin.hollands@drdc-rddc.gc.ca j.visser@sig.eu kaisa.still@vtt.fi kantor@cs.technion.ac.il kevin.mcdaid@dkit.ie kewusi@lmu.edu klaas-jan.stol@lero.ie K.Markantonakis@rhul.ac.uk konstantinos.chronis@gmail.com ktrivedi@duke.edu laertexavier@dcc.ufmg.br larissanadja@copin.ufcg.edu.br lcao@odu.edu leo@susaventures.com lionel.briand@uni.lu lsarigia@pme.duth.gr lucia.2009@smu.edu.sg magnus@magnusdettmar.com mail@kaidence.org ma.khan@uleth.ca manuel.oriol@ch.abb.com marc.schulz@rwth-aachen.de Marek@gryting.biz marie-jeanne.lesot@lip6.fr mariusz.musial@ericpol.com maruyama@atr.jp matthias.biggeleben@open-xchange.com matthias.stuermer@iwi.unibe.ch mcknight@bus.msu.edu mdettmar@deloitte.com mdettmar@deloitte.se melanie@cs.columbia.edu Michael.english@lero.ie michael.english@ul.ie michael.grottke@fau.de Michael.Grottke@wiso.uni-erlangen.de Michael@targetprocess.com mika.mantyla@aalto.fi mika.mantyla@oulu.fi mingshu@iscas.ac.cn mischael.schill@inf.ethz.ch misof@ksp.sk mjaber@ryerson.ca monica.pais@ifgoiano.edu.br monicaspais@gmail.com morin@scs.carleton.ca mschermann@scu.edu mtov@dcc.ufmg.br mzhu@ets.org ncerpa@utalca.cl Neil.Stewart@warwick.ac.uk Nelson.W.Green@jpl.nasa.gov nick.wells@jobstats.co.uk o.alexy@tum.de oliver.krancher@iwi.unibe.ch Oliver.Laitenberger@horn-company.de olivier.gendreau@polymtl.ca paula.j.savolainen@uef.fi paulmcb@seas.upenn.edu paul@strassmann.com pchatzog@pme.duth.gr perry@mail.utexas.edu philippe.roche@st.com phoonakker@cqpi.engr.wisc.edu pierre.robillard@polymtl.ca ploaiza@lsm.in2p3.fr P.Love@curtin.edu.au pokech@uonbi.ac.ke psidhu@cmu.edu pvk@pvk.ca pyzychen@gmail.com ren@iit.edu rh13@aub.edu.lb ricardo.colomo@uc3m.es rkiyer@illinois.edu robert.benkoczi@uleth.ca roberto.natella@unina.it roberto.pietrantuono@unina.it salvaneschi@cs.tu-darmstadt.de saurabh.dighe@intel.com sdorogov@ua.pt sebastien.lefort@lip6.fr sebastien.sauze@l2mp.fr shaji@scit.edu shilin@itechs.iscas.ac.cn show@um.edu.my siegfrie@adelphi.edu simona.ibba@diee.unica.it simon.gaechter@nottingham.ac.uk simonk@rpi.edu simvrh@gmail.com sl@monochromata.de soenke.albers@the-klu.org songxue@microsoft.com s.raemaekers@sig.eu sriram.vangal@intel.com ssg@engr.uconn.edu stavrino@eap.gr stavrino@gmail.com stefan@garage-coding.com sterusso@unina.it steve.a.shogren@gmail.com svkbharathi@scit.edu swilson@tcd.ie tamada@cse.kyoto-su.ac.jp tien@iastate.edu tien.n.nguyen@utdallas.edu tjleffel@gmail.com tkabdelh@nps.edu tsunoda@info.kindai.ac.jp tung@iastate.edu victoria@stodden.net wangyi@us.ibm.com William.L.Taber@jpl.nasa.gov wmhan@takming.edu.tw wobbrock@uw.edu wq@itechs.iscas.ac.cn xenos@eap.gr xhua@hawk.iit.edu xiao.qu@us.abb.com yanglusi@comp.nus.edu.sg ychen200@cba.ua.edu yi.wang@rit.edu yiw@ics.uci.edu yoaval@checkpoint.com zhangx@nku.edu zhij@cs.toronto.edu Zhongju.Zhang@asu.edu zibran@cs.uno.edu

Ponylang (SeanTAllen)

Last Week in Pony - July 12, 2020 July 12, 2020 03:05 PM

Sync audo for July 7 is available. RFC PR #175 is ready for vote on the next sync meeting.

Andreas Zwinkau (qznc)

Crossing the Chasm July 12, 2020 12:00 AM

The book describes the dangerous transition from early adopters to an early majority market

Read full article!

July 11, 2020

Andrew Montalenti (amontalenti)

Learning about babashka (bb), a minimalist Clojure for building CLI tools July 11, 2020 06:25 PM

A few years back, I wrote Clojonic: Pythonic Clojure, which compares Clojure to Python, and concluded:

My exploration of Clojure so far has made me realize that the languages share surprisingly more in common than I originally thought as an outside observer. Indeed, I think Clojure may be the most “Pythonic” language running on the JVM today (short of Jython, of course).

That said, as that article discussed, Clojure is a very different language than Python. As Rich Hickey, the creator of Clojure, put it in his “A History of Clojure”:

Most developers come to Clojure from Java, JavaScript, Python, Ruby and other OO languages. [… T]he most significant […] problem  [in adopting Clojure] is learning functional programming. Clojure is not multiparadigm, it is FP or nothing. None of the imperative techniques they are used to are available. That said, the language is small and the data structure set evident. Clojure has a reputation for being opinionated, opinionated languages being those that somewhat force a particular development style or strategy, which I will graciously accept as meaning the idioms are clear, and somewhat inescapable.

There is one area in which Clojure and Python seem to have a gulf between them, for a seemingly minor (but, in practice, major) technical reason. Clojure, being a JVM language, inherits the JVM’s slow start-up time, especially for short-lived scripts, as is common for UNIX CLI tools and scripts.

As a result, though Clojure is a relatively popular general purpose programming language — and, indeed, one of the most popular dynamic functional programming languages in existence — it is still notably unpopular for writing quick scripts and commonly-used CLI tools. But, in theory, this needn’t be the case!

If you’re a regular UNIX user, you probably have come across hundreds of scripts with a “shebang”, e.g. something like #!/usr/bin/env python3 at the top of Python 3 scripts or #!/bin/bash for bash scripts. But I bet you have rarely, perhaps never, come across something like #!/usr/bin/env java or #!/usr/bin/env clojure. It’s not that either of these is impossible or unworkable. No, they are simply unergonomic. Thus, they aren’t preferred.

The lack of ergonomics stems from a number of reasons inherent to the JVM, notably slow start-up time and complex system-level classpath/dependency management.

Given Clojure’s concision, readability, and dynamism, it might be a nice language for scripting and CLI tools, if we could only get around that slow start-up time problem. Could we somehow leverage the Clojure standard library and a subset of the Java standard library as a “batteries included” default environment, and have it all compiled into a fast-launching native binary?

Well, it turns out, someone else had this idea, and went ahead and implemented it. Enter babashka.

babashka

To quote the README:

Babashka is implemented using the Small Clojure Interpreter. This means that a snippet or script is not compiled to JVM bytecode, but executed form by form by a runtime which implements a sufficiently large subset of Clojure. Babashka is compiled to a native binary using GraalVM. It comes with a selection of built-in namespaces and functions from Clojure and other useful libraries. The data types (numbers, strings, persistent collections) are the same. Multi-threading is supported (pmapfuture). Babashka includes a pre-selected set of Java classes; you cannot add Java classes at runtime.

Wow! That’s a pretty neat trick. If you install babashka — which is available as a native binary for Windows, macOS, and Linux — you’ll be able to run bb to try it out. For example:

$ bb
Babashka v0.1.3 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.

user=> (+ 2 2)
4
user=> (println (range 5))
(0 1 2 3 4)
nil
user=> :repl/quit
$

And, the fast start-up time is legit. For example, here’s a simple “Hello, world!” in Clojure stored in hello.clj:

(println "Hello, world!")

Now compare:

$ multitime -n 10 -s 1 clojure hello.clj
...
        Mean        Std.Dev.    Min         Median      Max
user    1.753       0.090       1.613       1.740       1.954       
...
$ multitime -n 10 -s 1 bb hello.clj
...
        Mean        Std.Dev.    Min         Median      Max
user    0.004       0.005       0.000       0.004       0.012       
...

That’s a pretty big difference on my modern machine! That’s a median start-up time of 1.7 seconds using the JVM version, and a median start-up time of 0.004 seconds — that is, four one-thousandths of a second, or 4 milliseconds — using bb, the Babashka version! The JVM version is almost 500x slower!

How does this compare to Python?

$ multitime -n 10 -s 1 python3 hello.py
...
        Mean        Std.Dev.    Min         Median      Max
user    0.012       0.004       0.006       0.011       0.018       
...

So, bb‘s start-up is as fast as, perhaps even a little faster than, Python 3. Pretty cool!

All that said, the creator of Babashka has said, publicly:

It’s not targeted at Python programmers or Go programmers. I just want to use Clojure. The target audience for Babashka is people who want to use Clojure to build scripts and CLI tools.

Fair enough. But, as Rich Hickey said, there can be really good reasons for Python, Ruby, and Go programmers to take a peek at Clojure. There are some situations in which it could really simplify your code or approach. Not always, but there are certainly some strengths. Here’s what Hickey had to say about it:

[New Clojure users often] find the amount of code they have to write is significantly reduced, 2—5x or more. A much higher percentage of the code they are writing is related to their problem domain.

Aside from being a useful tool for this niche, bb is also just a fascinating F/OSS research project. For example, the way it manages to pull off native binaries across platforms is via the GraalVM native-image facility. Studying GraalVM native-image is interesting in itself, but bb makes use of this facility and makes its benefit accessible to Clojure programmers without resorting to complex build toolchains.

With bb now stable, its creator took a stab at rewriting the clojure wrapper script itself in Babashka. That is, Clojure programmers may not have realized that when they invoke clojure on Linux, what’s really happening is that they are calling out to a bash script that then detects the local JVM and classpath, and then execs out to the java CLI for the JVM itself. On Windows, that same clojure wrapper script is implemented in PowerShell, pretty much by necessity, and serves the same purpose as the Linux bash script, but is totally different code. Well, now there’s something called deps.clj, which eliminates the need to use bash and PowerShell here, and uses Babashka-flavored Clojure code instead. See the deps.clj rationale in the README for more on that.

If you want a simple real-world example of a full-fledged Babashka-flavored Clojure program that does something useful at the command-line, you can take look at clj-kondo, a simple command-line Clojure linter (akin to pyflakes or flake8 in the Python community), which is also by the same author.

Overall, Babashka is not just a really cool hack, but also a very useful tool in the Clojurist’s toolbelt. I’ve become a convert and evangelist, as well as a happy user. Congrats to Michiel Borkent on a very interesting and powerful piece of open source software!


Note: Some of my understanding of Babashka solidified when hearing Michiel describe his project at the Clojure NYC virtual meetup. The meeting was recorded, so I’ll update this blog post when the talk is available.

Gustaf Erikson (gerikson)

June July 11, 2020 04:45 PM

Telemedecine is the only light in the darkness of COVID

This pic was supposed to be part of a pictorial depicting one day in my life during Corona, but I got bored of the concept. I just added it here so I don’t have an embarrasing gap for June 2020.

Jun 2019 | Jun 2018 | Jun 2017 | Jun 2016 | Jun 2015 | Jun 2014 | Jun 2013 | Jun 2012 | Jun 2011 | Jun 2010 | Jun 2009

Gonçalo Valério (dethos)

Why you shouldn’t remove your package from PyPI July 11, 2020 11:26 AM

Nowadays most software developed using the Python language relies on external packages (dependencies) to get the job done. Correctly managing this “supply-chain” ends up being very important and having a big impact on the end product.

As a developer you should be cautious about the dependencies you include on your project, as I explained in a previous post, but you are always dependent on the job done by the maintainers of those packages.

As a public package owner/maintainer, you also have to be aware that the code you write, your decisions and your actions will have an impact on the projects that depend directly or indirectly on your package.

With this small introduction we arrive to the topic of this post, which is “What to do as a maintainer when you no longer want to support a given package?” or ” How to properly rename my package?”.

In both of these situations you might think “I will start by removing the package from PyPI”, I hope the next lines will convince you that this is the worst you can do, for two reasons:

  • You will break the code or the build systems of all projects that depend on the current or past versions of your package.
  • You will free the namespace for others to use and if your package is popular enough this might become a juicy target for any malicious actor.

TLDR: your will screw your “users”.

The left-pad incident, while it didn’t happen in the python ecosystem, is a well known example of the first point and shows what happens when a popular package gets removed from the public index.

Malicious actors usually register packages using names that are similar to other popular packages with the hope that a user will end up installing them by mistake, something that already has been found multiple times on PyPI. Now imagine if that package name suddenly becomes available and is already trusted by other projects.

What should you do it then?

Just don’t delete the package.

I admit that in some rare occasions it might be required, but most of the time the best thing to do is to leave it there (specially for open-source ones).

Adding a warning to the code and informing the users in the README file that the package is no longer maintained or safe to use is also a nice thing to do.

A good example of this process being done properly was the renaming of model-mommy to model-bakery, as a user it was painless. Here’s an overview of the steps they took:

  1. A new source code repository was created with the same contents. (This step is optional)
  2. After doing the required changes a new package was uploaded to PyPI.
  3. Deprecation warnings were added to the old code, mentioning the new package.
  4. The documentation was updated mentioning the new package and making it clear the old package will no longer be maintained.
  5. A new release of the old package was created, so the user could see the deprecation warnings.
  6. All further development was done on the new package.
  7. The old code repository was archived.

So here is what is shown every time the test suite of an affected project is executed:

/lib/python3.7/site-packages/model_mommy/__init__.py:7: DeprecationWarning: Important: model_mommy is no longer maintained. Please use model_bakery instead: https://pypi.org/project/model-bakery/

In the end, even though I didn’t update right away, everything kept working and I was constantly reminded that I needed to make the change.

July 10, 2020

Robin Schroer (sulami)

Keyboardio Atreus Review July 10, 2020 12:00 AM

I recently received my early bird Keybardio Atreus Promotional photo courtesy of Keyboardio

from the Kickstarter and have now been using it for about three weeks, so I am writing a review for folks considering buying one after release.

A Bit of History

Most of this is also outlined on the Atreus website, but here is the short version: my colleague Phil Hagelberg designed the original Atreus keyboard in 2014, and has been selling kits for self-assembly ever since.

In 2019 Keyboardio, the company which created the Model 01, got together with Phil to build a pre-assembled commercial version of the Atreus. Their Kickstarter ran earlier in 2020 and collected almost $400k.

Phil’s original 42-key version can be built with either a PCB or completely hand-wired, and uses a wooden, acrylic, or completely custom (e.g. 3D-printed) case.

Keyboardio split the two larger thumb keys into two regular size keys, bringing the total up to 44, and uses a PCB and Cherry MX-style switches mounted on an Aluminium plate inside a black ABS case.

Hardware

At a first impression, it is incredibly smallDimensions taking from the product page: 24.3 × 10 × 2.8cm, 310g.

, noticeably smaller still than the small Apple Magic Keyboard. At the same time, it uses a regular key spacing, so once your hands are in place it does not feel cramped at all. On the contrary, every time I use a different keyboard now, I feel that half the keys are too far away to reach comfortably. It is also flat enough that I can use it without a wrist rest.

Mine has Kailh Speed Copper switches, which require 40g of force to actuate, with very early actuation points. They are somewhat comparable to Cherry MX Browns without the dead travel before the tactile bump. As mentioned above, the switches are mounted on an aluminium plate, and can be swapped without disassembly.

The early actuation point of the switches does require some getting used to, I keep experiencing some key chatter, especially on my weaker fingers, though Jesse from Keyboardio is working hard on alleviating that.

When it comes to noise, you can hear that it is a mechanical keyboard. Even with relatively quiet switches, the open construction means that the sound of the keys getting released is audible in most environments. I would hesitate to bring it to a public space, like a café or a co-working space. Open-office depends on the general noise level, and how tolerant your coworkers are, I have not had anyone complain about the sound level in video conferences.

The keycaps used are XDA-profileSymmetrical and the same height across the keyboard, like lower profile SDA. That means you can rearrange them between rows.

laser-engraved PBT of medium thickness. Apparently there have been a lot of issues with the durability of the labels, so the specifics of that might change. I personally have had a single key start to fade a bit over 3 weeks of use, but I do not actually care.

The keyboard is powered by the ATmega32U4, which is a pretty standard controller for a keyboard, it is also used in the Teensy 2.0 for example.

I would judge the overall build quality as good. While it does not feel like an ultra-premium product, there is nothing specific I can actually complain about, no rough edges or manufacturing artefacts.

Software

Out of the box, the keyboard uses the open-source Kaleidoscope firmware, which can be configured with the (also open-source) Chrysalis graphical configurator. Chrysalis with my Layer 0

Supposedly it is also possible to use QMK, and Phil has recently written Menelaus, a firmware in Microscheme.

I have stuck with (pre-release versions of) Kaleidoscope so far, which has worked out fairly well. Chrysalis is an Electron app, and doing sweeping changes in it can be a bit cumbersome compared to using text-based, declarative configuration, but it does the job. Flashing a new version onto the keyboard only takes a few seconds. I also have to mention the extensive documentation available. Kaleidoscope has a rich plugin infrastructure, very little of which I actually use, but it does seem to rival QMK in flexibility.

I am using the Atreus with Colemak, the same layout I have been using for almost a decade now, and compared to trying the Ergodox,When I tried using an Ergodox for the first time, the ortholinear layout really threw me off, and I kept hitting right in between keys.

the switching was much smoother. I am mostly back to my regular typing speed of 80-90 WPM after three weeks, and I can still use a regular staggered layout keyboard without trouble.

The modifier keys at the bottom are unusual, but work for me. I use the three innermost keys with my thumbs, and the bottom edges by just pushing down with my palm. It does require some careful arrangement to avoid often having to press two modifiers on the same time at once.

With only 44 physical keys, the keyboard makes heavy use of layers, which can be temporarily shifted to when holding a key, or switched to permanently. By default the first extra layer has common special characters on the left half, and a numpad on the right, which works better than a regular keyboard for me.

The only problem I sometimes have is the lack of a status indicator. This means I have to keep track of the keyboard state in my head when switching layers. Not a big problem though.

Conclusion

My conclusion is quite simple: if you are in the market for a keyboard like this, this might be the keyboard for you. It does what it does well, and is much cheaper than anything comparable that does not require manual assembly. I personally enjoy the small form factor, the flexible (set of) firmware, and the RSI-friendly layout.

I also want to highlight the truly amazing effort Keyboardio puts into supporting their customers. You can browse the Kickstarter or their GitHub projects to see how much effort they put into this, and I have been in contact with Jesse myself while trying to debug a debouncing issue in the firmware. I am very happy to support them with my wallet.

July 09, 2020

Tobias Pfeiffer (PragTob)

Guest on Parallel Passion Podcast July 09, 2020 08:24 PM

Hey everyone, yes yes I should blog more. The world is just in a weird place right now affecting all of us and I hope you’re all safe & sound. I do many interesting things while freelancing, but sadly didn’t allocate the time to blog about them yet. What did get the time to is […]

July 06, 2020

Frederik Braun (freddyb)

Hardening Firefox against Injection Attacks – The Technical Details July 06, 2020 10:00 PM

This blog post has first appeared on the Mozilla Attack & Defense blog and was co-authored with Christoph Kerschbaumer and Tom Ritter

In a recent academic publication titled Hardening Firefox against Injection Attacks (to appear at SecWeb – Designing Security for the Web) we describe techniques which we have incorporated into Firefox …

Andreas Zwinkau (qznc)

Wardley Maps July 06, 2020 12:00 AM

A book which presents a map visualization for business strategy

Read full article!

July 05, 2020

Ponylang (SeanTAllen)

Last Week in Pony - July 5, 2020 July 05, 2020 10:40 PM

There is a new set of public Docker images for Pony with SSL system libraries installed. These will be replacing the previous “x86-64-unknown-linux-builder-with-ssl” image.

Derek Jones (derek-jones)

Algorithms are now commodities July 05, 2020 10:14 PM

When I first started writing software, developers had to implement most of the algorithms they used; yes, hardware vendors provided libraries, but the culture was one of self-reliance (except for maths functions, which were technical and complicated).

Developers read Donald Knuth’s The Art of Computer Programming, it was the reliable source for step-by-step algorithms. I vividly remember seeing a library copy of one volume, where somebody had carefully hand-written, in very tiny letters, an update to one algorithm, and glued it to the page over the previous text.

Algorithms were important because computers were not yet fast enough to solve common problems at an acceptable rate; developers knew the time taken to execute common instructions and instruction timings were a topic of social chit-chat amongst developers (along with the number of registers available on a given cpu). Memory capacity was often measured in kilobytes, every byte counted.

This was the age of the algorithm.

Open source commoditized algorithms, and computers got a lot faster with memory measured in megabytes and then gigabytes.

When it comes to algorithm implementation, developers are now spoilt for choice; why waste time implementing the ‘low’ level stuff when there were plenty of other problems waiting to be implemented.

Algorithms are now like the bolts in a bridge: very important, but nobody talks about them. Today developers talk about story points, features, business logic, etc. Given a well-defined problem, many are now likely to search for an existing package, rather than write code from scratch (I certainly work this way).

New algorithms are still being invented, and researchers continue to look for improvements to existing algorithms. This is a niche activity.

There are companies where algorithms are not commodities. Google operates on a scale where what appears to others as small improvements, can save the company millions (purely because a small percentage of a huge amount can be a lot). Some company’s core competency may include an algorithmic component (whose non-commodity nature gives the company its edge over the competition), with the non-core competency treating algorithms as a commodity.

Knuth’s The Art of Computer Programming played an important role in making viable algorithms generally available; while the volumes are frequently cited, I suspect they are rarely read (I have not taken any of my three volumes off the shelf, to read, for years).

A few years ago, I suddenly realised that I was working on a book about software engineering that not only did not contain an algorithms chapter, and the 103 uses of the word algorithm all refer to it as a concept.

Today, we are in the age of the ecosystem.

Algorithms have not yet completed their journey to obscurity, which has to wait until people can tell computers what they want and not be concerned about the implementation details (or genetic algorithm programming gets a lot better).

Patrick Louis (venam)

D-Bus and Polkit, No More Mysticism and Confusion July 05, 2020 09:00 PM

freedesktop logo

Dbus and Polkit are two technologies that emanate an aura of confusion. While their names are omnipresent in discussions, and the internet has its share of criticism and rants about them, not many have a grasp of what they actually do. In this article I’ll give an overview of these technologies.

D-Bus, or Desktop Bus, is often described as a software that allows other processes to communicate with one another, to perform inter-process communication (IPC). However, this term is generic and doesn’t convey what it is used for. Many technologies exist that can perform IPC, from plain socket, to messaging queue, so what differentiates D-Bus from them.

D-Bus can be considered a middleware, a software glue that sits in the middle to provide services to software through a sort of plugin/microkernel architecture. That’s what the bus metaphor represents, it replicates the functionality of hardware buses, with components attaching themselves to known interfaces that they implement, and providing a mean of communication between them. With D-Bus these can be either procedure calls aka methods or signals aka notifications.

While D-Bus does offer 1-to-1 and 1-to-many IPC, it’s more of a byproduct of its original purpose than a mean of efficient process to process data transfer — it isn’t meant to be fast. D-Bus emerges from the world of desktop environments where blocks are well known, and each implements a functionality that should be accessible from other processes if needed without having to reinvent the transfer mechanism for each and every software.
This is the problem it tackles: having components in a desktop environment that are distributed in many processes, each fulfilling a specific job. In such case, if a process implements the behavior needed, instead of reimplementing it, it can instead harness the feature already provided by that other process.

Its design is heavily influenced by Service Oriented Architectures (SOA), Enterprise Service Buses (ESB), and microkernel architectures.
A bus permits abstracting communication between software, replacing all direct contact, and only allowing them to happen on the bus instead.
Additionally, the SOA allows software to expose objects that have methods that can be called remotely, and also allows other software to subscribe/publish events happening in remote objects residing in other software.
Moreover, D-Bus provides an easy plug-and-play, a loose coupling, where any software could detach itself from the bus and allow another process to be plugged, containing objects that implement the same features the previous process implemented.
In sum, it’s an abstraction layer for functionalities that could be implemented by any software, a standardized way to create pluggable desktop components. This is what D-Bus is about, this is the role it plays, and it explains the difficulty in grasping the concepts that gave rise to it.

The big conceptual picture goes as follows.
We have a D-Bus daemon running at an address and services that implement well known behaviors. These services attach to the D-Bus daemon and the attachment edge has a name, a bus name.
Inside these services, there are objects that implement the well known behavior. These objects also have a path leading to them so that you can target which object within that service implements the specific interface needed.
Then, the interface methods and events can be called or registered on this object inside this service, connected to this bus name, from another service that requires the behavior implemented by that interface to be executed.

This is how these particular nested components interact with one another, and it gives rise to the following:

Address of D-Bus daemon ->
Bus Name that the service attached to ->
Path of the object within this service ->
Interface that this object implements ->
Method or Signal concrete implementation

Or in graphical form:

D-Bus ecosystem

Instead of having everyone talk to one another:

p2p interaction

Let’s take a method call example that shows these 3 required pieces of information.

org.gnome.SessionManager \
/org/gnome/SessionManager \
org.gnome.SessionManager.CanShutdown

   boolean true

Here, we have the service bus name org.gnome.SessionManager, the object path /org/gnome/SessionManager, and the interface/method name org.gnome.SessionManager.CanShutdown, all separated by spaces. If the /org/gnome/SessionManager only implements a single interface then we could call it as such CanShutdown, but here it doesn’t.

Let’s dive deeper into the pieces we’ve mentioned. They are akin to the ones in an SOA ecosystem, but with the addition of the bus name, bus daemon, and the abstraction for the plug-and-play.

  • Objects

An object is an entity that resides in a process/service and that effectuates some work. It is identified by a path name. The path name is usually written, though not mandatory, in a namespace format where it is grouped and divided by slashes /, just like Unix file system path.

For example: /org/gnome/Nautilus/window/1.

Objects have methods and signals, methods take input and return output, while signals are events that processes can subscribe to.

  • Interfaces

These methods and signals are concrete implementations of interfaces, the same definition as in OOP.
As with OOP, interfaces are a group of abstractions that have to be defined in the object that implements them. The members, methods and signals, are also namespaced under this interface name.

Example:

interface=org.gnome.Shell.Introspect
member method=GetRunningApplications
absolute name of method=org.gnome.Shell.Introspect.GetRunningApplications

Some interfaces are commonly implemented by objects, such as the org.freedesktop.Introspectable interface, which, as the name implies, makes the object introspectable. It allows to query the object about its capabilities, features, and other interfaces it implements. This is a very useful feature because it allows discovery.
It’s also worth mentioning that dbus can be used in a generic way to set and get properties of services’ objects through the org.freedesktop.DBus.Properties interface.

Interfaces can be described as standard, and for documentation, in D-Bus XML configuration files so that other programmers can use the reference to implement them properly. These files can also be used to auto-generate classes from the XML, making it quicker to implement and less error-prone.
These files can usually be found under /usr/share/dbus-1/interfaces/. Our org.gnome.Shell.Introspect of earlier is there in the file org.gnome.Shell.Introspect.xml along with our method GetRunningApplications. Here’s an excerpt of the relevant section.

<!--
	GetRunningApplications:
	@short_description: Retrieves the description of all running applications

	Each application is associated by an application ID. The details of
	each application consists of a varlist of keys and values. Available
	keys are listed below.

	'active-on-seats' - (as)   list of seats the application is active on
								(a seat only has at most one active
								application)
-->
<method name="GetRunningApplications">
	<arg name="apps" direction="out" type="a{sa{sv}}" />
</method>

Notice the type= part, which describes the format of the output, we’ll come back to what this means in the message format section, but in short each letter represents a basic type. The out direction means that it’s the type of an output value of the method, similarly in is for method parameters. See the following example taken from org.gnome.Shell.Screenshot.xml.

<!--
	ScreenshotArea:
	@x: the X coordinate of the area to capture
	@y: the Y coordinate of the area to capture
	@width: the width of the area to capture
	@height: the height of the area to capture
	@flash: whether to flash the area or not
	@filename: the filename for the screenshot
	@success: whether the screenshot was captured
	@filename_used: the file where the screenshot was saved

	Takes a screenshot of the passed in area and saves it
	in @filename as png image, it returns a boolean
	indicating whether the operation was successful or not.
	@filename can either be an absolute path or a basename, in
	which case the screenshot will be saved in the $XDG_PICTURES_DIR
	or the home directory if it doesn't exist. The filename used
	to save the screenshot will be returned in @filename_used.
-->
<method name="ScreenshotArea">
	<arg type="i" direction="in" name="x"/>
	<arg type="i" direction="in" name="y"/>
	<arg type="i" direction="in" name="width"/>
	<arg type="i" direction="in" name="height"/>
	<arg type="b" direction="in" name="flash"/>
	<arg type="s" direction="in" name="filename"/>
	<arg type="b" direction="out" name="success"/>
	<arg type="s" direction="out" name="filename_used"/>
</method>
  • Proxies

Proxies are the nuts and bolts of an RPC ecosystem, they represent remote objects, along with their methods, in your native code as if they were local. Basically, these are wrappers to make it more simple to manipulate things on D-Bus programmatically instead of worrying about all the components we’ve mentioned above. Programming with proxies might look like this.

Proxy proxy = new Proxy(getBusConnection(), "/remote/object/path");
Object returnValue = proxy.MethodName(arg1, arg2);
  • Bus names

The bus name, or also sometimes called connection name, is the name of the connection that an application gets assigned when it connects to D-Bus. Because D-Bus is a bus architecture, it requires that each assigned name be unique, you can’t have two applications using the same bus name. Usually, it is the D-Bus daemon that generates this random unique value, one that begins with a colon by convention, however, applications may ask to own well-known names instead. These well-known names, as reverse domain names, are for cases when people want to agree on a standard unique application that should implement a certain behavior. Let’s say for instance a specification for a com.mycompany.TextEditor bus name, where the mandatory object path should be /com/mycompany/TextFileManager, and supporting interface org.freedesktop.FileHandler. This makes the desktop environment more predictable and stable. However, today this is still only a dream and has nothing to do with current desktop environment implementations.

  • Connection and address of D-Bus daemon

The D-Bus daemon is the core of D-Bus, it is what everything else attaches itself to. Thus, the address that the daemon uses and listens to should be well known to clients. The mean of communication can be varied from UNIX domain sockets to TCP/IP sockets if used remotely.
In normal scenarios, there are two daemons running, a system-wide daemon and a per-session daemon, one for system-level applications and one for session related applications such as desktop environments. The address of the session bus can be discovered by reading the environment variable $DBUS_SESSION_BUS_ADDRESS, while the address of the system D-Bus daemon is discovered by checking a predefined UNIX domain socket path, though it can be overridden by using another environment variable, namely $DBUS_SYSTEM_BUS_ADDRESS.
Keep in mind that it’s always possible to start private buses, private daemons for non-standard use.

  • Service

A service is the application daemon connected to a bus that provides some utility to clients via the objects it contains that implement some interfaces. Normally we talk of services when the bus name is well-known, as in not auto-generated but using a reverse domain name. Due to D-Bus nature, services are singleton and owner of the bus name, and thus are the only applications that can fulfill specific requests. If any other application wants to use the particular bus name they have to wait in a queue of aspiring owner until the first one relinquishes it.

Within the D-Bus ecosystem, you can request that the D-Bus daemon automatically start a program, if not already started, that provides a given service (well-known name) whenever it’s needed. We call this service activation. It’s quite convenient as you don’t have to remember what application does what, nor care if it’s already running, but instead send a generic request to D-Bus and rely on it to launch it.

To do this we have to define a service file in the /usr/share/dbus-1/services/ directory that describes what and how the service will run.
A simple example goes as follows.

[D-BUS Service]
Name=org.gnome.ServiceName
Exec=program-providing-servicename

You can also specify the user with which the command will be executed using a User= line, and even specify if it’s in relation with a systemd service using SystemdService=.

Additionally, if you are creating a full service, it’s a good practice to define its interfaces explicitly in the /usr/share/dbus-1/interfaces as we previously mentioned.

Now when calling the org.gnome.ServiceName, D-Bus will check to see if the service exists already on the bus, if not it will block the method call, search for the service in the directory, if it matches, start the service as specified to take ownership of the bus name, and finally continue with the method call. If there’s no service file, an error is returned. It’s possible programmatically to make such call asynchronous to avoid blocking.

This is actually a mechanism that systemd can use for service activation when the application acquires a name on dbus (Service Type=dbus). For example, polkit and wpa_supplicant. When the dbus daemon is started with --systemd-activation, as shown below, then systemd services can be started on the fly whenever they are needed. That’s also related to SystemdService= we previously mentioned, as both a systemd unit file and a dbus daemon service file are required in tandem.

dbus         498       1  0 Jun05 ?        00:01:41 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
vnm          810     795  0 Jun05 ?        00:00:19 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

And the systemd unit file for polkit.

[Unit]
Description=Authorization Manager
Documentation=man:polkit(8)

[Service]
Type=dbus
BusName=org.freedesktop.PolicyKit1
ExecStart=/usr/lib/polkit-1/polkitd --no-debug

Here’s an exploratory example of service activation.
Let’s say we found a service file for Cheese (A webcam app) in the /service directory that is called org.gnome.Cheese.service.

We have no clue what interfaces and methods it implements because its interfaces aren’t described in the /interfaces directory, so we send it any message.

$ dbus-send --session \
--dest=org.gnome.Cheese \
/ org.gnome.Cheese.nonexistent

If we now take a look at the processes, we can clearly see it has been started by the dbus daemon.

$ ps -ef | grep cheese
vnm        56841     716 11 20:53 ?        00:00:00 /usr/bin/cheese --gapplication-service
vnm        56852   56783  0 20:53 pts/4    00:00:00 grep -i cheese

Cheese probably implements introspect so let’s try to see which methods it has.

$ gdbus introspect --session \
--dest org.gnome.Cheese \
--object-path /org/gnome/Cheese | less

We can see that it implements the org.freedesktop.Application interface that is described here, but that I couldn’t find the interface description of in /usr/share/dbus-1/interfaces/. So let’s try to call one of them, the org.freedesktop.Application.Activate seems interesting, it should start the application for us.

$ gdbus call --session --dest org.gnome.Cheese \
--object-path /org/gnome/Cheese \
--method org.freedesktop.Application.Activate  '{}'

NB: I’m using gdbus instead of dbus-send because dbus-send has limitation with complex types such as (a{sv}), a dictionary of key with type “string” and value of type “variant”. We’ll explain the types in the next section.

And cheese will open.
So this call is based on pure service activation.

What kind of messages are sent, and what’s up with the type we mentioned.

Messages, the unit of data transfer in D-Bus, are composed of header and data. The header contains information regarding the sender, receiver, and the message type, while the data is the payload of the message.

The D-Bus message type, not to be confused with the type format of the data payload, could be either a signal (DBUS_MESSAGE_TYPE_SIGNAL), a method call(DBUS_MESSAGE_TYPE_SIGNAL), or an error (DBUS_MESSAGE_TYPE_ERROR).

D-Bus is fully typed and type-safe as far as the payload is concerned, that means the types are predefined and are checked to see if they fit the signatures.

The following types are available:

<contents>   ::= <item> | <container> [ <item> | <container>...]
<item>       ::= <type>:<value>
<container>  ::= <array> | <dict> | <variant>
<array>      ::= array:<type>:<value>[,<value>...]
<dict>       ::= dict:<type>:<type>:<key>,<value>[,<key>,<value>...]
<variant>    ::= variant:<type>:<value>
<type>       ::= string | int16 | uint16 | int32 | uint32 | int64 | uint64 | double | byte | boolean | objpath

These are what is represented in the previous example with the type= in the interface definition. Here are some descriptions.

b           ::= boolean
s           ::= string
i           ::= int
u           ::= uint
d           ::= double
o           ::= object path
v           ::= variant (could be different types)
a{keyvalue} ::= dictionary of key-value type
a(type)     ::= array of value of type

As was said, the actual method of transfer of the information isn’t mandated by the protocol, but it can usually be done locally via UNIX sockets, pipes, or via TCP/IP.

It wouldn’t be very secure to have anyone on the machine be able to send messages to the dbus daemon and do service activation, or call any and every method, some of them could be dealing with sensitive data and activities. It wouldn’t be very secure either to have this data sent in plain text.
On the transfer side, that is why D-Bus implements a simple protocol based on SASL profiles for authenticating one-to-one connections. For the authorization, the dbus daemon controls access to interfaces by a security system of policies.

The policies are read and represented in XML files that can be found in multiple places, including /usr/share/dbus-1/session.conf, /usr/share/dbus-1/system.conf/, /usr/share/dbus-1/session.d/*, and /usr/share/dbus-1/system.d/*.
These files mainly control which user can talk to which interface. If you are not able to talk with a D-Bus service or get an org.freedesktop.DBus.Error.AccessDeniederror, then it’s probably due to one of these files.

For example:

<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
	<policy user="vnm">
		<allow own="net.nixers"/>
		<allow send_destination="net.nixers"/>
		<allow send_interface="net.nixers.Blog" send_member="GetPosts"/>
	</policy>
</busconfig>

In this example, the user “vnm” can:

  • Own the interface net.nixers
  • Send messages to the owner of the given service
  • Call GetPosts from interface net.nixers.Blog

If services need more granularity when it comes to permission, then polkit can be used instead.

There’s a lot more that can be configured in the dbus daemon, namely in the configuration files for the session wide daemon in /usr/share/dbus-1/session.conf, and the system wide daemon in /usr/share/dbus-1/system.conf. Such as the way it listens to connections, the limits regarding messages, and where they read other files.

So how do we integrate and harness dbus in our client or service programs.

libdbus schema

We do this using libraries, of course, which there are many. The most low-level one being libdbus, the reference implementation of the specification. However, it’s quite hard to use so people rely on other libraries such as GDBus (part of GLib in GNOME), QtDBus (part of Qt so KDE too), dbus-java, and sd-bus (which is part of systemd).
Some of these libraries offer the proxy capability we’ve talked, namely manipulating dbus objects as if they were local. They also could offer ways to generate classes in the programming language of choice by inputting an interface definition file (see gdbus-codegen and qdbusxml2cpp for an idea).

Let’s name a few projects that rely on D-Bus.

  • KDE: A desktop environment based on Qt
  • GNOME: A desktop environment based on gtk
  • Systemd: An init system
  • Bluez: A project adding Bluetooth support under Linux
  • Pidgin: An instant messaging client
  • Network-manager: A daemon to manage network interfaces
  • Modem-manager: A daemon to provide an API to dial with modems - works with Network-Manager
  • Connman: Same as Network-Manager but works with Ofono for modem
  • Ofono: A daemon that exposing features provided by telephony devices such as modems

One thing that is nice about D-Bus is that there is a lot of tooling to interact with it, it’s very exploratory.

Here’s a bunch of useful ones:

  • dbus-send: send messages to dbus
  • dbus-monitor: monitor all messages
  • gdbus: manipulate dbus with gtk
  • qdbus: manipulate dbus with qt
  • QDBusViewer: exploratory gui
  • D-Feet: exploratory gui

I’ll list some examples.

Monitor all the method calls in the org.freedesktop namespace.

$ dbus-monitor --session type=method_call \
interface=org.freedesktop

For instance, we can debug what happens when we use the command line tool notify-send(1).

This is equivalent to this line of gdbus(1).

$ gdbus call --session --dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications \
--method org.freedesktop.Notifications.Notify \
my_app_name 42 \
gtk-dialog-info "The Summary" \
"Here's the body of the notification" '[]' '{}' 5000

Or as we’ve seen, we can use dbus-send(1), however it has some limitations with dictionaries and variant types. Here are some more examples of it.

$ dbus-send --system --print-reply \
--dest=org.freedesktop.systemd1 \
/org/freedesktop/systemd1/unit/apache2_2eservice \
org.freedesktop.DBus.Properties.Get \
string:'org.freedesktop.systemd1.Unit' \
string:'ActiveState'

$ dbus-send --system --print-reply --type=method_call \
--dest=org.freedesktop.systemd1 \
/org/freedesktop/systemd1 \
org.freedesktop.systemd1.Manager.GetUnit \
string:'apache2.service'

D-Feet QDBusViewer

D-Feet and QDBusViewer are GUI that are driven by the introspectability of objects. You can also introspect using gbus and qdbus.

Either through calling org.freedesktop.DBus.Introspectable.Introspect.

With gdbus:

$ gdbus call --session --dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications \
--method org.freedesktop.DBus.Introspectable.Introspect

With dbus-send:

$ dbus-send --session --print-reply \
--dest=org.freedesktop.Notifications \
/org/freedesktop/Notifications \
org.freedesktop.DBus.Introspectable.Introspect

Or by using the introspect feature of the tool, here gdbus, which will output it in a fancy colored way:

$ gdbus introspect --session \
--dest org.freedesktop.Notifications \
--object-path /org/freedesktop/Notifications

D-Bus is not without limitations and critics. As we said in the introduction, it isn’t meant for high performance IPC, it’s meant for control, and not data transfer. So it’s fine to use it to activate a chat application, for instance, but not to have a whole media stream pass on it.
D-Bus has also been criticized as being bloated and over-engineered, though those claims are often unsubstantiated and only come from online rants. It remains that D-Bus is still heavily popular and that there’s no replacement that is a real contender.

Now, let’s turn our attention to Polkit.

Polkit, formerly PolicyKit, is a service running on dbus that offers clients a way to perform granular system-wide privilege authentication, something dbus default policies are not able to do, nor sudo.
Unlike sudo, that switches the user and grants permission to the whole process, polkit delimits distinct actions, categorizes users by group or name, and decides whether the action is allowed or not. This is all offered system-wide, so that dbus services can query polkit to know if clients have privileges or not.
In polkit parlance, we talk of MECHANISMS, privileged services, that offer actions to SUBJECTS, which are unprivileged programs.

The polkit authority is a system daemon, usually dbus service activated, named “polkitd”, and running as the polkitd user UID.

$ ps -ef | grep polkitd
polkitd   904  1  0 Jun05 ?  00:00:34 /usr/lib/polkit-1/polkitd --no-debug

The privileged services (MECHANISMS) can define a set of actions for which authentication is required. If another process wants to access the method of such privileged service, maybe through dbus method call, the privilege service will query polkit. Polkit will then consult two things, the action policy defined by that service and a set of programmatic rules that generally apply. If needed, polkit will initiate an authentication agent to verify that the user is who they say they are. Finally, polkit sends its result back to the privilege service and let it know if the user is allowed to perform the action or not.

In summary, the following definitions apply:

  • Subject - a user
  • Action - a privileged duty that (generally) requires some authentication.
  • Result - the action to take given a subject/action pair and a set of rules. This may be to continue, to deny, or to prompt for a password.
  • Rule - a piece of logic that maps a subject/action pair to a result.

And they materialize in these files:

  • /usr/share/polkit-1/actions - Default policies for each action. These tell polkit whether to allow, deny, or prompt for a password.
  • /etc/polkit-1/rules.d - User-supplied rules. These are JavaScript scripts.
  • /usr/share/polkit-1/rules.d - Distro-supplied rules. Do not change these because they will be overwritten by the next upgrade.

Which can be summarized in this picture:

polkit architecture

Thus, polkit works along a per-session authentication agent, usually started by the desktop environment. This is another service that is used whenever a user needs to be prompted for a password to prove its identity.
The polkit package contains a textual authentication agent called pkttyagent, which is used as a general fallback but lacks in features. I advise anyone that is trying the examples in this post to install a decent authentication agent instead.

Here’s a list of popular ones:

  • lxqt-policykit - which provides /usr/bin/lxqt-policykit-agent
  • lxsession - which provides /usr/bin/lxpolkit
  • mate-polkit - which provides /usr/lib/mate-polkit/polkit-mate-authentication-agent-1
  • polkit-efl - which provides /usr/bin/polkit-efl-authentication-agent-1
  • polkit-gnome - which provides /usr/lib/polkit-gnome/polkit-gnome-authentication-agent-1
  • polkit-kde-agent - which provides /usr/lib/polkit-kde-authentication-agent-1
  • ts-polkitagent - which provides /usr/lib/ts-polkitagent
  • xfce-polkit - which provides /usr/lib/xfce-polkit/xfce-polkit

Authentication agent

Services/mechanisms have to define the set of actions for which clients require authentication. This is done through defining a policy XML file in the /usr/share/polkit-1/actions/ directory. The actions are defined in a namespaced format, and there can be multiple ones per policy file.
A simple, grep '<action id' * | less in this directory should give an idea of the type of actions that are available. You can also list all the installed polkit actions, using the pkaction(1) command.

For example:

org.xfce.thunar.policy: <action id="org.xfce.thunar">
org.freedesktop.policykit.policy:  <action id="org.freedesktop.policykit.exec">

NB: File names aren’t required to be the same as the action id namespace.

This file defines metadata information for each action, such as the vendor, the vendor URL, the icon name, the message that will be displayed when requiring authentication in multiple languages, and the description. The important sections in the action element are the defaults and annotate elements.

The defaults element is the one that polkit inspects to know if a client is authorized or not. It is composed of 3 mandatory sub-elements: allow_any for authorization policy that applies to any client, allow_inactive for policy that apply to clients in inactive session on local console, and allow_active for client in the currently active session on local consoles.
These elements take as value one of the following:

  • no - Not authorized
  • yes - Authorized.
  • auth_self - The owner of the current session should authenticate (usually the user that logged in, your user password)
  • auth_admin - Authentication by the admin is required (root)
  • auth_self_keep - Same as auth_self but the authentication is kept for some time that is defined in polkit configurations.
  • auth_admin_keep - Same as auth_admin but also keeps it for some time

The annotate element is used to pass extra key-value pair to the action. There can be multiple key-value that are passed. Some annotations/key-values are well known, such as the org.freedesktop.policykit.exec.path which, if passed to the pkexec program that is shipped by default with polkit, will tell it how to execute a certain program.
Another defined annotation is the org.freedesktop.policykit.imply which will tell polkit that if a client was authorized for the action it should also be authorized for the action in the imply annotation.
One last interesting annotation is the org.freedesktop.policykit.owner, which will let polkitd know who has the right to interrogate it about whether other users are currently authorized to do certain actions or not.

Other than policy actions, polkit also offers a rule system that is applied every time it needs to resolve authentication. The rules are defined in two directories, /etc/polkit-1/rules.d/ and /usr/share/polkit-1/rules.d/. As users, we normally add custom rules to the /etc/ directory and leave the /usr/share/ for distro packages rules.
Rules within these files are defined in javascript and come with a preset of helper methods that live under the polkit object.

The polkit javascript object comes with the following methods, which are self-explanatory.

  • void addRule( polkit.Result function(action, subject) {...});
  • void addAdminRule( string[] function(action, subject) {...}); called when administrator authentication is required
  • void log( string message);
  • string spawn( string[] argv);

The polkit.Result object is defined as follows:

polkit.Result = {
    NO              : "no",
    YES             : "yes",
    AUTH_SELF       : "auth_self",
    AUTH_SELF_KEEP  : "auth_self_keep",
    AUTH_ADMIN      : "auth_admin",
    AUTH_ADMIN_KEEP : "auth_admin_keep",
    NOT_HANDLED     : null
};

Note that the rule files are processed in alphabetical order, and thus if a rule is processed before another and returns any value other than polkit.Result.NOT_HANDLED, for example polkit.Result.YES, then polkit won’t bother continuing processing the next files. Thus, file name convention does matter.

The functions polkit.addRule, and polkit.addAdminRule, have the same arguments, namely an action and a subject. Respectively being the action being requested, which has an id attribute, and a lookup() method to fetch annotations values, and the subject which has as attributes the pid, user, groups, seat, session, etc, and methods such as isInGroup, and isInNetGroup.

Here are some examples taken from the official documentation:

Log the action and subject whenever the action org.freedesktop.policykit.exec is requested.

polkit.addRule(function(action, subject) {
    if (action.id == "org.freedesktop.policykit.exec") {
        polkit.log("action=" + action);
        polkit.log("subject=" + subject);
    }
});

Allow all users in the admin group to perform user administration without changing policy for other users.

polkit.addRule(function(action, subject) {
    if (action.id == "org.freedesktop.accounts.user-administration" &&
        subject.isInGroup("admin")) {
        return polkit.Result.YES;
    }
});

Define administrative users to be the users in the wheel group:

polkit.addAdminRule(function(action, subject) {
    return ["unix-group:wheel"];
});

Run an external helper to determine if the current user may reboot the system:

polkit.addRule(function(action, subject) {
    if (action.id.indexOf("org.freedesktop.login1.reboot") == 0) {
        try {
            // user-may-reboot exits with success (exit code 0)
            // only if the passed username is authorized
            polkit.spawn(["/opt/company/bin/user-may-reboot",
                          subject.user]);
            return polkit.Result.YES;
        } catch (error) {
            // Nope, but do allow admin authentication
            return polkit.Result.AUTH_ADMIN;
        }
    }
});

The following example shows how the authorization decision can depend on variables passed by the pkexec(1) mechanism:

polkit.addRule(function(action, subject) {
    if (action.id == "org.freedesktop.policykit.exec" &&
        action.lookup("program") == "/usr/bin/cat") {
        return polkit.Result.AUTH_ADMIN;
    }
});

Keep in mind that polkit will track changes in both the policy and rules directories, so there’s no need to worry about restarting polkit, changes will appear immediately.

We’ve mentioned a tool called pkexec(1) that comes pre-installed along polkit. This program lets you execute a command as another user, by default executing it as root. It is a sort of sudo replacement but that may appear confusing to most users who have no idea about polkit. However, the integration with authentication agent is quite nice.

So how do we integrate and harness polkit in our subject and mechanism software. We do this using libraries, of course, which there is are many to integrate with different desktop environments.
The libpolkit-agent-1, or the libpolkit-gobject-1 (gtk), libraries are used by the mechanisms, and this is most of what is needed. The portion of code that requires authentication can be wrapped with a check on polkit.
For instance, the polkit_authority_check_authorization() is used to check whether a subject is authorized.

As for writing an authentication agent, it will have to implement the registration methods to be able to receive requests from polkit.

Remember, polkit is a dbus service, and thus all its interfaces are well known and can be introspected. That means that you can possibly interact with it directly through dbus instead of using a helper library.

Polkit also offers some excellent manpages that are extremely useful, be sure to check polkit(8), polkitd(8), pkcheck(1), pkaction(1), pkexec(1).

The following tools are of help:

  • polkit-explorer or polkitex - a GUI to inspect policy files
  • pkcreate - a WIP tool to easily create policy files, but it seems it is lacking
  • pkcheck - Check whether a subject has privileges or not
  • pkexec - Execute a command as another user

Let’s test through some examples.

First pkaction(1), to query the policy file.

$ pkaction -a org.xfce.thunar -v

org.xfce.thunar:
  description:       Run Thunar as root
  message:           Authentication is required to run Thunar as root.
  vendor:            Thunar
  vendor_url:        https://xfce.org/
  icon:              system-file-manager
  implicit any:      auth_self_keep
  implicit inactive: auth_self_keep
  implicit active:   auth_self_keep
  annotation:        org.freedesktop.policykit.exec.path -> /usr/bin/thunar
  annotation:        org.freedesktop.policykit.exec.allow_gui -> true

Compared to polkitex:

freedesktop logo

We can get the current shell PID.

$ ps
    PID TTY          TIME CMD
 421622 pts/21   00:00:00 zsh
 421624 pts/21   00:00:00 ps

And then give ourselves temporary privileges to org.freedesktop.systemd1.manage-units permission.

$ pkcheck --action-id 'org.freedesktop.systemd1.manage-units' --process 421622 -u
$ pkcheck --list-temp
authorization id: tmpauthz10
action:           org.freedesktop.systemd1.manage-units
subject:          unix-process:421622:195039910 (zsh)
obtained:         26 sec ago (Sun Jun 28 10:53:39 2020)
expires:          4 min 33 sec from now (Sun Jun 28 10:58:38 2020)

As you can see, if the auth_admin_keep or auth_self_keep are set, the authorization will be kept for a while and can be listed using pkcheck.

You can try to exec a process as another user, just like sudo:

$ pkexec /usr/bin/thunar

If you want to override the currently running authentication agent, you can test having pkttyagent running in another terminal passing it the -p argument for the process it will listen to.

# terminal 1
$ pkttyagent -p 423619
# terminal 2
$ pkcheck --action-id 'org.xfce.thunar' --process 423619 -u
# will display in terminal 1
polkit\56temporary_authorization_id=tmpauthz13
polkit\56retains_authorization_after_challenge=true
==== AUTHENTICATING FOR org.xfce.thunar ====
Authentication is required to run Thunar as root.
Authenticating as: vnm
Password: 
==== AUTHENTICATION COMPLETE ====

So this is it for polkit, but what’s the deal with consolekit and systemd logind, and what’s the relation with polkit.

Remember we’ve talked about sessions when discussing the <default> element of polkit policy files, this is where these two come in. Let’s quote again:

  • auth_self - The owner of the current session should authenticate (usually the user that logged in, your user password)
  • allow_active - for client in the currently active session on local consoles

The two programs consolekit and systemd logind have as purpose to be services on dbus that can be interrogated about the status of the current session, its users, its seats, its login. It can also be used to manage the session with methods for shutting down, suspending, restarting, and hibernating the machine.

$ loginctl show-session $XDG_SESSION_ID
Id=2
Name=vnm
Timestamp=Fri 2020-06-05 21:06:43 EEST
[...snip...]
Remote=no
Active=yes
State=active

# in another terminal we monitor using
$ dbus-monitor --system
# and the output
method call time=1593360621.762509 sender=:1.59516 \
-> destination=org.freedesktop.login1 serial=2 \
path=/org/freedesktop/login1; \
interface=org.freedesktop.login1.Manager; \
member=GetSession

method call time=1593360621.763069 sender=:1.59516 \
-> destination=org.freedesktop.login1 serial=3 \
path=/org/freedesktop/login1/session/_32; \
interface=org.freedesktop.DBus.Properties; \
member=GetAll

As can be seen, this is done through the org.freedesktop.login1.Manager bus name.

And so, polkit uses data gathered from systemd logind or consolekit to create the 3 domain rules we’ve seen, the allow_any, allow_inactive, and allow_active. This is where these two interact with one another.
The following condition applies for the returned value of systemd logind:

  • allow_any mean any session (even remote sessions)
  • allow_inactive means Remote == false and Active == false
  • allow_active means Remote == false and Active == true


In conclusion, all these technologies, D-Bus, polkit, and systemd logind, are inherently intertwined, and this is as much a positive aspect as it is a fragile point of failure. They each complete one another but if one goes down, there could be issues echoing all across the system.
I hope this post has removed the mystification around them and helped anyone to understand what they stand for: Yet another glue in the desktop environments, similar to this post but solving another problem.






References:

July 04, 2020

Jeff Carpenter (jeffcarp)

用20行Python构建Markov Chain语句生成器 July 04, 2020 06:44 PM

A bot who can write a long letter with ease, cannot write ill. —Jane Austen, Pride and Prejudice 这篇文章将引导您逐步学习如何使用Python从头开始编写马尔可夫链(Markov Chain),以生成好像一个真实的人写的英语的全新句子。 简·奥斯丁的《傲慢与偏见》(Pride and Prejudice by Jane Austen) 是我们用来构建马尔可夫链的文字。 Colab 上有一篇可运行的笔记本版本。 Read the English version of this post here. Setup 首先下载“傲慢与偏见”的全文。 # 下载Pride and Prejudice和并切断头. !curl https://www.gutenberg.org/files/1342/1342-0.txt | tail -n+32 > /content/pride-and-prejudice.txt # 预览文件. !head -n 10 /content/pride-and-prejudice.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 707k 100 707k 0 0 1132k 0 --:--:-- --:--:-- --:--:-- 1130k PRIDE AND PREJUDICE By Jane Austen Chapter 1 It is a truth universally acknowledged, that a single man in possession 添加一些必要的导入。

July 03, 2020

Unrelenting Technology (myfreeweb)

Wow, about a month ago Spot (ex-Spotinst), the service that can... July 03, 2020 12:36 AM

Wow, about a month ago Spot (ex-Spotinst), the service that can auto-restore an EC2 spot instance after it gets killed, fixed their arm64 support! (Used to be that it would always set the AMI’s “architecture” metadata to amd64, haha.)

And of course their support didn’t notify me that it was fixed , the service didn’t auto-notify me that an instance finally was successfully restored after months of trying and failing, AWS didn’t notify either (it probably can but I haven’t set anything up?), so I wasted a few bucks running a spare inaccessible clone server of my website. Oh well, at least now I can use a spot instance again without worrying about manual restore.

UPD: hmm, it still tried i386 on another restore! dang it.

Pete Corey (petecorey)

Recursive Roguelike Tiles July 03, 2020 12:00 AM

The roguelike dungeon generator we hacked together in a previous post sparked something in my imagination. What kind of flora and fauna lived there, and how could we bring them to life?

The first thing that came to mind was grass. We should have some way of algorithmically generating grass throughout the open areas of our dungeon. A quick stab at adding grass could be to randomly colorize floor tiles as we render them:

But this isn’t very aesthetically pleasing. The grass tiles should be smaller than the walkable tiles to give us some visual variety. We could model this by giving every ground tile a set of grass tiles. All of the grass tiles in a given area live entirely within their parent ground tile.

This is better, but we can go further. To spice up our grass, let’s inject some life into it. We’ll model our grass cells as a basic cellular automaton that changes its state over time, looking to its immediate neighbors to decide what changes to make.

Because of how we recursively modeled our tiles, finding all of the neighbors of a single grass tile takes some work:


const getGrass = (x1, y1, x2, y2) => {
    let ground = state[y1 * w + x1];
    return _.get(ground, `grass.${y2 * grassWidth + x2}`{:.language-javascript});
};

const getGrassNeighbors = (i, x, y) => {
    let ix = i % w;
    let iy = Math.floor(i / w);
    return _.chain([
        [-1, -1],
        [0, -1],
        [1, -1],
        [-1, 0],
        [1, 0],
        [-1, 1],
        [0, 1],
        [1, 1],
    ])
        .map(([dx, dy]) => {
            let nx = x + dx;
            let ny = y + dy;
            if (nx >= 0 && nx < grassWidth && ny >= 0 && ny < grassWidth) {
                return getGrass(ix, iy, nx, ny);
            } else if (nx < 0 && ny >= 0 && ny < grassWidth) {
                return getGrass(ix - 1, iy, grassWidth - 1, ny);
            } else if (nx >= grassWidth && ny >= 0 && ny < grassWidth) {
                return getGrass(ix + 1, iy, 0, ny);
            } else if (nx >= 0 && nx < grassWidth && ny < 0) {
                return getGrass(ix, iy - 1, nx, grassWidth - 1);
            } else if (nx >= 0 && nx < grassWidth && ny >= grassWidth) {
                return getGrass(ix, iy + 1, nx, 0);
            } else if (nx < 0 && ny < 0) {
                return getGrass(ix - 1, iy - 1, grassWidth - 1, grassWidth - 1);
            } else if (nx < 0 && ny >= grassWidth) {
                return getGrass(ix - 1, iy + 1, grassWidth - 1, 0);
            } else if (nx >= grassWidth && ny < 0) {
                return getGrass(ix + 1, iy - 1, 0, grassWidth - 1);
            } else if (nx >= grassWidth && ny >= grassWidth) {
                return getGrass(ix + 1, iy + 1, 0, 0);
            }
        })
        .reject(_.isUndefined)
        .value();
};

Once we get get each grass cell’s neighbors (sometimes dipping into a neighboring ground cell’s grass tiles), we can start modeling a basic cellular automaton.

In this example, if a grass tile has more than four neighbors that are “alive”, we set its value to the average of all of its neighbors, smoothing the area out. Otherwise, we square it’s value, effectively darkening the tile:


for (let y = 0; y < grassWidth; y++) {
    for (let x = 0; x < grassWidth; x++) {
        let grass = cell.grass[y * grassWidth + x];
        let neighbors = getGrassNeighbors(i, x, y);
        let alive = _.filter(neighbors, ({ value }) => value > 0.5);
        if (_.size(alive) > 4) {
            cell.grass[y * grassWidth + x].value = _.chain(neighbors)
                .map("value")
                .mean()
                .value();
        } else {
            cell.grass[y * grassWidth + x].value =
                cell.grass[y * grassWidth + x].value *
                cell.grass[y * grassWidth + x].value;
        }
    }
}

There is no rhyme of reason for choosing these rules, but they produce interesting results:

We can even take this idea of recursive tile further. What if every grass tile had a set of flower tiles? Again, those flower tiles could be driven by cellular automata rules, or simple randomly generated.

Now I’m even more pulled in. What else lives in these caves? How do they change over time? Refresh the page for more dunegons!

canvas { width: 100%; height: 100%; }

July 01, 2020

Jan van den Berg (j11g)

How I read 52 books in a year July 01, 2020 07:30 PM

My book tracking app alerted me that I read 52 books over the last twelve months. So, *franticly crunching numbers* yes, indeed, that averages to one book per week!

This brings the book average to 226 pages per book.

I follow a couple of blogs of people that read way more than I do. Like these guys, respectively read 116, 105, 74 and 58 books in 2019. I don’t know how they managed to do so, but 52 is definitely a personal best for me and this blogpost is about how I did this.

When I say that I have read a book, I mean: I read it cover to cover. No skimming or skipping, or glossing through. That’s not reading. And no audio books. Nothing against that, but my point is to read a book as the author intended it (of course, this is different when you study a subject and need to pick and choose parts).
Full disclosure, I am currently experimenting reading Moby Dick with the book in hand and the audio book playing along. It’s fun, and a good way to get your teeth into such a classic. But I still need my eyes to follow the words and I don’t think listening to an audiobook while doing other things is the same experience. A book is not a podcast.

Getting serious

I’ve always liked reading but if I had to state a regret it would still be that I wish I had read more. There is always a certain anxiety when I enter a library or bookstore. The average human, or even a frantic reader, will never read more than a few thousand books in their lifetime. So I can never read just what my local library has in stock: even if it takes a lifetime. There are just too.many.books. With this in mind, a minute watching TV is a minute wasted reading.

I realised I find few activities more rewarding than reading. With this realisation in mind I consciously decided that I would take reading more seriously. And of course I still watch a little bit of TV and movies, but just a bit more consciously.

Here are some principles I developed around reading to keep me on track.

Principle 1: Track broadly

For me, this is key. So much so, that last year I wrote my own book tracking app, to exactly fit my needs. In my app I cannot only track what I have read, or am currently reading, but also what I want to read.

I used to use a spreadsheet, whatever works for you, but I was often getting lost in what I was reading (see Principle 2). So having this app definitely helps.

Principle 2: Read widely

This may be the most important principle on multiple levels. It not only means that I want to read many different books or genres but also that I like to read them simultaneously.

Of course I have favorite genres or subjects, but I try to be open-minded about every book (I wouldn’t snuff Danielle Steele). You never know what you might learn about yourself.

Levels

And before I meticulously kept track, this is usually where I got lost. Not every book demands the same energy or attention level and you should be able to switch it up without regret.

Which I do. So at a certain point last year I was reading 11 different books at once: diaries, biographies, novels, management books, historical books. You name it. Because my app allows me to directly see what I started it’s easy to keep track of this and — most importantly — switch it up when I am not feeling a certain book. Instead of dreading picking up a certain book for months or a half read book getting lost on my bookshelf I just move on to a different book, and know I will eventually get to that book. My app tracks it. And I always do! Some books I haven’t touched in months but I pick em up again after some time when I feel like it, and more often than not it’s usually a better experience. I have now had this experience more than once. And it was quite the revelation. The lesson is: different moods ask for different books.

So far I only actively stopped reading two books, with no intention of reading any further ever (this is fine!). So this is rare. Most books I start, I have already done a little bit of research, to know enough that I want to read them.

Another benefit when you switch a lot between books is that I noticed it helps to retain what the books are about. It’s a different experience when you read a book over two months as opposed to two days. Because you have to actively remind yourself of what the book was about again.

Principle 3: Buy loosely

The app allows me to add books to my wish list, and as you can see in the screenshot I bought 90 books last year. Mostly from thrift stores, they are absolute goldmines. And yes, I don’t read e-books. I need to feel paper.

The ‘Books I want‘ list from my app is a guideline for thrift store visits, but mostly I just look all over the place. And I used to be a bit hesitant to buy a book, as it would indicate a future commitment to myself to read it. But since reading Nassim Nicholas Taleb’s Black Swan and his thoughts on famous writer Umberto Eco’s personal library (here and here), I have been able to shake this habit a bit. So if a book looks interesting: buy it!

Bookmark stickies.

Tips

So those are the three main principles. Here are some other tips that help to keep your reading on track.

  • I dislike using a highlighter. It ruins books. Even if it’s just paper that got for 50 cents a thrift store.
  • I have used the classic highlighters and last year I moved to a pencil highlighter, a little bit less permanent but still not great. So since a couple of months I use TRANSPARENT bookmark stickies.
    • They not permanent.
    • I can still read what I highlighted.
    • I can remove them without tearing the paper.
  • It doesn’t matter what type of book it is, I read every book with a stack of sticky bookmarks and annotate what I like or want to remember. (This would definitely be my number one reason to move to eBooks at some point..).
  • To retain things, I usually read the sticky parts again after finishing or when picking up a book if it has been a while.
  • Read everyday. Even it’s just a couple of minutes. Don’t break the chain. Create a habit.
  • Put your phone on mute. I do most of my reading between 8 and 10 pm. If you text or call me between those hours, I probably won’t see or hear it.
  • Write! After all, what good is reading if you don’t write? I tend to blog about every book I read (few exceptions: i.e. when it’s a really small book). This helps with retention and thinking about what you liked or want to remember. And also you create your own little archive. I often look up my own posts, to see what I was thinking.

So there you have it! Now, let’s see what’s on TV.

The post How I read 52 books in a year appeared first on Jan van den Berg.

Pete Corey (petecorey)

Hello Roguelike July 01, 2020 12:00 AM

Like a lot of folks, the desire to create my own video games is what originally got me into computer programming. While I don’t play many video games these days, video game development still holds a special place in my heart. I’m especially fascinated by procedural generation techniques used in video games and many forms of computer art, so you’ll often find me creeping around the /r/roguelikedev and /r/generative subreddits.

Inspired by a recent post on /r/roguelikedev, I decided to try my hand at implementing a very basic dungeon generator using a random walk algorithm.

After getting our canvas set up, we can implement our basic random walk algorithm by starting in the center of our grid and moving in random directions, filling in each square we encounter as we come to it:


let pixelSize = 32;
let w = Math.floor(width / pixelSize);
let h = Math.floor(height / pixelSize);

let state = [];
let x = Math.floor(w / 2);
let y = Math.floor(h / 2);
let filled = 0;
let path = [];
let maxFilled = 500;

while (filled < maxFilled) {
    path.push(y * w + x);
    if (!state[y * w + x]) {
        state[y * w + x] = true;
        filled++;
    }
    let [nx, ny] = getNextDirection(x, y);
    x += nx;
    y += ny;
}

Notice that we’re also keeping track of the sequence of steps, or the path we took to as we moved through our grid. Also notice that this isn’t particularly “good” code. That doesn’t matter as long as we’re having fun.

The getNextDirection function just returns a random direction, with a little added fanciness to keep our path from falling off our grid:


let getNextDirection = (cx, cy) => {
    let [x, y] = _.sample([[0, 1], [0, -1], [1, 0], [-1, 0]]);
    if (cx + x < 1 || cy + y < 1 || cx + x >= w - 1 || cy + y >= h - 1) {
        return getNextDirection(cx, cy);
    } else {
        return [x, y];
    }
};

Animating this algorithm is its own microcosm of interesting divergences…

Once we have our fully filled out grid, we can flip our perspective and render the walls around the steps we took through the grid, rather than rendering the steps themselves:

We can add a path through our dungeon by removing the cycles from our path and tracing its newly simplified form through our grid. We could even hint at up and down stairways with orange dots at the beginning and end of our path.

This is a ridiculously simple algorithm, but what comes out of it absolutely pulls me in. What else lives in these dungeons? What kinds of things could we expect to find, and how would we bring those things to life?

Refresh the page to get more dunegons!

canvas { width: 100%; height: 100%; }

June 30, 2020

Derek Jones (derek-jones)

beta: Evidence-based Software Engineering – book June 30, 2020 10:12 PM

My book, Evidence-based software engineering: based on the publicly available data is now out on beta release (pdf, and code+data). The plan is for a three-month review, with the final version available in the shops in time for Christmas (I plan to get a few hundred printed, and made available on Amazon).

The next few months will be spent responding to reader comments, and adding material from the remaining 20 odd datasets I have waiting to be analysed.

You can either email me with any comments, or add an issue to the book’s Github page.

While the content is very different from my original thoughts, 10-years ago, the original aim of discussing all the publicly available software engineering data has been carried through (in some cases more detailed data, in greater quantity, has supplanted earlier less detailed/smaller datasets).

The aim of only discussing a topic if public data is available, has been slightly bent in places (because I thought data would turn up, and it didn’t, or I wanted to connect two datasets, or I have not yet deleted what has been written).

The outcome of these two aims is that the flow of discussion is very disjoint, even disconnected. Another reason might be that I have not yet figured out how to connect the material in a sensible way. I’m the first person to go through this exercise, so I have no idea where it’s going.

The roughly 620+ datasets is three to four times larger than I thought was publicly available. More data is good news, but required more time to analyse and discuss.

Depending on the quantity of issues raised, updates of the beta release will happen.

As always, if you know of any interesting software engineering data, please tell me.

Jan van den Berg (j11g)

Bono on Bono – Michka Assayas June 30, 2020 08:19 PM

I have a soft spot for Bono. The megalomaniac lead singer of probably the world’s most commercial band (“the only band with their own iPod”). The Irish humanitarian multi-millionaire. Yes, I get all the criticism. Still, few singers can belt it out like Bono can. And I will forever stand by that.

On May 10th this year, Bono turned 60. So I thought it would be a good time to (re)read his 2005 biography.

I got this book, with a bunch of others, in 2006 at an HMV in Manchester. Good times.

Bono on Bono – Michka Assayas (2005) – 368 pages

It sort of took me back a bit when I realised that most of this book was written in 2003 and 2004, when Bono was only a couple years older than I am now😲. By then he was of course already a very well established and very famous person. The book is written somewhere between two U2 albums: All That You Can’t Leave Behind and How To Dismantle An Atomic Bomb. So it finds Bono in a sort of musical lull, but with VERY high energy on issues like humanitarian aid en debt-relief causes.

Banter

The book is written as a dialogue, which is a very interesting concept! But I don’t think the chemistry between this Irishman and Frenchman works all the time. Or, I just don’t get their banter, because it’s cringy at times and the questions often go in directions I don’t want them to go (I would have asked different things!). It is also strange that there seems to be an effort to put everything down verbatim (with inserts like “Bono laughs” or “pauses reflectively”) while clearly this book and the interviews have been edited. Which is fine! But why the emphasis on this fake realness?

I am also not sure of the reason for this biography, other than to emphasize Bono’s humanitarian efforts. This biography therefore also suffers what so many biographies suffer from: high on current events, low on what actually made the subject into the person they are now (Neil Young’s biography is the worst example of this).

Granted, Bono is very vulnerable in discussing his youth and parents. This was probably the most revealing and most interesting part. Also because these are few of the actual biographical parts of this biography. I also enjoyed how Bono talked about his religious beliefs. You don’t always get this from the music. But the tête-à-têtes Bono had with Bush and Clinton were probably very on topic in 2005, but they seem like something from another lifetime in 2020 and less relevant.

So I get this is not a book about U2 but about Bono, but I would have expected a little bit more stories about music. And this is not like Keith Richards or Bruce Springsteen‘s tremendous biographies, which were written when they were much older and are much more about the music.

So now that I am done complaining, I could just say that this book is less of a book than more of a collection of what could be magazine interviews, but HOWEVER: I still liked it!

I mean, it’s about Bono. And he definitively is one of a kind. How could you not like it!

The post Bono on Bono – Michka Assayas appeared first on Jan van den Berg.

Who moved my cheese? – Spencer Johnson June 30, 2020 08:03 PM

People like stories, people remember stories. So, tell stories! This is what I learned from Seth Godin. But Spencer Johnson clearly understands this concept too.

Who moved my cheese? – Spencer Johnson (1998) – 95 pages

This little book embodies the concepts of how to deal with change in one memorable parable.

Johnson probably wasn’t the first to do so, but this concept — packing management theories as stories — is everywhere now. And this little book, probably has a lot to do with this trend. It was after all the bestselling book EVER at Amazon.com’s tenth anniversary. Go figure.

The post Who moved my cheese? – Spencer Johnson appeared first on Jan van den Berg.

Marx – Peter Singer June 30, 2020 08:01 PM

This was the third book in a twelve part series of introductions to famous thinkers/philosophers (previously I read Plato and Kierkegaard). You might expect these books to be small (check) and comprehensible (not so much). So like the other two books, this book suffers from the same problems.

Marx – Peter Singer (1999) – 111 pages

Sure, you’ll get an introduction on Marx, and you get a better understanding of what influenced his thinking and what his special relation to Hegel was. Interesting, enlightening, great!

However for an introduction I find the language, specifically in the critical parts, way too scientific. So I am always struggling with the question for who are these books written? Clearly an experienced philosopher would not pick up an introduction like this? And for someone just dipping their toes — it is after all an introduction — I think the language can be a bit overwhelming. So writer, who are you trying to impress? The material is there, but it could do with a bit of editing.

The post Marx – Peter Singer appeared first on Jan van den Berg.

Dan Luu (dl)

How do cars fare in crash tests they're not specifically optimized for? June 30, 2020 07:06 AM

Any time you have a benchmark that gets taken seriously, some people will start gaming the benchmark. Some famous examples in computing are the CPU benchmark specfp and video game benchmarks. With specfp, Sun managed to increase its score on 179.art (a sub-benchmark of specfp) by 12x with a compiler tweak that essentially re-wrote the benchmark kernel, which increased the Sun UltraSPARC’s overall specfp score by 20%. At times, GPU vendors have added specialized benchmark-detecting code to their drivers that lowers image quality during benchmarking to produce higher benchmark scores. Of course, gaming the benchmark isn't unique to computing and we see people do this in other fields. It’s not surprising that we see this kind of behavior since improving benchmark scores by cheating on benchmarks is much cheaper (and therefore higher ROI) than improving benchmark scores by actually improving the product.

As a result, I'm generally suspicious when people take highly specific and well-known benchmarks too seriously. Without other data, you don't know what happens when conditions aren't identical to the conditions in the benchmark. With GPU and CPU benchmarks, it’s possible for most people to run the standard benchmarks with slightly tweaked conditions. If the results change dramatically for small changes to the conditions, that’s evidence that the vendor is, if not cheating, at least shading the truth.

Benchmarks of physical devices can be more difficult to reproduce. Vehicle crash tests are a prime example of this -- they're highly specific and well-known benchmarks that use up a car for some test runs.

While there are multiple organizations that do crash tests, they each have particular protocols that they follow. Car manufacturers, if so inclined, could optimize their cars for crash test scores instead of actual safety. Checking to see if crash tests are being gamed with hyper-specific optimizations isn't really feasible for someone who isn't a billionaire. The easiest way we can check is by looking at what happens when new tests are added since that lets us see a crash test result that manufacturers weren't optimizing for just to get a good score.

While having car crash test results is obviously better than not having them, the results themselves don't tell us what happens when we get into an accident that doesn't exactly match a benchmark. Unfortunately, if we get into a car accident, we don't get to ask the driver of the vehicle we're colliding with to change their location, angle of impact, and speed, in order for the collision to comply with an IIHS, NHTSA, or *NCAP, test protocol.

For this post, we're going to look at IIHS test scores when they added the (driver side) small overlap and passenger side small overlap tests, which were added in 2012, and 2018, respectively. We'll start with a summary of the results and then discuss what those results mean and other factors to consider when evaluating car safety, followed by details of the methodology.

Results

The ranking below is mainly based on how well vehicles scored when the driver-side small overlap test was added in 2012 and how well models scored when they were modified to improve test results.

  • Tier 1: good without modifications
    • Volvo
  • Tier 2: mediocre without modifications; good with modifications
    • None
  • Tier 3: poor without modifications; good with modifications
    • Mercedes
    • BMW
  • Tier 4: poor without modifications; mediocre with modifications
    • Honda
    • Toyota
    • Subaru
    • Chevrolet
    • Tesla
    • Ford
  • Tier 5: poor with modifications or modifications not made
    • Hyundai
    • Dodge
    • Nissan
    • Jeep
    • Volkswagen

These descriptions are approximations. Honda, Ford, and Tesla are the poorest fits for these descriptions, with Ford arguably being halfway in between Tier 4 and Tier 5 but also arguably being better than Tier 4 and not fitting into the classification and Honda and Tesla not really properly fitting into any category (with their category being the closest fit), but some others are also imperfect. Details below.

General commentary

If we look at overall mortality in the U.S., there's a pretty large age range for which car accidents are the leading cause of death. Although the numbers will vary depending on what data set we look at, when the driver-side small overlap test was added, the IIHS estimated that 25% of vehicle fatalities came from small overlap crashes. It's also worth noting that small overlap crashes were thought to be implicated in a significant fraction of vehicle fatalities at least since the 90s; this was not a novel concept in 2012.

Despite the importance of small overlap crashes, from looking at the results when the IIHS added the driver-side and passenger-side small overlap tests in 2012 and 2018, it looks like almost all car manufacturers were optimizing for benchmark and not overall safety. Except for Volvo, all carmakers examined produced cars that fared poorly on driver-side small overlap crashes until the driver-side small overlap test was added.

When the driver-side small overlap test was added in 2012, most manufacturers modified their vehicles to improve driver-side small overlap test scores. However, until the IIHS added a passenger-side small overlap test in 2018, most manufacturers skimped on the passenger side. When the new test was added, they beefed up passenger safety as well. To be fair to car manufacturers, some of them got the hint about small overlap crashes when the driver-side test was added in 2012 and did not need to make further modifications to score well on the passenger-side test, including Mercedes, BMW, and Tesla (and arguably a couple of others, but the data is thinner in the other cases; Volvo didn't need a hint).

Other benchmark limitations

There are a number of other areas where we can observe that most car makers are optimizing for benchmarks at the expensive of safety.

Gender, weight, and height

Another issue is crash test dummy overfitting. For a long time, adult NHSTA and IIHS tests used a 1970s 50%-ile male dummy, which is 5'9" and 171lbs. Regulators called for a female dummy in 1980 but due to budget cutbacks during the Reagan era, initial plans were shelved and the NHSTA didn't put one in a car until 2003. The female dummy is a scaled down version of the male dummy, scaled down to 5%-ile 1970s height and weight (4'11", 108lbs; another model is 4'11", 97lbs). In frontal crash tests, when a female dummy is used, it's always a passenger (a 5%-ile woman is in the driver's seat in one NHSTA side crash test and the IIHS side crash test). For reference, in 2019, the average weight of a U.S. adult male was 198 lbs and the average weight of a U.S. adult female was 171 lbs.

Using a 1970s U.S. adult male crash test dummy causes a degree of overfitting for 1970s 50%-ile men. For example, starting in the 90s, manufacturers started adding systems to protect against whiplash. Volvo and Toyota use a kind of system that reduces whiplash in men and women and appears to have slightly more benefit for women. Most car makers use a kind of system that reduces whiplash in men but, on average, has little impact on whiplash injuries in women.

It appears that we also see a similar kind of optimization for crashes in general and not just whiplash. We don't have crash test data on this, and looking at real-world safety data is beyond the scope of this post, but I'll note that, until around the time the NHSTA put the 5%-ile female dummy into some crash tests, most car manufacturers not named Volvo had a significant fatality rate differential in side crashes based on gender (with men dying at a lower rate and women dying at a higher rate).

Volvo claims to have been using computer models to simulate what would happen if women (including pregnant women) are involved in a car accident for decades.

Other crashes

Volvo is said to have a crash test facility where they do a number of other crash tests that aren't done by testing agencies. A reason that they scored well on the small overlap tests when they were added is that they were already doing small overlap crash tests before the IIHS started doing small overlap crash tests.

Volvo also says that they test rollovers (the IIHS tests roof strength and the NHSTA computes how difficult a car is to roll based on properties of the car, but neither tests what happens in a real rollover accident), rear collisions (Volvo claims these are especially important to test if there are children in the 3rd row of a 3-row SUV), and driving off the road (Volvo has a "standard" ditch they use; they claim this test is important because running off the road is implicated in a large fraction of vehicle fatalities).

If other car makers do similar tests, I couldn't find much out about the details. Based on crash test scores, it seems like they weren't doing or even considering small overlap crash tests before 2012. Based on how many car makers had poor scores when the passenger side small overlap test was added in 2018, I think it would be surprising if other car makers had a large suite of crash tests they ran that aren't being run by testing agencies, but it's theoretically possible that they do and just didn't include a passenger side small overlap test.

Caveats

We shouldn't overgeneralize from these test results. As we noted above, crash test results test very specific conditions. As a result, what we can conclude when a couple new crash tests are added is also very specific. Additionally, there are a number of other things we should keep in mind when interpreting these results.

Limited sample size

One limitation of this data is that we don't have results for a large number of copies of the same model, so we're unable to observe intra-model variation, which could occur due to minor, effectively random, differences in test conditions as well as manufacturing variations between different copies of same model. We can observe that these do matter since some cars will see different results when two copies of the same model are tested. For example, here's a quote from the IIHS report on the Dodge Dart:

The Dodge Dart was introduced in the 2013 model year. Two tests of the Dart were conducted because electrical power to the onboard (car interior) cameras was interrupted during the first test. In the second Dart test, the driver door opened when the hinges tore away from the door frame. In the first test, the hinges were severely damaged and the lower one tore away, but the door stayed shut. In each test, the Dart’s safety belt and front and side curtain airbags appeared to adequately protect the dummy’s head and upper body, and measures from the dummy showed little risk of head and chest injuries.

It looks like, had electrical power to the interior car cameras not been disconnected, there would have been only one test and it wouldn't have become known that there's a risk of the door coming off due to the hinges tearing away. In general, we have no direct information on what would happen if another copy of the same model were tested.

Using IIHS data alone, one thing we might do here is to also consider results from different models made by the same manufacturer (or built on the same platform). Although this isn't as good as having multiple tests for the same model, test results between different models from the same manufacturer are correlated and knowing that, for example, a 2nd test of a model that happened by chance showed significantly worse results should probably reduce our confidence in other test scores from the same manufacturer. There are some things that complicate this, e.g., if looking at Toyota, the Yaris is actually a re-branded Mazda2, so perhaps that shouldn't be considered as part of a pooled test result, and doing this kind of statistical analysis is beyond the scope of this post.

Actual vehicle tested may be different

Although I don't think this should impact the results in this post, another issue to consider when looking at crash test results is how results are shared between models. As we just saw, different copies of the same model can have different results. Vehicles that are somewhat similar are often considered the same for crash test purposes and will share the same score (only one of the models will be tested).

For example, this is true of the Kia Stinger and the Genesis G70. The Kia Stinger is 6" longer than the G70 and a fully loaded AWD Stinger is about 500 lbs heavier than a base-model G70. The G70 is the model that IIHS tested -- if you look up a Kia Stinger, you'll get scores for a Stinger with a note that a base model G70 was tested. That's a pretty big difference considering that cars that are nominally identical (such as the Dodge Darts mentioned above) can get different scores.

Quality may change over time

We should also be careful not to overgeneralize temporally. If we look at crash test scores of recent Volvos (vehicles on the Volvo P3 and Volvo SPA platforms), crash test scores are outstanding. However, if we look at Volvo models based on the older Ford C1 platform1, crash test scores for some of these aren't as good (in particular, while the S40 doesn't score poorly, it scores Acceptable in some categories instead of Good across the board). Although Volvo has had stellar crash test scores recently, this doesn't mean that they have always had or will always have stellar crash test scores.

Models may vary across markets

We also can't generalize across cars sold in different markets, even for vehicles that sound like they might be identical. For example, see this crash test of a Nissan NP300 manufactured for sale in Europe vs. a Nissan NP300 manufactured for sale in Africa. Since European cars undergo EuroNCAP testing (similar to how U.S. cars undergo NHSTA and IIHS testing), vehicles sold in Europe are optimized to score well on EuroNCAP tests. Crash testing cars sold in Africa has only been done relatively recently, so car manufacturers haven't had PR pressure to optimize their cars for benchmarks and they'll produce cheaper models or cheaper variants of what superficially appear to be the same model. This appears to be no different from what most car manufacturers do in the U.S. or Europe -- they're optimizing for cost as long as they can do that without scoring poorly on benchmarks. It's just that, since there wasn't an African crash test benchmark, that meant they could go all-in on the cost side of the cost-safety tradeoff2.

This report compared U.S. and European car models and found differences in safety due to differences in regulations. They found that European models had lower injury risk in frontal/side crashes and that driver-side mirrors were designed in a way that reduced the risk of lane-change crashes relative to U.S. designs and that U.S. vehicles were safer in rollovers and had headlamps that made pedestrians more visible.

Non-crash tests

Over time, more and more of the "low hanging fruit" from crash safety has been picked, making crash avoidance relatively more important. Tests of crash mitigation are relatively primitive compared to crash tests and we've seen that crash tests had and have major holes. One might expect, based on what we've seen with crash tests, that Volvo has a particularly good set of tests they use for their crash avoidance technology (traction control, stability control, automatic braking, etc.), but I don't know of any direct evidence for that.

Crash avoidance becoming more important might also favor Tesla, since they seem more aggressive about pushing software updates (so people wouldn't have to buy a newer model to get improved crash avoidance) and it's plausible that they use real-world data from their systems to inform crash avoidance in a way that most car companies don't, but I also don't know of any direct evidence of this.

Scores of vehicles of different weights aren't comparable

A 2700lb subcompact vehicle that scores Good may fare worse than a 5000lb SUV that scores Acceptable. This is because the small overlap tests involve driving the vehicle into a fixed obstacle, as opposed to a reference vehicle or vehicle-like obstacle of a specific weight. This is, in some sense, equivalent to crashing the vehicle into a vehicle of the same weight, so it's as if the 2700lb subcompact was tested by running it into a 2700lb subcompact and the 5000lb SUV was tested by running it into another 5000 lb SUV.

How to increase confidence

We've discussed some reasons we should reduce our confidence in crash test scores. If we wanted to increase our confidence in results, we could look at test results from other test agencies and aggregate them and also look at public crash fatality data (more on this later). I haven't looked at the terms and conditions of scores from other agencies, but one complication is that the IIHS does not allow you to display the result of any kind of aggregation if you use their API or data dumps (I, time consumingly, did not use their API for this post because of that).

Using real life crash data

Public crash fatality data is complex and deserves its own post. In this post, I'll note that, if you look at the easiest relevant data for people in the U.S., this data does not show that Volvos are particularly safe (or unsafe). For example, if we look at this report from 2017, which covers models from 2014, two Volvo models made it into the report and both score roughly middle of the pack for their class. In the previous report, one Volvo model is included and it's among the best in its class, in the next, one Volvo model is included and it's among the worst in its class. We can observe this kind of variance for other models, as well. For example, among 2014 models, the Volkswagen Golf had one of the highest fatality rates for all vehicles (not just in its class). But among 2017 vehicles, it had among the lowest fatality rates for all vehicles. It's unclear how much of that change is from random variation and how much is because of differences between a 2014 and 2017 Volkswagen Golf.

Overall, it seems like noise is a pretty important factor in results. And if we look at the information that's provided, we can see a few things that are odd. First, there are a number of vehicles where the 95% confidence interval for the fatality rate runs from 0 to N. We should have pretty strong priors that there was no 2014 model vehicle that was so safe that the probability of being killed in a car accident was zero. If we were taking a Bayesian approach (though I believe the authors of the report are not), and someone told us that the uncertainty interval for the true fatality rate of a vehicle had a >= 5% of including zero, we would say that either we should use a more informative prior or we should use a model that can incorporate more data (in this case, perhaps we could try to understand the variance between fatality rates of different models in the same class and then use the base rate of fatalities for the class as a prior, or we could incorporate information from other models under the same make if those are believed to be correlated).

Some people object to using informative priors as a form of bias laundering, but we should note that the prior that's used for the IIHS analysis is not completely uninformative. All of the intervals reported stop at zero because they're using the fact that a vehicle cannot create life to bound the interval at zero. But we have information that's nearly as strong that no 2014 vehicle is so safe that the expected fatality rate is zero, using that information is not fundamentally different from capping the interval at zero and not reporting negative numbers for the uncertainty interval of the fatality rate.

Also, the IIHS data only includes driver fatalities. This is understandable since that's the easiest way to normalize for the number of passengers in the car, but it means that we can't possibly see the impact of car makers not improving passenger small-overlap safety until the passenger-side small overlap test was added in 2018, the result of lack of rear crash testing for the case Volvo considers important (kids in the back row of a 3rd row SUV), etc.

We can also observe that, in the IIHS analysis, many factors that one might want to control for aren't (e.g., miles driven isn't controlled for, which will make trucks look relatively worse and luxury vehicles look relatively better, rural vs. urban miesl driven also isn't controlled for, which will also have the same directional impact). One way to see that the numbers are heavily influenced by confounding factors is by looking at AWD or 4WD vs. 2WD versions of cars. They often have wildly different fatalty rates even though the safety differences are not very large (and the difference is often in favor of the 2WD vehicle). Some plausible causes of that are random noise, differences in who buys different versions of the same vehicle, and differences in how the vehicle are used.

If we'd like to answer the question "which car makes or models are more or less safe", I don't find any of the aggregations that are publicly available to be satisfying and I think we need to look at the source data and do our own analysis to see if the data are consistent with what we see in crash test results.

Conclusion

We looked at 12 different car makes and how they fared when the IIHS added small overlap tests. We saw that only Volvo was taking this kind of accident seriously before companies were publicly shamed for having poor small overlap safety by the IIHS even though small overlap crashes were known to be a significant source of fatalities at least since the 90s.

Although I don't have the budget to do other tests, such as a rear crash test in a fully occupied vehicle, it appears plausible and perhaps even likely that most car makers that aren't Volvo would have mediocre or poor test scores if a testing agency decided to add another kind of crash test.

Bonus: "real engineering" vs. programming

As Hillel Wayne has noted, although programmers often have an idealized view of what "real engineers" do, when you compare what "real engineers" do with what programmers do, it's frequently not all that different. In particular, a common lament of programmers is that we're not held liable for our mistakes or poor designs, even in cases where that costs lives.

Although automotive companies can, in some cases, be held liable for unsafe designs, just optimizing for a small set of benchmarks, which must've resulted in extra deaths over optimizing for safety instead of benchmark scores, isn't something that engineers or corporations were, in general, held liable for.

Bonus: reputation

If I look at what people in my extended social circles think about vehicle safety, Tesla has the best reputation by far. If you look at broad-based consumer polls, that's a different story, and Volvo usually wins there, with other manufacturers fighting for a distant second.

I find the Tesla thing interesting since their responses are basically the opposite of what you'd expect from a company that was serious about safety. When serious problems have occurred (with respect to safety or otherwise), they often have a very quick response that's basically "everything is fine". I would expect an organization that's serious about safety or improvement to respond with "we're investigating", followed by a detailed postmortem explaining what went wrong, but that doesn't appear to be Tesla's style.

For example, on the driver-side small overlap test, Tesla had one model with a relevant score and it scored Acceptable (below Good, but above Poor and Marginal) even after modifications were made to improve the score. Tesla disputed the results, saying they make "the safest cars in history" and implying that IIHS should be ignored in favor of NHSTA test scores:

While IIHS and dozens of other private industry groups around the world have methods and motivations that suit their own subjective purposes, the most objective and accurate independent testing of vehicle safety is currently done by the U.S. Government which found Model S and Model X to be the two cars with the lowest probability of injury of any cars that it has ever tested, making them the safest cars in history.

As we've seen, Tesla isn't unusual for optimizing for a specific set of crash tests and achieving a mediocre score when an unexpected type of crash occurs, but their response is unusual. However, it makes sense from a cynical PR perspective. As we've seen over the past few years, loudly proclaiming something, regardless of whether or not it's true, even when there's incontrovertible evidence that it's untrue, seems to not only work, that kind of bombastic rhetoric appears to attract superfans who will aggressively defend the brand. If you watch car reviewers on youtube, they'll sometimes mention that they get hate mail for reviewing Teslas just like they review any other car and that they don't see anything like it for any other make.

Apple also used this playbook to good effect in the 90s and early '00s, when they were rapidly falling behind in performance and responded not by improving performance, but by running a series of ad campaigns saying that had the best performance in the world and that they were shipping "supercomputers" on the desktop.

Another reputational quirk is that I know a decent number of people who believe that the safest cars they can buy are "American Cars from the 60's and 70's that aren't made of plastic". We don't have directly relevant small overlap crash test scores for old cars, but the test data we do have on old cars indicates that they fare extremely poorly in overall safety compared to modern cars. For a visually dramatic example, see this crash test of a 1959 Chevrolet Bel Air vs. a 2009 Chevrolet Malibu.

Appendix: methodology summary

The top-line results section uses scores for the small overlap test both because it's the one where I think it's the most difficult to justify skimping on safety as measured by the test and it's also been around for long enough that we can see the impact of modifications to existing models and changes to subsequent models, which isn't true of the passenger side small overlap test (where many models are still untested).

For the passenger side small overlap test, someone might argue that the driver side is more important because you virtually always have a driver in a car accident and may or may not have a front passenger. Also, for small overlap collisions (which simulates a head-to-head collision where the vehicles only overlap by 25%), driver's side collisions are more likely than passenger side collisions.

Except to check Volvo's scores, I didn't look at roof crash test scores (which were added in 2009). I'm not going to describe the roof test in detail, but for the roof test, someone might argue that the roof test score should be used in conjunction with scoring the car for rollover probability since the roof test just tests roof strength, which is only relevant when a car has rolled over. I think, given what the data show, this objection doesn't hold in many cases (the vehicles with the worst roof test scores are often vehicles that have relatively high rollover rates), but it does in some cases, which would complicate the analysis.

In most cases, we only get one reported test result for a model. However, there can be multiple versions of a model -- including before and after making safety changes intended to improve the test score. If changes were made to the model to improve safety, the test score is usually from after the changes were made and we usually don't get to see the score from before the model was changed. However, there are many exceptions to this, which are noted in the detailed results section.

For this post, scores only count if the model was introduced before or near when the new test was introduced, since models introduced later could have design changes that optimize for the test.

Appendix: detailed results

On each test, IIHS gives an overall rating (from worst to best) of Poor, Marginal, Acceptable, or Good. The tests have sub-scores, but we're not going to use those for this analysis. In each sub-section, we'll look at how many models got each score when the small overlap tests were added.

Volvo

All Volvo models examined scored Good (the highest possible score) on the new tests when they were added (roof, driver-side small overlap, and passenger-side small overlap). One model, the 2008-2017 XC60, had a change made to trigger its side curtain airbag during a small overlap collision in 2013. Other models were tested without modifications.

Mercedes

Of three pre-existing models with test results for driver-side small overlap, one scored Marginal without modifications and two scored Good after structural modifications. The model where we only have unmodified test scores (Mercedes C-Class) was fully re-designed after 2014, shortly after the driver-side small overlap test was introduced.

As mentioned above, we often only get to see public results for models without modifications to improve results xor with modifications to improve results, so, for the models that scored Good, we don't actually know how they would've scored if you bought a vehicle before Mercedes updated the design, but the Marginal score from the one unmodified model we have is a negative signal.

Also, when the passenger side small overlap test was added, the Mercedes vehicles also generally scored Good. This is, indicating that Mercedes didn't only increase protection on the driver's side in order to improve test scores.

BMW

Of the two models where we have relevant test scores, both scored Marginal before modifications. In one of the cases, there's also a score after structural changes were made in the 2017 model (recall that the driver-side small overlap test was introduced in 2012) and the model scored Good afterwards. The other model was fully-redesigned after 2016.

For the five models where we have relevant passenger-side small overlap scores, all scored Good, indicating that the changes made to improve driver-side small overlap test scores weren't only made on the driver's side.

Honda

Of the five Honda models where we have relevant driver-side small overlap test scores, two scored Good, one scored Marginal, and two scored Poor. The model that scored Marginal had structural changes plus a seatbelt change in 2015 that changed its score to Good, other models weren't updated or don't have updated IIHS scores.

Of the six Honda models where we have passenger driver-side small overlap test scores, two scored Good without modifications, two scored Acceptable without modifications, and one scored Good with modifications to the bumper.

All of those models scored Good on the driver side small overlap test, indicating that when Honda increased the safety on the driver's side to score Good on the driver's side test, they didn't apply the same changes to the passenger side.

Toyota

Of the six Toyota models where we have relevant driver-side small overlap test scores for unmodified models, one score Acceptable, four scored Marginal, and one scored Poor.

The model that scored Acceptable had structural changes made to improve its score to Good, but on the driver's side only. The model was later tested in the passenger-side small overlap test and scored Acceptable. Of the four models that scored Marginal, one had structural modifications made in 2017 that improved its score to Good and another had airbag and seatbelt changes that improved its score to to Acceptable. The vehicle that scored Poor had structural changes made that improved its score to acceptable in 2014, followed by later changes that improved its score to Good.

There are four additional models where we only have scores from after modifications were made. Of those, one scored Good, one score Acceptable, one scored Marginal, and one scored Poor.

In general, changes appear to have been made to the driver's side only and, on introduction of the passenger side small overlap test, vehicles had passenger side small overlap scores that were the same as the driver's side score before modifications.

Ford

Of the two models with relevant driver-side small overlap test scores for unmodified models, one scored Marginal and one scored Poor. Both of those models were produced into 2019 and neither has an updated test result. Of the three models where we have relevant results for modified vehicles, two scored Acceptable and one score Marginal. Also, one model was released the year the small overlap test was introduced and one the year after; both of those scored Acceptable. It's unclear if those should be considered modified or not since the design may have had last-minute changes before release.

We only have three relevant passenger-side small overlap tests. One is Good (for a model released in 2015) and the other two are Poor; these are the two models mentioned above as having scored Marginal and Poor, respectively, on the driver-side small overlap test. It appears that the models continued to be produced into 2019 without safety changes. Both of these unmodified models were trucks and this isn't very unusual for a truck and is one of a number of reasons that fatality rates are generally higher in trucks -- until recently, many of them are based on old platforms that hadn't been updated for a long time.

Chevrolet

Of the three Chevrolet models where we have relevant driver-side small overlap test scores before modifications, one scored Acceptable and two scored Marginal. One of the Marginal models had structural changes plus a change that caused side curtain airbags to deploy sooner in 2015, which improved its score to Good.

Of the four Chevrolet models where we only have relevant driver-side small overlap test scores after the model was modified (all had structural modifications), two scored Good and two scored Acceptable.

We only have one relevant score for the passenger-side small overlap test, that score is Marginal. That's on the model that was modified to improve its driver-side small overlap test score from Marginal to Good, indicating that the changes were made to improve the driver-side test score and not to improve passenger safety.

Subaru

We don't have any models where we have relevant passenger-side small overlap test scores for models before they were modified.

One model had a change to cause its airbag to deploy during small overlap tests; it scored Acceptable. Two models had some kind of structural changes, one of which scored Good and one of which score Acceptable.

The model that had airbag changes had structural changes made in 2015 that improved its score from Acceptable to Good.

For the one model where we have relevant passenger-side small overlap test scores, the score was Marginal. Also, for one of the models with structural changes, it was indicated that, among the changes, were changes to the left part of the firewall, indicating that changes were made to improve the driver's side test score without improving safety for a passenger on a passenger-side small overlap crash.

Tesla

There's only one model with relevant results for the driver-side small overlap test. That model scored Acceptable before and after modifications were made to improve test scores.

Hyundai

Of the five vehicles where we have relevant driver-side small overlap test scores, one scored Acceptable, three scored Marginal, and one scored Poor. We don't have any indication that models were modified to improve their test scores.

Of the two vehicles where we have relevant passenger-side small overlap test scores for unmodified models, one scored Good and one scored Acceptable.

We also have one score for a model that had structural modifications to score Acceptable, which later had further modifications that allowed it to score Good. That model was introduced in 2017 and had a Good score on the driver-side small overlap test without modifications, indicating that it was designed to achieve a good test score on the driver's side test without similar consideration for a passenger-side impact.

Dodge

Of the five models where we have relevant driver-side small overlap test scores for unmodified models, two scored Acceptable, one scored Marginal, and two scored Poor. There are also two models where we have test scores after structural changes were made for safety in 2015; both of those models scored Marginal.

We don't have relevant passenger-side small overlap test scores for any model, but even if we did, the dismal scores on the modified models means that we might not be able to tell if similar changes were made to the passenger side.

Nissan

Of the seven models where we have relevant driver-side small overlap test scores for unmodified models, two scored Acceptable and five scored Poor.

We have one model that only has test scores for a modified model; the frontal airbags and seatbelts were modified in 2013 and the side curtain airbags were modified in 2017. The score afterward modifications was Marginal.

One of the models that scored Poor had structural changes made in 2015 that improved its score to Good.

Of the four models where we have relevant passenger-side small overlap test scores, two scored Good, one scored Acceptable (that model scored good on the driver-side test), and one score Marginal (that model also scored Marginal on the driver-side test).

Jeep

Of the two models where we have relevant driver-side small overlap test scores for unmodified models, one scored Marginal and one scored Poor.

There's one model where we only have test score after modifications; that model has changes to its airbags and seatbelts and it scored Marginal after the changes. This model was also later tested on the passenger-side small overlap test and scored Poor.

One other model has a relevant passenger-side small overlap test score; it scored Good.

Volkswagen

The two models where we have relevant driver-side small overlap test scores for unmodified models both scored Marginal.

Of the two models where we only have scores after modifications, one was modified 2013 and scored Marginal after modifications. It was then modified again in 2015 and scored Good after modifications. That model was later tested on the passenger side small-overlap test, where it scored Acceptable, indicating that the modifications differentially favored the driver's side. The other scored Acceptable after changes made in 2015 and then scored Good after further changes made in 2016. The 2016 model was later tested on the passenger-side small overlap test and scored Marginal, once again indicating that changes differentially favored the driver's side.

We have passenger-side small overlap test for two other models, both of which scored Acceptable. These were models introduced in 2015 (well after the introduction of the driver-side small overlap test) and scored Good on the driver-side small overlap test.

Appendix: miscellania

A number of name brand car makes weren't included. Some because they have relatively low sales in the U.S. are low and/or declining rapidly (Mitsubishi, Fiat, Alfa Romeo, etc.), some because there's very high overlap in what vehicles are tested (Kia, Mazda, Audi), and some because there aren't relevant models with driver-side small overlap test scores (Lexus). When a corporation owns an umbrella of makes, like FCA with Jeep, Dodge, Chrysler, Ram, etc., these weren't pooled since most people who aren't car nerds aren't going to recognize FCA, but may recognize Jeep, Dodge, and Chrysler.

If the terms of service of the API allowed you to use IIHS data however you wanted, I would've included smaller makes, but since the API comes with very restrictive terms on how you can display or discuss the data which aren't compatible with exploratory data analysis and I couldn't know how I would want to display or discuss the data before looking at the data, I pulled all of these results by hand (and didn't click through any EULAs, etc.), which was fairly time consuming, so there was a trade-off between more comprehensive coverage and the rest of my life.

Appendix: what car should I buy?

That depends on what you're looking for, there's no way to make a blanket recommendation. For practical information about particular vehicles, Alex on Autos is the best source that I know of. I don't generally like videos as a source of practical information, but car magazines tend to be much less informative than youtube car reviewers. There are car reviewers that are much more popular, but their popularity appears to come from having witty banter between charismatic co-hosts or other things that not only aren't directly related to providing information, they actually detract from providing information. If you just want to know about how cars work, Engineering Explained is also quite good, but the information there is generally practical.

For reliability information, Consumer Reports is probably your best bet (you can also look at J.D. Power, but the way they aggregate information makes it much less useful to consumers).

Thanks to Leah Hanson, Travis Downs, Prabin Paudel, and Justin Blank for comments/corrections/discussion


  1. this includes the 2004-2012 Volvo S40/V50, 2006-2013 Volvo C70, and 2007-2013 Volvo C30, which were designed during the period when Ford owned Volvo. Although the C1 platform was a joint venture between Ford, Volvo, and Mazda engineers, the work was done under a Ford VP at a Ford facility. [return]
  2. to be fair, as we saw with the IIHS small overlap tests, not every manufacturer did terribly. In 2017 and 2018, 8 vehicles sold in Africa were crash tested. One got what we would consider a mediocre to bad score in the U.S. or Europe, five got what we would consider to be a bad score, and "only" three got what we would consider to be an atrocious score. The Nissan NP300, Datsun Go, and Cherry QQ3 were the three vehicles that scored the worst. Datsun is a sub-brand of Nissan and Cherry is a Chinese brand, also known as Qirui.

    We see the same thing if we look at cars sold in India. Recently, some tests have been run on cars sent to the Indian market and a number of vehicles from Datsun, Renault, Chevrolet, Tata, Honda, Hyundai, Suzuki, Mahindra, and Volkswagen came in with atrocious scores that would be considered impossibly bad in the U.S. or Europe.

    [return]

June 29, 2020

Phil Hagelberg (technomancy)

in which a compiler takes steps towards strapping its boots June 29, 2020 11:30 PM

One of the biggest milestones in a programming language is when the language gets to the point where it can be used to write its own implementation, which is called self-hosting. This is seen as a sign of maturity since reaching this point requires getting a lot of common problems shaken out first.

The compiler for the Fennel programming language was written using Lua, and it emits Lua code as output. Over time, certain parts of the compiler were added that were written in Fennel, starting with fennelview, which is the pretty-printer for Fennel data structures. Once the macro system stabilized, many built-in forms that had originally been hard-coded into the compiler using Lua got ported to the macro system. After that the REPL was ported to Fennel as a relatively independent piece of code, followed by the command-line launcher script and a helper module to explain and identify compiler errors. The parser had already seen an impressive port to Fennel using a literate programming approach, but we hadn't incorporated this into the mainline repository yet because the literate approach made it a bit tricky to bring in.

As you might expect, any attempt at self-hosting can easily run into "chicken or egg" problems—how do you use the language to write the implementation if the language hasn't been finished being defined yet? Sometimes this requires simply limiting yourself to a subset; for instance, the built-in macros in Fennel cannot themselves use any macros but must be written in a macroless subset of Fennel. In other cases, such as the launcher, we keep a copy of the old pre-self-hosted version around in order to build the new version.

lake union/lake washington canal

That's about as far as we could get on the path to self-hosting without changing the approach, because most of the remaining code was fairly entangled, and we didn't have clear boundaries to port it one piece at a time. At this stage there were 2250 lines of Lua and 1113 lines of Fennel. I recently took some time to reorganize the compiler into four independent "pseudo-modules" with clear dependencies between the pieces. But even with the independent modules broken out, we were still looking at porting 800 lines of intricate compiler code and 900 lines of special forms all in two fell swoops.

That's when I started to consider an alternate approach. The Fennel compiler takes Fennel code as input and produces Lua code as output. We have a big pile of Lua code in the compiler that we want turned into Fennel code. What if we could reverse the process? That's when Antifennel was born.

(fn early-return [compile {: arguments}]
  (let [args (map arguments compile)]
    (if (any-complex-expressions? arguments 1)
        (early-return-complex compile args)
        (list (sym :lua)
              (.. "return " (table.concat (map args view) ", "))))))

(fn binary [compile {: left : right : operator} ast]
  (let [operators {:== := "~=" :not= "#" :length "~" :bnot}]
    (list (sym (or (. operators operator) operator))
          (compile left)
          (compile right))))

Antifennel takes Lua code and parses[1] it, then walks the abstract syntax tree of Lua and builds up an abstract syntax tree of Fennel code based on it. I had to add some features to fnlfmt, the formatter for Fennel, in order to get the output to look decent, but the overall approach is overall rather straightforward since Fennel and Lua have a great deal of overlap in their semantics.

The main difficulties came from supporting features which are present in the Lua language but not in Fennel. Fennel omits somethings which are normal in Lua, usually because the code becomes easier to understand if you can guarantee certain things never happen. For instance, when you read a Fennel function, you don't have to think about where in the code the possible return values can be found; these can only occur in tail positions because there is no early return. But Lua allows you to return (almost) anywhere in the function!

Fennel has one "secret" feature to help with this: the lua special form:

(lua "return nextState, value")

Included specifically to make the task of porting existing code easier, the lua form allows you to emit Lua code directly without the compiler checking its validity. This is an "escape hatch" that can allow you to port Lua code as literally as possible first, then come back once you have it working and clean up the ugly bits once you have tests and things in place. It's not pretty, but it's a practical compromise that can help you get things done.

Unfortunately it's not quite as simple as just calling (lua "return x"), because if you put this in the output every time there's a return in the Lua code, most of it will be in the tail position. But Fennel doesn't understand that the lua call is actually a return value; it thinks that it's just a side-effect, and it will helpfully insert a return nil after it for consistency. In order to solve this I needed to track which returns occurred in the tail position and which were early returns, so I could use normal Fennel methods for the tail ones and use this workaround hack only for early returns[2]. But that ended up being easier than it sounds.

Other incompatibilities were the lack of a break form (which could easily be addressed with the (lua "break") hack because it only happens in a non-tail position), the lack of repeat form (compiled into a while with a break at the end), and the fact that locals default to being immutable in Fennel and mutability is opt-in. This last one I am currently handling by emitting all locals as var regardless of whether they are mutated or not, but I plan on adding tracking in to allow the compiler to emit the appropriate declaration based on how it's used.

While it's still too early to swap out the canonical implementation of the Fennel compiler, the Antifennel-compiled version works remarkably well, passing the entire language test suite across every supported version of the Lua runtime at 79% the length of the Lua version. I'm looking forward to finishing the job and making the Fennel codebase written purely using Fennel itself.


[1] Antifennel uses the parser from the LuaJIT Language Toolkit, which is another self-hosted compiler that takes Lua code as input and emits LuaJIT bytecode without requiring any C code to be involved. (Of course, in order to run the bytecode, you have to use the full LuaJIT VM, which is mostly written in C.) I had to make one small change to the parser in order to help it "mangle" identifiers that were found to conflict with built-in special forms and macros in Fennel, but other than that it worked great with no changes. The first big test of Antifennel was making sure it could compile its own parser dependency from Lua into Fennel, which it could do on the second day.

[2] Even that is a slight oversimplification, because the lua return hack only works on literals and identifiers, not complex expressions. When a complex expression is detected being returned, we compile it to a wrapping let expression and only pass in the bound local name to the return.

June 28, 2020

Ponylang (SeanTAllen)

Last Week in Pony - June 28, 2020 June 28, 2020 03:52 PM

Ponyup and ponylang-mode have new releases.

Bogdan Popa (bogdan)

Deploying Racket Web Apps June 28, 2020 12:00 PM

Someone recently asked about how to deploy Racket web apps on the Racket Slack. The most common answers were install Racket on the target machine, then ship your code there or use Docker (basically a “portable” variant of option 1). I wanted to take a few minutes today and write about my preferred way of deploying Racket apps: build an executable with the application code, libraries and assets embedded into it and ship that around.

June 23, 2020

Marc Brooker (mjb)

Code Only Says What it Does June 23, 2020 12:00 AM

Code Only Says What it Does

Only loosely related to what it should do.

Code says what it does. That's important for the computer, because code is the way that we ask the computer to do something. It's OK for humans, as long as we never have to modify or debug the code. As soon as we do, we have a problem. Fundamentally, debugging is an exercise in changing what a program does to match what it should do. It requires us to know what a program should do, which isn't captured in the code. Sometimes that's easy: What it does is crash, what it should do is not crash. Outside those trivial cases, discovering intent is harder.

Debugging when should do is subtle, such as when building distributed systems protocols, is especially difficult. In our Millions of Tiny Databases paper, we say:

Our code reviews, simworld tests, and design meetings frequently referred back to the TLA+ models of our protocols to resolve ambiguities in Java code or written communication.

The problem is that the implementation (in Physalia's case the Java code) is both an imperfect implementation of the protocol, and an overly-specific implementation of the protocol. It's overly-specific because it needs to be fully specified. Computers demand that, and no less, while the protocol itself has some leeway and wiggle room. It's also overly-specific because it has to address things like low-level performance concerns that the specification can't be bothered with.

Are those values in an ArrayList because order is actually important, or because O(1) random seeks are important, or some other reason? Was it just the easiest thing to write? What happens when I change it?

Business logic code, while lacking the cachet of distributed protocols, have even more of these kinds of problems. Code both over-specifies the business logic, and specifies it inaccurately. I was prompted to write this by a tweet from @mcclure111 where she hits the nail on the head:

This is a major problem with code: You don't know which quirks are load-bearing. You may remember, or be able to guess, or be able to puzzle it out from first principles, or not care, but all of those things are slow and error-prone. What can we do about it?

Design Documentation

Documentation is uncool. Most software engineers seem to come out of school thinking that documentation is below them (tech writer work), or some weird thing their SE professor talked about that is as archaic as Fortran. Part of this is understandable. My own software engineering courses emphasized painstakingly documenting the implementation in UML. No other mention of documentation was made. Re-writing software in UML helps basically nobody. I finished my degree thinking that documentation was unnecessary busywork. Even the Agile Manifesto agreed with me1:

Working software over comprehensive documentation

What I discovered later was that design documentation, encoding the intent and decisions made during developing a system, helps teams be successful in the short term, and people be successful in the long term. Freed from fitting everything in my head, emboldened by the confidence that I could rediscover forgotten facts later, I could move faster. The same applies to teams.

One thing I see successful teams doing is documenting not only the what and why behind their designs, but the how they decided. When it comes time to make changes to the system—either for debugging or in response to changing requirements—these documents are invaluable. It's hard to decide whether its safe to change something, when you don't know why it's like that in the first place. The record of how you decided is important because you are a flawed human, and understanding how you came to a decision is useful to know when that decision seems strange, or surprising.

This documentation process doesn't have to be heavyweight. You don't have to draw painstaking ER diagrams unless you think they are helpful. You should probably ignore UML entirely. Instead, describe the system in prose as clearly and succinctly as you can. One place to start is by building an RFC template for your team, potentially inspired by one that you find on the web. SquareSpace's template seems reasonable. Some designs will fit well into that RFC format, other's won't. Prefer narrative writing where you can.

Then, keep the documents. Store them somewhere safe. Soak them in vinegar and tie them around your chest. You're going to want to make sure that the people who need to maintain the system can find them. As they are spelunking through history, help them feel more like a library visitor and less like Lara Croft.

I'm not advocating for Big Design Up Front. Many of the most important things we learn about a project we learn during the implementation. Some of the most important things we learn years after the implementation is complete. Design documentation isn't a static one-time ahead-of-time deliverable, but an ongoing process. Most importantly, design documentation is not a commitment to bad ideas. If it's wrong, fix it and move forward. Documentation is not a deal with the devil.

Comments

Few topics invite a programmer flame war like comments. We're told that comments are silly, or childish, or make it hard to show how manly you are in writing that convoluted mess of code. If it was hard to write, it should be hard to read. After all, you're the James Joyce of code.

That silliness aside, back to @mcclure111's thread:

Comments allow us to encode authorial intent into our code in a way that programming languages don't always. Types, traits, interfaces, and variable names do put intent into code, but not completely (I see you, type system maximalists). These same things allow us to communicate a lack of intent—consider RandomAccess vs ArrayList—but are also incomplete. Well-commented code should make the intent of the author clear, especially in cases where that intent is either lost in the translation to code, or where implementation constraints hide the intent of the design. Code comments that link back to design documents are especially useful.

Some languages need comments more than others. Some, like SQL, I find to nearly always obscure the intent of the design behind implementation details.

Formal Specification

In Who Builds a House Without Drawing Blueprints? Leslie Lamport writes:

The need for specifications follows from two observations. The first is that it is a good idea to think about what we are going to do before doing it, and as the cartoonist Guindon wrote: "Writing is nature's way of letting you know how sloppy your thinking is."

The second observation is that to write a good program, we need to think above the code level.

I've found that specification, from informal specification with narrative writing to formal specification with TLA+, makes writing programs faster and helps reduce mistakes. As much as I like that article, I think Lamport misses a key part of the value of formal specification: it's a great communication tool. In developing some of the trickiest systems I've built, I've found that heavily-commented formal specifications are fantastically useful documentation. Specification languages are all about intent, and some make it easy to clearly separate intent from implementation.

Again, from our Millions of Tiny Databases paper:

We use TLA+ extensively at Amazon, and it proved exceptionally useful in the development of Physalia. Our team used TLA+ in three ways: writing specifications of our protocols to check that we understand them deeply, model checking specifications against correctness and liveness properties using the TLC model checker, and writing extensively commented TLA+ code to serve as the documentation of our distributed protocols. While all three of these uses added value, TLA+’s role as a sort of automatically tested (via TLC),and extremely precise, format for protocol documentation was perhaps the most useful.

Formal specifications make excellent documentation. Like design docs, they aren't immutable artifacts, but a reflection of what we have learned about the problem.

Conclusion

Building long-lasting, maintainable, systems requires not only communicating with computers, but also communicating in space with other people, and in time with our future selves. Communicating, recording, and indexing the intent behind our designs is an important part of that picture. Make time for it, or regret it later.

Footnotes

  1. To be charitable to the Agile folks, comprehensive does seem to be load-bearing.

June 22, 2020

Henry Robinson (henryr)

Network Load Balancing with Maglev June 22, 2020 07:52 PM

Maglev: A Fast and Reliable Software Network Load Balancer Eisenbud et. al., NSDI 2016 [paper] Load balancing is a fundamental primitive in modern service architectures - a service that assigns requests to servers so as to, well, balance the load on each server. This improves resource utilisation and ensures that servers aren’t unnecessarily overloaded. Maglev is - or was, sometime before 2016 - Google’s network load-balancer that managed load-balancing duties for search, GMail and other high-profile Google services.

June 21, 2020

Derek Jones (derek-jones)

How should involved if-statement conditionals be structured? June 21, 2020 10:43 PM

Which of the following two if-statements do you think will be processed by readers in less time, and with fewer errors, when given the value of x, and asked to specify the output?

// First - sequence of subexpressions
if (x > 0 && x  10 || x > 20 && x  30)
   print("a");
else
   print "b");

// Second - nested ifs
if (x > 0 && x  10)
   print("c");
else if (x > 20 && x  30)
   print("d");
else
   print("e");

Ok, the behavior is not identical, in that the else if-arm produces different output than the preceding if-arm.

The paper Syntax, Predicates, Idioms — What Really Affects Code Complexity? analyses the results of an experiment that asked this question, including more deeply nested if-statements, the use of negation, and some for-statement questions (this post only considers the number of conditions/depth of nesting components). A total of 1,583 questions were answered by 220 professional developers, with 415 incorrect answers.

Based on the coefficients of regression models fitted to the results, subjects processed the nested form both faster and with fewer incorrect answers (code+data). As expected performance got slower, and more incorrect answers given, as the number of intervals in the if-condition increased (up to four in this experiment).

I think short-term memory is involved in this difference in performance; or at least I can concoct a theory that involves a capacity limited memory. Comprehending an expression (such as the conditional in an if-statement) requires maintaining information about the various components of the expression in working memory. When the first subexpression of x > 0 && x 10 || x > 20 && x 30 is false, and the subexpression after the || is processed, there is no now forget-what-went-before point like there is for the nested if-statements. I think that the single expression form is consuming more working memory than the nested form.

Does the result of this experiment (assuming it is replicated) mean that developers should be recommended to write sequences of conditions (e.g., the first if-statement example) about as:

if (x > 0 && x  10)
   print("a");
else if (x > 20 && x  30)
   print("a");
else
   print("b");

Duplicating code is not good, because both arms have to be kept in sync; ok, a function could be created, but this is extra effort. As other factors are taken into account, the costs of the nested form start to build up, is the benefit really worth the cost?

Answering this question is likely to need a lot of work, and it would be a more efficient use of resources to address questions about more commonly occurring conditions first.

A commonly occurring use is testing a single range; some of the ways of writing the range test include:

if (x > 0 && x  10) ...

if (0  x && x  10) ...

if (10 > x && x > 0) ...

if (x > 0 && 10 > x) ...

Does one way of testing the range require less effort for readers to comprehend, and be more likely to be interpreted correctly?

There have been some experiments showing that people are more likely to give correct answers to questions involving information expressed as linear syllogisms, if the extremes are at the start/end of the sequence, such as in the following:

     A is better than B
     B is better than C

and not the following (which got the lowest percentage of correct answers):

     B is better than C
     B is worse than A

Your author ran an experiment to find out whether developers were more likely to give correct answers for particular forms of range tests in if-conditions.

Out of a total of 844 answers, 40 were answered incorrectly (roughly one per subject; it was a paper and pencil experiment, so no timings). It's good to see that the subjects were so competent, but with so few mistakes made the error bars are very wide, i.e., too few mistakes were made to be able to say that one representation was less mistake-prone than another.

I hope this post has got other researchers interested in understanding developer performance, when processing if-statements, and that they will be running more experiments help shed light on the processes involved.

Ponylang (SeanTAllen)

Last Week in Pony - June 21, 2020 June 21, 2020 05:11 PM

We have new releases for ponyup, ponylang-mode, and release-notes-bot-action.

June 19, 2020

Pierre Chapuis (catwell)

[Quora] Transparency in distributed systems UX June 19, 2020 06:52 PM

Yet another Quora answer, this time to with this question which I answered on August 24, 2016:

Why is Transparency a major issue in distributed databases?


First, a few words about what "transparency" is. Transparency is a UX term about the user not noticing that they are using a distributed system. We actually talk about transparencies in the plural, because there are several kinds of transparencies: fault transparency, location transparency, concurrency transparency, access transparency, etc. In my opinion, the choice of wording is not so good: when we talk about "transparency" we actually mean we hide things from the user, so "opacity" would be more logical. (The name comes from the fact that, if we replace a single node system by a distributed system, it will be transparent for the user, i.e. then will not notice it.)

The reason why transparency is important is usability. The more transparencies our system has, the less cognitive burden there is on the user. In other words: transparencies simplify the API of the system.

However, what transparencies we implement or not is a trade-off between that simplicity of API and things like flexibility, performance, and sometimes correctness. Years ago (when Object Oriented programming à la Java was booming) it was fashionable to abstract everything and make the user forget that they were using an actual distributed system. For instance, we had RPC everywhere, which kind of hid the network from the user. Since then we learnt that abstracting the network entirely is a bad idea.

On the other hand, exposing too many knobs to the user is dangerous as well: they might turn them without really understanding what they do and set the system on fire.

So, determining what to expose to the user and what to implement "transparently" is a crucial point in all distributed systems work, not only databases.

In databases in particular, conflict resolution is a contention point. Do we only provide the user with databases that are consistent, knowing that this severly impacts performance and availability? Do we let them tweak the parameters (the R and W parameters in a quorum system, for instance)? Do we tolerate divergence, detect it, inform the user and let them reconcile (à la CouchDB)? Do we provide the user with constrained datastructures that resolve conflicts by themselves (CRDTs)?

Some people have gone as far as saying that Distributed Systems Are a UX Problem and I tend to agree with this line of reasoning.

Frederic Cambus (fcambus)

Viewing ANSI art in MS-DOS virtual machines June 19, 2020 04:28 PM

I sometimes get reports about Ansilove rendering some artworks differently than other ANSI art editors and viewers for modern platforms.

Ansilove tries to be faithful to ANSI.SYS and MS-DOS based editors and viewers rendering, as the vast majority of artworks were created during the DOS era. Most of the time, using ACiDDraw and ACiD View in DOSBox is enough, but when in doubt, it can be useful to verify how ANSI.SYS rendered a particular piece.

Once we have MS-DOS installed and working in a virtual machine, the next step is accessing files within the VM. The easiest way to do so is to create and use virtual floppy images to transfer files.

On a Linux machine, one can use mkfs.msdos to create an empty floppy image:

mkfs.msdos -C floppy.img 1440

The image can then be mounted on the host to copy the desired content, then attached to the virtual machine.

In the MS-DOS guest, we need to enable ANSI.SYS in CONFIG.SYS:

DEVICE=C:\DOS\ANSI.SYS

We can then render the files we want to verify:

A:
TYPE ANSI.ANS

80x50 mode can be enabled this way:

MODE CON COLS=80 LINES=50

Wesley Moore (wezm)

Working Around GitHub Browser Sniffing to Get Better Emoji on Linux June 19, 2020 08:03 AM

I have my system configured1 to use JoyPixels for emoji, which I consider vastly more attractive than Noto Color Emoji. Sadly GitHub uses browser sniffing to detect Linux user-agents and replaces emoji with (badly aligned) images of Noto Color Emoji. They don't do this on macOS and Windows. In this post I explain how I worked around this.

Screenshot of GitHub showing two comments, one with emoji set in the Noto Color Emoji font, the other in the JoyPixels Font.

The solution is simple: make GitHub think you're using a Mac or Windows PC. There are various ways to change the User-Agent string of Firefox. The easiest is via about:config but I didn't want it to be a global change — I want sites to know that I'm using Linux in logs/privacy respecting analytics (I block most trackers).

I ended up using the User-Agent Switcher and Manager browser add-on. I configured its allow list to only include github.com, and use the User-Agent string for Firefox on macOS. The end result? JoyPixels, just like I wanted.

P.S. If anyone from GitHub sees this. Please stop browser sniffing Linux visitors. Linux desktops and browsers have had working emoji support for years now.

1

I use the term, "configured", loosely here as all I really did was install the ttf-joypixels package.

June 17, 2020

Gustaf Erikson (gerikson)

Closing Time by Joe Queenan June 17, 2020 09:13 PM

An unflinching but entertaining memoir about growing up with an alchoholic father in working-class Philadelphia.

Britain’s War Machine by David Edgerton June 17, 2020 08:12 PM

A revisionist look at the material grounding of Great Britain (and its Empire) in World War II.

Unlike most contemporary views, Edgerton sees Dunkirk not as a low point but as a temporary setback. The real setback was Japan’s entry into the war and Britain’s need to divert forces and treasure to defend the Empire.

In the post-war years, with the Empire gone and Britain’s relative standing diminished, Dunkirk grows in stature, and the myth of the small island sacrificing itself for peace and democracy grows with it.

5,000 dead in Sweden June 17, 2020 12:17 PM

June 16, 2020

Benjamin Pollack (gecko)

Distributed Perfection June 16, 2020 08:20 PM

One of the hardest things for me to come to terms with in my adult life has been that…well, no one is perfect. The civil rights leader you laud turns out to be a womanizer. A different one held up as a herald of nonviolence turns out to be violently antisemitic. An author who is unquestionably a major supporter of women’s rights and the LGB part of the acronym turns out to be an enemy of the TQ part of the acronym.

But that doesn’t mean we just absolutely write them all off. When it comes to people, we understand that no one is perfect. People are complex, nuanced individuals. It’s entirely reasonable that someone who is a true rights champion in some arena might be straight-up retrograde in another. Sometimes they improve, sometimes they don’t, but either way, they’re people, who deserve to be lauded for their wins and criticized for their faults. They should be allowed faults. They shouldn’t have to be perfect.

I feel like we’re really slow to adopt this attitude to tech, even though tech is, ultimately, made by people. Jepsen tests are awesome, but I’ve become numb to people seeing a single Jepsen test failure as an indicator the entire database is broken. An amazing hobbyist graphical OS might be almost impossibly impressive, but people write it off because the kernel interface isn’t quite what they want. GitHub sucks because it needs JavaScript and bloated browsers to operate, even though it demonstrably has made it much easier for the average person to contribute to free and open-source software.

There’s so much work on trying to get a more diverse community into tech, but I feel like we lose a lot of our potential diversity right there, in our insistence that everything be straight-up perfect or be thrown out. Of course I’d like my tech stack to be perfect. But it’s written by people, and people are notoriously complicated, unreliable, and nuanced. And it’s important I meet them where they are, as people.

There’s no call to action in this post, and I’m deliberately not linking anything because I don’t want to fan the flames. But I do want to ask that, before you write a project or its people off as incompetent, lazy, offensive, or stupid, that you take a moment to explore that they’re people with strengths and weaknesses, and the tech they produce will likely be along similar axes.

June 14, 2020

Derek Jones (derek-jones)

An experiment involving matching regular expressions June 14, 2020 10:40 PM

Recommendations for/against particular programming constructs have one thing in common: there is no evidence backing up any of the recommendations. Running experiments to measure the impact of particular language features on developer performance is not something that researchers do (there have been a handful of experiments looking at the impact of strong typing on developer performance; the effect measured was tiny).

In February I discovered two groups researching regular expressions. In the first post on duplicate regexs, I promised to say something about the second group. This post discusses an experiment comparing developer comprehension of various regular expressions; the paper is: Exploring Regular Expression Comprehension.

The experiment involved 180 workers on Mechanical Turk (to be accepted, workers had to correctly answer four or five questions about regular expressions). Workers/subjects performed two different tasks, matching and composition.

  • In the matching task workers saw a regex and a list of five strings, and had to specify whether the regex matched (or not) each string (there was also an unsure response).
  • In the composition task workers saw a regular expression, and had to create a string matched by this regex. Each worker saw 10 different regexs, which were randomly drawn from a set of 60 regexs (which had been created to be representative of various regex characteristics). I have not analysed this data yet.

What were the results?

For the matching task: given each of the pairs of regexs below, which one (of each pair) would you say workers were most likely to get correct?

         R1                  R2
1.     tri[a-f]3         tri[abcdef]3
2.     no[w-z]5          no[wxyz]5
3.     no[w-z]5          no(w|x|y|z)5
4.     [ˆ0-9]            [\D]

The percentages correct for (1) were essentially the same, at 94.0 and 93.2 respectively. The percentages for (2) were 93.3 and 87.2, which is odd given that the regex is essentially the same as (1). Is this amount of variability in subject response to be expected? Is the difference caused by letters being much less common in text, so people have had less practice using them (sounds a bit far-fetched, but its all I could think of). The percentages for (3) are virtually identical, at 93.3 and 93.7.

The percentages for (4) were 58 and 73.3, which surprised me. But then I have been using regexs since before \D support was generally available. The MTurk generation have it easy not having to use the ‘hard stuff’ 😉

See Table III in the paper for more results.

This matching data might be analysed using Item Response theory, which can take into account differences in question difficulty and worker/subject ability. The plot below looks complicated, but only because there are so many lines. Each numbered colored line is a different regex, worker ability is on the x-axis (greater ability on the right), and the y-axis is the probability of giving a correct answer (code+data; thanks to Peipei Wang for fixing the bugs in my code):

Probability of giving a correct answer, by subject ability, for 60 regex matching questions

Yes, for question 51 the probability of a correct answer decreases with worker ability. Heads are being scratched about this.

There might be some patterns buried in amongst all those lines, e.g., particular kinds of patterns require a given level of ability to handle, or correct response to some patterns varying over the whole range of abilities. These are research questions, and this is a blog article: answers in the comments :-)

This is the first experiment of its kind, so it is bound to throw up more questions than answers. Are more incorrect responses given for longer regexs, particularly if they cannot be completely held in short-term memory? It is convenient for the author to use a short-hand for a range of characters (e.g., a-f), and I was expecting a difference in performance when all the letters were enumerated (e.g., abcdef); I had theories for either one being less error-prone (I obviously need to get out more).

Gokberk Yaltirakli (gkbrk)

Status update, June 2020 June 14, 2020 09:00 PM

After seeing other people (like Drew and emersion) publish these for a while, I decided to write my own “Status Update” for the first time. I can’t always find enough time to write blog posts, so these monthly status updates should be useful both for other people to keep an eye on what I’m working on, and for me as a historical record.

A full blog post needs to have a decent amount of content, and has a lot of time costs related to research, writing, and editing. This means for small updates, there might be too much friction in getting a post out. On the other hand, a monthly update needs a lot less polish for each individual item you want to talk about. As long as you pay attention to the post as a whole, individual items are allowed to lack the substance expected of a standalone post.

These reasons, combined with my not-so-great track record in posting regularly, makes me think I will be doing this in the future too. So let’s get on with it.


This month I migrated my kernel’s build system from a Makefile to the Ninja build system. Other than depending on a non-standard tool, I am really happy with this decision. I think Ninja is easier to maintain unless your Makefile is really simple.

Aside from the build system migration, and some small fix-ups, I have completely removed the C-based drivers and migrated them all to the new C++ driver system. This code cleanup made it easier to work on device drivers; and as a result of that we now have a graphics driver that works with QEMU, along with a framebuffer to avoid partial frames.

Speaking of Ninja, I also wrote a Python implementation of Ninja. This was both a way to learn about how build systems works, and a way to build my projects in environments without Ninja. While it doesn’t have full feature parity with the original implementation, it can build my projects and even bootstrap Ninja itself.

Fed up with the state of privacy on the Modern Web™, I started working on a browser extension that aims to plug some privacy leaks. It is still not polished enough, and occasionally breaks some JS-heavy websites. But despite that I’ve been using it daily and it’s not too problematic. I’ll try to make an announcement once it’s fully polished and ready to use.

Just as the previous months, I’m still learning more about DSP. Last month I created a software modulator for APT. This month I decided to go a little old-school and made a tool to create for Hellschreiber. Unfortunately, I haven’t had the chance to put up the code online yet.

I wrote a couple pages on my wiki about DSP, and I managed to contribute a little to the Signal Identification Wiki. I’d recommend checking it out, it’s a great resource to sink some time into.

Short update about university: It’s finally over. Once it’s marked and graded, I’d love to write about my dissertation project here.

That’s all for this month! Thanks for reading.

Ponylang (SeanTAllen)

Last Week in Pony - June 14, 2020 June 14, 2020 03:00 PM

We have some new RFCs and more updates to the Emacs ponylang-mode. This week’s sync meeting includes a discussion about build systems and package management.

Bogdan Popa (bogdan)

Announcing http-easy June 14, 2020 03:00 PM

Yesterday I released http-easy, a high-level HTTP client for Racket. I started working on it after getting annoyed at some of the code in my racket-sentry package. The same day I wrote that code, someone started a mailing list thread asking for a “practical” HTTP client so that served as additional motivation to spend some time on this problem. Here’s a basic example: 1 2 (require net/http-easy) (response-xexpr (get "https://example.

June 11, 2020

Oleg Kovalov (olegkovalov)

kakkaka June 11, 2020 05:25 PM

lalalalal


kakkaka was originally published in testlolkek on Medium, where people are continuing the conversation by highlighting and responding to this story.

June 10, 2020

Robin Schroer (sulami)

LISP<sub>1</sub> Has Won June 10, 2020 12:00 AM

I am currently working on a compiler for a new programming language which has been in the making for a few months at this point. There is nothing public to show yet, everything is very early stage, and there are plenty of decisions to make and work to be done before I will publish anything.

That being said, I will write about both the progress as well as different topics I come across, so stay tuned if you are interested in that.

The language I am writing currently has a Lisp-like syntax, because that is easy to parse an work with,I just really don’t want to deal with operator precedence, and honestly I like writing Lisp, so I find it unlikely that I’ll actually change to a non-Lisp-like syntax.

which is why I am sharing some thoughts on one of the big bike sheds in software history.

LISPwhat?

LISP1 and LISP2 are terms to describe the way symbol namespaces work in different LISP-like programming languages.It can also be applied to other languages with first-class functions, e.g. Python & Ruby.

The explanation is actually very simple, LISP1 has a single shared namespace for functions and variables. This means a symbol can refer to either a function or a variable, but not both. Consider the following Racket code:

(define (double x)
  (* 2 x))

(define triple (* 3 4))

double
;; => #<procedure:double>

triple
;; => 12

(double triple)
;; => 24

When you resolve a symbol to a variable, you cannot know if it will resolve to a function or not.

LISP2 on the other hand has a separate namespace for functions. This has the advantage that every name can be used twice,I’m glossing over the fact that Common Lisp actually has more than two namespaces, depending on your definition of the term namespace.

once for a function, and once for a variable. The tradeoff is that the user has to specify in which namespace they want to resolve a symbol. Consider the following Emacs Lisp code:

(defun double (x)
  (* 2 x))

(defvar double (* 2 4))

(funcall #'double double)
;; => (funcall <function double> <variable double>)
;; => (double 6)
;; => 12

Note the added punctuation to denote the first double as a symbol resolving to a function.

LISPwhy?

LISP is one of the oldest programming languages that is still used commercially today in some form, if you accept Common Lisp in its lineage. It appears that the namespace separation in the original LISP 1.5 was mostly incidental, and has been regretted since.

The set of LISP2 languages is quite small these days. Besides Common Lisp and Emacs Lisp, both of which are over three decades old at this point, there are also Ruby and Perl.Honourable mention: Lisp Flavoured Erlang is a LISP2.

The other ancient LISP-like language, Scheme, is a LISP1, and so is its popular modern dialect Racket (as demonstrated above). Almost every other somewhat popular language chooses to share a single namespace between functions and variables. Examples include Clojure, Janet, Python, Java, JavaScript, and even Rust.

Clearly the benefits of less syntactic clutter and cognitive overhead have won in the popular arena, to the point that the established de facto standard itself becomes a good reason to stick with a single unified namespace. Of course improvement, by its very definition, always requires change, but language designers need to be acutely aware of the cost incurred by diverging from the established norm.

June 09, 2020

Frederik Braun (freddyb)

Understanding Web Security Checks in Firefox (Part 1) June 09, 2020 10:00 PM

This blog post has first appeared on the Mozilla Attack & Defense blog and was co-authored with Christoph Kerschbaumer

This is the first part of a blog post series that will allow you to understand how Firefox implements Web Security fundamentals, like the Same-Origin Policy. This first post of the series …

Gokberk Yaltirakli (gkbrk)

Faux-DEFLATE June 09, 2020 09:00 PM

I was working on a proof-of-concept implementation of a file format that uses DEFLATE compression. Since it was supposed to be a self-contained example, I didn’t want to bring in a fully-featured compressor like zlib. I skimmed the DEFLATE RFC and noticed that it supports raw / uncompressed data blocks. I wrote a simple encoder that stores uncompressed blocks to solve my problem, and wanted to document it on my blog for future reference.

Structure of a DEFLATE stream

A DEFLATE stream is made up of blocks. Each block has a 3-bit header. The first bit signifies the last block of the stream, and the other two make up the block type.

The block types are

  • 00 - no compression
  • 01 - compressed with fixed Huffman codes
  • 10 - compressed with dynamic Huffman codes
  • 11 - reserved (error)

In this post, we’re only interested in the first one (00).

Structure of an uncompressed block

In the uncompressed block, the 3-bit header is contained in a byte. A good property of the uncompressed block type being 00 is the ease of constructing the header.

  • If the block is the final one, the header is 1
  • If the block is not the final one, the header is 0

After the header byte, there is the length and negated length. These are both encoded as little endian uint16_t’s.

  • length is the number of data bytes in the block
  • negated length is one’s complement of length, ~len & 0xFFFF

After the header, length and the negated length, length bytes of data follow. If there are no more blocks after this one, the final bit is set.

Python implementation

Here’s a simple DEFLATE implementation in Python. The output should be a valid DEFLATE stream for real decoders.

def faux_deflate(reader, writer, bufsize=2048):
    while True:
        chunk = reader.read(bufsize)

        header = 0

        if not chunk:
            header = 1

        _len = len(chunk)
        nlen = ~_len & 0xFFFF
        writer.write(struct.pack("<BHH", header, _len, nlen))
        writer.write(chunk)

        if not chunk:
            break

There’s also a decoder that can only decode uncompressed blocks.

def faux_inflate(reader, writer):
    while header := reader.read(5):
        header, _len, nlen = struct.unpack("<BHH", header)

        assert header in [0, 1]
        assert nlen == ~_len & 0xFFFF

        writer.write(reader.read(_len))

Useful resources

Jeremy Morgan (JeremyMorgan)

Get and Store Temperature from a Raspberry Pi with Go June 09, 2020 05:08 PM

In this tutorial, I’ll show you how to grab temperature from a Raspberry Pi and build an endpoint to store the data, with Go. You will learn: How to retrieve the temperature from a sensor How to send that data in JSON Build an API endpoint to receive it Store the data in SQLite database And we’ll do it all with Go. I did a live stream of the entire process that you can watch here.

June 08, 2020

Marc Brooker (mjb)

Some Virtualization Papers Worth Reading June 08, 2020 12:00 AM

Some Virtualization Papers Worth Reading

A short, and incomplete, survey.

A while back, Cindy Sridharan asked on Twitter for pointers to papers on the past, present and future of virtualization. A picked a few of my favorites, and given the popularity of that thread I decided to collect some of them here. This isn't a literature survey by any means, just a collection of some papers I've found particularly interesting or useful. As usual, I'm biased towards papers I enjoyed reading, rather than those I had to slog through.

Popek and Goldberg's 1974 paper Formal Requirements for Virtualizable Third Generation Architectures is rightfully a classic. They lay out a formal framework of conditions that a computer architecture must fulfill to support virtual machines. It's 45 years old, so some of the information is dated, but the framework and core ideas have stood the test of time.

Xen and the Art of Virtualization, from 2003, described the Xen hypervisor and a novel technique for running secure virtualization on commodity x86 machines. The exact techniques are less interesting than they were then, mostly because of hardware virtualization features on x86 like VT-x, but the discussion of the filed and trade-offs is enlightening. Xen's influence on the industry has been huge, especially because it was used as the foundation of Amazon EC2, which triggered the following decade's explosion in cloud computing. Disco: Running Commodity Operating Systems on Scalable Multiprocessors from 1997 is very useful from a similar perspective (and thanks to Pekka Enberg for the tip on that one). Any paper that has "our approach brings back an idea popular in the 1970s" in its abstract gets my attention immediately.

A Comparison of Software and Hardware Techniques for x86 Virtualization, from 2006, looks at some of the early versions of that x86 virtualization hardware and compares it to software virtualization techniques. As above, hardware has moved on since this was written, but the criticisms and comparisons are still useful to understand.

The security, compatibility and performance trade-offs of different approaches to isolation are complex. On compatibility, A study of modern Linux API usage and compatibility: what to support when you're supporting is a very nice study of how much of the Linux kernel surface area actually gets touched by applications, and what is needed to be truly compatible with Linux. Randal's The Ideal Versus the Real: Revisiting the History of Virtual Machines and Containers surveys the history of isolation, and what that means in the modern world. Anjali's Blending Containers and Virtual Machines: A Study of Firecracker and gVisor is another of a related genre, with some great data comparing three methods of isolation.

My VM is Lighter (and Safer) than your Container from SOSP'17 has also been influential in changing they way a lot of people think about virtualization. A lot of people I talk to see virtualization as a heavy tool with multi-second boot times and very limited density, mostly because that's the way it's typically used in industry. Manco et al's work wasn't the first to burst that bubble, but they do it very effectively.

Our own paper Firecracker: Lightweight Virtualization for Serverless Applications describes Firecracker, new open-source Virtual Machine Monitor (VMM) specialized for serverless workloads. The paper also covers how we use it in AWS Lambda, and some of what we see as the future challenges in this space. Obviously I'm biased here, being an author of that paper.

June 07, 2020

Derek Jones (derek-jones)

C++ template usage June 07, 2020 10:29 PM

Generics are a programming construct that allow an algorithm to be coded without specifying the types of some variables, which are supplied later when a specific instance (for some type(s)) is instantiated. Generics sound like a great idea; who hasn’t had to write the same function twice, with the only difference being the types of the parameters.

All of today’s major programming languages support some form of generic construct, and developers have had the opportunity to use them for many years. So, how often generics are used in practice?

In C++, templates are the language feature supporting generics.

The paper: How C++ Templates Are Used for Generic Programming: An Empirical Study on 50 Open Source Systems contains lots of interesting data :-) The following analysis applies to the five largest projects analysed: Chromium, Haiku, Blender, LibreOffice and Monero.

As its name suggests, the Standard Template Library (STL) is a collection of templates implementing commonly used algorithms+other stuff (some algorithms were commonly used before the STL was created, and perhaps some are now commonly used because they are in the STL).

It is to be expected that most uses of templates will involve those defined in the STL, because these implement commonly used functionality, are documented and generally known about (code can only be reused when its existence is known about, and it has been written with reuse in mind).

The template instantiation measurements show a 17:1 ratio for STL vs. developer-defined templates (i.e., 149,591 vs. 8,887).

What are the usage characteristics of developer defined templates?

Around 25% of developer defined function templates are only instantiated once, while 15% of class templates are instantiated once.

Most templates are defined by a small number of developers. This is not surprising given that most of the code on a project is written by a small number of developers.

The plot below shows the percentage instantiations (of all developer defined function templates) of each developer defined function template, in rank order (code+data):

Number of tasks having a given estimate.

Lines are each a fitted power law, whose exponents vary between -1.5 and -2. Is it just me, or are these exponents surprising close?

The following is for developer defined class templates. Lines are fitted power law, whose exponents vary between -1.3 and -2.6. Not so close here.

Number of tasks having a given estimate.

What processes are driving use of developer defined templates?

Every project has its own specific few templates that get used everywhere, by all developers. I imagine these are tailored to the project, and are widely advertised to developers who work on the project.

Perhaps some developers don’t define templates, because that’s not what they do. Is this because they work on stuff where templates don’t offer much benefit, or is it because these developers are stuck in their ways (if so, is it really worth trying to change them?)

Ponylang (SeanTAllen)

Last Week in Pony - June 7, 2020 June 07, 2020 03:16 PM

This weeks sync meeting includes discussions about type system soundness and dependency management.

June 06, 2020

Patrick Louis (venam)

Evolutionary Software Architecture June 06, 2020 09:00 PM

Building Evolutionary Architectures

In a previous post, I’ve underlined the philosophy behind Domain Driven Design, DDD, and now I’d like to move to a practical approach that handles real issues in software development and architecture: requirements that constantly change, and models that are never precise, never current, and/or never using the best technology available. One of the solution to such problems is to build an evolutionary architecture.

To be able to have a discussion we have to understand the part that software architecture plays, which is not straight forward considering the many definitions and re-definitions of it. I’ll note the particularly fascinating ones.

The architecture of a software system (at a given point in time) is its organization or structure of significant components interacting through interfaces, those components being composed of successively smaller components and interfaces. — IEEE definition of software architecture

In most successful software projects, the expert developers working on that project have a shared understanding of the design. This shared understanding is called “architecture.” This understanding includes how the system is divided into components and how the components interact through interfaces. — Ralph Johnson

Going towards more abstract definitions such as the following.

Architecture is the stuff that’s hard to change later. And there should be as little of that stuff as possible. — Martin Fowler

Architecture is about the important stuff. Whatever that is. — Martin Fowler

Stuff that’s hard to change later. — Neal Ford

These definitions barely overlap but there’s still a vague essence joining them which we can extract. We can say that architecture is concerned with the important decisions in a software project, the objects of those decisions, the shared knowledge of them, and how to reason about them. If we view this from an evolutionary architecture standpoint, the best architecture is one where decisions are flexible, easily replaceable, reversible, and deferred as late as possible so that they can be substituted for alternatives that recent experiences have shown to be superior.
Because architecture is about decision-making, it is inherently tied with the concept of technical debt, the compromise of trading time for a design that is not perfect. Keep in mind that debt accumulates and often leads to architectural decay as changes keep coming and entropy increases.

Similarly, due to the vague definition of architecture, the role of architect is hard to describe. Whether it should be a completely separate role, or whether everyone in a team acts as one, is ambiguous. The vociferous software architecture evangelist Martin Fowler prefers the term Architectus Oryzus, referring to architects that are also active contributors on the projects, thus getting direct insights from their involvement.

The software architecture thought process can be applied at two broad levels: the application level and the enterprise level. Application architecture is about describing the structure of the application and how they fit together, usually using design patterns, while enterprise architecture is about the organizational level software issues such as practices, information flow, methodology standards, release mechanisms, personnel related activities, technology stacks enforced, etc.

Design relates to all the well known development design patterns, refactoring techniques, the usage of frameworks, how to bundle components together, and other daily concerns. In an evolutionary architecture, it’s preferable to have an emergent design instead of one that is set up front.

This gives us a good idea of what software architecture is about, so what’s the current state of it, and why do we need a solution such as building evolutionary architectures?

The usual way we develop software today is by fighting incoming changes that we want to incorporate in the current architecture. Software development has a dynamic equilibrium, and currently we find that software is in a constantly unbalanced and unstable state whenever there are changes to be included. That is because even though we’d like to do the right things at the right time, we can’t predict what those decisions should be, predictability is almost impossible. For example, we can’t predict disruptive technologies that don’t exist yet. As the software ages, we juggle changes and new requirements, there’s no room for experimentation, we only respond.
Stakeholders want the software to fulfill architecture significant requirements, also known as the “ilities” and KPI, such as auditability, performance, security, scalability, privacy, legality, productivity, portability, stability, etc. They expect those to not degrade. Hence, we have to find the least-worst trade-off between them and blindly not introduce anything that could hinder them. This is hard to do, be it because of a business-driven change, such as new features, new customers, new market, etc., or be it because of ecosystem change such as advances in technology, library upgrades, frameworks, operating systems, etc.

CBA

In recent years, we’ve seen the rise of agile development methodologies that are meant to replace the waterfall approach. They are more apt at facing this challenge, they create an iterative and dynamic way to control the change process. What we call evolutionary architecture starts from the idea of embracing change and constant feedback but wants to apply it across the whole architecture spectrum, on multiple dimensions. It’s not the strongest that survive, it’s the ones that are the most responsive to change. What is evolutionary software architecture.

Evolutionary architecture is a meta-architecture, a way of thinking about software in evolutionary terms. A guide, a first derivative, dictating design principles that promote change as a first citizen. Here is Neal Ford’s, Rebecca Parson’s, and Patrick Kua’s definition in their book “Building Evolutionary Architectures” which we’ll dissect.

An evolutionary architecture supports guided, incremental change across multiple dimensions.

  • Multiple dimensions

There are no separate systems. The world is a continuum. Where to draw a boundary around a system depends on the purpose of the discussion. — Donella H. Meadows

While the agile methodology is only concerned with people and processes, evolutionary architecture encompasses the whole spectrum including the technical, the data, the domain, the security, the organizational, and the operational aspects. We want different perspectives, and all evolvable, those are our dimensions. The evolutionary mindset should surround it all in a holistic view of software systems. For this, we add a new requirement, an “-ility”, we call the evolvability of a dimension. This will help measure how easily change in a dimension can evolve the architecture — easily be included in the dynamic equilibrium.
For example, the big ball of mud architecture, with its extreme coupling and architectural rotting, has a dimension of evolvability of 0 because any change in any dimension is daunting.
The layered architecture has a one-dimensional structural evolvability because change at one layer ripples only through the lower one. However, the domain dimension evolvability is often 0 when domain concepts are smeared and coupled across layer boundaries, thus a domain change requires major refactoring and ripples through all the layers.
The microservice style of architecture, that hinges on the post-devops and agile revolution, has a structural and domain dimension evolvability of n, n being the number of isolated services running. Each service in a microservice architecture represents a domain bounded context, which can be changed independently of the others because of its boundary. In the world of evolutionary architecture we call such disjunct piece a quantum. An architectural quantum is an independently deployable component with high functional cohesion, which includes all the structural elements required for the system to function properly. In a monolith architecture, the whole monolith is the quantum. However, from a temporal coupling perspective dimension, transaction may resonate through multiple services in a microservice architecture, and thus have an evolvability in the transactional dimension of 0.

  • Incremental change

It is not enough to have a measure of how easy change can be applied, we also need to continually and incrementally do it. This applies both to how teams build software, such as the agile methodology, and how the software is deployed, things such as continuous integration, continuous delivery, and continuous verification/validation.
These rely on good devops practices that let you take back control in complex systems, such as automated deployment pipelines, automated machine provisioning, good monitoring, gradual migration of new services by controlling routes, using database migration tools, using chaos engineering to facilitate the management of services, and more.

  • Guided Change

We can experiment without hassle, trivially and reversibly, with evolvability across multiple dimension and incremental change. But to start the evolutionary process this is what we need: a guide that will push, using experiments as the main stressors, the architecture in the direction we want. We call this selector an evolutionary fitness function, similar to the language used in genetic algorithms for the optimization function.

An Architectural fitness function provides an objective integrity assessment of some architectural characteristic(s).

Fitness functions are metrics that can cover one or multiple dimensions we care about and want to optimize. There’s a wide range of such functions, and this is where the evolutionary architecture shines, it encourages testing, hypothesis, and gathering data in all manner possible to see how these metrics evolve, and the software along with it. Experimentation and hypothesis-driven development are some superpowers that evolutionary architectures deliver.
This isn’t limited to the usual test units and static analysis but extends way beyond simple code quality metrics. These could be automated or not, global or not, continuous or not, dynamic or not, domain specific or not, etc. Let’s mention interesting techniques that can be used for experimentation and that are now facilitated.

  • A/B testing.
  • Canary Releases aka phased rollout.
  • TDD to find emergent design.
  • Security as code, especially in the deployment pipeline
  • Architecture as code, also in the deployment pipeline with test framework such as ArchUnit.
  • Licenses as code, surprisingly this works too.
  • Test in production: through instrumentation and metrics, or direct interaction with users.
  • Feature flags/feature toggles, to toggle behavior on and off.
  • Chaos engineering, for example using the simian army as a continuous fitness function. “The facilitation of experiments to uncover systemic weakness”.
  • Social code analysis to find hotspot in code.
  • Github scientist, to test hypothesis in production while keeping normal behavior.
    • Decides whether to run or not the try block.
    • Randomizes the order in which use and try blocks are run.
    • Measures the durations of all behaviors.
    • Compares the result of try to the result of use.
    • Swallows (but records) any exceptions raised in the try.
    • Publishes all the information.

The benefit of all the experimentation are soon seen, creating a real interactive feedback loop with users, a buffet of options. The dynamic equilibrium takes care of itself and there are fewer surprises.

This is enabled by the team building the evolutionary architecture. Like with DDD, Conway’s law applies, the shape of the organization is directly reflected in the software — You can’t affect the architecture without affecting the people that build it.

So far, we’ve seen that such team should embrace devops and agile development, that’s a given. Additionally, the team should itself be a cocoon for evolution and experimentation. By making it cross-functional, that is every role and expertise should be found in it, and responsible for a single project, we remove the bottlenecks in the organization. We need a team that resembles our architectural quantum.
A small team, one that can be fed by two pizzas — a two-pizzas team — avoids the separation between who decides what needs to be done and who decides how it’s going to be done. Everyone is there and decides together. We talk of teams in charge of products rather than projects.
The size of the team also allows information to flow seamlessly. All can share the architectural and domain knowledge. Methods that can be used are the usual documentation, architectural decision records, pair programming, and even mob programming.

As nice as it is to have teams that are single working units taking the best decisions for their projects, it’s also important to limit their boundaries. Many companies prefer giving loose recommendations about the software stacks teams can use instead of letting them have their own silos of specialized and centralized knowledge. Again, we face the dynamic equilibrium but this time at the team level. The parallel in enterprise architecture is called the “classic alternatives” strategy.
Human governance in these teams shouldn’t be restrictive because it would make it hard to move. However, the teams are guided by their own fitness functions, an automatic architectural governance. The continuous verifications in the delivery pipeline act as the guard-rail mechanism.

There are two big principles that should be kept in mind while applying all the above: “last responsible moment” and “bring the pain forward”. Together, they have the effect of making team members less hesitant and more prone to experiment.

The last responsible moment, an idea from the LEAN methodology, is about postponing decisions that are not immediately required to find the time to gather as much information as possible to let the best possible choice emerge.
This is especially useful when it comes to structural designs and technological decisions, as insights and clear contexts appear late. It helps avoid the potential cost and technical debts of useless abstractions and vendor-locking the code to frameworks. That is in direct opposition to the classical way of doing software architecture where those decisions are taken upfront.

What to remember when taking decisions

Bringing the pain forward, an idea inspired from the extreme programming methodology, is about facing difficult, long, painful tasks instead of postponing them. The more often we encounter them, the more we’ll know their ins-and-outs, and the more we’ll be incited to automate the pain away. It’s the dynamic equilibrium of pain vs time, the pain increases exponentially if we wait.
This is why it’s encouraged to do things like test in production, apply techniques of chaos engineering, rebooting often, garbage collecting services, merging code often, using database migration tools, etc. Eventually, the known-unknowns become known-knowns, the common predictable pain is gone.

In a world where software keeps getting complex, building evolutionary architectures leads into the topic of building robust, resilient, adaptive, and rugged systems. Software is now an intimate part of our lives, and we rely on it, it has real world effects. We could get inspired by the aerospace world and take a look at the checklist manifesto, or we could embrace statelessness and a throwable/disposable architecture (disposable software, erase your darlings), or maybe go the way of the flexible reactive architectures with their self-healing mechanisms. Anything is possible.

In this post, I’ve given my overview of the way I perceive evolutionary software architecture and its place in software architecture as a whole. It is clearly a step forward from the typical static view of architecture and offers a novel and organic approach, as the name implies. None of what is described is necessarily novel but putting all these methods and thinking together is. If you want an in-depth explanation, you can take a look at the O’Reilly book “Building Evolutionary Software” by Neal Ford, Rebecca Parsons, and Patrick Kua. I hope this article kick-starts your journey.






References:

Frederic Cambus (fcambus)

OpenBSD framebuffer console and custom color palettes June 06, 2020 04:33 PM

On framebuffer consoles, OpenBSD uses the rasops(9) subsystem, which was imported from NetBSD in March 2001.

The RGB values for the ANSI color palette in rasops have been chosen to match the ones in Open Firmware, and are different from those in the VGA text mode color palette.

Rasops palette:

Rasops palette

VGA text mode palette:

VGA text mode palette

As one can see, the difference is quite significant, and decades of exposure to MS-DOS and Linux consoles makes it quite difficult to adapt to a different palette.

RGB values for the ANSI color palette are defined in sys/dev/rasops/rasops.c, and here are the proper ones to use to match the VGA text mode palette:

#define	NORMAL_BLACK	0x000000
#define	NORMAL_RED	0xaa0000
#define	NORMAL_GREEN	0x00aa00
#define	NORMAL_BROWN	0xaa5500
#define	NORMAL_BLUE	0x0000aa
#define	NORMAL_MAGENTA	0xaa00aa
#define	NORMAL_CYAN	0x00aaaa
#define	NORMAL_WHITE	0xaaaaaa

#define	HILITE_BLACK	0x555555
#define	HILITE_RED	0xff5555
#define	HILITE_GREEN	0x55ff55
#define	HILITE_BROWN	0xffff55
#define	HILITE_BLUE	0x5555ff
#define	HILITE_MAGENTA	0xff55ff
#define	HILITE_CYAN	0x55ffff
#define	HILITE_WHITE	0xffffff

And here is a diff doing just that, which I sent to tech@ back in January 2017.

EDIT: The enthusiasm around this article led me to make another try, which didn't fare any better.

June 05, 2020

Stjepan Golemac (stjepangolemac)

Building an easy on the eyes IKEA style blog, in no time, for free, again June 05, 2020 05:58 PM

Desktop after workPhoto by Luca Bravo on Unsplash

It’s been over a year since I bought a new domain and redesigned my blog. I was planning to write about many things, I set up a nice Markdown editor on my Mac, everything was ready to go. Then I did nothing, for a year. 😴

During this lockdown, I felt the urge to do it all over again but I successfully resisted it. Then I caved in one evening and rebuilt everything. This time it’s much simpler and I’m much more pleased with the tech stack.

The future is static! 🚀 Keep reading here.

If you’re wondering why I’m not writing on Medium anymore I wrote a post about that too. Check it out.

June 04, 2020

Aaron Bieber (qbit)

OpenSSH - Configuring FIDO2 Resident Keys June 04, 2020 11:43 PM

Table of Contents

  1. The Setup
  2. Creating keys
    1. Generating the non-resident handle
    2. Generating the resident handle
  3. Using the token
    1. Resident
      1. Transient usage with ssh-add
      2. Permanent usage with ssh-agent
    2. Non-resident

The Setup

If you haven’t heard, OpenSSH recently ([2020-02-14 Fri]) gained support for FIDO2/U2F hardware authenticators like the YubiKey 5!

This allows one to log into remote hosts with the touch of a button and it makes me feel like I am living in the future!

Some of these hardware tokens even support multiple slots, allowing one to have multiple keys!

On top of all that, the tokens can do “resident” and “non-resident” keys. “Resident” means that the key is effectively retrievable from the token (it doesn’t actually get the key - it’s a handle that lets one use the hardware key on the device).

This got me thinking about how I could use a single token (with two keys) to access the various machines I use.

In my use case, I have two types of machines I want to connect to:

  • greater security: machines I want to grant access to from a very select number of devices.

The greater key will require me to copy the “key handle” to the machines I want to use it from.

  • lesser security: machines I want to access from devices that may not be as secure.

The lesser key will be “resident” to the YubiKey. This means it can be downloaded from the YubiKey itself. Because of this, it should be trusted a bit less.

Creating keys

When creating FIDO keys (really they are key handles) one needs to explicitly tell the tool being used that it needs to pick the next slot. Otherwise generating the second key will clobber the first!

Generating the non-resident handle

greater will require me to send the ~/.ssh/ed25519_sk_greater handle to the various hosts I want to use it from.

We will be using ssh-keygen to create our resident key.

ssh-keygen -t ed25519-sk -Oapplication=ssh:greater -f ~/.ssh/ed25519_sk_greater

Generating the resident handle

Because resident keys allow for the handle to be downloaded from the token, I have changed the PIN on my token. The PIN is the only defense against a stolen key. Note: the PIN can be a full passphrase!

Again via ssh-keygen.

ssh-keygen -t ed25519-sk -Oresident -Oapplication=ssh:lesser -f ~/.ssh/ed25519_sk_lesser

Using the token

Resident

The resident key can be used by adding it to ssh-agent or by downloading the handle / public key using ssh-keygen:

Transient usage with ssh-add

ssh-add -K

This will prompt for the PIN (which should be set as it’s the only defense against a stolen key!)

No handle files will be placed on the machine you run this on. Handy for machines you want to ssh from but don’t fully trust.

Permanent usage with ssh-agent

ssh-keygen -K

This will also prompt for the PIN, however, it will create the private key handle and corresponding public key and place them in $CWD.

Non-resident

The non-resident key will only work from hosts that have the handle (in our case ~/.ssh/ed25519_sk_greater). As such, the handle must be copied to the machines you want to allow access from.

Once the handle is in place, you can specify it’s usage in ~/.ssh/config:

Host secretsauce
    IdentityFile ~/.ssh/ed25519_sk_greater

June 02, 2020

Jeremy Morgan (JeremyMorgan)

How to Install Go on the Raspberry Pi June 02, 2020 06:20 PM

If you want to install Go on your Raspberry Pi you have a few options. In the past there was a lot of cross compiling and hacking to get it done, but now you can install it through Apt. However, you’re likely to find an older version. For instance at the time of this writing, an updated Raspberry Pi OS shows a version of 1.11.1 in the repositories. However the current version is 1.

Dan Luu (dl)

Finding the Story June 02, 2020 07:05 AM

This is an archive of an old pseudonymously written post from the 90s that seems to have disappeared from the internet.

I see that Star Trek: Voyager has added a new character, a Borg. (From the photos, I also see that they're still breeding women for breast size in the 24th century.) What ticked me off was the producer's comment (I'm paraphrasing), "The addition of Seven of Nine will give us limitless story possibilities."

Uh-huh. Riiiiiight.

Look, they did't recognize the stories they had. I watched the first few episodes of Voyager and quit when my bullshit meter when off the scale. (Maybe that's not fair, to judge them by only a few episodes. But it's not fair to subject me to crap like the holographic lungs, either.)

For those of you who don't watch Star Trek: Voyager, the premise is that the Voyager, sort of a space corvette, gets transported umpteen zillions of light years from where it should be. It will take over seventy years at top speed for them to get home to their loved ones. For reasons we needn't go into here, the crew consists of a mix of loyal Federation members and rebels.

On paper, this looks good. There's an uneasy alliance in the crew, there's exploration as they try to get home, there's the whole "island in space" routine. And the Voyager is nowhere near as big as the Enterprise -- it's not mentally healthy for people to stay aboard for that long.

But can this idea actually sustain a whole series? Would it be interesting to watch five years of "the crew bickers" or "they find a new clue to faster interstellar travel but it falls through"? I don't think so.

(And, in fact, the crew settled down awfully quickly.)

The demands of series television subvert the premise. The basic demand of series television is that our regular characters are people we come to know and to care about -- we want them to come into our living rooms every week. We must care about their changes, their needs, their desires. We must worry when they're put in jeopardy. But we know it's a series, so it's hard to make us worry. We know that the characters will be back next week.

The demands of a story require someone to change of their own accord, to recognize some difference. The need to change can be imposed from without, but the actual change must be self-motivated. (This is the fundamental paradox of series television: the only character allowed to change is a guest, but the instrument of that change has to be a series regular, therefore depriving both characters of the chance to do something interesting.)

Series with strict continuity of episodes (episode 2 must follow episode 1) allow change -- but they're harder to sell in syndication after the show goes off the air. Economics favour unchanging regular characters.

Some series -- such as Hill Street Blues -- get around the jeopardy problem by actually making characters disposable. Some characters show up for a few episodes and then die, reminding us that it could happen to the regulars, too. Sometimes it does happen to the regulars.

(When the characters change in the pilot, there may be a problem. A writer who was approached to work on Mary Tyler Moore's last series saw from the premise that it would be brilliant for six episodes and then had noplace to go. The first Fox series starring Tea Leoni, Flying Blind, had a very funny pilot and set up an untenable situation.)

I'm told the only interesting character on Voyager has been the doctor, who can change. He's the only character allowed to grow.

The first problem with Voyager, then, is that characters aren't allowed to change -- or the change is imposed from outside. (By the way, an imposed change is a great way to start a story. The character then fights it, and that's interesting. It's a terrible way to end a story.)

The second problem is that they don't make use of the elements they have. Let's go back to the first season. There was an episode in which there's a traitor on board who is as smart as Janeway herself. (How psychiatric testing missed this, I don't know, but the Trek universe has never had really good luck with psychiatry.) After leading Janeway by the nose for fifty minutes, she figures out who it is, and confronts him. He says yes -- and beams off the ship, having conveniently made a deal with the locals.

Perfect for series television. We've got a supposedly intelligent villain out there who could come back and Janeway's been given a run for her money -- except that I felt cheated. Where's the story? Where's the resolution?

Here's what I think they should have done. It's not traditional series television, but I think it would have been better stories.

First of all, the episode ends when Janeway confronts the bad guy and arrests him. He's put in the brig -- and stays there. The viewer gets some sense of victory here.

But now there's someone as smart as Janeway in the brig. Suddenly we've set up Silence of the Lambs. (I don't mind stealing if I steal from good sources.) Whenever a problem is big enough, Janeway has this option: she can go to the brig and try and make a deal with the bad guy. "The ship dies, you die." Not only that, here's someone on board ship with whom she has a unique relationship -- one not formally bounded by rank. What does the bad guy really want?

And whenever Janeway's feeling low, he can taunt her. "By the way, I thought of a way to get everyone home in one-tenth the time. Have you, Captain?"

You wouldn't put him in every episode. But any time you need that extra push, he's there. Remember, we can have him escape any time we want, through the same sleight used in the original episode.

Furthermore, it's one thing to catch him; it's another thing to keep him there. You can generate another entire episode out of an escape attempt by the prisoner. But that would be an intermediate thing. Let's talk about the finish I would have liked to have seen.

Let's invent a crisis. The balonium generator explodes; we're deep in warp space; our crack engineering crew has jury-rigged a repair to the sensors and found a Class M planet that might do for the repairs. Except it's just too far away. The margin is tight -- but can't be done. There are two too many people on board ship. Each requires a certain amount of food, air, water, etc. Under pressure, Neelix admits that his people can go into suspended animation, so he does. The doctor tries heroically but the engineer who was tending the balonium generator dies. (Hmmm. Power's low. The doctor can only be revived at certain critical moments.) Looks good -- but they were using air until they died; one more crew member must die for the rest to live.

And somebody remembers the guy in the brig. "The question of his guilt," says Tuvok, "is resolved. The authority of the Captain is absolute. You are within your rights to hold a summary court martial and sentence him to death."

And Janeway says no. "The Federation doesn't do that."

Except that everyone will die if she doesn't. The pressure is on Janeway, now. Janeway being Janeway, she's looking for a technological fix. "Find an answer, dammit!" And the deadline is coming up. After a certain point, the prisoner has to die, along with someone else.

A crewmember volunteers to die (a regular). Before Janeway can accept, yet another (regular) crewmember volunteers, and Janeway is forced to decide. -- And Tuvok points out that while morally it's defensible if that member volunteered to die, the ship cannot continue without either of those crewmembers. It can continue without the prisoner. Clearly the prisoner is not worth as much as those crewmembers, but she is the captain. She must make this decision.

Our fearless engineering crew thinks they might have a solution, but it will use nearly everything they've got, and they need another six hours to work on the feasibility. Someone in the crew tries to resolve the problem for her by offing the prisoner -- the failure uses up more valuable power. Now the deadline moves up closer, past the six hours deadline. The engineering crew's idea is no longer feasible.

For his part, the prisoner is now bargaining. He says he's got ideas to help. Does he? He's tried to destroy the ship before. And he won't reveal them until he gets a full pardon.

(This is all basic plotting: keep piling on difficulties. Put a carrot in front of the characters, keep jerking it away.)

The tricky part is the ending. It's a requirement that the ending derive logically from what has gone before. If you're going to invoke a technological fix, you have to set the groundwork for it in the first half of the show. Otherwise it's technobabble. It's deus ex machina. (Any time someone says just after the last commercial break, "Of course! If we vorpalize the antibogon flow, we're okay!" I want to smack a writer in the head.)

Given the situation set up here, we have three possible endings:

  • Some member of the crew tries to solve the problem by sacrificing themselves. (Remember, McCoy and Spock did this.) This is a weak solution (unless Janeway does it) because it takes the focus off Janeway's decision.
  • Janeway strikes a deal with the prisoner, and together they come up with a solution (which doesn't involve the antibogon flow). This has the interesting repercussions of granting the prisoner his freedom -- while everyone else on ship hates his guts. Grist for another episode, anyway.
  • Janeway kills the prisoner but refuses to hold the court martial. She may luck out -- the prisoner might survive; that million-to-one-shot they've been praying for but couldn't rely on comes through -- but she has decided to kill the prisoner rather than her crew.

My preferred ending is the third one, even though the prisoner need not die. The decision we've set up is a difficult one, and it is meaningful. It is a command decision. Whether she ends up killing the prisoner is not relevant; what is relevant is that she decides to do it.

John Gallishaw once categorized all stories as either stories of achievement or of decision. A decision story is much harder to write, because both choices have to matter.

June 01, 2020

Nikita Voloboev (nikivi)

Carlos Fenollosa (carlesfe)

Seven years later, I bought a new Macbook. For the first time, I don't love it June 01, 2020 02:31 PM

The 2013 Macbook Air is the best computer I have ever owned. My wish has always been that Apple did nothing more than update the CPU and the screen, touching nothing else. I was afraid the day of upgrading my laptop would come.

But it came.

My Air was working flawlessly, if only unbearably slow when under load. Let me dig a bit deeper into this problem, because this is not just the result of using old hardware.

When video conferencing or under high stress like running multiple VMs the system would miss key presses or mouse clicks. I'm not saying that the system was laggy, which it was, and it is expected. Rather, that I would type the word "macbook" and the system would register "mok", for example. Or I would start a dragging event, the MouseUp never registered but the MouseMove continued working, so I ended up flailing an icon around the screen or moving a window to some unexpected place.

This is mostly macOS's fault. I own a contemporary x230 with similar specs running Linux and it doesn't suffer from this issue. Look, I was a computer user in the 90s and I perfectly understand that an old computer will be slow to the point of freezing, but losing random input events is a serious bug on a modern multitasking system.

Point #1: My old computer became unusable due to macOS, not hardware, issues.

******

As I mentioned, I had been holding on my purchase due to the terrible product lineup that Apple held from 2016 to 2019. Then Apple atoned, and things changed with the 2019 16". Since I prefer smaller footprints, I decided that I would buy the 13" they updated next.

So here I am, with my 2020 Macbook Pro, i5, 16 GB RAM, 1 TB SSD. But I can't bring myself to love it like I loved my 2013 Air.

Let me explain why. Maybe I can bring in a fresh perspective.

Most reviewers evaluate the 2020 lineup with the 2016-2019 versions in mind. But I'm just some random person, not a reviewer. I have not had the chance to even touch any Mac since 2015. I am not conditioned towards a positive judgement just because the previous generation was so much worse.

Of course the new ones are better. But the true test is to compare them to the best laptops ever made: 2013-2015 Airs and Pros.

Point #2: this computer is not a net win from a 2013 Air.

Let me explain the reasons why.

The webcam

You will see the webcam reviewed as an afterthought in most pieces. I will cover it first. I feel like Apple is mocking us by including the worst possible webcam on the most expensive laptop.

Traditionally, this has been a non-issue for most people. However, due to covid-19 and working from home, this topic has become more prominent.

In my case, even before the pandemic I used to do 2-3 video conferences every day. Nowadays I spend the day in front of my webcam.

What infuriates me is that the camera quality in the 2013 Air is noticeably better. Why couldn't they use at least the same part, if not a modern one?

See for yourself. It really feels like a ripoff. Apple laughing at us.

A terrible quality picture from the macbook pro webcam
The 2020 macbook pro webcam looks horrible, and believe me, it is not only due to Yours Truly's face.

A reasonable quality picture from the 2013 Air
A reasonable quality picture from the 2013 Air

For reference, this is the front facing camera of the 2016 iPhone SE
For reference, this is the front facing camera of the 2016 iPhone SE, same angle and lighting conditions.

For reference, a picture taken with my 2006 Nokia 5200
As a second reference, a picture taken with the 640x480 VGA camera of my 2006 Nokia 5200. Which of the above looks the most like this?

I would have paid extra money to have a better webcam on my macbook.

The trackpad

The mechanism and tracking is excellent, but the trackpad itself is too large and the palm rejection algoritm is not good enough.

Point #3: The large trackpad single-handedly ruins using the experience of working on this laptop for me.

I am constantly moving the cursor accidentally. This situation is very annoying, especially for a touch typist as my fingers are always on hjkl and my thumb on the spacebar. This makes my thumb knuckle constantly brush the trackpad and activate it.

I really, really need to fix this, because I have found myself unconsciously raising my palms and placing them at a different angle. This may lead to RSI, which I have suffered from in the past.

This is a problem that Apple created on their own. Having an imperfect palm rejection algorithm is not an issue unless you irrationally enlarge the trackpad so much that it extends to the area where the palm of touch typists typically rests.

Video: Nobody uses the trackpad like this
Is it worth it to antagonize touch typists in order to be able to move the cursor from this tiny corner?

I would accept this tradeoff if the trackpad was Pencil-compatible and we could use it as some sort of handwriting tablet. That would actually be great!

Another very annoying side effect of it being so large is that, when your laptop is in your lap, sometimes your clothes accidentally brush the trackpad. The software then registers spurious movements or prevents some gestures from happening because it thinks there is a finger there.

In summary, it's too big for no reason, which turns it into an annoyance for no benefit. This trackpad offers a bad user experience, not only that, it also ruins the keyboard—read below.

I would have paid extra money to have a smaller trackpad on my macbook.

The keyboard

The 2015 keyboard was very good, and this one is better. The keyswitch mechanism is fantastic, the layout is perfect, and this is probably the best keyboard on a laptop.

Personally, I did not mind the Escape key shenanigans because I remapped it to dual Ctrl/Escape years ago, which I recommend you do too.

Touch ID is nice, even though I'm proficient at typing my password, so it was not such a big deal for me. Face ID would have been much more convenient, I envy Windows Hello users.

Unfortunately, the large trackpad torpedoes the typing experience. Writing on this Macbook Pro is worse than on my 2013 Air.

I will keep searching for a tool which disables trackpad input within X miliseconds of a key press or disables some areas of the trackpad. I have not had any luck with neither Karabiner nor BetterTouchTool.

The Touchbar

After having read mostly negative feedback about it, I was determined to drill myself to like it, you know, just to be a bit contrarian.

"I will use tools to customize it so much that it will be awesome as a per-application custom function layer!"

Unfortunately, the critics are right. It's an anti-feature. I gave it an honest try, I swear. It is just bad, though it could have been better with a bit more effort.

I understand why it's there. Regular users probably find it useful and cute. It's ironically, a feature present in pro laptops meant for non-pro users: slow typists and people who don't know the regular keyboard shortcuts.

That being said, I would not mind it, probably would even like it, if it weren't for three major drawbacks:

First and foremost, it is distracting to the point that the first thing I did was to search how to completely turn it off.

This is because, by default, it offers typing suggestions. Yes, while you are typing and trying to concentrate, there is something in your field of vision constantly flashing words that you didn't mean to type and derailing your train of thought.

Easy to fix, but it makes me wonder what were Apple product managers thinking.

Secondly, it is placed in such a way that resting your fingers on top of the keyboard trigger accidental key presses.

I can and will retrain my hand placement habits. After all, this touchbar-keyboard-trackpad combo is forcing many people to learn to place their hands in unnatural positions to accommodate these poorly designed peripherals.

However, Apple could have mitigated this by implementing a pressure sensor to make it more difficult to generate involuntary key presses. It would be enough to distinguish a brush from a tap.

Finally, and this is also ironic because it's in contradiction with the previous point, due to lack of feedback, sometimes you're not sure whether you successfully pressed a touchbar key. And, in my experience, there is an unjustifiable large number of times where you have to press them twice, or press very deliberately to activate that key you want.

There are some redeeming features, though.

As stated above, I am determined to make it bearable, and even slightly useful for me, by heavily modifying it. I suggest you go to System Preferences > Keyboard and use the "Expanded Control Strip".

Then, customize the touchbar buttons, remove keys you don't use, and add others. Consider paying for BetterTouchTool for even more customization options.

Then, on the same window, go to the Shortcuts tab, and select Function keys on the left. This allows you to use function keys by default in some apps, which is useful for Terminal and other pro apps like Pycharm.

(Get the third irony? To make the touchbar, a pro feature, useful for pro apps, the best setup is to make it behave like normal function keys)

Finally, if you're registering accidental key presses, just leave an empty space in the touchbar to let your fingers rest safely until you re-train your hands to rest somewhere else. This is ridiculous, but hey, better than getting your brightness suddenly dimming to zero accidentaly.

Leave an empty space in the touchbar
Leave an empty space in the touchbar on the area where you are used to rest your fingers.

I would have paid extra money to not have a touchbar on my macbook.

The ports

Another much-debated feature where I resigned myself to just accept this new era of USB-C.

I did some research online and bought the "best" USB-C hub, along with new dongles. I don't mind dongles, because I was already using some with my Air. It's not like I swim in money, but there is no need to blow this out of proportion.

Well, I won't point any fingers to any review site, but that "best" hub is going back to Amazon as I write these lines. Some of my peripherals disconnect randomly, plus I get an "electric arc" noise when I disconnect the hub cable. I don't know how that is even possible.

The USB-C situation is terrible. Newly bought peripherals still come with USB-A cables. Regarding hubs, it took me a few years to find a reliable USB3 hub for my 2013 Air. I will keep trying, wish me luck.

About Magsafe, even though I really liked it, I don't miss it as much as I expected. I do miss the charging light, though. No reason not to have it integrated in the official cable, like the XPS does.

Some people say that charging via USB-C is actually better due to standardization of all devices, but I don't know what periperals these people use. My iPhone and Airpods charge via Lightning, my Apple Watch charges via a puck, and other minor peripherals like cameras and external batteries all charge via micro-USB. Now I have to carry the same amount of cables as before, I just swapped the Magsafe cable and charger for the USB-C cable and charger.

Another poorly thought decision is the headphone jack. It is on the wrong side. Most of the population is right-handed, so there usually is a notebook, mouse, or other stuff to the right of the laptop. The headphones cable then gets in the way. The port should have been on the left, and close to the user, not far away from them, to gain a few extra centimeters to the cable.

By the way, not including the extension cord is unacceptable. This cord is not only a convenience, but it increases safety, because it's the only way to have earth grounding for the laptop. Without it, rubbing your fingers on the surface of the computer generates this weird vibration due to current. I have always recommended Mac users that they use their chargers with the extension cable even if they don't need the extra length.

I would have paid extra money to purchase an Apple-guaranteed proper USB-C hub. Alternatively, I would have paid extra money for this machine to have a couple of USB-A ports so I can keep using my trusty old hub.

I would not have paid extra money to have the extension cord, because it should have come included with this 2,200€ laptop. I am at a loss for words. Enough of paying extra money for things that Apple broke on purpose.

Battery life

8-9 hours with all apps closed except Safari. Browsing lightly, with an occasional video, and brightness at the literal minimum. This brightness level is only realistic if it's night time. In a normally lit environment you need to set the brightness level at around 50%.

It's not that great. My Air, when it was new, easily got 12 hours of light browsing. Of course, it was not running Catalina, but come on.

When I push the laptop a bit more, with a few Docker containers, Pycharm running, Google Chrome with some Docs opened, and brightness near the maximum, I get around 4 hours. In comparison, that figure is reasonable.

Overall, it's not bad, but I expected more.

While we wait for a Low Power Mode on the mac, do yourself a favor and install Turbo Boost Switcher Pro.

The screen

Coming from having never used a Retina screen on a computer, this Macbook Pro impressed me.

Since I don't edit photos or videos professionally, I can only appreciate it for its very crisp text. The rest of features are lost on me, but this does not devalue my opinion of the screen.

The 500-nit brightness is not noticeable on a real test with my 2013 Air. For some reason, both screens seem equally bright when used in direct daylight.

This new Retina technology comes with a few drawbacks, though.

First, it's impossible to get a terminal screen without anti-aliasing. My favorite font, IBM VGA8, is unreadable when anti aliased, which is a real shame, because I've been using it since the 90s, and I prefer non-anti-aliased fonts on terminals.

Additionally, many pictures on websites appear blurry because they are not "retina-optimized". The same happens with some old applications which display crappy icons or improperly proportioned layouts. This is not Apple's fault, but it affects the user experience.

Finally, the bezels are not tiny like those in the XPS 13, but they are acceptable. I don't mind them.

To summarize, I really like this screen, but like everything else in this machine, it is not a net gain. You win some, you lose some.

Performance

This is the reason why I had to switch from my old laptop, and the 2020 MBP delivers.

It allows me to perform tasks that were very painful in my old computer. Everything is approximately three times faster than it was before, which really is a wow experience, like upgrading your computer in the 90s.

Not much to add. This is a modern computer and, as such, it is fast.

Build quality

Legendary, as usual.

To nitpick on a minor issue, I'd like Apple to make the palm rest area edges a bit less sharp. After typing for some time I get pressure marks on my wrists. They are not painful, but definitely discomforting.

Likewise, when typing on my lap, especially when wearing sports shorts in summer like I'm doing right now, the chassis leaves marks on my legs near the hinge area. Could have been reduced by blunting the edges too.

One Thousand Papercuts

In terms of software, Apple also needs to get its stuff together.

Catalina is meh. Not terrible, but with just too many annoyances.

  • Mail keeps opening by itself while I'm doing video conferences and sharing my screen. I have to remind myself to close Mail before any video conference, because if I don't, other people will read my inbox. It's ridiculous that this bug has not been fixed yet. Do you remember when Apple mocked Microsoft because random alert windows would steal your focus while you were typing? This is 100x worse.
  • My profile picture appears squished on the login screen, and there is no way to fix it. The proportions are correctly displayed on the iCloud settings window.
  • Sometimes, after resuming from sleep, the laptop doesn't detect its own keyboard. I can assure you, the keyboard was there indeed, and note how the dock is still the default one. This happened to me minutes after setting up the computer for the first time, before I had any chance to install software or change any settings.
  • I get constant alerts to re-enter my password for some internet account, but my password is correct. Apple's services need to differentiate a timeout from a rejected password, or maybe retry a couple times before prompting.
  • Critical software I used doesn't run anymore and I have to look for alternatives. This includes Safari 13 breaking extensions that were important for me. Again, I was prepared for this, but it's worth mentioning.

Praise worthy

Here are a few things that Apple did really well and don't fit into any other category.

  • Photos.app has "solved" the photos problem. It is that great. As a person who has 50k photos in their library, going back to pictures of their great grandparents: Thank you, Apple!
  • Continuity features have been adding up, and the experience is now outstanding. The same goes for iCloud. If you have an iPhone and a Mac, things are magical.
  • Fan and thermal configuration is very well crafted on this laptop. It runs totally silent, and when the fans kick off, the system cools down very quickly and goes back to silent again.
  • The speakers are crisp and they have very nice bass. They don't sound like a tin can like most laptops, including the 2013 Air, do.

Conclusion

This computer is bittersweet.

I'm happy that I can finally perform tasks which were severely limited on my previous laptop. But this has nothing to do with the design of the product, it is just due to the fact that the internals are more modern.

Maybe loving your work tools is a privilege that only computer nerds have. Do taxi drivers love their cars? Do baristas love their coffee machines? Do gardeners love their leaf blowers? Do surgeons love their scalpels?

Yes, I have always loved my computer. Why wouldn't I? We developers spend at least eight hours a day touching and looking at our silicon partners. We earn our daily bread thanks to them. This is why we chose our computers carefully with these considerations in mind, why we are so scrupulous when evaluating them.

This is why it's so disappointing that this essential tool comes with so many tradeoffs.

Even though this review was exhaustive, don't get me wrong, most annoyances are minor except for the one deal-breaker: the typing experience. I have written this review with the laptop keyboard and it's been a continuous annoyance. Look, another irony. Apple suffered so much to fix their keyboard, yet it's still ruined by a comically large trackpad. The forest for the trees.

Point #4: For the first time since using Macs, I do not love this machine.

Going back to what "Pro" means

Apple engineers, do you know who is the target audience for these machines?

This laptop has been designed for casual users, not pro users. Regular users enjoy large trackpads and Touch Bars because they spend their day scrolling through Twitter and typing short sentences.

Do you know who doesn't, because it gets in the way of them typing their essays, source code, or inputting their Photoshop keyboard shortcuts? Pro users.

In 2016 I wrote:

However, in the last three to five years, everybody seemed to buy a Mac, even friends of mine who swore they would never do it. They finally caved in, not because of my advice, but because their non-nerd friends recommend MBPs. And that makes sense. In a 2011 market saturated by ultraportables, Windows 8, and laptops which break every couple years, Macs were a great investment. You can even resell them after five years for 50% of their price, essentially renting them for half price.

So what happened? Right now, not only Pros are using the Macbook Pro. They're not a professional tool anymore, they're a consumer product. Apple collects usage analytics for their machines and, I suppose, makes informed decisions, like removing less used ports or not increasing storage on iPhones for a long time.

What if Apple is being fed overwhelmingly non-Pro user data for their Pro machines and, as a consequence, their decisions don't serve Pro users anymore, but rather the general public?

The final irony: Apple uses "Pro" in their product marketing as a synonymous for "the more expensive tier", and they are believing their own lies. Their success with consumer products is fogging their understanding of what a real Pro needs.

We don't need a touchbar that we have to disable for Pro apps.

We don't need a large trackpad that gets in the way of typing.

We need more diverse ports to connect peripherals that don't work well with adapters.

We need a better webcam to increase productivity and enhance communication with our team.

We need that you include the effin extension cable so that there is no current on the chassis.

We need you to not splash our inbox contents in front of guests while sharing our screens.

We need a method to extend the battery as long as possible while we are on the road—hoping that comes back some day.

Point #5: Apple needs to continue course-correcting their design priorities for power users

Being optimistic for the future

I have made peace with the fact that, unlike my previous computer, this one will not last me for 7 years. This was a very important factor in my purchase decision. I know this mac is just bridging a gap between the best lineup in Apple's history (2015) and what will come in the future. It was bought out of necessity, not out of desire.

14" laptop? ARM CPUs? We will be awaiting new hardware eagerly, hoping that Apple keeps rolling back some anti-features like they did with the butterfly keyboard. Maybe the Touchbar and massive trackpad will be next. And surely the laggy and unresponsive OS will have been fixed by then.

What about the alternatives?

Before we conclude I want to anticipate a question that will be in some people's mind. Why didn't you buy another laptop?

Well, prior to my purchase I spent two months trying to use a Linux setup full-time. It was close, but not 100% successful. Critical software for my job had no real alternatives, or those were too inconvenient.

Regarding Windows, I had eyes on the XPS 13 and the X1 Carbon which are extremely similar to this macbook in most regards. I spent some time checking if Windows 10 had improved since the last time I used it and it turns out it hasn't. I just hate Windows so much it is irrational. Surely some people prefer it and feel the same way about the Mac. To each their own.

Point #6: Despite its flaws, macOS is the OS that best balances convenience with productive work. When combined with an iPhone it makes for an unbeatable user experience.

I decided that purchasing this new Mac was the least undesirable option, and I still stand by that decision. I will actively try to fix the broken trackpad, which will increase my customer satisfaction from a 6 —tolerate— to an 8 or 9 —like, even enjoy—.

But that will still be far away from the perfect, loving 10/10 experience I had with the 2013 Air.

Tags: apple, hardware

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

May 31, 2020

Derek Jones (derek-jones)

Estimating in round numbers May 31, 2020 10:20 PM

People tend to use round numbers. When asked the time, the response is often rounded to the nearest 5-minute or 15-minute value, even when using a digital watch; the speaker is using what they consider to be a relevant level of accuracy.

When estimating how long it will take to perform a task, developers tend to use round numbers (based on three datasets). Giving what appears to be an overly precise value could be taken as communicating extra information, e.g., an estimate of 1-hr 3-minutes communicates a high degree of certainty (or incompetence, or making a joke). If the consumer of the estimate is working in round numbers, it makes sense to give a round number estimate.

Three large software related effort estimation datasets are now available: the SiP data contains estimates made by many people, the Renzo Pomodoro data is one person’s estimates, and now we have the Brightsquid data (via the paper “Utilizing product usage data for requirements evaluation” by Hemmati, Didar Al Alam and Carlson; I cannot find an online pdf at the moment).

The plot below shows the total number of tasks (out of the 1,945 tasks in the Brightsquid data) for which a given estimate value was recorded; peak values shown in red (code+data):

Number of tasks having a given estimate.

Why are there estimates for tasks taking less than 30 minutes? What are those 1 minute tasks (are they typos, where the second digit was omitted and the person involved simply create a new estimate without deleting the original)? How many of those estimate values appearing once are really typos, e.g., 39 instead of 30? Does the task logging system used require an estimate before anything can be done? Unfortunately I don’t have access to the people involved. It does look like this data needs some cleaning.

There are relatively few 7-hour estimates, but lots for 8-hours. I’m assuming the company works an 8-hour day (the peak at 4-hours, rather than three, adds weight to this assumption).

Ponylang (SeanTAllen)

Last Week in Pony - May 31, 2020 May 31, 2020 02:40 PM

A bunch of updates to ponylang-mode. The Pony Zulip now has a ‘jobs’ stream for posting Pony-related job opportunities. The ‘Add maybe to itertools’ RFC will be voted on in the next sync meeting.

Dan Luu (dl)

A simple way to get more value from tracing May 31, 2020 07:06 AM

A lot of people seem to think that distributed tracing isn't useful, or at least not without extreme effort that isn't worth it for companies smaller than FB. For example, here are a couple of public conversations that sound like a number of private conversations I've had. Sure, there's value somewhere, but it costs too much to unlock.

I think this overestimates how much work it is to get a lot of value from tracing. At Twitter, Rebecca Isaacs was able to lay out a vision for how to get value from tracing and executed on it (with help from a number other folks, including Jonathan Simms, Yuri Vishnevsky, Ruben Oanta, Dave Rusek, Hamdi Allam, and many others1) such that the work easily paid for itself. This post is going to describe the tracing "infrastructure" we've built and describe some use cases where we've found it to be valuable. Before we get to that, let's start with some background about the situation before Rebecca's vision came to fruition.

At a high level, we could say that we had a trace-view oriented system and ran into all of the issues that one might expect from that. Those issues are discussed in more detail in this article by Cindy Sridharan. However, I'd like to discuss the particular issues we had in more detail since I think it's useful to look at what specific things were causing problems.

Taken together, the issues were problematic enough that tracing was underowned and arguably unowned for years. Some individuals did work in their spare time to keep the lights on or improve things, but the lack of obvious value from tracing led to a vicious cycle where the high barrier to getting value out of tracing made it hard to fund organizationally, which made it hard to make tracing more usable.

Some of the issues that made tracing low ROI included:

  • Schema made it impossible to run simple queries "in place"
  • No real way to aggregate info
    • No way to find interesting or representative traces
  • Impossible to know actual sampling rate, sampling highly non-representative
  • Time

Schema

The schema was effectively a set of traces, where each trace was a set of spans and each span was a set of annotations. Each span that wasn't a root span had a pointer to its parent, so that the graph structure of a trace could be determined.

For the purposes of this post, we can think of each trace as either an external request including all sub-RPCs or a subset of a request, rooted downstream instead of at the top of the request. We also trace some things that aren't requests, like builds and git operations, but for simplicity we're going to ignore those for this post even though the techniques we'll discuss also apply to those.

Each span corresponds to an RPC and each annotation is data that a developer chose to record on a span (e.g., the size of the RPC payload, queue depth of various queues in the system at the time of the span, or GC pause time for GC pauses that interrupted the RPC).

Some issues that came out of having a schema that was a set of sets (of bags) included:

  • Executing any query that used information about the graph structure inherent in a trace required reading every span in the trace and reconstructing the graph
  • Because there was no index or summary information of per-trace information, any query on a trace required reading every span in a trace
  • Practically speaking, because the two items above are too expensive to do at query time in an ad hoc fashion, the only query people ran was some variant of "give me a few spans matching a simple filter"

Aggregation

Until about a year and a half ago, the only supported way to look at traces was to go to the UI, filter by a service name from a combination search box + dropdown, and then look at a list of recent traces, where you could click on any trace to get a "trace view". Each search returned the N most recent results, which wouldn't necessarily be representative of all recent results (for reasons mentioned below in the Sampling section), let alone representative of all results over any other time span.

Per the problems discussed above in the schema section, since it was too expensive to run queries across a non-trivial number of traces, it was impossible to ask questions like "are any of the traces I'm looking at representative of common traces or am I looking at weird edge cases?" or "show me traces of specific tail events, e.g., when a request from service A to service B times out or when write amplification from service A to some backing database is > 3x", or even "only show me complete traces, i.e., traces where we haven't dropped spans from the trace".

Also, if you clicked on a trace that was "too large", the query would time out and you wouldn't be able to view the trace -- this was another common side effect of the lack of any kind of rate limiting logic plus the schema.

Sampling

There were multiple places where a decision was made to sample or not. There was no document that listed all of these places, making it impossible to even guess at the sampling rate without auditing all code to figure out where sampling decisions were being made.

Moreover, there were multiple places where an unintentional sampling decision would be made due to the implementation. Spans were sent from services that had tracing enabled to a local agent, then to a "collector" service, and then from the collector service to our backing DB. Spans could be dropped at of these points: in the local agent; in the collector, which would have nodes fall over and lose all of their data regularly; and at the backing DB, which would reject writes due to hot keys or high load in general.

This design where the trace id is the database key, with no intervening logic to pace out writes, meant that a 1M span trace (which we have) would cause 1M writes to the same key over a period of a few seconds. Another problem would be requests with a fanout of thousands (which exists at every tech company I've worked for), which could cause thousands writes with the same key over a period of a few milliseconds.

Another sampling quirk was that, in order to avoid missing traces that didn't start at our internal front end, there was logic that caused an independent sampling decision in every RPC. If you do the math on this, if you have a service-oriented architecture like ours and you sample at what naively might sound like a moderately low rate, like, you'll end up with the vast majority of your spans starting at a leaf RPC, resulting in a single span trace. Of the non-leaf RPCs, the vast majority will start at the 2nd level from the leaf, and so on. The vast majority of our load and our storage costs were from these virtually useless traces that started at or near a leaf, and if you wanted to do any kind of analysis across spans to understand the behavior of the entire system, you'd have to account for this sampling bias on top of accounting for all of the other independent sampling decisions.

Time

There wasn't really any kind of adjustment for clock skew (there was something, but it attempted to do a local pairwise adjustment, which didn't really improve things and actually made it more difficult to reasonably account for clock skew).

If you just naively computed how long a span took, even using timestamps from a single host, which removes many sources of possible clock skew, you'd get a lot of negative duration spans, which is of course impossible because a result can't get returned before the request for the result is created. And if you compared times across different hosts, the results were even worse.

Solutions

The solutions to these problems fall into what I think of as two buckets. For problems like dropped spans due to collector nodes falling over or the backing DB dropping requests, there's some straightforward engineering solution using well understood and widely used techniques. For that particular pair of problems, the short term bandaid was to do some GC tuning that reduced the rate of collector nodes falling over by about a factor of 100. That took all of two minutes, and then we replaced the collector nodes with a real queue that could absorb larger bursts in traffic and pace out writes to the DB. For the issue where we oversampled leaf-level spans due to rolling the sampling dice on every RPC, that's one of these little questions that most people would get right in an interview that can sometimes get lost as part of a larger system that has a number of solutions, e.g., since each span has a parent pointer, we must be able to know if an RPC has a parent or not in a relevant place and we can make a sampling decision and create a traceid iff a span has no parent pointer, which results in a uniform probability of each span being sampled, with each sampled trace being a complete trace.

The other bucket is building up datasets and tools (and adding annotations) that allow users to answer questions they might have. This isn't a new idea, section 5 of the Dapper paper discussed this and it was published in 2010.

Of course, one major difference is that Google has probably put at least two orders of magnitude more effort into building tools on top of Dapper than we've put into building tools on top of our tracing infra, so a lot of our tooling is much rougher, e.g., figure 6 from the Dapper paper shows a trace view that displays a set of relevant histograms, which makes it easy to understand the context of a trace. We haven't done the UI work for that yet, so the analogous view requires running a simple SQL query. While that's not hard, presenting the user with the data would be a better user experience than making the user query for the data.

Of the work that's been done, the simplest obviously high ROI thing we've done is build a set of tables that contain information people might want to query, structured such that common queries that don't inherently have to do a lot of work don't have to do a lot of work.

We have, partitioned by day, the following tables:

  • trace_index
    • high-level trace-level information, e.g., does the trace have a root; what is the root; if relevant, what request endpoint was hit, etc.
  • span_index
    • information on the client and server
  • anno_index
    • "standard" annotations that people often want to query, e.g., request and response payload sizes, client/server send/recv timestamps, etc.
  • span_metrics
    • computed metrics, e.g., span durations
  • flat_annotation
    • All annotations, in case you want to query something not in anno_index
  • trace_graph
    • For each trace, contains a graph representation of the trace, for use with queries that need the graph structure

Just having this set of tables, queryable with SQL queries (or a Scalding or Spark job in cases where Presto SQL isn't ideal, like when doing some graph queries) is enough for tracing to pay for itself, to go from being difficult to justify to being something that's obviously high value.

Some of the questions we've been to answer with this set of tables includes:

  • For this service that's having problems, give me a representative set of traces
  • For this service that has elevated load, show me which upstream service is causing the load
  • Give me the list of all services that have unusual write amplification to downstream service X
    • Is traffic from a particular service or for a particular endpoint causing unusual write amplification? For example, in some cases, we see nothing unusual about the total write amplification from B -> C, but we see very high amplification from B -> C when B is called by A.
  • Show me how much time we spend on serdes vs. "actual work" for various requests
  • Show me how much different kinds of requests cost in terms of backend work
  • For requests that have high latency, as determined by mobile client instrumentation, show me what happened on the backend
  • Show me the set of latency critical paths for this request endpoint (with the annotations we currently have, this has a number issues that probably deserve their own post)
  • Show me the CDF of services that this service depends on
    • This is a distribution because whether or not a particular service calls another service is data dependent; it's not uncommon to have a service that will only call another one every 1000 calls (on average)

We have built and are building other tooling, but just being able to run queries and aggregations against trace data, both recent and historical, easily pays for all of the other work we'd like to do. This analogous to what we saw when we looked at metrics data, taking data we already had and exposing it in a way that lets people run arbitrary queries immediately paid dividends. Doing that for tracing is less straightforward than doing that for metrics because the data is richer, but it's a not fundamentally different idea.

I think that having something to look at other than the raw data is also more important for tracing than it is for metrics since the metrics equivalent of a raw "trace view" of traces, a "dashboard view" of metrics where you just look at graphs, is obviously and intuitively useful. If that's all you have for metrics, people aren't going to say that it's not worth funding your metrics infra because dashboards are really useful! However, it's a lot harder to see how to get value out of a raw view of traces, which is where a lot of the comments about tracing not being valuable come from. This difference between the complexity of metrics data and tracing data makes the value add for higher-level views of tracing larger than it is for metrics.

Having our data in a format that's not just blobs in a NoSQL DB has also allowed us to more easily build tooling on top of trace data that lets users who don't want to run SQL queries get value out of our trace data. An example of this is the Service Dependency Explorer (SDE), which was primarily built by Yuri Vishnevsky, Rebecca Isaacs, and Jonathan Simms, with help from Yihong Chen. If we try to look at the RPC call graph for a single request, we get something that's pretty large. In some cases, the depth of the call tree can be hundreds of levels deep and it's also not uncommon to see a fanout of 20 or more at some levels, which makes a naive visualization difficult to interpret.

In order to see how SDE works, let's look at a smaller example where it's relatively easy to understand what's going on. Imagine we have 8 services, A through H and they call each other as shown in the tree below, we we have service A called 10 times, which calls service B a total of 10 times, which calls D, D, and E 50, 20, and 10 times respectively, where the two Ds are distinguished by being different RPC endpoints (calls) even though they're the same service, and so on, shown below:

Diagram of RPC call graph; this will implicitly described in the relevant sections, although the entire SDE section in showing off a visual tool and will probably be unsatisfying if you're just reading the alt text; the tables described in the previous section are more likely to be what you want if you want a non-visual interpretation of the data, the SDE is a kind of visualization

If we look at SDE from the standpoint of node E, we'll see the following: SDE centered on service E, showing callers and callees, direct and indirect

We can see the direct callers and callees, 100% of calls of E are from C, and 100% of calls of E also call C and that we have 20x load amplification when calling C (200/10 = 20), the same as we see if we look at the RPC tree above. If we look at indirect callees, we can see that D has a 4x load amplification (40 / 10 = 4).

If we want to see what's directly called by C downstream of E, we can select it and we'll get arrows to the direct descendents of C, which in this case is every indirect callee of E.

SDE centered on service E, with callee C highlighted

For a more complicated example, we can look at service D, which shows up in orange in our original tree, above.

In this case, our summary box reads:

  • On May 28, 2020 there were...
    • 10 total TFE-rooted traces
    • 110 total traced RPCs to D
    • 2.1 thousand total traced RPCs caused by D
    • 3 unique call paths from TFE endpoints to D endpoints

The fact that we see D three times in the tree is indicated in the summary box, where it says we have 3 unique call paths from our front end, TFE to D.

We can expand out the calls to D and, in this case, see both of the calls and what fraction of traffic is to each call.

SDE centered on service D, with different calls to D expanded by having clicked on D

If we click on one of the calls, we can see which nodes are upstream and downstream dependencies of a particular call, call4 is shown below and we can see that it never hits services C, H, and G downstream even though service D does for call3. Similarly, we can see that its upstream dependencies consist of being called directly by C, and indirectly by B and E but not A and C:

SDE centered on service D, with call4 of D highlighted by clicking on call 4; shows only upstream and downstream load that are relevant to call4

Some things we can easily see from SDE are:

  • What load a service or RPC call causes
    • Where we have unusual load amplification, whether that's generally true for a service or if it only occurs on some call paths
  • What causes load to a service or RPC call
  • Where and why we get cycles (very common for Strato, among other things
  • What's causing weird super deep traces

These are all things a user could get out of queries to the data we store, but having a tool with a UI that lets you click around in real time to explore things lowers the barrier to finding these things out.

In the example shown above, there are a small number of services, so you could get similar information out of the more commonly used sea of nodes view, where each node is a service, with some annotations on the visualization, but when we've looked at real traces, showing thousands of services and a global makes it very difficult to see what's going on. Some of Rebecca's early analyses used a view like that, but we've found that you need to have a lot of implicit knowledge to make good use of a view like that, a view that discards a lot more information and highlights a few things makes it easier to users who don't happen to have the right implicit knowledge to get value out of looking at traces.

Although we've demo'd a view of RPC count / load here, we could also display other things, like latency, errors, payload sizes, etc.

Conclusion

More generally, this is just a brief description of a few of the things we've built on top of the data you get if you have basic distributed tracing set up. You probably don't want to do exactly what we've done since you probably have somewhat different problems and you're very unlikely to encounter the exact set of problems that our tracing infra had. From backchannel chatter with folks at other companies, I don't think the level of problems we had was unique; if anything, our tracing infra was in a better state than at many or most peer companies (which excludes behemoths like FB/Google/Amazon) since it basically worked and people could and did use the trace view we had to debug real production issues. But, as they say, unhappy systems are unhappy in their own way.

Like our previous look at metrics analytics, this work was done incrementally. Since trace data is much richer than metrics data, a lot more time was spent doing ad hoc analyses of the data before writing the Scalding (MapReduce) jobs that produce the tables mentioned in this post, but the individual analyses were valuable enough that there wasn't really a time when this set of projects didn't pay for itself after the first few weeks it took to clean up some of the worst data quality issues and run an (extremely painful) ad hoc analysis with the existing infra.

Looking back at discussions on whether or not it makes sense to work on tracing infra, people often point to the numerous failures at various companies to justify a buy (instead of build) decision. I don't think that's exactly unreasonable, the base rate of failure of similar projects shouldn't be ignored. But, on the other hand, most of the work described wasn't super tricky, beyond getting organizational buy-in and having a clear picture of the value that tracig can bring.

One thing that's a bit beyond the scope of this post that probably deserves its own post is that, tracing and metrics, while not fully orthogonal, are complementary and having only one or the other leaves you blind to a lot of problems. You're going to pay a high cost for that in a variety of ways: unecessary incidents, extra time spent debugging incidents, generally higher monetary costs due to running infra inefficiently, etc. Also, while metrics and tracing individually gives you much better visibility than having either alone, some problemls require looking at both together; some of the most interesting analyses I've done involve joining (often with a literal SQL join) trace data and metrics data.

To make it concrete, an example of something that's easy to see with tracing but annoying to see with logging unless you add logging to try to find this in particular (which you can do for any individual case, but probably don't want to do for the thousands of things tracing makes visible), is something we looked at above: "show me cases where a specific call path from the load balancer to A causes high load amplification on some service B, which may be multiple hops away from A in the call graph. In some cases, this will be apparent because A generally causes high load amplificaiton on B, but if it only happens in some cases, that's still easy to handle with tracing but it's very annoying if you're just looking at metrics.

An example of something where you want to join tracing and metrics data is when looking at the performance impact of something like a bad host on latency. You will, in general, not be able to annotate the appropriate spans that pass through the host as bad because, if you knew the host was bad at the time of the span, the host wouldn't be in production. But you can sometimes find, with historical data, a set of hosts that are bad, and then look up latency critical paths that pass through the host to determine the end-to-end imapct of the bad host.

Everyone has their own biases, with respect to tracing, mine come from generally working on things that try to direct improve cost, reliability, and latency, so the examples are focused on that, but there are also a lot of other uses for tracing. You can check out Distributed Tracing in Practice or Mastering Distributed Tracing for some other perspectives.

Acknowledgements

Thanks to Rebecca Isaacs, Leah Hanson, Yao Yue, and Yuri Vishnevsky for comments/corrections/discussion.


  1. this will almost certainly be an incomplete list, but some other people who've pitched in include Moses, Tiina, Rich, Rahul, Ben, Mike, Mary, Arash, Feng, Jenny, Andy, Yao, Yihong, Vinu, and myself.

    Note that this relatively long list of contributors doesn't contradict this work being high ROI. I'd estimate that there's been less than 2 person-years worth of work on everything discussed in this post. Just for example, while I spend a fair amount of time doing analyses that use the tracing infra, I think I've only spent on the order of one week on the infra itself.

    In case it's not obvious from the above, even though I'm writing this up, I was a pretty minor contributor to this. I'm just writing it up because I sat next to Rebecca as this work was being done and was super impressed by both her process and the outcome.

    [return]

May 30, 2020

Dan Luu (dl)

A simple way to get more value from metrics May 30, 2020 07:06 AM

We spent one day1 building a system that immediately found a mid 7 figure optimization (which ended up shipping). In the first year, we shipped mid 8 figures per year worth of cost savings as a result. The key feature this system introduces is the ability to query metrics data across all hosts and all services and over any period of time (since inception), so we've called it LongTermMetrics (LTM) internally since I like boring, descriptive, names.

This got started when I was looking for a starter project that would both help me understand the Twitter infra stack and also have some easily quantifiable value. Andy Wilcox suggested looking at JVM survivor space utilization for some large services. If you're not familiar with what survivor space is, you can think of it as a configurable, fixed-size buffer, in the JVM (at least if you use the GC algorithm that's default at Twitter). At the time, if you looked at a random large services, you'd usually find that either:

  1. The buffer was too small, resulting in poor performance, sometimes catastrophically poor when under high load.
  2. The buffer was too large, resulting in wasted memory, i.e., wasted money.

But instead of looking at random services, there's no fundamental reason that we shouldn't be able to query all services and get a list of which services have room for improvement in their configuration, sorted by performance degradation or cost savings. And if we write that query for JVM survivor space, this also goes for other configuration parameters (e.g., other JVM parameters, CPU quota, memory quota, etc.). Writing a query that worked for all the services turned out to be a little more difficult than I was hoping due to a combination of data consistency and performance issues. Data consistency issues included things like:

  • Any given metric can have ~100 names, e.g., I found 94 different names for JVM survivor space
    • I suspect there are more, these were just the ones I could find via a simple search
  • The same metric name might have a different meaning for different services
    • Could be a counter or a gauge
    • Could have different units, e.g., bytes vs. MB or microseconds vs. milliseconds
  • Metrics are sometimes tagged with an incorrect service name
  • Zombie shards can continue to operate and report metrics even though the cluster manager has started up a new instance of the shard, resulting in duplicate and inconsistent metrics for a particular shard name

Our metrics database, MetricsDB, was specialized to handle monitoring, dashboards, alerts, etc. and didn't support general queries. That's totally reasonable, since monitoring and dashboards are lower on Maslow's hierarchy of observability needs than general metrics analytics. In backchannel discussions from folks at other companies, the entire set of systems around MetricsDB seems to have solved a lot of the problems that plauge people at other companies with similar scale, but the specialization meant that we couldn't run arbitrary SQL queries against metrics in MetricsDB.

Another way to query the data is to use the copy that gets written to HDFS in Parquet format, which allows people to run arbitrary SQL queries (as well as write Scalding (MapReduce) jobs that consume the data).

Unfortunately, due to the number of metric names, the data on HDFS can't be stored in a columnar format with one column per name -- Presto gets unhappy if you feed it too many columns and we have enough different metrics that we're well beyond that limit. If you don't use a columnar format (and don't apply any other tricks), you end up reading a lot of data for any non-trivial query. The result was that you couldn't run any non-trivial query (or even many trivial queries) across all services or all hosts without having it time out. We don't have similar timeouts for Scalding, but Scalding performance is much worse and a simple Scalding query against a day's worth of metrics will usually take between three and twenty hours, depending on cluster load, making it unreasonable to use Scalding for any kind of exploratory data analysis.

Given the data infrastructure that already existed, an easy way to solve both of these problems was to write a Scalding job to store the 0.1% to 0.01% of metrics data that we care about for performance or capacity related queries and re-write it into a columnar format. I would guess that at least 90% of metrics are things that almost no one will want to look at in almost any circumstance, and of the metrics anyone really cares about, the vast majority aren't performance related. A happy side effect of this is that since such a small fraction of the data is relevant, it's cheap to store it indefinitely. The standard metrics data dump is deleted after a few weeks because it's large enough that it would be prohibitively expensive to store it indefinitely; a longer metrics memory will be useful for capacity planning or other analyses that prefer to have historical data.

The data we're saving includes (but isn't limited to) the following things for each shard of each service:

  • utilizations and sizes of various buffers
  • CPU, memory, and other utilization
  • number of threads, context switches, core migrations
  • various queue depths and network stats
  • JVM version, feature flags, etc.
  • GC stats
  • Finagle metrics

And for each host:

  • various things from procfs, like iowait time, idle, etc.
  • what cluster the machine is a part of
  • host-level info like NIC speed, number of cores on the host, memory,
  • host-level stats for "health" issues like thermal throttling, machine checks, etc.
  • OS version, host-level software versions, host-level feature flags, etc.
  • Rezolus metrics

For things that we know change very infrequently (like host NIC speed), we store these daily, but most of these are stored at the same frequency and granularity that our other metrics is stored for. In some cases, this is obviously wasteful (e.g., for JVM tenuring threshold, which is typically identical across every shard of a service and rarely changes), but this was the easiest way to handle this given the infra we have around metrics.

Although the impetus for this project was figuring out which services were under or over configured for JVM survivor space, it started with GC and container metrics since those were very obvious things to look at and we've been incrementally adding other metrics since then. To get an idea of the kinds of things we can query for and how simple queries are if you know a bit of SQL, here are some examples:

Very High p90 JVM Survivor Space

This is part of the original goal of finding under/over-provisioned services. Any service with a very high p90 JVM survivor space utilization is probably under-provisioned on survivor space. Similarly, anything with a very low p99 or p999 JVM survivor space utilization when under peak load is probably overprovisioned (query not displayed here, but we can scope the query to times of high load).

A Presto query for very high p90 survivor space across all services is:

with results as (
  select servicename,
    approx_distinct(source, 0.1) as approx_sources, -- number of shards for the service
    -- real query uses [coalesce and nullif](https://prestodb.io/docs/current/functions/conditional.html) to handle edge cases, omitted for brevity
    approx_percentile(jvmSurvivorUsed / jvmSurvivorMax, 0.90) as p90_used,
    approx_percentile(jvmSurvivorUsed / jvmSurvivorMax, 0.50) as p50_used,
  from ltm_service 
  where ds >= '2020-02-01' and ds <= '2020-02-28'
  group by servicename)
select * from results
where approx_sources > 100
order by p90_used desc

Rather than having to look through a bunch of dashboards, we can just get a list and then send diffs with config changes to the appropriate teams or write a script that takes the output of the query and automatically writes the diff. The above query provides a pattern for any basic utilization numbers or rates; you could look at memory usage, new or old gen GC frequency, etc., with similar queries. In one case, we found a service that was wasting enough RAM to pay my salary for a decade.

I've been moving away from using thresholds against simple percentiles to find issues, but I'm presenting this query because this is a thing people commonly want to do that's useful and I can write this without having to spend a lot of space explain why it's a reasonable thing to do; what I prefer to do instead is out of scope of this post and probably deserves its own post.

Network utilization

The above query was over all services, but we can also query across hosts. In addition, we can do queries that join against properties of the host, feature flags, etc.

Using one set of queries, we were able to determine that we had a significant number of services running up against network limits even though host-level network utilization was low. The compute platform team then did a gradual rollout of a change to network caps, which we monitored with queries like the one below to determine that we weren't see any performance degradation (theoretically possible if increasing network caps caused hosts or switches to hit network limits).

With the network change, we were able to observe, smaller queue depths, smaller queue size (in bytes), fewer packet drops, etc.

The query below only shows queue depths for brevity; adding all of the quantities mentioned is just a matter of typing more names in.

The general thing we can do is, for any particular rollout of a platform or service-level feature, we can see the impact on real services.

with rolled as (
 select
   -- rollout was fixed for all hosts during the time period, can pick an arbitrary element from the time period
   arbitrary(element_at(misc, 'egress_rate_limit_increase')) as rollout,
   hostId
 from ltm_deploys
 where ds = '2019-10-10'
 and zone = 'foo'
 group by ipAddress
), host_info as(
 select
   arbitrary(nicSpeed) as nicSpeed,
   hostId
 from ltm_host
 where ds = '2019-10-10'
 and zone = 'foo'
 group by ipAddress
), host_rolled as (
 select
   rollout,
   nicSpeed,
   rolled.hostId
 from rolled
 join host_info on rolled.ipAddress = host_info.ipAddress
), container_metrics as (
 select
   service,
   netTxQlen,
   hostId
 from ltm_container
 where ds >= '2019-10-10' and ds <= '2019-10-14'
 and zone = 'foo'
)
select
 service,
 nicSpeed,
 approx_percentile(netTxQlen, 1, 0.999, 0.0001) as p999_qlen,
 approx_percentile(netTxQlen, 1, 0.99, 0.001) as p99_qlen,
 approx_percentile(netTxQlen, 0.9) as p90_qlen,
 approx_percentile(netTxQlen, 0.68) as p68_qlen,
 rollout,
 count(*) as cnt
from container_metrics
join host_rolled on host_rolled.hostId = container_metrics.hostId
group by service, nicSpeed, rollout

Other questions that became easy to answer

  • What's the latency, CPU usage, CPI, or other performance impact of X?
    • Increasing or decreasing the number of performance counters we monitor per container
    • Tweaking kernel parameters
    • OS or other releases
    • Increasing or decreasing host-level oversubscription
    • General host-level load
    • Retry budget exhaustion
  • For relevant items above, what's the distribution of X, in general or under certain circumstances?
  • What hosts have unusually poor service-level performance for every service on the host, after controlling for load, etc.?
    • This has usually turned out to be due to a hardware misconfiguration or fault
  • Which services don't play nicely with other services aside from the general impact on host-level load?
  • What's the latency impact of failover, or other high-load events?
    • What level of load should we expect in the future given a future high-load event plus current growth?
    • Which services see more load during failover, which services see unchanged load, and which fall somewhere in between?
  • What config changes can we make for any fixed sized buffer or allocation that will improve performance without increasing cost or reduce cost without degrading performance?
  • For some particular host-level health problem, what's the probability it recurs if we see it N times?
  • etc., there are a lot of questions that become easy to answer if you can write arbitrary queries against historical metrics data

Design decisions

LTM is about as boring a system as is possible. Every design decision falls out of taking the path of least resistance.

  • Why using Scalding?
    • It's standard at Twitter and the integration made everything trivial. I tried Spark, which has some advantages. However, at the time, I would have had to do manual integration work that I got for free with Scalding.
  • Why use Presto and not something that allows for live slice & dice queries like Druid?
    • Rebecca Isaacs and Jonathan Simms were doing related work on tracing and we knew that we'd want to do joins between LTM and whatever they created. That's trivial with Presto but would have required more planning and work with something like Druid, at least at the time.
    • George Sirois imported a subset of the data into Druid so we could play with it and the facilities it offers are very nice; it's probably worth re-visiting at some point
  • Why not use Postgres or something similar?
    • The amount of data we want to store makes this infeasible without a massive amount of effort; even though the cost of data storage is quite low, it's still a "big data" problem
  • Why Parquet instead of a more efficient format?
    • It was the most suitable of the standard supported formats (the other major suppported format is raw thrift), introducing a new format would be a much larger project than this project
  • Why is the system not real-time (with delays of at least one hour)?
    • Twitter's batch job pipeline is easy to build on, all that was necessary was to read some tutorial on how it works and then write something similar, but with different business logic.
    • There was a nicely written proposal to build a real-time analytics pipeline for metrics data written a couple years before I joined Twitter, but that never got built because (I estimate) it would have been one to four quarters of work to produce an MVP and it wasn't clear what team had the right mandate to work on that and also had 4 quarters of headcount available. But the add a batch job took one day, you don't need to have roadmap and planning meetings for a day of work, you can just do it and then do follow-on work incrementally.
    • If we're looking for misconfigurations or optimization opportunities, these rarely go away within an hour (and if they did, they must've had small total impact) and, in fact, they often persist for months to years, so we don't lose much by givng up on real-time (we do lose the ability to use the output of this for some monitoring use cases)
    • The real-time version would've been a system that significant operational cost can't be operated by one person without undue burden. This system has more operational/maintenance burden than I'd like, probably 1-2 days of mine time per month a month on average, which at this point makes that a pretty large fraction of the total cost of the system, but it never pages, and the amount of work can easily be handeled by one person.

Boring technology

I think writing about systems like this, that are just boring work is really underrated. A disproportionate number of posts and talks I read are about systems using hot technologies. I don't have anything against hot new technologies, but a lot of useful work comes from plugging boring technologies together and doing the obvious thing. Since posts and talks about boring work are relatively rare, I think writing up something like this is more useful than it has any right to be.

For example, a couple years ago, at a local meetup that Matt Singer organizes for companies in our size class to discuss infrastructure (basically, companies that are smaller than FB/Amazon/Google) I asked if anyone was doing something similar to what we'd just done. No one who was there was (or not who'd admit to it, anyway), and engineers from two different companies expressed shock that we could store so much data, and not just the average per time period, but some histogram information as well. This work is too straightforward and obvious to be novel, I'm sure people have built analogous systems in many places. It's literally just storing metrics data on HDFS (or, if you prefer a more general term, a data lake) indefinitely in a format that allows interactive queries.

If you do the math on the cost of metrics data storage for a project like this in a company in our size class, the storage cost is basically a rounding error. We've shipped individual diffs that easily pay for the storage cost for decades. I don't think there's any reason storing a few years or even a decade worth of metrics should be shocking when people deploy analytics and observability tools that cost much more all the time. But it turns out this was surprising, in part because people don't write up work this boring.

An unrelated example is that, a while back, I ran into someone at a similarly sized company who wanted to get similar insights out of their metrics data. Instead of starting with something that would take a day, like this project, they started with deep learning. While I think there's value in applying ML and/or stats to infra metrics, they turned a project that could return significant value to the company after a couple of person-days into a project that took person-years. And if you're only going to either apply simple heuristics guided by someone with infra experience and simple statistical models or naively apply deep learning, I think the former has much higher ROI. Applying both sophisticated stats/ML and practitioner guided heuristics together can get you better results than either alone, but I think it makes a lot more sense to start with the simple project that takes a day to build out and maybe another day or two to start to apply than to start with a project that takes months or years to build out and start to apply. But there are a lot of biases towards doing the larger project: it makes a better resume item (deep learning!), in many places, it makes a better promo case, and people are more likely to give a talk or write up a blog post on the cool system that uses deep learning.

The above discusses why writing up work is valuable for the industry in general. We covered why writing up work is valuable to the company doing the write-up in a previous post, so I'm not going to re-hash that here.

Appendix: stuff I screwed up

I think it's unfortunate that you don't get to hear about the downsides of systems without backchannel chatter, so here are things I did that are pretty obvious mistakes in retrospect. I'll add to this when something else becomes obvious in retrospect.

  • Not using a double for almost everything
    • In an ideal world, some things aren't doubles, but everything in our metrics stack goes through a stage where basically every metric is converted to a double
    • I stored most things that "should" be an integral type as an integral type, but doing the conversion from long -> double -> long is never going to be more precise than just doing thelong -> double conversion and it opens the door to other problems
    • I stored some things that shouldn't be an integral type as an integral type, which causes small values to unnecessarily lose precision
      • Luckily this hasn't caused serious errors for any actionable analysis I've done, but there are analyses where it could cause problems
  • Using asserts instead of writing bad entries out to some kind of "bad entries" table
    • For reasons that are out of scope of this post, there isn't really a reasonable way to log errors or warnings in Scalding jobs, so I used asserts to catch things that shoudn't happen, which causes the entire job to die every time something unexpected happens; a better solution would be to write bad input entries out into a table and then have that table emailed out as a soft alert if the table isn't empty
      • An example of a case where this would've saved some operational overhead is where we had an unusual amount of clock skew (3600 years), which caused a timestamp overflow. If I had a table that was a log of bad entries, the bad entry would've been omitted from the output, which is the correct behavior, and it would've saved an interruption plus having to push a fix and re-deploy the job.
  • Longterm vs. LongTerm in the code
    • I wasn't sure which way this should be capitalized when I was first writing this and, when I made a decision, I failed to grep for and squash everything that was written the wrong way, so now this pointless inconsistency exists in various places

These are the kind of thing you expect when you crank out something quickly and don't think it through enough. The last item is trivial to fix and not much of a problem since the ubiquitous use of IDEs at Twitter means that basically anyone who would be impacted will have their IDE supply the correct capitalization for them.

The first item is more problematic, both in that it could actually cause incorrect analyses and in that fixing it will require doing a migration of all the data we have. My guess is that, at this point, this will be half a week to a week of work, which I could've easily avoided by spending thirty more seconds thinking through what I was doing.

The second item is somewhere in between. Between the first and second items, I think I've probably signed up for roughly double the amount of direct work on this system (so, not including time spent on data analysis on data in the system, just the time spent to build the system) for essentially no benefit.

Thanks to Leah Hanson, Andy Wilcox, Lifan Zeng, and Matej Stuchlik for comments/corrections/discussion


  1. The actual work involved was about a day's work, but it was done over a week since I had to learn Scala as well as Scalding and the general Twitter stack, the metrics stack, etc.

    One day is also just an estimate for the work for the initial data sets. Since then, I've done probably a couple more weeks of work and Wesley Aptekar-Cassels and Kunal Trivedi have probably put in another week or two of time. The opertional cost is probably something like 1-2 days of my time per month (on average), bringing the total cost to on the order a month or two.

    I'm also not counting time spent using the dataset, or time spent debugging issues, which will include a lot of time that I can only roughly guess at, e.g., when the compute platform team changed the network egress limits as a result of some data analysis that took about an hour, that exposed a latent mesos bug that probably cost a day of Ilya Pronin's time, David Mackey has spent a fair amount of time tracking down weird issues where the data shows something odd is going on, but we don't know what is, etc. If you wanted to fully account for time spent on work that came out of some data analysis on the data sets discussed in the post, I suspect, between service-level teams, plus platform-level teams like our JVM, OS, and HW teams, we're probably at roughly 1 person-year of time.

    But, because the initial work it took to create a working and useful system was a day plus time spent working on orientation material and the system returned seven figures, it's been very easy to justify all of this additional time spent, which probably wouldn't have been the case if a year of up-front work was required. Most of the rest of the time isn't the kind of thing that's usually "charged" on roadmap reviews on creating a system (time spent by users, operational overhead), but perhaps the ongoing operational cost shlould be "charged" when creating the system (I don't think it makes sense to "charge" time spent by users to the system since, the more useful a system is, the more time users will spend using it, that doesn't really seem like a cost).

    There'a also been work to build tools on top of this, Kunal Trivedi has spent a fair amount of time building a layer on top of this to make the presentation more user friendly than SQL queries, which could arguably be charged to this project.

    [return]

Andreas Zwinkau (qznc)

One Letter Programming Languages May 30, 2020 12:00 AM

If you are looking for a free name, there is none.

Read full article!

May 29, 2020

Jeremy Morgan (JeremyMorgan)

How to Build Your First JAMstack Site May 29, 2020 10:51 PM

Are you wondering what all this new hype is over JAMstack? What is a JAMstack site? How do I build one? Where do I deploy it? If you’ve asked any of these questions over the last couple of months, this article is for you. We’re going to learn what JAMstack is, and how to build our first JAMstack blog. If you already have an idea what a JAMstack site is, you can skip this section and go directly to:

Wesley Moore (wezm)

Setting the amdgpu HDMI Pixel Format on Linux May 29, 2020 10:48 PM

This week I discovered some details of digital display technology that I was previously unaware of: pixel formats. I have two Dell P2415Q displays connected to my computer. One via DisplayPort, the other via HDMI. The HDMI connected one was misbehaving and showing a dull picture. It turned out I needed to force the HDMI port of my RX560 graphics card to use RGB output instead of YCbCr. However, the amdgpu driver does not expose a means to do this. So, I used an EDID hack to make it look like the display only supported RGB.

tl;dr You can't easily configure the pixel format of the Linux amdgpu driver but you can hack the EDID of your display so the driver chooses RGB. Jump to the instructions.

Previously I had one display at work and one at home, both using DisplayPort and all was well. However, when I started working from home at the start of 2020 (pre-pandemic) the HDMI connected one has always been a bit flakey. The screen would go blank for second, then come back on. I tried 3 different HDMI cables each more premium (and hopefully shielded than the last) without success.

This week the frustration boiled over and I vented to some friends. I was on the brink of just rage buying a new graphics card with multiple DisplayPorts, since I'd never had any trouble with that connection. I received one suggestion to swap the cables between the two, to rule out a fault with the HDMI connected display. I was quite confident the display was ok but it was a sensible thing to try before dropping cash on a new graphics card. So I swapped the cables over.

After performing the magical incantation to enable HDMI 2.0 and get 4K 60Hz on the newly HDMI connected display I immediately noticed lag. I even captured it in a slow motion video on my phone to prove I wasn't going crazy. Despite xrandr reporting a 60Hz connection it seemed as though it was updating at less than that. This led me to compare the menus of the two displays. It was here I noticed that the good one reported an input colour format of RGB, the other YPbPr.

This led to more reading about pixel formats in digital displays — a thing I was not previously aware of. Turns out that ports like HDMI support multiple ways of encoding the pixel data, some sacrificing dynamic range for lower bandwidth. I found this article particularly helpful, DisplayPort vs. HDMI: Which Is Better For Gaming?.

My hypothesis at this point was that the lag was being introduced by my display converting the YPbPr input to its native RGB. So, I looked for a way to change the pixel format output from the HDMI port of my RX560 graphics card. Turns out this is super easy on Windows, but the amdgpu driver on Linux does not support changing it.

In trying various suggestions in that bug report I rebooted a few times and the lag mysteriously went away but the pixel format remained the same. At this point I noticed the display had a grey cast to it, especially on areas of white. This had been present on the other display when it was connected via HDMI too but I just put it down to being a couple of years older than the other one. With my new pixel format knowledge in hand I knew this was was the source of lack of brightness. So, I was still determined to find a way to force the HDMI output to RGB.

The Fix

It was at this point I found this Reddit post describing a terrible hack, originally described by Parker Reed in this YouTube video: Copy the EDID of the display and modify it to make it seem like the display only supports RGB. The amdgpu driver then chooses that format instead. Amazingly enough it worked! I also haven't experienced the screen blanking issue since swapping cables. I can't say for sure if that is fixed but the HDMI cable is now further away from interference from my Wi-Fi router, so perhaps that helped.

The following are the steps I took on Arch Linux to use a modified EDID:

  1. Install wxEDID from the AUR.
  2. Make a copy of the EDID data: cp /sys/devices/pci0000:00/0000:00:03.1/0000:09:00.0/drm/card0/card0-HDMI-A-1/edid Documents/edid.bin
  3. Edit edid.bin with wxEDID and change these values:
    1. Find SPF: Supported features -> vsig_format -> replace 0b01 wih 0b00
    2. Find CHD: CEA-861 header -> change the value of YCbCr420 and YCbCr444 to 0
    3. Recalculate the checksum: Options > Recalc Checksum.
    4. Save the file.

Note: I had to attempt editing the file a few times as wxEDID kept segfaulting. Eventually it saved without crashing though.

Now we need to get the kernel to use the modified file:

  1. sudo mkdir /lib/firmware/edid

  2. sudo mv edid.bin /lib/firmware/edid/edid.bin

  3. Edit the kernel command line. I use systemd-boot, so I edited /boot/loader/entries/arch.conf and added drm_kms_helper.edid_firmware=edid/edid.bin to the command line, making the full file look like this:

     title   Arch Linux
     linux   /vmlinuz-linux
     initrd  /amd-ucode.img
     initrd  /initramfs-linux.img
     options root=PARTUUID=2f693946-c278-ed44-8ba2-67b07c3b6074 resume=UUID=524c0604-c307-4106-97e4-1b9799baa7d5 resume_offset=4564992 drm_kms_helper.edid_firmware=edid/edid.bin rw
    
  4. Regenerate the initial RAM disk: sudo mkinitcpio -p linux

  5. Reboot

After rebooting the display confirmed it was now using RGB and visually it was looking much brighter! 🤞 the display blanking issue remains fixed as well.

May 27, 2020

Frederic Cambus (fcambus)

OpenBSD/armv7 on the CubieBoard2 May 27, 2020 10:39 PM

I bought the CubieBoard2 back in 2016 with the idea to run OpenBSD on it, but because of various reliability issues with the onboard NIC, it ended up running NetBSD for a few weeks before ending up in a drawer.

Back in October, Mark Kettenis committed code to allow switching to the framebuffer "glass" console in the bootloader on OpenBSD/armv7, making it possible to install the system without using a serial cable.

>> OpenBSD/armv7 BOOTARM 1.14
boot> set tty fb0
switching console to fb0

This prompted me to plug the board again, and having support for the framebuffer console is a game changer. It also allows running Xenocara, if that's your thing.

Here is the output of running file on executables:

ELF 32-bit LSB shared object, ARM, version 1

And this is the result of the md5 -t benchmark:

MD5 time trial.  Processing 10000 10000-byte blocks...
Digest = 52e5f9c9e6f656f3e1800dfa5579d089
Time   = 1.340000 seconds
Speed  = 74626865.671642 bytes/second

For the record, LibreSSL speed benchmark results are available here.

System message buffer (dmesg output):

OpenBSD 6.7-current (GENERIC) #299: Sun May 24 18:25:45 MDT 2020
    deraadt@armv7.openbsd.org:/usr/src/sys/arch/armv7/compile/GENERIC
real mem  = 964190208 (919MB)
avail mem = 935088128 (891MB)
random: good seed from bootblocks
mainbus0 at root: Cubietech Cubieboard2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A7 r0p4
cpu0: 32KB 32b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu0: 256KB 64b/line 8-way L2 cache
cortex0 at mainbus0
psci0 at mainbus0: PSCI 0.0
sxiccmu0 at mainbus0
agtimer0 at mainbus0: tick rate 24000 KHz
simplebus0 at mainbus0: "soc"
sxiccmu1 at simplebus0
sxipio0 at simplebus0: 175 pins
sxirtc0 at simplebus0
sxisid0 at simplebus0
ampintc0 at simplebus0 nirq 160, ncpu 2: "interrupt-controller"
"system-control" at simplebus0 not configured
"interrupt-controller" at simplebus0 not configured
"dma-controller" at simplebus0 not configured
"lcd-controller" at simplebus0 not configured
"lcd-controller" at simplebus0 not configured
"video-codec" at simplebus0 not configured
sximmc0 at simplebus0
sdmmc0 at sximmc0: 4-bit, sd high-speed, mmc high-speed, dma
"usb" at simplebus0 not configured
"phy" at simplebus0 not configured
ehci0 at simplebus0
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ohci0 at simplebus0: version 1.0
"crypto-engine" at simplebus0 not configured
"hdmi" at simplebus0 not configured
sxiahci0 at simplebus0: AHCI 1.1
scsibus0 at sxiahci0: 32 targets
ehci1 at simplebus0
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ohci1 at simplebus0: version 1.0
"timer" at simplebus0 not configured
sxidog0 at simplebus0
"ir" at simplebus0 not configured
"codec" at simplebus0 not configured
sxits0 at simplebus0
com0 at simplebus0: ns16550, no working fifo
sxitwi0 at simplebus0
iic0 at sxitwi0
axppmic0 at iic0 addr 0x34: AXP209
sxitwi1 at simplebus0
iic1 at sxitwi1
"gpu" at simplebus0 not configured
dwge0 at simplebus0: address 02:0a:09:03:27:08
rlphy0 at dwge0 phy 1: RTL8201L 10/100 PHY, rev. 1
"hstimer" at simplebus0 not configured
"display-frontend" at simplebus0 not configured
"display-frontend" at simplebus0 not configured
"display-backend" at simplebus0 not configured
"display-backend" at simplebus0 not configured
gpio0 at sxipio0: 32 pins
gpio1 at sxipio0: 32 pins
gpio2 at sxipio0: 32 pins
gpio3 at sxipio0: 32 pins
gpio4 at sxipio0: 32 pins
gpio5 at sxipio0: 32 pins
gpio6 at sxipio0: 32 pins
gpio7 at sxipio0: 32 pins
gpio8 at sxipio0: 32 pins
usb2 at ohci0: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1
usb3 at ohci1: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1
simplefb0 at mainbus0: 1920x1080, 32bpp
wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation)
scsibus1 at sdmmc0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <SD/MMC, SC64G, 0080> removable
sd0: 60906MB, 512 bytes/sector, 124735488 sectors
uhidev0 at uhub2 port 1 configuration 1 interface 0 "Lenovo ThinkPad Compact USB Keyboard with TrackPoint" rev 2.00/3.30 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub2 port 1 configuration 1 interface 1 "Lenovo ThinkPad Compact USB Keyboard with TrackPoint" rev 2.00/3.30 addr 2
uhidev1: iclass 3/1, 22 report ids
ums0 at uhidev1 reportid 1: 5 buttons, Z and W dir
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 16: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 17: input=2, output=0, feature=0
uhid2 at uhidev1 reportid 19: input=8, output=8, feature=8
uhid3 at uhidev1 reportid 21: input=2, output=0, feature=0
uhid4 at uhidev1 reportid 22: input=2, output=0, feature=0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootfile: sd0a:/bsd
boot device: sd0
root on sd0a (f7b555b0fa0e8c49.a) swap on sd0b dump on sd0b

Sensors output:

$ sysctl hw.sensors
hw.sensors.sxits0.temp0=39.50 degC
hw.sensors.axppmic0.temp0=30.00 degC
hw.sensors.axppmic0.volt0=4.95 VDC (ACIN)
hw.sensors.axppmic0.volt1=0.03 VDC (VBUS)
hw.sensors.axppmic0.volt2=4.85 VDC (APS)
hw.sensors.axppmic0.current0=0.11 A (ACIN)
hw.sensors.axppmic0.current1=0.00 A (VBUS)
hw.sensors.axppmic0.indicator0=On (ACIN), OK
hw.sensors.axppmic0.indicator1=Off (VBUS)

May 25, 2020

Gustaf Erikson (gerikson)

4,000 dead in Sweden May 25, 2020 09:48 AM