Planet Crustaceans

This is a Planet instance for lobste.rs community feeds. To add/update an entry or otherwise improve things, fork this repo.

January 27, 2020

Derek Jones (derek-jones)

How useful are automatically generated compiler tests? January 27, 2020 12:17 AM

Over the last decade, testing compilers using automatically generated source code has been a popular research topic (for those working in the compiler field; Csmith kicked off this interest). Compilers are large complicated programs, and they will always contain mistakes that lead to faults being experienced. Previous posts of mine have raised two issues on the use of automatically generated tests: a financial issue (i.e., fixing reported faults costs money {most of the work on gcc and llvm is done by people working for large companies}, and is intended to benefit users not researchers seeking bragging rights for their latest paper), and applicability issue (i.e., human written code has particular characteristics and unless automatically generated code has very similar characteristics the mistakes it finds are unlikely to commonly occur in practice).

My claim that mistakes in compilers found by automatically generated code are unlikely to be the kind of mistakes that often lead to a fault in the compilation of human written code is based on the observations (I don’t have any experimental evidence): the characteristics of automatically generated source is very different from human written code (I know this from measurements of lots of code), and this difference results in parts of the compiler that are infrequently executed by human written code being more frequently executed (increasing the likelihood of a mistake being uncovered; an observation based on my years working on compilers).

An interesting new paper, Compiler Fuzzing: How Much Does It Matter?, investigated the extent to which fault experiences produced by automatically generated source are representative of fault experiences produced by human written code. The first author of the paper, Michaël Marcozzi, gave a talk about this work at the Papers We Love workshop last Sunday (videos available).

The question was attacked head on. The researchers instrumented the code in the LLVM compiler that was modified to fix 45 reported faults (27 from four fuzzing tools, 10 from human written code, and 8 from a formal verifier); the following is an example of instrumented code:

warn ("Fixing patch reached");
if (Not.isPowerOf2()) {
   if (!(C-> getValue().isPowerOf2()  // Check needed to fix fault
         && Not != C->getValue())) {
      warn("Fault possibly triggered");
   } else { /* CODE TRANSFORMATION */ } } // Original, unfixed code

The instrumented compiler was used to build 309 Debian packages (around 10 million lines of C/C++). The output from the builds were (possibly miscompiled) built versions of the packages, and log files (from which information could be extracted on the number of times the fixing patches were reached, and the number of cases where the check needed to fix the fault was triggered).

Each built package was then checked using its respective test suite; a package built from miscompiled code may successfully pass its test suite.

A bitwise compare was run on the program executables generated by the unfixed and fixed compilers.

The following (taken from Marcozzi’s slides) shows the percentage of packages where the fixing patch was reached during the build, the percentages of packages where code added to fix a fault was triggered, the percentage where a different binary was generated, and the percentages of packages where a failure was detected when running each package’s tests (0.01% is one failure):

Percentage of packages where patched code was reached during builds, and packages with failing tests.

The takeaway from the above figure is that many packages are affected by the coding mistakes that have been fixed, but that most package test suites are not affected by the miscompilations.

To find out whether there is a difference, in terms of impact on Debian packages, between faults reported in human and automatically generated code, we need to compare number of occurrences of “Fault possibly triggered”. The table below shows the break-down by the detector of the coding mistake (i.e., Human and each of the automated tools used), and the number of fixed faults they contributed to the analysis.

Human, Csmith and EMI each contributed 10-faults to the analysis. The fixes for the 10-fault reports in human generated code were triggered 593 times when building the 309 Debian packages, while each of the 10 Csmith and EMI fixes were triggered 1,043 and 948 times respectively; a lot more than the Human triggers :-O. There are also a lot more bitwise compare differences for the non-Human fault-fixes.

Detector  Faults   Reached    Triggered   Bitwise-diff   Tests failed
Human       10      1,990         593         56              1
Csmith      10      2,482       1,043        318              0
EMI         10      2,424         948        151              1
Orange       5        293          35          8              0
yarpgen      2        608         257          0              0
Alive        8      1,059         327        172              0

Is the difference due to a few packages being very different from the rest?

The table below breaks things down by each of the 10-reported faults from the three Detectors.

Ok, two Human fault-fix locations are never reached when compiling the Debian packages (which is a bit odd), but when the locations are reached they are just not triggering the fault conditions as often as the automatic cases.

Detector   Reached    Triggered
Human
              300       278
              301         0
              305         0
                0         0
                0         0
              133        44
              286       231
              229         0
              259        40
               77         0
Csmith
              306         2
              301       118
              297       291
              284         1
              143         6
              291       286
              125       125
              245         3
              285        16
              205       205
EMI      
              130         0
              307       221
              302       195
              281        32
              175         5
              122         0
              300       295
              297       215
              306       191
              287        10

It looks like I am not only wrong, but that fault experiences from automatically generated source are more (not less) likely to occur in human written code (than fault experiences produced by human written code).

This is odd. At best I would expect fault experiences from human and automatically generated code to have the same characteristics.

Ideas and suggestions welcome.

Update: the morning after

I have untangled my thoughts on how to statistically compare the three sets of data.

The bootstrap is based on the idea of exchangeability; which items being measured might we consider to be exchangeable, i.e., being able to treat the measurement of one as being the equivalent to measuring the other.

In this experiment the coding mistakes are not exchangeable, i.e., different mistakes can have different outcomes.

But we might claim that the detection of mistakes is exchangeable; that is, a coding mistake is just as likely to be detected by source code produced by an automatic tool as source written by a Human.

The bootstrap needs to be applied without replacement, i.e., each coding mistake is treated as being unique. The results show that for the sum of the Triggered counts (code+data):

  • treating Human and Csmith as being equally likely to detect the same coding mistake, there is a 18% change of the Human results being lower than 593.
  • treating Human and EMI as being equally likely to detect the same coding mistake, there is a 12% change of the Human results being lower than 593.

So the likelihood of the lower value, 593, of Human Triggered events is expected to occur quite often (i.e., 12% and 18%). Automatically generated code is not more likely to detect coding mistakes than human written code (at least based on this small sample set).

January 26, 2020

Patrick Louis (venam)

Loading of xinitrc,xserverrc,xresources,xdefaults,xprofile,xsession,xmodmap January 26, 2020 10:00 PM

X11 Logo

NB: This is a repost on this blog of a post made on nixers.net

We often hear discussions about X configuration files and their roles. Namely: xinitrc,xserverrc,xresources,xdefaults,xprofile,xsession,xmodmap. So let’s try to clear up this mumbo jumbo of words.

There’s roughly two ways to start your X environment, one is via xinit and the other is via a display manager (fancy login screen). Depending on which one you use, different configuration files will be loaded.

If starting via xinit, or startx which is a wrapper over xinit, then the ~/.xinitrc will be loaded, and if not present will load the global /etc/X11/xinit/xinitrc. This will run all the lines found in it, interpreted by /bin/sh and will stop at the last one. The X session will stop when that last program terminates.

If using the globally available xinitrc it will include in an alphabetical order sub-xinitrc found in the /etc/X11/xinit/xinitrc.d/.

That globally available xinitrc loads two more configurations:

  • Xresources, found in ~/.Xresources or /etc/X11/xinit/.Xresources which consists of the key/value pair accessible for all X clients, the resources. xinit executes xrdb -merge ~/.Xresources or xrdb with the global one.
  • Xmodmap, locally in ~/.Xmodmap and globally in /etc/X11/xinit/.Xmodmap. This will run xmodmap $thefile. So in theory instead of having all those xmodmap lines that we find so commonly in the .xinitrc file (I’m guilty of this too) we can separate them into a .Xmodmap file instead.

xinit/startx will finally start the X server, it does it by executing a script found in ~/.xserverrc or globally /etc/X11/xinit/xserverrc. This consists of simply:

exec /usr/bin/X -nolisten tcp "$@"

However, replacing this xserverrc allows us to start X in different ways.

What about initiating a graphical session from the display manager.

Instead of xinitrc the file loaded at login will be the Xsession file. So similar to xinitrc we have globally a default located at /etc/X11/Xsession.options along with a directory of sub-xsessions to be loaded in /etc/X11/Xsession.d. Also similar to xinit the default Xsession will load the Xresources. As for local configs there are many of them depending on what the type of session, I quote:

If the user has a ~/.xsessionrc file, read it. (used by all sessions types)
If a specific session was selected in the DM (GDM, KDM, WDM, LightDM, …) , run it.
Otherwise, if the user has a ~/.xsession or ~/.Xsession file, run it.
Otherwise, if the /usr/bin/x-session-manager command exists, run it.
Otherwise, if the /usr/bin/x-window-manager command exists, run it.
Otherwise, if the /usr/bin/x-terminal-emulator command exists, run it.

Some specific display manager include in their default Xsession an extra configuration called Xprofile.

For example:

  • GDM - /etc/gdm/Xsession
  • LightDM - /etc/lightdm/Xsession
  • LXDM - /etc/lxdm/Xsession
  • SDDM - /usr/share/sddm/scripts/Xsession

Otherwise, if you want the Xprofile, you have to source the file manually from startx/xinit or XDM or any other display manager.

Now for something unrelated, Xdefaults is the old version of Xresources.

The way it was done in the old days is that Xdefaults was read every single time a client program (Xlib) was started, unlike Xresources which have properties stored in the root window/resource manager (think xrdb). So that means the old method, Xdefaults, couldn’t be used over the network because you needed direct access to the file.

Now that gets a bit complicated because there could be multiple Xdefaults files found in different ways other than ~/.Xdefaults.

I quote:

There also is the $XENVIRONMENT variable, which defaults to ~/.Xdefaults-hostname ($XENVIRONMENT/.Xdefaults) if not set. This is used in the same way as .Xdefaults, but is always read regardless of whether RESOURCE_MANAGER is present. You can use .Xdefaults-hostname files to keep some settings machine-specific while using xrdb for the global ones

The fourth location is the directory pointed to by the $XAPPLRESDIR environment variable. (Oddly, if the variable is not set, $HOME is used as the default.) When a program is started, it looks if any of the following files exist (the file name being the same as the program’s class name):
$XAPPLRESDIR/$LC_CTYPE/XTerm
$XAPPLRESDIR/language/XTerm
$XAPPLRESDIR/XTerm

The fifth location is the system-wide “app-defaults” directories. Again, the app-defaults directories are checked on program startup if they have a file named after the program. For example, XTerm (on Arch Linux) uses:

/etc/X11/$LC_CTYPE/app-defaults/XTerm
/etc/X11/language/app-defaults/XTerm
/etc/X11/app-defaults/XTerm
/usr/share/X11/$LC_CTYPE/app-defaults/XTerm
/usr/share/X11/language/app-defaults/XTerm
/usr/share/X11/app-defaults/XTerm

The app-defaults files are usually installed into /usr/share along with the program itself; administrator overrides would go to /etc.

I hope that helps clear things up.











References:

Key And Trust Store on Unix-like OS January 26, 2020 10:00 PM

lock drawing

NB: This is a repost on this blog of a post made on nixers.net

Let’s have a discussion about all the kinds of trust stores found on Unix-like operating systems.
For those not in the know, trust stores are places where the operating sytems generally, or the specific software, stores private and public keys (asymmetric), trusted CAs, and symmetric keys (decryption keys).

There’s a lot to cover on this topic, I thought of writing an article about this because I couldn’t find anything online that covered it in a generic manner. That’s what gets me writing anyway.


Let’s tackle some of the stuffs regarding TLS PKI (Public Key Infrastructure).

Mozilla maintains a list of trusted CAs in a certificate store that a lot of Unix-like operating system fetch through the package manager and deploy at /etc/ssl/certs.

This location is accessed system wide by a lot of utilities to check the trusted certificates. It sort of has become standard, though as you’ll see in a bit it’s not really.

You also may find a symbolic link there pointing to /etc/ca-certificates/extracted/. This all points to the same thing, there’s even /usr/share/ca-certificates or /usr/lib/mozilla/certificates, /usr/lib64/mozilla/certificates, ~/.mozilla/certificates.

Openssl also stores/read certificate from that location /etc/ssl, that’s where you’ll find openssl.cnf for example. In this directory you can choose to store your private keys associated with certificates you’ve generated yourself in /etc/ssl/private. For obvious reasons, this directory should only be owned by root.

But there’s a catch here, openssl can be compiled with the nss library, which will have its own list of trusted CAs built-in though usually through /usr/lib/libnssckbi.so which, again, has the list of maintained trusted CAs by Mozilla (Mozilla’s Network Security Services).

The Chrome browser also uses nss so you might ask where the trust exclusions are stored when added. They are in $HOME/.pki/nssdb or /etc/pki/nssdb globally in an sqlite3 db.

Firefox also uses an sqlite3 database to store its exclusions. However, it’s not in the .pki directory but right within its profile directory: $HOME/.mozilla/firefox/<profilename>.default/cert_override.txt. Add to this that it has two (or maybe more) sqlite3 dbs in there which are basically copy of the nss trusted certs that are found globally on the system.

Now what about programming languages that want to access the internet in a secure manner through TLS PKI.

Most of them rely on the trusted stores mentioned previously, namely nss or /etc/ssl. However, some don’t.

I’m aware of one well known example with the Java language. It stores its trust store in the $JAVA_HOME/jre/lib/security/cacerts which is a java keystore. The password to this keystore is “changeit”. Java has a concept of security provider, and they are listed in order of preference in the java.security file. Hopefully you can find one of the provider relying on the nss.cfg, and so we have less redundancy within our system.

Let’s also put a hint here about certificate revocation. Sometimes, in specific cases, you can’t always rely on your OS packages to update your trusted CAs and you’ll need a daemon to check CRLs and OCSPs for all the trusted certs you got.

One example is: dirmngr(1)

Now there are two other common places that I’ll tackle too.

Gnupg trust store and ssh trust store. Those are in $HOME/.gnupg and $HOME/.ssh respectively.

Those directories both contains trusted certificates and your private/public pairs.

Let’s mention that almost all things in the crypto world uses a format called ASN.1 with DER encoding or not. GPG, X509, SSH all have it this way with some different formatting in some places.

You can have a look at those here:

And here’s a useful link:

https://wiki.gentoo.org/wiki/Certificates


Outro

So nixers, what do you have to say about trust stores on Unix-like OS. Anything to add to what I’ve mentioned. There’s a lot I’ve deliberately left out. Maybe talk about interfacing with keystores on a hardware security module through pkcs#11, like a yubikey, that could be used for OTP. Maybe we can talk about all the utilities that can be used to manipulate, create, and display in a human readable format the certificates, public/private pairs, and more (openssl, keytool, certutil, etc..). We can also talk about building your own PKI. We can talk about what specific language do to handle cryptographic keystores, what you like about it. Or maybe simply share a link you’ve found useful. Or maybe we can talk about package management and how maintainers should sign their packages. Or just express your opinion about anything.

We could go into secret management, PAM, crypto protocols and different libraries, and MAC (Mandatory access control, think SELinux and others) as a whole but that would be too big for a single thread. If you want to do that we can open a new one. Let’s attack trust and key stores in this thread.

What’s your take on trust and key stores?











Attributions:

  • Unknown. The author of the picture is not given, but it is possible that it is Josef Pokorný, the author of the text of the encyclopedia entry. [Public domain]

Gonçalo Valério (dethos)

Setting up a Content-Security-Policy January 26, 2020 09:08 PM

A couple of weeks ago, I gave a small talk on the Madeira Tech Meetup about a set of HTTP headers that could help website owners protect their assets and their users. The slides are available here, just in case you want to take a look.

The content of the talk is basically a small review about what exists, what each header tries to achieve and how could you use it.

After the talk I remembered that I didn’t review the heades of this blog for quite sometime. So a quick visit to Mozilla Observatory, a tool that lets you have a quick look of some of the security configurations of your website, gave me an idea of what I needed to improve. This was the result:

The Content-Security-Header was missing

So what is a Content Security Policy? On the MDN documentation we can find the following description:

The HTTP Content-Security-Policy response header allows web site administrators to control resources the user agent is allowed to load for a given page.

Mozilla Developer Network

Summing up, in this header we describe with a certain level of detail the sources from where each type of content can be fetched in order to be allowed and included on a given page/app. The main goal of this type of policy is to mitigate Cross-Site Scripting attacks.

In order to start building a CSP for this blog a good approach, in my humble opinion, is to start with the more basic and restrictive policy and then proceed evaluating the need for exceptions and only add them when strictly necessary. So here is my first attempt:

default-src 'self'; object-src 'none'; report-uri https://ovalerio.report-uri.com/r/d/csp/reportOnly

Lets interpret what it says:

  • default-src: This is the default value for all non-mentioned directives. self means “only things that come from this domain”.
  • object-src: No <object>, <embed> or <applet> here.
  • report-uri: All policy violations should be reported by the browser to this URL.

The idea was that all styles, scripts and images should be served by this domain, anything external should be blocked. This will also block inline scripts, styles and data images, which are considered unsafe. If for some reason I need to allow this on the blog I could use unsafe-inline, eval and data: on the directive’s definition but in my opinion they should be avoided.

Now a good way to find out how this policy will affect the website and to understand how it needs to be tuned (or the website changed) we can activate it using the “report only mode:

Content-Security-Policy-Report-Only: <policy>

This mode will generate some reports when you (and other users) navigate through the website, they will be printed on the browser’s console and sent to the defined report-uri, but the resources will be loaded anyway.

Here are some results:

CSP violations logs on the browser consoleExample of the CSP violations on the browser console

As an example below is a raw report from one of those violations:

{
    "csp-report": {
        "blocked-uri": "inline",
        "document-uri": "https://blog.ovalerio.net/",
        "original-policy": "default-src 'self'; object-src 'none'; report-uri https://ovalerio.report-uri.com/r/d/csp/reportOnly",
        "violated-directive": "default-src"
    }
}

After a while I found that:

  • The theme used on this blog used some data: fonts
  • Several inline scripts were being loaded
  • Many inline styles were also being used
  • I have some demos that load content from asciinema.org
  • I often share some videos from Youtube, so I need to allow iframes from that domain
  • Some older posts also embeded from other websites (such as soundcloud)

So for the blog to work fine with the CSP being enforced, I either had to include some exceptions or fix errors. After evaluating the attack surface and the work required to make the changes I ended up with the following policy:

Content-Security-Policy-Report-Only: default-src 'self'; script-src 'self' https://asciinema.org 'sha256-A+5+D7+YGeNGrYcTyNB4LNGYdWr35XshEdH/tqROujM=' 'sha256-2N2eS+4Cy0nFISF8T0QGez36fUJfaY+o6QBWxTUYiHc=' 'sha256-AJyUt7CSSRW+BeuiusXDXezlE1Wv2tkQgT5pCnpoL+w=' 'sha256-n3qH1zzzTNXXbWAKXOMmrBzjKgIQZ7G7UFh/pIixNEQ='; style-src 'self' 'sha256-MyyabzyHEWp8TS5S1nthEJ4uLnqD1s3X+OTsB8jcaas=' 'sha256-OyKg6OHgnmapAcgq002yGA58wB21FOR7EcTwPWSs54E='; font-src 'self' data:; img-src 'self' https://secure.gravatar.com; frame-src 'self' https://www.youtube.com https://asciinema.org; object-src 'none'; report-uri https://ovalerio.report-uri.com/r/d/csp/reportOnly

A lot more complex than I initially expected it to be, but it’s one of the drawbacks of using a “pre-built” theme on a platform that I didn’t develop. I was able (in the available time) to fix some stuff but fixing everything would take a lot more work.

All those sha-256 hashes were added to only allow certain inline scripts and styles without allowing everything using unsafe-inline.

Perhaps in the future I will be able to change to a saner theme/platform, but for the time being this Content-Security-Policy will do the job.

I started enforcing it (by changing Content-Security-Policy-Report-Only to Content-Security-Policy) just before publishing this blog post, so if anything is broken please let me know.

I hope this post has been helpful to you and if you didn’t yet implement this header you should give it a try, it might take some effort (depending on the use case) but in the long run I believe it is totally worth it.

January 25, 2020

Carlos Fenollosa (carlesfe)

January 24, 2020

Gustaf Erikson (gerikson)

Pages From The Fire (kghose)

static linking, duplicate symbol error and inlineing January 24, 2020 03:40 PM

If you define a function in a header file (i.e. have it’s implementation in the header) and statically link your program, you need to declare it inline if the header is used in multiple files. This is different from issues you get when a header file is included multiple times in a compilation unit. Consider… Read More static linking, duplicate symbol error and inlineing

January 22, 2020

Átila on Code (atilaneves)

The power of reflection January 22, 2020 10:01 AM

When I was at CppCon 2016 I overheard someone ask “Everyone keeps talking about  reflection, but why do we actually need it?”. A few years before that, I also would have had difficulty understanding why it would be useful. After years of writing D, it’s hard to imagine life without it. Serialisation is an obvious […]

January 20, 2020

Gustaf Erikson (gerikson)

The Information: A History, A Theory, A Flood by James Gleick January 20, 2020 01:58 PM

Gleick at his usual lucid self. Not as thick (or as deep) as Chaos, but a good read nonetheless.

With the Old Breed: At Peleliu and Okinawa by Eugene B. Sledge January 20, 2020 01:55 PM

Continuing my deep dive into the rot and shit of the Pacific theatre. Sledge has another background than Leckie (who was a sportswriter as a civilian) and has less facility with words. I believe Leckie spent a lot of time drinking beers with other vets, polishing his anecdotes, while Sledge pushed his memories back - he alludes to frequent nightmares after his experiences.

January 19, 2020

Andrew Gallant (burntsushi)

Posts January 19, 2020 10:15 PM

Ponylang (SeanTAllen)

Last Week in Pony - January 19, 2020 January 19, 2020 03:50 PM

Sean T. Allen’s recent PWL on Deny Capabilities for Safe, Fast Actors Talk is available. Microsoft’s Project Verona is now open source.

Eric Faehnrich (faehnrich)

When a Byte is Not 8 Bits January 19, 2020 05:00 AM

I’ve been getting into my backlog of C books and resources when I came across my copy of Portable C by Henry Rabinowitz1 that I obtained after reading the post C Portability Lessons from Weird Machines.

The blog post lists old weird machines with addressable units that might not be your typical 8-bit bytes. One might think that’s well and good, but not a concern since typical modern architectures have 8-bit bytes. That’s not entirely the case.

I work on products that have a Texas Instruments C2000 microcontroller. This is a modern microcontroller in use now. However, it has 16-bit bytes instead of 8.

I understand that the C2000 isn’t a part you see every day, but the fact remains that I have to support code for this.

If you want to play with this part, you can get their LaunchPad kit that has it.

So the addressable units and char are 16 bits. The standard says sizeof(char)==1, so any size is a multiple of 16 bits.

The standard also says int is at least 16 bits, but could for instance be 32. The C2000 just happens to have int be the minimum 16 bits. Interestingly, this means sizeof(int)==1 when we’re used to int being larger than char.

A multi-byte word like a 32-bit unsigned long is then made up of two 16-bit bytes. The C2000 is little-endian, so if we had an unsigned long with the value 0x01020304 at address 0x00000426, it would look like this in memory:

0x00000426 0x0304
0x00000427 0x0102

An example of portability becoming a concern is when we have to take something off of the network. We can’t reuse code to convert network order to the C2000 when it expects 8-bit bytes in the host. We had to write our own just for this.

Endianness is a worry when you have multi-byte words. But also what about 8-bit byte arrays coming in from the network? Do you store each 8-bit network byte in its own 16-bit byte? Or do you pack two 8-bit network bytes into one 16-bit host byte?

Similarly, when we’re sending something out, does the host put just 8-bits in the 16-bit register that holds values going out onto the wire? And is that upper or lower 8 bits? Or pack two 8-bit bytes again?

It’s certainly awkward talking about 8-bit bytes inside our 16-bit bytes, so we just call them octets.

I look forward to learning anything from those portability books that I can apply to the C2000.

  1. That blog post must have driven demand up for that book. When I first ordered it on Amazon for a reasonable price, the seller then canceled my order but I didn’t think much of it. When I searched for another copy, I saw that same seller had the book again, but this time for over $300! It’s a niche area of programming but demand shouldn’t be that crazy.

January 16, 2020

Derek Jones (derek-jones)

for-loop usage at different nesting levels January 16, 2020 04:59 PM

When reading code, starting at the first line of a function/method, the probability of the next statement read being a for-loop is around 1.5% (at least in C, I don’t have decent data on other languages). Let’s say you have been reading the code a line at a time, and you are now reading lines nested within various if/while/for statements, you are at nesting depth d. What is the probability of the statement on the next line being a for-loop?

Does the probability of encountering a for-loop remain unchanged with nesting depth (i.e., developer habits are not affected by nesting depth), or does it decrease (aren’t developers supposed to using functions/methods rather than nesting; I have never heard anybody suggest that it increases)?

If you think the for-loop use probability is not affected by nesting depth, you are going to argue for the plot on the left (below, showing number of loops whose compound-statement contains appearing in C source at various nesting depths), with the regression model fitting really well after 3-levels of nesting. If you think the probability decreases with nesting depth, you are likely to argue for the plot on the right, with the model fitting really well down to around 10-levels of nesting (code+data).

Number of C for-loops whose enclosed compound-statement contains basic blocks nested to a given depth.

Both plots use the same data, but different scales are used for the x-axis.

If probability of use is independent of nesting depth, an exponential equation should fit the data (i.e., the left plot), decreasing probability is supported by a power-law (i.e, the right plot; plus other forms of equation, but let’s keep things simple).

The two cases are very wrong over different ranges of the data. What is your explanation for reality failing to follow your beliefs in for-loop occurrence probability?

Is the mismatch between belief and reality caused by the small size of the data set (a few million lines were measured, which was once considered to be a lot), or perhaps your beliefs are based on other languages which will behave as claimed (appropriate measurements on other languages most welcome).

The nesting depth dependent use probability plot shows a sudden change in the rate of decrease in for-loop probability; perhaps this is caused by the maximum number of characters that can appear on a typical editor line (within a window). The left plot (below) shows the number of lines (of C source) containing a given number of characters; the right plot counts tokens per line and the length effect is much less pronounced (perhaps developers use shorter identifiers in nested code). Note: different scales used for the x-axis (code+data).

Number of lines containing a given number of C tokens.

I don’t have any believable ideas for why the exponential fit only works if the first few nesting depths are ignored. What could be so special about early nesting depths?

What about fitting the data with other equations?

A bi-exponential springs to mind, with one exponential driven by application requirements and the other by algorithm selection; but reality is not on-board with this idea.

Ideas, suggestions, and data for other languages, most welcome.

January 15, 2020

Gustaf Erikson (gerikson)

Dunkirk: Fight to the Last Man by Simon Sebag-Montefiore January 15, 2020 03:20 PM

An accessible read on the fall of France and the evacuation from Dunkirk. This is the first book by Sebag-Montefiore I’ve read and I’m not that impressed.

I did like the attempt to give other viewpoints than the British, though.

Dunkirk-in-memory is weird. I’m sure the recent movie (the reason I wanted to read this book) got a lot of lift from Brexit, and that the Leavers imagine they’re doing something similar. Of course Dunkirk was a crushing defeat, but in that curious British (English?) way, it’s almost more famous than some victories (cf. Scott vs. Amundsen). Perhaps it’s an memory of Thermopylae, as echoed by Hjalmar Gullberg’s poem about the death of Karin Boye:

Ej har Nike med segerkransen
krönt vid flöjtspel och harposlag
perserkonungen, jordens gissel.
Glömd förvittrar hans sarkofag.
Hyllningkören skall evigt handla
om Leonidas’ nederlag.

By far the most chilling parts of the book are the discussions in the War Cabinet on whether Great Britain should seek an armistice with Nazi Germany. Churchill, whatever his faults and motivations, deserves credit for not giving in. Leavers see themselves as heirs to Churchills, but they’re actually followers of Lord Halifax.

January 14, 2020

Mark Fischer (flyingfisch)

MVC website redirecting to logout page immediately after logging in January 14, 2020 06:16 PM

Over the past couple days I have been converting the authentication and authorization method on an existing MVC website to use Auth0, an OpenID provider. During the process of converting the website’s login and logout routines I ran into an issue where no matter what the website would redirect to the logout page immediately after hitting the login route. After much trial and error I finally pinpointed the problem.

In my project’s web.config I had the following code:

<authentication mode="Forms">
	<forms loginUrl="~/Auth/Logout" timeout="2880" />
</authentication>

I changed this to mode="None" and the login page now works flawlessly.

<authentication mode="None"></authentication>

January 13, 2020

Andrew Owen (yumaikas)

A Small Command-line Productivity Tip January 13, 2020 09:00 PM

I really like using the command line for automating tasks. There are some things that a GUI is handier for, but having scripts handy is a very nice place to be.

One trick I’ve started using is writing small wrapper scripts (using batch files at work) for tools like RipGrep or NUnit3-console. This makes it far easier to edit those commands, and save back examples of doing different things, especially for commands with more involved parameter lists. It makes it possible to add small things, like output delimiters between search results, or to do log actions to a file or the like. It also makes it easy to reduce the total amount of typing I have to do in a given command prompt.

An example, using RipGrep, in a file called search.bat on my PATH.

REM banner is a program that spits out "-", as wide as the terminal is
REM it takes a color as its only argument.


banner blue
REM %dp~1 here is a batch-ism for getting the _first_ paramter to a batch script
rg -iw -pcre --glob "!*.xml" %dp~1


REM Other useful argument combos
REM -tcs -tsql -Txml

The nice thing about this approach is that lets me pull up an editor for when I want edit a given command and it makes using multiple arguments to --glob much easier. Were I to start using find on Unix a lot, it’d probably get a similar treatment. It’s definitely a nice thing to have on hand for more involved command-line tools.

Published Jan 12, 2020

Warming up to Unit Testing January 13, 2020 09:00 PM

One of the things that has been a consistent part of my software career is that I often have to approach a given idea or practice at least 3 times before I start to get comfortable with it. It’s happened with managing websites, making video games (my first Ludum Dare was well beyond the 3rd time I’d tried to make a video game), programming languages (both Nim and Factor were languages I approached once, and then again with more experience under my belt), and software develoment techniques. I got a lot more comfortable with Git after I’d had a chance to use Fossil for a while.

All this to say, that I rarely pick up something completely the first time I go at it, so for any younger colleagues that might have been impressed with my mastery of regex or shell scripting, that mastery was not achieved overnight.

Recently, I’ve started to have another go-round at Unit Testing, and this is feeling like the time-around that will have it stick into my habits as well as version control does. And, in the middle of all of this, the reasons why it seems to be sticking around seem to be a convergence of factors, not just a single thing.

For one, some sort of approach to automated testing is being required at work. Were this the only factor, I’d probably pick up a little bit of unit testing practice for work, but it would certainly not be a major factor outside of work. The other thing is that I picked up several books at the end of the year that all ended up talking about unit testing and picking the right abstraction.

The first, and possibly most influential was Mastering Software Technique, and the associated articles, by Noah Gibbs. It doesn’t have anything to do with unit testing specifically, but Gibbs consistently recommended 99 Bottles of OOP (Sandi Metz and Katrina Owen), which does have a decent amount in it about unit testing. I also picked up the book Working Effectively with Legacy Code, by Michael Feathers, mostly because I was looking for books that addressed relatively timeless topics, rather than mostly books that address version X of a given framework.

So, I ended up with a lot of programming books about unit testing. I also found in going through 99 Bottles of OOP, that the unit testing harness available for Ruby is relatively responsive, especially compared to the NUnit Visual Studio Test Explorer. Eventually, some bugs, either in my code, or the Test Explorer, led me to trying the nunit command-line runner. The difference was impressive. It makes me think that the nunit command-line tools get more attention than the test runner, because getting the command-line tools working consistently was a lot easier than keeping the test-runner working.

The last thing that seems to be cementing automated testing in my mind as a worthwhile investment was an article I ran across about Moon Pig. Often, when people tell you to adopt a practice, they don’t do a great job communicating the concrete reasons they adopted it. For me, the story of what and how various things were mocked for testability reasons in Moon Pig, mocking out the storage and time aspects of things, felt like a great starting point for how to mock things for tests, and it included a good, relatively concrete reasons for mocking things out the way they did.

So, now that I have a test-runner that works at a reasonable speed and reliability, and I’m taking in the wisdom of Michael Feathers about Legacy Code, and that I have a good story to refer too when I’m reasoning about how to mock things out, I think unit testing as a practice will be more sticky this time than it was before.

PostScript:

I should also give a shout-out to Kartik Agaram in this. Mu, and how low level he’s taking a test-driven approach to software has definitely informed how I write system-type software, as opposed to business software. PISC’s unit tests weren’t a direct result of hearing about Mu, but Mu’s artificial file system and such definitely got me thinking in that direction.

Published Jan 12, 2020

Ponylang (SeanTAllen)

Last Week in Pony - January 12, 2020 January 13, 2020 01:12 AM

We’ve had a boatload of exciting new RFC proposals in the past few weeks! Check them out at https://github.com/ponylang/rfcs/pulls.

January 12, 2020

Derek Jones (derek-jones)

The dark-age of software engineering research: some evidence January 12, 2020 05:28 PM

Looking back, the 1970s appear to be a golden age of software engineering research, with the following decades being the dark ages (i.e., vanity research promoted by ego and bluster), from which we are slowly emerging (a rough timeline).

Lots of evidence-based software engineering research was done in the 1970s, relative to the number of papers published, and I have previously written about the quantity of research done at Rome and the rise of ego and bluster after its fall (Air Force officers studying for a Master’s degree publish as much software engineering data as software engineering academics combined during the 1970s and the next two decades).

What is the evidence for a software engineering research dark ages, starting in the 1980s?

One indicator is the extent to which ancient books are still venerated, and the wisdom of the ancients is still regularly cited.

I claim that my evidence-based software engineering book contains all the useful publicly available software engineering data. The plot below shows the number of papers cited (green) and data available (red), per year; with fitted exponential regression models, and a piecewise regression fit to the data (blue) (code+data).

Count of papers cited and data available, per year.

The citations+date include works that are not written by people involved in software engineering research, e.g., psychology, economics and ecology. For the time being I’m assuming that these non-software engineering researchers contribute a fixed percentage per year (the BibTeX file is available if anybody wants to do the break-down)

The two straight line fits are roughly parallel, and show an exponential growth over the years.

The piecewise regression (blue, loess was used) shows that the rate of growth in research data leveled-off in the late 1970s and only started to pick up again in the 1990s.

The dip in counts during the last few years is likely to be the result of me not having yet located all the recent empirical research.

January 11, 2020

Gustaf Erikson (gerikson)

Helmet for My Pillow: From Parris Island to the Pacific by Robert Leckie January 11, 2020 02:36 PM

I enjoyed the TV miniseries The Pacific, and this is one of the inspirations for it. Leckie is a good if journeymanlike writer, and the story flows chronologically with no significant pauses. Flashes of class differences, frank discussion of petty criminality and sexual promiscuity, and actual sympathy for the hated enemy enliven the text.

Gergely Nagy (algernon)

2020 January 11, 2020 02:00 PM

I set out to write a year-end retrospective, even ended up with almost a thousand words written, but I got distracted, and never finished. Having sat down to do just that now, I re-read it all, and decided it's best to throw it in the bin. In many ways, 2019 was a terrible year. In some other ways, it was incredible. When I first sat down, I started listing everything that happened, good and bad - but in truth, such a detailed log is not something the great public should read. Yes, yes, that "great" public is likely a handful of people who stumble upon this blog. Doesn't matter. Those who need to know, do. Those who don't, should not.

So instead of looking back, I'll be looking forward.

I will be looking forward to 2020, because we're in our own apartment now, and while not everything's a rosy dream come true (oh, the stories I could tell!), it's a colossal leap forward. It enabled - and continues to enable - us to do so many things we weren't able to do before. We've made friends, plans, and had a lot of fun. I also have a work room, with a lock, making it a million times easier to work from home, and shut out any distractions when immersing myself in keyboard-related work.

As it happens, I've been working for Keyboardio full-time for a while, and it looks like that this might be sustainable for longer than originally planned. Not complaining, mind you, this is a situation I very much enjoy. Doubly so, because I'm also doing similar work for Dygma on the side, and getting paid to do what I love doing is amazing, and I'm incredibly privileged to be in this situation. Every day I wonder when I'll wake up to find out this was just a dream.

Of course, sometimes I have to work on Chrysalis or Bazecor, and then I start to question my life choices, because JavaScript and front-end development in general is something I very much do not enjoy. But alas, that's also work that needs to be done. I guess it's the price to pay for the privilege of being able to work from home, on free software, which I use every single time I interact with the computer, in an area I find great pleasure diving into.

On another note, in 2020, I plan to continue adapting my work environment to be more ergonomic, more comfortable, and more practical. I already have a terrific keyboard, and amazing trackball, an adjustable standing desk. The next thing is adjusting my monitors: they're currently on the desk - I'd love to mount them on the wall instead, on adjustable arms. That'd free up a lot of space on the desk, and would make it easier to have them at the right height, distance, and angle. It would also allow me to reposition more easily: whether I face the window, or away, for example. I'd also love to upgrade at least one of my monitors to something that's easier on my eyes. At a previous job, I loved the 30" 4k monitors, text were crisp, and the high resolution was so much easier on my eyes. I've yet to find one which I like, and can afford. There's a good chance I will need to upgrade my GPU to be able to drive a 4k monitor, too. Plenty of things to figure out here, but this is a goal I'm pursuing this year - because the rest of my work environment is already top notch, and every single upgrade so far made me more productive. It pays off in the long run.

Another thing I want to do in 2020 is write more. Last year, I wasn't exactly prolific, even though there were a lot of things I could've written about - but when it would've made sense to write, I didn't have the resources to do so, and doing it later wasn't a viable strategy. That's just guilt piling up, to the point where I give up and just don't write. So the goal this year is to build writing into my routine, so it does not get postponed. I do enjoy writing! But there was always something more important to do. Writing felt like an administrative burden - it shouldn't. I'm not sure how to tackle this yet.

With working on keyboard-related things full-time (and part-time too), with plans to improve my working conditions, and plans to write more, you can expect more fun stories on these pages. The next one, as a matter of fact, will be the story of two very confused bytes.

Pete Corey (petecorey)

Timing Streams in Node.js January 11, 2020 12:00 AM

On a current client project, I was tasked with optimizing a very large, very slow, very CPU-bound stream-based pipeline. Before I even started to think about optimizing this pipeline, I needed an objective way to measure the execution time of each step of the pipeline.

Imagine the pipeline in question looks something like this:


pipeline(
    httpStream,
    decodeStream,
    parseStream,
    batchStream,
    processStream
);

We’re reading in a stream of JSON-encoded events (httpStream), making sure they’re appropriately decoded (decodeStream), JSON parsing each incoming event (parseStream), batching events together (batchStream), and finally processing each batch of events (processStream).

Ideally I’d like to measure any or all of these individual steps.

However, many of these stream implementations are out of our hands. We can’t easily reach in and add timing code. Thankfully, we can easily write a function that decorates a provided stream with a simple runtime calculation.

Let’s call our decorator function time:


const time = (stream, name) => {
    return stream;
};

Our time function accepts and returns the stream we’ll be decorating, along with a name that describes the provided stream. It should be noted that it’s assumed that stream implements the Readable interface.

What we’re trying to accomplish here is relatively simple. We want to measure the amount of time that elapses between data emission events on our stream. We can use console.time/console.timeEnd and an event listener to make short work of this task:


const time = (stream, name) => {
    let timing = false;
    stream.on('data', () => {
        if (timing) {
            console.timeEnd(name);
        }
        console.time(name);
        timing = true;
    });
    return stream;
};

Every time we receive a 'data' event on our stream, we log the duration since the last received 'data' event, and start a new timer. We’re using a timing flag to ensure that console.timeEnd isn’t called the first time we receive a 'data' event.

Notice that we’re also using the provided name as the label in our console.time/console.timeEnd calls. This keeps us from getting confused when we start measuring multiple stages of our pipeline.

This solution mostly works. Unfortunately, a data event isn’t fired when the stream starts processing its first chunk of data. This means that we’re missing a measurement for this first chunk of execution time. Thankfully, we can capture that missing metric by also listening for a 'resume' event, which is called when the stream starts processing its first chunk of data:


const time = (stream, name) => {
    stream.on('resume', () => {
        console.time(name);
    });
    stream.on('data', () => {
        console.timeEnd(name);
        console.time(name);
    });
    return stream;
};

Notice that we’re no longer concerned about wrapping our console.timeEnd call in a guard in our 'data' event listener. We know that the 'resume' event handler will always call console.time before we reach our 'data' event handler, so we have no need for the timing guard anymore.

We can use our time function by decorating any or all of the stages of our pipeline:


await pipeline(
    httpStream,
    decodeStream,
    parseStream,
    time(batchStream, 'batch'),
    time(processStream, 'process')
);

Now that our runtime durations are finding their way to the logs, we can either use them as-is, or take things a step further and aggregate them for more in-depth data analysis:

...
batch: 258.213ms
process: 512.493ms
batch: 239.112ms
process: 475.293ms
...

As a warning to the reader, I’ll be the first to admit that I’m no stream expert. That said, this utility function proved invaluable to me, so I thought I’d record what I learned and pass it along for posterity.

Stream on.

January 10, 2020

Jeremy Morgan (JeremyMorgan)

The Developer Tool You Can't Live Without January 10, 2020 10:48 PM

I'm introducing a text / code generation tool that you will fall in love with. If you're a developer or someone who works with text or tabulated data you need this tool.

It's called Nimble Text and it's awesome. Here's how the developer of Nimble Text describes it:

You can be more awesome at your job by keeping this tool always within reach.

NimbleText is a text manipulation and code generation tool available online or as a free download. It magnifies your ability to perform incredible feats of text and data wrangling.

So it's a bold claim to say this will make you better at your job. Sounds crazy right?

I have been using this for years (since 2011-2012) and I can tell it's certainly made me more effective.

Download it and follow along.

How Nimble Text Works

Nimble Text

If you look at the screen that comes up when you first run it, you can get a really good idea of how it works.

You paste in some data, usually in columns and rows (Comma, tab separated, etc.) You put in your pattern ($0, $1, etc represents the columns) For each row of data it will substitute the values and display the results.

in the sample above, you can see rows of data that appear to be last name, first name, company name.

So let's look at the top row. In our substitution pattern we're creating an email and it shows $1 (2nd column, starts at 0) which we know is a first name. Then we have a period, and then $0 which we know is the last name, then @ $2 .com which we assume will make Initech.com.

One of the coolest parts of this is that the pattern doesn't need to be line by line and you don't need to be a Regex expert to do this.

Here's another example of how you can quickly add quotes around CSV values. This is using data from Mockaroo.

So we take this CSV file and dump it in the input:

Nimble Text

Then we use this simple pattern, which puts quotes around all the values:

Nimble Text

We press generate, and get this:

Nimble Text

It's that easy! But this is isn't really impressive, because you probably aren't doing a ton of CSV format conversions on a daily basis. But there's a lot of potential here.

How I Use This as a Developer

So there are tons of things you can do with this that are explained on their website. You can do cool things like remove leading trailing and leading spaces or convert spaces to tabs. I love using things like converting to Camel Case and I've done weird stuff with Base64 encoding.

I won't repeat what's already been written there. I'll tell you how I've been using it all these years.

Let's take our sample data set:

Nimble Text

And we'll see what we can do with it.

Create JSON

Let's say I want to make JSON out of this. I would put in a pattern like this:

{ "id": $0, "first_name": "$1", "last_name": "$2", "email": "$3", "gender": "$4", "ip_address": "$5" }

Nimbletext then prints this out:

Nimble Text

and it will repeat for every row of data. Very cool, and easy.

Make some objects

So as a C# Developer sometimes I'd generate fake data and then use it for Unit testing. With our sample data, I would create a class like this:

```csharp public class Person {

public string Name { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string Email { get; set; }
public string Gender { get; set; }
public string IPAddress { get; set; }

} ```

Then, let's say I want to create a collection of these objects. I then enter a pattern like this into Nimble Text:

new Person { Id = $0, FirstName = "$1", LastName = "$2", Email = "$3", Gender = "$4", IPAddress = "$5" },

Then, I click "calculate" and get this:

Nimble Text

Which generates a nice set of mocked objects for testing:

```csharp List people = new List {

new Person { Id = 1, FirstName = "Derick", LastName = "Giorgietto", Email = "dgiorgietto0@freewebs.com", Gender = "Male", IPAddress = "193.214.16.47" },
new Person { Id = 2, FirstName = "Jorey", LastName = "Bertomieu", Email = "jbertomieu1@pcworld.com", Gender = "Female", IPAddress = "228.52.120.198" },
new Person { Id = 3, FirstName = "Jordana", LastName = "Ofield", Email = "jofield2@mashable.com", Gender = "Female", IPAddress = "242.56.206.162" },
new Person { Id = 4, FirstName = "Zelda", LastName = "Pett", Email = "zpett3@google.nl", Gender = "Female", IPAddress = "53.184.3.220" },
new Person { Id = 5, FirstName = "Malia", LastName = "McCuffie", Email = "mmccuffie4@noaa.gov", Gender = "Female", IPAddress = "100.137.97.15" },
new Person { Id = 6, FirstName = "Juliet", LastName = "Sivior", Email = "jsivior5@scientificamerican.com", Gender = "Female", IPAddress = "77.243.6.34" },
new Person { Id = 7, FirstName = "Trista", LastName = "Filde", Email = "tfilde6@narod.ru", Gender = "Female", IPAddress = "24.158.23.9" },
new Person { Id = 8, FirstName = "Bartlet", LastName = "Pankhurst.", Email = "bpankhurst7@cmu.edu", Gender = "Male", IPAddress = "61.253.135.113" },
new Person { Id = 9, FirstName = "Giorgi", LastName = "Verbeke", Email = "gverbeke8@utexas.edu", Gender = "Male", IPAddress = "2.43.176.188" },
new Person { Id = 10, FirstName = "Issy", LastName = "Ramplee", Email = "iramplee9@com.com", Gender = "Female", IPAddress = "53.253.248.96" }  

}; ```

I have done this countless times over the years. Once you get it into your regular workflow, mocking up data takes seconds.

SQL Statements

You can even make SQL statements like this:

Pattern:
$ONCEINSERT INTO Person (id, first_name, last_name, email, gender, ip_address) VALUES $EACH($0, "$1", "$2", "$3", "$4", "$5"), The $ONCE variable prints the first statement, then $EACH loops through after that. So you get this in return:

sql INSERT INTO Person (id, first_name, last_name, email, gender, ip_address) VALUES (1, "Derick", "Giorgietto", "dgiorgietto0@freewebs.com", "Male", "193.214.16.47"), (2, "Jorey", "Bertomieu", "jbertomieu1@pcworld.com", "Female", "228.52.120.198"), (3, "Jordana", "Ofield", "jofield2@mashable.com", "Female", "242.56.206.162"), (4, "Zelda", "Pett", "zpett3@google.nl", "Female", "53.184.3.220"), (5, "Malia", "McCuffie", "mmccuffie4@noaa.gov", "Female", "100.137.97.15"), (6, "Juliet", "Sivior", "jsivior5@scientificamerican.com", "Female", "77.243.6.34"), (7, "Trista", "Filde", "tfilde6@narod.ru", "Female", "24.158.23.9"), (8, "Bartlet", "Pankhurst.", "bpankhurst7@cmu.edu", "Male", "61.253.135.113"), (9, "Giorgi", "Verbeke", "gverbeke8@utexas.edu", "Male", "2.43.176.188"), (10, "Issy", "Ramplee", "iramplee9@com.com", "Female", "53.253.248.96")

Easy as pie! Anything you can do with JavaScript you can do with Nimble Text.

HTML Tables

So this is odd but something I've had to do in the past, and Nimble Text works great for it. Here's how you create an HTML table for our sample data:

$ONCE<table> $ONCE <tr><th>ID</th><th>First Name</th><th>Last Name</th><th>Email</th><th>Gender</th><th>IP Address</th></tr> $EACH <tr><td>$0</td><td>$1</td><td>$2</td><td>$3</td><td>$4</td><td>$5</td></tr> $ONCE</table>

Click generate and there it is:

Nimble Text

A usable HTML table!!

Conclusion

This tool will help you become a better, more effective developer. You can use it for all kinds of code generation. I've done some crazy things with the Keywords and Functions in this program. You can write code to generate code.

I've seen the most benefit with time savings. Things like mocking data or processing CSVs are boring and tedious. When you're doing boring tedious things you make mistakes. So once you work this tool into your workflow you'll work faster with fewer mistakes.

Once it becomes a part of your routine you won't want to work without it.

Download it, try it out, and let me know what you think!

José Padilla (jpadilla)

Podcast: DjangoChat Ep. 45 January 10, 2020 05:14 PM

Back in November I was invited to DjangoChat to talk about my contributions to the Django ecosystem. We talked about how I got started contributing to Django REST Framework and other open source projects, my work on Blimp and FilePreviews, authentication, and my current role at Auth0.

Recording this was a lot of fun. Thanks to William Vincent and Carlton Gibson.

January 09, 2020

Patrick Louis (venam)

Will Lebanon Collapse or Will it Survive January 09, 2020 10:00 PM

Cedrus libani

“Collapse”, the word that is on everyone’s lips in Lebanon. What does it mean, will Lebanon fall or survive, and what does the future have in store? “We can predict everything, except the future”, I hear someone say, but can we at least get some possibilities.

Primo, we have to define what is a societal collapse and how it’s related to an economic collapse.
The definitions are broad, a societal collapse can be about a simple change in leadership or governance, a whole change in cultural dynamics like merging with another society and forming an inter-regional structure, the disappearance of traditions and ways of living (aka anomie, you learned a new word today), a population scattering and leaving a geographical area, or the annihilation of a population. Even though we imagine a collapse as being sudden, it can still happen very slowly.

Some scholars, enamored with population studies and social Darwinism, posit that societal collapses are a normal response to population crisis. Their case is that in any society resources will eventually get depleted, be it due to overpopulation or other reasons. When it reaches such state, according to their studies on mammals, the response from the population will be to switch from a cooperation model and parental behavior to a model of competition, dominance, and violence. All together, leading to a societal collapse, which will balance the population and stabilise it again. The cycle oscillating back and forth. Game theorists would also endorse such ideas.
No wonder there’s so much aversion to social Darwinism and cold-hearted economists.

Therefore, an economic collapse is often correlated with a societal collapse, it could happen before or after.

Secondo, what can be the reasons for collapse.
According to Jared Diamond in his popular book “Collapse: How Societies Choose to Fail or Succeed”, there are five key reasons:

  • Environmental degradation (depletion of resources, including overpopulation)
  • Changes in the climate
  • Hostile neighbors
  • Weakened trading partners
  • Absence of cultural resource and attitude to tackle these issues

I can already sense the smirk on your face but before diving into those, let’s mention another position by Joseph Tainter.
Tainter puts four axioms that he says are a must to understand collapses.

  • Human societies are problem-solving organizations;
  • Sociopolitical systems require energy for their maintenance;
  • Increased complexity carries with it increased costs per capita; and
  • Investment in sociopolitical complexity as a problem-solving response reaches a point of declining marginal returns.

The smirk didn’t disappear, am I right?

Let’s see, do we have environmental degradation in Lebanon. Maybe we do:

Do we have a change in climate, maybe we do:

I don’t think I have to even quote articles about our hostile neighbors.

And what about our trading partners, did we loose any?

Most importantly, what about our cultural attitude, what Jared Diamond calls axiological flexibility: the ability of a society to change its cultural values.
Indeed, our sectarian and political divisions implant in us an unprecedented cultural stiffness. The list of things we are banning keeps getting longer.

So that was that, and I feel like I’ve skipped a lot of things..
Regarding Tainter’s view, what can we say, has Lebanese society become too complex for its own good?

How much energy in our system are we using just to maintain our sociopolitical complexity, I’d say quite a lot. And isn’t this a reiteration of our initial definition of collapse: resources getting scarce. Things are getting circular.
While it’s easy to scapegoat a single factor like a certain political group, it’s more about the whole culture that is to blame.

Still, despite all this pessimism, Lebanese strive for better lives. We can’t stop and we can’t let our hopes down because of some environmental determinism explanation. That would be inconsiderate of the humanity in people. A simple explanation can’t possibly explain events in the middle-east, right? What about Black Swan events, those unprovable major game changer events that are inappropriately rationalized later on, as the Lebanese author Nassim Taleb puts it.

Can we somehow grab some hope from somewhere, anything? What can we do to boost the economy and the society.

In my opinion, Lebanon, as a small country, has to take advantage of the intangible economy to thrive. So what is needed to create a boom in the intangible economy?

In the book “Capitalism Without Capital: The Rise of the Intangible Economy”, three societal criteria are a must for a small country to take advantage of the intangible economy.

  • Societal and external trust
  • Small power-distance
  • Societal openness to experience

Apart from this the country can create hubs by having:

  • Low renting price
  • High synergy places, places where it’s pleasant to meet others
  • Unambiguous laws that attract international companies

Oh well… You got that smirk back on your face I presume!

Although it may sound gloomy, I’ll end this article on a note from the sociologist Charles Fritz in his 1961 paper. Surprisingly, he asks “Why do large-scale disasters produce such mentally healthy conditions?” Bizarrely, a catastrophe such as a societal and economic collapse doesn’t necessarily result in social breakdown and trauma but may lead to greater cooperation between groups.

Maybe this is what we’re the spectator of with this Lebanese revolution.











Attributions:

  • ALBA-BALAMAND [CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)]

Nikola Plejić (nikola)

Book Review: "How to Read a Book" January 09, 2020 01:11 PM

I don't really review the books I read, but I decided to write a short blurb about the oft-recommended "How to Read a Book: The Classic Guide to Intelligent Reading" by Mortimer J. Adler and Charles Van Doren. I've originally posted it on Goodreads, but I'm publishing it here, too, for good measure:

There's a lot to not like about this book: the slightly hermetic style, the occasional sexist slur, the subtly condescending tone, its exclusive--and, grantedly, somewhat apologetic--orientation to the "Western" literary canon, and the fact that the "recommended reading list" includes a single non-male author.

Keeping in mind that it was written in the 1940s, and despite these non-negligible shortcomings, I still find the book thoroughly insightful and valuable for what it is: a manual for analytical and comparative reading of "difficult" books, for whatever definition of "difficult" the reader might choose. It's a deeply practical book, sometimes to a fault, and many of its takeaways might seem obvious. Yet, when outlined in a systematic and formal way, with plenty of examples and illustrations, I believe they give a good framework for approaching demanding literature.

Most importantly, the book forces you to think critically about the act of reading, and this might be its greatest contribution of all: it has certainly made me think about the way I approach books, and it has given me a few new tools to do so.

Gustaf Erikson (gerikson)

The Stars My Destination by Alfred Bester January 09, 2020 10:08 AM

For some reason I’ve not read this classic from 1956 before. I’m glad I did.

Although this follows the basic Count of Monte Cristo formula, it has enough SF concepts for many novels. The central conceit of personal teleportation implies disease spread, new forms of criminality, new forms of urban development, threat of inter-system war - all summarized in breezy paragraphs.

Bester has also thought about the implications for those who because of brain damage or injury cannot “jaunt” - rehabilitation, or degrading slave wage labor at workplaces where jaunting is impractical.

The view of women is from its time, but could be plausibly explained by a neo-Victorian reaction in the novel. The female characters are thinly drawn, but not devoid of agency.

Unrelenting Technology (myfreeweb)

Firefox content process sandboxing with Capsicum: it’s alive! January 09, 2020 01:37 AM

Firefox content process sandboxing with Capsicum: it’s alive! Work in progress, but I have WebGL and audio working :)

January 08, 2020

Patrick Louis (venam)

Professional Software Engineering Topics And Practices January 08, 2020 10:00 PM

Roman blacksmith

As the field of SE/CS is getting more press, graduates are flooding the market. Yet, the curriculum given in many universities still seems barren when it comes to professionalism, forcing newcomers to learn via unpolished street creds. Not only is it leading to mysticism about what skills are required but is also leading to a lack of discipline, duty, and craftsmanship.

We would be tempted to assume that those graduates would be instantly digitally/web literate and professional in their field, however, with the ever increasing volume of knowledge stacking up, we’re at loss. What we’re left with as a solution is the slow and painful process of passing the flame via mentorship and deliberate practice. And to be fair, universities may not be the place for such type of learning.
With this in mind, let’s review some of the topics and practices that are important to know. Let’s write what seems dull but that no one dares to express.





How to Ask Questions

It is easy to underestimate the value of asking questions properly. While in a class setting we’re eager to say “No such thing as a stupid question”, it’s not quite the same in the software world (Though most people asserting they’ll be asking a stupid question are the ones asking legit ones). Questioning turns into an introspective activity that is honed over the years.

Before asking a question, it is expected that a decent amount of research be done such that it can be formulated without ambiguity or misinterpretation and with confidence, basically knowing exactly what you want to ask. That means searching the web, checking online manuals, reading FAQ (Frequently Asked Questions), diagnosing and experimenting, and even reading the source code if possible. A lot of common questions have been answered at least once. With all this, the question should be precise and concise.
Unsurprisingly, this is enough to answer the vast majority of questions. For the rest that can’t be answered alone, we have to try our luck and ask the appropriate persons. Finding them is a quest on its own, unless they are sitting close by, it could be a forum, stack overflow, an IRC channel, a mailing list, etc.. When the medium of discussion is found, the question should be formulated in a way that respects it, is easy to reply to, has an accurate title if requested, has no grammatical errors, is not overly verbose, and again, be precise and informative.
After posting the question, there may or may not be replies to it. In a lot of cases there isn’t and we should go back to research. This is why many foresee the complexity of the question they are about to ask, and the lack of experts on the topic, and prefer not to ask and to stick to research only, sometimes later on posting their findings in blog posts and articles. In case there are replies but with misinterpretation of the question, then the wording choice might have been inadequate and a reformulation can be done. And if someone replies with RTFM (Read the Fucking Manual) or anything rude, don’t worry this is the internet, do not take it personal.

Searching Online

A lot of questions get asked not because they weren’t researched enough but because the searching methodology itself wasn’t grasped. Even though most people use search engines everyday, not everyone knows how to use them when it comes to advanced subjects. Indeed, like any tool, there’s a skill set to learn.

The first thing to realize is that search engines are not omnipotent, they cannot read minds, and they’ll always give the easiest and most popular results for the query they are given. In sum, it’s a constant battle. That means that full sentences are not interpreted if they aren’t part of a preset. While sentences like “What is the weather today” may work on some search engines like Google, it is the exception rather than the norm when it comes to searching online. For anything else the search engines usually split the terms by spaces, search each terms individually, and return links that match as many as possible, prioritizing them by popularity, sometimes respecting the order of the words given, and sometimes not. So how do we cope with this?
To let search engines know what we want, we have to speak their language. For example, putting words inside quotes will tell the search engine that we absolutely want them in the results, prepending a word with minus will tell it to omit results that contain it, using keywords such as “site:” and following it by a domain will force the search engine to only have results from this domain. Those are ways to calibrate the search criteria.
In theory, with this method we can find what we want. However, what if we don’t know what we want, what if we’re unaware of the exact wording to use, what if what we want doesn’t match what exists on the web, what if what we want is buried under thousands of popular clickbait articles, or what if what we’re searching for is based on something we forgot and are trying to remember. Thus, we have to enter an iterative process with the search engine: We ask it something, check the results, change the query to get different results based on what we’ve got, and repeat everything until we find what we’re looking for.
This iterative search, surprisingly, is something not a lot of people learn, or are comfortable with, but that is mandatory in the tech world. Additionally, it’s this sort of procedure that leads to the discovery of interesting topics. Nonetheless, it requires patience and a hunch for knowledge, the so called “learning how to learn”.

Roles in a Team

A lot of early entrants in the field have the dream of becoming the media fetishised lone-wolf developer superstar, fortunately, they couldn’t be further away from the truth. In the professional context, getting entranced by a condescending feeling about one’s own skills is counter-productive. On the contrary, teamwork, understanding, and humbleness, are what drive productivity forward.
Any project bigger than a certain size has dynamics that are better managed as a group endeavour. In well functioning teams, members are expected to handle their responsibilities with professionalism and respect other’s professionalism. Furthermore, each task should be properly delegated to the suitable person or team while simultaneously navigating boundaries between roles. In contemporary workplaces, everyone is a leader in their own role, they should be dependable and are expected to move forward without being commanded and micromanaged at every step.
Although freelancing might be appealing, it requires the maturity of enveloping all the usual roles found in a team. Let’s take a look at what roles can exist.

  • Stakeholders: Generically meaning anyone that has a stake in the project
  • Engineering manager: Responsible for delegating task to people, managing people
  • Product manager: Knows everything about the project, the what of the project
  • Software, project lead: Implements the project with the help of a team, knows how to do it
  • Software architect: Designs, documents, analyses requirements and help the team keep on track to have a solution that stands the test of time
  • Software engineer: Generic term for someone that works with software
  • IT, networking infrastructure: Is responsible for physical infrastructures, usually related to network, and their configuration
  • Operation team: A role that is slowly getting deprecated, they are tasked with going on premise or remotely installing, packaging, configuring, monitoring, and supporting a product and system once it is live.
  • Database administration: The keeper of the data
  • Site reliability Engineer: Sort of an upgrade on the operation team with the addition that they focus on scalable and reliable systems
  • QA (quality assurance): Ensures the software is up to quality standards before going live by making it pass tests, sometimes in a pipeline. This can also include following live metrics
  • Graphic UI/UX designer: Is in charge of graphics and images, how people will use the interface, what it will look like. They have back and forth behaviour testing and can also be interested in live metrics regarding this
  • Frontend engineer: Engineers that will implement the front-facing, graphical part, of the application
  • Backend engineer: Engineers that will implement the business rules of the application which the front-facing part relies on
  • DevOps: A mix between a software engineer and an operation engineer, usually in an environment where continuous integration is in place
  • Senior engineer: Someone that has worked long enough on projects in the company or with a certain technology to know it from head to toe
  • Consultant: An expert hired by the company to give inputs and guidance in a project
  • Fullstack engineer: An engineer burdened with doing frontend, backend, operation, and database management

Reading and Writing Documentation

Contrary to popular beliefs, programming isn’t about mindlessly click-clacking on the keyboard from 9-to-5 and then clocking out, nor are most jobs these days for that matter. A big chunk of time is allocated to reading and understanding requirements, researching, and then planning how to tackle problems. Consequently, getting comfy with technical documents is a must.
It’s not uncommon to be spectator to the shocked look on new developers faces as they open big documentation pages for the first time. The sight of such gargantuan documents, to a generation that hates reading and has a decreasing attention span, appears like a monster impossible to tame. However, it’s all because they approach it with the wrong mentality; they are too accustomed to having to memorize and being interrogated on anything they’ve set their eyes upon. Instead, documentation should be taken as a treasure hunt: finding, extracting, and translating value into workable solutions to problems.
Similar to how we read research papers, documentation should be read from the generic to the specific, focusing on units of understanding. It could be read from “cover to cover” but it usually isn’t. Let’s also mention that there are different types of documentation, each with a different aim: tutorials, how-to, explanations, and reference guides. Though the latter is what we generally call documentation.

We recognize the high value of documentation when working with third party systems/libraries and as such should transpire this value into our own practices. Newcomers, which should already be familiar with inline comments within code, should be introduced to writing documentation using a format such as markdown, and to use diagramming tools to illustrate things.

Debugging

Junior engineers and graduates have surely heard about debugging but how many of them have actually used a debugger or thought about debugging software at scale. Debugging is an arduous process that requires extreme patience and calmness, two zen-like characteristics that are only acquired by strenuous hours spent doing it.
Debugging can be executed via simple printing, via logging (distributed or not), via tracing, or via debugging tools that let us step into the code. Debugging can be enabled by configuration, by debug releases, by compiler flags, or at the test pipeline. Debugging sessions are going to take a sizable amount of time if it’s not facilitated in some way.

Logging

Analogous with debugging, logging is something we appreciate once a project reaches a certain scale. And, as with the rest of this article, it’s not often in the awareness space of junior developers.

Logging is an art and everyone is a critic of how it’s done. It can have levels, ranging from info to error, it can have a specific format, it can be stored in different places (file or db), it can have a rotation scheme, it can be integrated in the software or delegated to another (like the operating system), and it can be distributed or centralized.
The crucial part to consider is context, logging is all about keeping the context alive and being able to track it when needed.

Working From Requirements

Unlike universities and other academic institutions, workplaces rarely give instructions on how to implement solutions. They more likely share abstract requirements and user stories. The burden is on the developer to massage their way around those requirements and create a roadmap and an estimation.

The expertise to compact the requirements into a design that can openly be discussed and defended with coworkers during meetings is what is needed. Although, an enterprise architect could limit the choice of technologies, even to ones you dislike, there’s still the flexibility of how to use them.
Depending on the situation and what is agreed upon with management, choosing a method of implementation can be done based on either: experience-based analogy, back-of-the-envelope, thought experiment, simulation, prototype, experiment, or instrumentation. Then the implementation should be weighted according to its strength and weaknesses, which should be measurable. Indeed, there’s always a trade-off between the so-called “-ilities”: availability, interoperability, modifiability, performance, security, testability, usability, etc..
In spite of all this, we should avoid analysis paralysis and work on the project while still avoiding technical debt.

Programming Paradigm

Programmers should comprehend the differences between the three main programming abstraction paradigm: procedural programming, object-oriented programming, and functional programming. Additionally, the knowledge of different languages, their advantages and disadvantages, is a plus. From interpreted, to just-in-time compiling, to compiled languages.
There are also high-level paradigms that can be interesting to add to a developer’s toolbox: AOP (Aspect Oriented Programming), dynamic programming, event-driven programming, natural language programming, parallel programming, meta-programming, microsystems, etc.. There’s so many of them.

Design Patterns and Modularity

The previous section leads to this one, actualizing higher paradigms into common patterns. These design patterns have been collected and grouped over the years because they emerged in a similar fashion from people facing similar issues.
Learning to take advantage of them, anticipating change and minimizing complexity, is a must for professionals. Developers should have at least, once in a while, a dive into design patterns, like the GoF, to not feel out of touch.

Best Practices, and Reviews

Each programming language, each programming environment, each programming culture, and each company, has its own set of standards and best agreed practices.
As craftsmen, these standards are our golden rules, they are what creates cohesion between people, from the syntax formatting style used within projects, to its file structure.
Who chooses those rules depends on where they are applied, it could be a consortium, it could be the enterprise architect, it could be a senior engineer, or it could be a project lead.
In companies, the checking for compliance with the best practices normally happens as a code review session where a peer makes sure the code to be committed is up to the standards.
Here’s a list of common things that can be checked:

  • The code is correct, follows the requirements, and achieves the result the stackeholders desire
  • The code is easy to read and easy to understand
  • The code follows the Syntax styling
  • The code has test units
  • The code uses strong type
  • The code has too much repetition (DRY)
  • The code follows defensive programming
  • The design choice is good
  • The design follows SOLID principles
  • The lack or not of documentation

Test units

Another way to enforce standards is to enforce them programmatically through test units. Test units are composed of a battery of tests for verifiability (inputs and outputs to an oracle), for respect of the architectural requirements, for interoperability checks, for speed monitoring, and for all the other “-ilities”.
When a code is testable it is also manageable.

Other types of tests include static tests, end-to-end tests, black-white box tests, integration tests, regression tests, acceptance tests, etc..

Storage Technology

Business is nothing without its data and by extension, its data storage. Therefore, making the right decision on which technology to use is crucial as data migrations are expensive and risky. To make a decision we need basic understanding of the general characteristics between storages and map them to our requirements.
Are we storing temporary information or long term ones, does caching suffice. Are we going for Atomicity, Consistency, Isolation, and Durability (ACID) or Basic Availability, Soft state, and Eventual consistency (BASE)? Are we going to use an Object Relational Model (ORM) to abstract access to the storage or are we going to write the queries ourselves. Are we going to have redundancy for the storage, and how is the syncing going to be done. Are we choosing a distributed data storage. Are we going for sql, nosql, or something else.

Configuration Mechanisms

It’s rare to encounter softwares that aren’t flexible and don’t allow their users to configure them. A certain ease with textual configuration formats is required from professional software engineers. At a glance, it should be obvious if we are dealing with YAML, XML, JSON, or a custom Domain Specific Language (DSL). Everyone develops their own preferences.

Package Managers

Since the 60s, people have been sharing code amongst themselves for convenience purposes. Initially, it was done manually by FTP and email but nowadays we share libraries through the package manager of the language we are using, or the operating system global package manager.
Seeking dependencies, managing them, and choosing them isn’t always straight forward. Does it do what we want, does it come with documentation, is it open source, and is it maintained and will it be maintained in the future. Those are questions that should be asked before committing to a third party library.

Content Versioning System

How can a team work on the same codebase without conflict and how to know who did what at which time, this is what a content versioning system resolves. Well, that’s in theory. In reality it’s not sufficient, we also need to follow a development methodology to avoid messiness.
Git and svn are the most used CVS, if someone isn’t acquainted with them they’ll soon be, willingly or not. If someone hasn’t encountered the meaning of “merging”, they’ll soon be.
To make this less painful, shared code should be compartmentalized into separate tasks, committed to the versioning system with a meaningful traceable history.

Methodologies

CVS is to share code, the rest: the conversation, assignment of tasks, bug reports, and the workflow, are done on collaboration and workflow tools such as Github, Gitlab, Bitbucket, Fogbuzz, Jira, Trello, etc.. They all have different usages but have in common the collaboration, communication, and transparency components. So what’s a development methodology.
Software development methodologies are management techniques to split work into distinct phases, it’s also called a software development lifecycle (SDLC) as it tracks the software life/features/releases.

It’s trendy to talk about methodologies due to the fact they are central to the proper building rhythm of software. They’re part of the daily routine of engineers. Agile methodologies, such as scrum, XP, lean, are getting traction with their fast pace and are superseding the bad reputation of waterfall methodologies. As much as it’s tempting to give advices to newcomers regarding this topic, there’s nothing that beats hands-on experience and making ones own judgement about particular methodologies. Watching videos and arguments should give an idea of the situation.

Continuous Development, Pipelines, Deployment, Fast Release Cycles

At some point in time the software needs to go live, it needs to be shipped to the customer. To accomplish this we could delegate it to the QA team, then the operation team if everything is in order. That’s a manual pipeline to reach production.
What we call continuous development is when we automate as much as possible of the deployment phase; packaging automatically after code checkout, testing automatically, and releasing online automatically. Jenkins, Spinnaker, Travis CI, and GitLab CI, are popular names of continuous integration tools.

Maintenance

Last but not least, enters the maintenance of software. Maintenance can be split into two parts, one is about monitoring the health of a running system and the other about upgrading legacy systems. They don’t have the same meaning but both are regarded as maintenance.

Monitoring is where the importance of logging, instrumentation, and reporting come into play. Some requirements and bugs can only be discovered in production and so we need a way to record what happens in our application and keep monitoring it for anomalies. A reliable system would send alarms to a response team in case of failure.

As for upgrading and working with legacy code, it isn’t instinctive at all. Things that may feel simple on the surface may turn out complex and we’ll keep wondering where the complexity emanates from. To keep our sanity we have to remember Chesterton’s Fence principle, that maybe there’s a reason for the complexity.
This takes us back to our first section about asking questions. We may need to ask seniors clarifications to be able to understand the whole system, which may not look impressive from the outside, but that somehow required the complexity it currently has… And perhaps our questions will remain unanswered.

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”

Conclusion

That’s it, so much content written while still not diving into any topic.

I’m aware that I’ve presented an idyllic scenario that might not be applicable to your current job. However, that doesn’t remove anything from their weight or the ability of newcomers to learn and deliberately practice them on their own time.

My hope is that having this out in the wild will act as a mini-guide to new software engineers that want to get started in their professional lives but that don’t feel like university and online courses are enough.

Have a fantastic career!











More on the topic:

Attributions:

  • Unknown. The author of the image is not given in the book, although it might be the author of the text entry Josef Pokorný. [Public domain]

January 05, 2020

Derek Jones (derek-jones)

Performance variation in 2,386 ‘identical’ processors January 05, 2020 10:41 PM

Every microprocessor is different, random variations in the manufacturing process result in transistors, and the connections between them, being fabricated with more/less atoms. An atom here and there makes very little difference when components are built from millions, or even thousands, of atoms. The width of the connections between transistors in modern devices might only be a dozen or so atoms, and an atom here and there can have a noticeable impact.

How does an atom here and there affect performance? Don’t all processors, of the same product, clocked at the same frequency deliver the same performance?

Yes they do, an atom here or there does not cause a processor to execute more/less instructions at a given frequency. But an atom here and there changes the thermal characteristics of processors, i.e., causes them to heat up faster/slower. High performance processors will reduce their operating frequency, or voltage, to prevent self-destruction (by overheating).

Processors operating within the same maximum power budget (say 65 Watts) may execute more/less instructions per second because they have slowed themselves down.

Some years ago I spotted a great example of ‘identical’ processor performance variation, and the author of the example, Barry Rountree, kindly sent me the data. In the weeks before Christmas I finally got around to including the data in my evidence-based software engineering book. Unfortunately I could not figure out what was what in the data (relearning an important lesson: make sure to understand the data as soon as it arrives), thankfully Barry came to the rescue and spent some time doing software archeology to figure out the data.

The original plots showed frequency/time data of 2,386 Intel Sandy Bridge XEON processors (in a high performance computer at the Lawrence Livermore National Laboratory) executing the EP benchmark (the data also includes measurements from the MG benchmark, part of the NAS Parallel benchmark) at various maximum power limits (see plot at end of post, which is normalised based on performance at 115 Watts). The plot below shows frequency/time for a maximum power of 65 Watts, along with violin plots showing the spread of processors running at a given frequency and taking a given number of seconds (my code, code+data on Barry’s github repo):

Frequency vs Time at 65 Watts

The expected frequency/time behavior is for processors to lie along a straight line running from top left to bottom right, which is roughly what happens here. I imagine (waving my software arms about) the variation in behavior comes from interactions with the other hardware devices each processor is connected to (e.g., memory, which presumably have their own temperature characteristics). Memory performance can have a big impact on benchmark performance. Some of the other maximum power limits, and benchmark, measurements have very different characteristics (see below).

More details and analysis in the paper: An empirical survey of performance and energy efficiency variation on Intel processors.

Intel’s Sandy Bridge is now around seven years old, and the number of atoms used to fabricate transistors and their connectors has shrunk and shrunk. An atom here and there is likely to produce even more variation in the performance of today’s processors.

A previous post discussed the impact of a variety of random variations on program performance.

Update start
A number of people have pointed out that I have not said anything about the impact of differences in heat dissipation (e.g., faster/slower warmer/cooler air-flow past processors).

There is some data from studies where multiple processors have been plugged, one at a time, into the same motherboard (i.e., low budget PhD research). The variation appears to be about the same as that seen here, but the sample sizes are more than two orders of magnitude smaller.

There has been some work looking at the impact of processor location (e.g., top/bottom of cabinet). No location effect was found, but this might be due to location effects not being consistent enough to show up in the stats.
Update end

Below is a png version of the original plot I saw:

Frequency vs Time at all power levels

Bogdan Popa (bogdan)

Running Racket on iOS January 05, 2020 06:00 PM

/u/myfreeweb pointed out to me in a lobste.rs thread yesterday that Racket compiles just fine on aarch64 and that led me down a rabbit hole trying to get Racket running inside an iOS application. I finally succeeded so I figured I'd write down my findings in hopes of helping future Racketeers (myself included) going down this path! Compile Racket for macOS A recent-enough version of Racket is required in order to compile Racket for iOS.

Ponylang (SeanTAllen)

Last Week in Pony - January 5, 2020 January 05, 2020 05:59 PM

Ponyup now supports the macOS builds of ponyc! We highly recommend that you update ponyup to the latest version, even if you are a Linux user.

Dan Luu (dl)

Algorithms interviews: theory vs. practice January 05, 2020 12:00 AM

When I ask people at trendy big tech companies why algorithms quizzes are mandatory, the most common answer I get is something like "we have so much scale, we can't afford to have someone accidentally write an O(n^2) algorithm and bring the site down"1. One thing I find funny about this is, even though a decent fraction of the value I've provided for companies has been solving phone-screen level algorithms problems on the job, I can't pass algorithms interviews! When I say that, people often think I mean that I fail half my interviews or something. It's more than half.

When I wrote a draft blog post of my interview experiences, draft readers panned it as too boring and repetitive because I'd failed too many interviews. I should summarize my failures as a table because no one's going to want to read a 10k word blog post that's just a series of failures, they said (which is good advice; I'm working on a version with a table). I’ve done maybe 40-ish "real" software interviews and passed maybe one or two of them (arguably zero)2.

Let's look at a few examples to make it clear what I mean by "phone-screen level algorithms problem", above.

At one big company I worked for, a team wrote a core library that implemented a resizable array for its own purposes. On each resize that overflowed the array's backing store, the implementation added a constant number of elements and then copied the old array to the newly allocated, slightly larger, array. This is a classic example of how not to implement a resizable array since it results in linear time resizing instead of amortized constant time resizing. It's such a classic example that it's often used as the canonical example when demonstrating amortized analysis.

For people who aren't used to big tech company phone screens, typical phone screens that I've received are one of:

  • an "easy" coding/algorithms question, maybe with a "very easy" warm-up question in front.
  • a series of "very easy" coding/algorithms questions,
  • a bunch of trivia (rare for generalist roles, but not uncommon for low-level or performance-related roles)

This array implementation problem is considered to be so easy that it falls into the "very easy" category and is either a warm-up for the "real" phone screen question or is bundled up with a bunch of similarly easy questions. And yet, this resizable array was responsible for roughly 1% of all GC pressure across all JVM code at the company (it was the second largest source of allocations across all code) as well as a significant fraction of CPU. Luckily, the resizable array implementation wasn't used as a generic resizable array and it was only instantiated by a semi-special-purpose wrapper, which is what allowed this to "only" be responsible for 1% of all GC pressure at the company. If asked as an interview question, it's overwhelmingly likely that most members of the team would've implemented this correctly in an interview. My fixing this made my employer more money annually than I've made in my life.

That was the second largest source of allocations, the number one largest source was converting a pair of long values to byte arrays in the same core library. It appears that this was done because someone wrote or copy pasted a hash function that took a byte array as input, then modified it to take two inputs by taking two byte arrays and operating on them in sequence, which left the hash function interface as (byte[], byte[]). In order to call this function on two longs, they used a handy long to byte[] conversion function in a widely used utility library. That function, in addition to allocating an byte[] and stuffing a long into it, also reverses the endianness of the long (the function appears to have been intended to convert long values to network byte order).

Unfortunately, switching to a more appropriate hash function would've been a major change, so my fix for this was to change the hash function interface to take a pair of longs instead of a pair of byte arrays and have the hash function do the endianness reversal instead of doing it as a separate step (since the hash function was already shuffling bytes around, this didn't create additional work). Removing these unnecessary allocations made my employer more money annually than I've made in my life.

Finding a constant factor speedup isn't technically an algorithms question, but it's also something you see in algorithms interviews. As a follow-up to an algorithms question, I commonly get asked "can you make this faster?" The answer is to these often involves doing a simple optimization that will result in a constant factor improvement.

A concrete example that I've been asked twice in interviews is: you're storing IDs as ints, but you already have some context in the question that lets you know that the IDs are densely packed, so you can store them as a bitfield instead. The difference between the bitfield interview question and the real-world superfluous array is that the real-world existing solution is so far afield from the expected answer that you probably wouldn’t be asked to find a constant factor speedup. More likely, you would've failed the interview at that point.

To pick an example from another company, the configuration for BitFunnel, a search index used in Bing, is another example of an interview-level algorithms question3.

The full context necessary to describe the solution is a bit much for this blog post, but basically, there's a set of bloom filters that needs to be configured. One way to do this (which I'm told was being done) is to write a black-box optimization function that uses gradient descent to try to find an optimal solution. I'm told this always resulted in some strange properties and the output configuration always resulted in non-idealities which were worked around by making the backing bloom filters less dense, i.e. throwing more resources (and therefore money) at the problem.

To create a more optimized solution, you can observe that the fundamental operation in BitFunnel is equivalent to multiplying probabilities together, so, for any particular configuration, you can just multiply some probabilities together to determine how a configuration will perform. Since the configuration space isn't all that large, you can then put this inside a few for loops and iterate over the space of possible configurations and then pick out the best set of configurations. This isn't quite right because multiplying probabilities assumes a kind of independence that doesn't hold in reality, but that seems to work ok for the same reason that naive Bayesian spam filtering worked pretty well when it was introduced even though it incorrectly assumes the probability of any two words appearing in an email are independent. And if you want the full solution, you can work out the non-independent details, although that's probably beyond the scope of an interview.

Those are just three examples that came to mind, I run into this kind of thing all the time and could come up with tens of examples off the top of my head, perhaps more than a hundred if I sat down and tried to list every example I've worked on, certainly more than a hundred if I list examples I know of that someone else (or no one) has worked on. Both the examples in this post as well as the ones I haven’t included have these properties:

  • The example could be phrased as an interview question
  • If phrased as an interview question, you'd expect most (and probably) all people on the relevant team to get the right answer in the timeframe of an interview
  • The cost savings from fixing the example is worth more annually than my lifetime earnings to date
  • The example persisted for long enough that it's reasonable to assume that it wouldn't have been discovered otherwise

At the start of this post, we noted that people at big tech companies commonly claim that they have to do algorithms interviews since it's so costly to have inefficiencies at scale. My experience is that these examples are legion at every company I've worked for that does algorithms interviews. Trying to get people to solve algorithms problems on the job by asking algorithms questions in interviews doesn't work.

One reason is that even though big companies try to make sure that the people they hire can solve algorithms puzzles they also incentivize many or most developers to avoid deploying that kind of reasoning to make money.

Of the three solutions for the examples above, two are in production and one isn't. That's about my normal hit rate if I go to a random team with a diff and don't persistently follow up (as opposed to a team that I have reason to believe will be receptive, or a team that's asked for help, or if I keep pestering a team until the fix gets taken).

If you're very cynical, you could argue that it's surprising the success rate is that high. If I go to a random team, it's overwhelmingly likely that efficiency is in neither the team's objectives or their org's objectives. The company is likely to have spent a decent amount of effort incentivizing teams to hit their objectives -- what's the point of having objectives otherwise? Accepting my diff will require them to test, integrate, deploy the change and will create risk (because all deployments have non-zero risk). Basically, I'm asking teams to do some work and take on some risk to do something that's worthless to them. Despite incentives, people will usually take the diff, but they're not very likely to spend a lot of their own spare time trying to find efficiency improvements(and their normal work time will be spent on things that are aligned with the team's objectives)4.

Hypothetically, let's say a company didn't try to ensure that its developers could pass algorithms quizzes but did incentivize developers to use relatively efficient algorithms. I don't think any of the three examples above could have survived, undiscovered, for years nor could they have remained unfixed. Some hypothetical developer working at a company where people profile their code would likely have looked at the hottest items in the profile for the most computationally intensive library at the company. The "trick" for both isn't any kind of algorithms wizardry, it's just looking at all, which is something incentives can fix. The third example is less inevitable since there isn't a standard tool that will tell you to look at the problem. It would also be easy to try to spin the result as some kind of wizardry -- that example formed the core part of a paper that won "best paper award" at the top conference in its field (IR), but the reality is that the "trick" was applying high school math, which means the real trick was having enough time to look at places where high school math might be applicable to find one.

I actually worked at a company that used the strategy of "don't ask algorithms questions in interviews, but do incentivize things that are globally good for the company". During my time there, I only found one single fix that nearly meets the criteria for the examples above (if the company had more scale, it would've met all of the criteria, but due to the company's size, increases in efficiency were worth much less than at big companies -- much more than I was making at the time, but the annual return was still less than my total lifetime earnings to date).

I think the main reason that I only found one near-example is that enough people viewed making the company better as their job, so straightforward high-value fixes tended not exist because systems were usually designed such that they didn't really have easy to spot improvements in the first place. In the rare instances where that wasn't the case, there were enough people who were trying to do the right thing for the company (instead of being forced into obeying local incentives that are quite different from what's globally beneficial to the company) that someone else was probably going to fix the issue before I ever ran into it.

The algorithms/coding part of that company's interview (initial screen plus onsite combined) was easier than the phone screen at major tech companies and we basically didn't do a system design interview.

For a while, we tried an algorithmic onsite interview question that was on the hard side but in the normal range of what you might see in a BigCo phone screen (but still easier than you'd expect to see at an onsite interview). We stopped asking the question because every new grad we interviewed failed the question (we didn't give experienced candidates that kind of question). We simply weren't prestigious enough to get candidates who can easily answer those questions, so it was impossible to hire using the same trendy hiring filters that everybody else had. In contemporary discussions on interviews, what we did is often called "lowering the bar", but it's unclear to me why we should care how high of a bar someone can jump over when little (and in some cases none) of the job they're being hired to do involves jumping over bars. And, in the cases where you do want them to jump over bars, they're maybe 2" high and can easily be walked over.

When measured on actual productivity, that was the most productive company I've worked for. I believe the reasons for that are cultural and too complex to fully explore in this post, but I think it helped that we didn't filter out perfectly good candidates with algorithms quizzes and assumed people could pick that stuff up on the job if we had a culture of people generally doing the right thing instead of focusing on local objectives.

If other companies want people to solve interview-level algorithms problems on the job perhaps they could try incentivizing people to solve algorithms problems (when relevant). That could be done in addition to or even instead of filtering for people who can whiteboard algorithms problems.

Appendix: how did we get here?

Way back in the day, interviews often involved "trivia" questions. Modern versions of these might look like the following:

  • What's MSI? MESI? MOESI? MESIF? What's the advantage of MESIF over MOESI?
  • What happens when you throw in a destructor? What if it's C++11? What if a sub-object's destructor that's being called by a top-level destructor throws, which other sub-object destructors will execute? What if you throw during stack unwinding? Under what circumstances would that not cause std::terminate to get called?

I heard about this practice back when I was in school and even saw it with some "old school" companies. This was back when Microsoft was the biggest game in town and people who wanted to copy a successful company were likely to copy Microsoft. The most widely read programming blogger around (Joel Spolsky) was telling people they need to adopt software practice X because Microsoft was doing it and they couldn't compete adopting the same practices. For example, in one of the most influential programming blog posts of the era, Joel Spolsky advocates for what he called the Joel test in part by saying that you have to do these things to keep up with companies like Microsoft:

A score of 12 is perfect, 11 is tolerable, but 10 or lower and you’ve got serious problems. The truth is that most software organizations are running with a score of 2 or 3, and they need serious help, because companies like Microsoft run at 12 full-time.

At the time, popular lore was that Microsoft asked people questions like the following (and I was actually asked one of these brainteasers during my on interview with Microsoft around 2001, along with precisely zero algorithms or coding questions):

  • how would you escape from a blender if you were half an inch tall?
  • why are manhole covers round?
  • a windowless room has 3 lights, each of which is controlled by a switch outside of the room. You are outside the room. You can only enter the room once. How can you determine which switch controls which lightbulb?

Since I was interviewing during the era when this change was happening, I got asked plenty of trivia questions as well plenty of brainteasers (including all of the above brainteasers). Some other questions that aren't technically brainteasers that were popular at the time were Fermi problems. Another trend at the time was for behavioral interviews and a number of companies I interviewed with had 100% behavioral interviews with zero technical interviews.

Anyway, back then, people needed a rationalization for copying Microsoft-style interviews. When I asked people why they thought brainteasers or Fermi questions were good, the convenient rationalization people told me was usually that they tell you if a candidate can really think, unlike those silly trivia questions, which only tell if you people have memorized some trivia. What we really need to hire are candidates who can really think!

Looking back, people now realize that this wasn't effective and cargo culting Microsoft's every decision won't make you as successful as Microsoft because Microsoft's success came down to a few key things plus network effects, so copying how they interview can't possibly turn you into Microsoft. Instead, it's going to turn you into a company that interviews like Microsoft but isn't in a position to take advantage of the network effects that Microsoft was able to take advantage of.

For interviewees, the process with brainteasers was basically as it is now with algorithms questions, except that you'd review How Would You Move Mount Fuji before interviews instead of Cracking the Coding Interview to pick up a bunch of brainteaser knowledge that you'll never use on the job instead of algorithms knowledge you'll never use on the job.

Back then, interviewers would learn about questions specifically from interview prep books like "How Would You Move Mount Fuji?" and then ask them to candidates who learned the answers from books like "How Would You Move Mount Fuji?". When I talk to people who are ten years younger than me, they think this is ridiculous -- those questions obviously have nothing to do the job and being able to answer them well is much more strongly correlated with having done some interview prep than being competent at the job. Hillel Wayne has discussed how people come up with interview questions today (and I've also seen it firsthand at a few different companies) and, outside of groups that are testing for knowledge that's considered specialized, it doesn't seem all that different today.

At this point, we've gone through a few decades of programming interview fads, each one of which looks ridiculous in retrospect. Either we've finally found the real secret to interviewing effectively and have reasoned our way past whatever roadblocks were causing everybody in the past to use obviously bogus fad interview techniques, or we're in the middle of another fad, one which will seem equally ridiculous to people looking back a decade or two from now.

Without knowing anything about the effectiveness of interviews, at a meta level, since the way people get interview techniques is the same (crib the high-level technique from the most prestigious company around), I think it would be pretty surprising if this wasn't a fad. I would be less surprised to discover that current techniques were not a fad if people were doing or referring to empirical research or had independently discovered what works.

Inspired by a comment by Wesley Aptekar-Cassels, the last time I was looking for work, I asked some people how they checked the effectiveness of their interview process and how they tried to reduce bias in their process. The answers I got (grouped together when similar, in decreasing order of frequency were):

  • Huh? We don't do that and/or why would we do that?
  • We don't really know if our process is effective
  • I/we just know that it works
  • I/we aren't biased
  • I/we would notice bias if the process if it existed
  • Someone looked into it and/or did a study, but no one who tells me this can ever tell me anything concrete about how it was looked into or what the study's methodology was

Appendix: training

As with most real world problems, when trying to figure out why seven, eight, or even nine figure per year interview-level algorithms bugs are lying around waiting to be fixed, there isn't a single "root cause" you can point to. Instead, there's a kind of hedgehog defense of misaligned incentives. Another part of this is that training is woefully underappreciated.

We've discussed that, at all but one company I've worked for, there are incentive systems in place that cause developers to feel like they shouldn't spend time looking at efficiency gains even when a simple calculation shows that there are tens or hundreds of millions of dollars in waste that could easily be fixed. And then because this isn't incentivized, developers tend to not have experience doing this kind of thing, making it unfamiliar, which makes it feel harder than it is. So even when a day of work could return $1m/yr in savings or profit (quite common at large companies, in my experience), people don't realize that it's only a day of work and could be done with only a small compromise to velocity. One way to solve this latter problem is with training, but that's even harder to get credit for than efficiency gains that aren't in your objectives!

Just for example, I once wrote a moderate length tutorial (4500 words, shorter than this post by word count, though probably longer if you add images) on how to find various inefficiences (how to use an allocation or CPU time profiler, how to do service-specific GC tuning for the GCs we use, how to use some tooling I built that will automatically find inefficiencies in your JVM or container configs, etc., basically things that are simple and often high impact that it's easy to write a runbook for; if you're at Twitter, you can read this at http://go/easy-perf). I've had a couple people who would've previously come to me for help with an issue tell me that they were able to debug and fix an issue on their own and, secondhand, I heard that a couple other people who I don't know were able to go off and increase the efficiency of their service. I'd be surprised if I’ve heard about even 10% of cases where this tutorial helped someone, so I'd guess that this has helped tens of engineers, and possibly quite a few more.

If I'd spent a week doing "real" work instead of writing a tutorial, I'd have something concrete, with quantifiable value, that I could easily put into a promo packet or performance review. Instead, I have this nebulous thing that, at best, counts as a bit of "extra credit". I'm not complaining about this in particular -- this is exactly the outcome I expected. But, on average, companies get what they incentivize. If they expect training to come from developers (as opposed to hiring people to produce training materials, which tends to be very poorly funded compared to engineering) but don't value it as much as they value dev work, then there's going to be a shortage of training.

I believe you can also see training under-incentivized in public educational materials due to the relative difficulty of monetizing education and training. If you want to monetize explaining things, there are a few techniques that seem to work very well. If it's something that's directly obviously valuable, selling a video course that's priced "very high" (hundreds or thousands of dollars for a short course) seems to work. Doing corporate training, where companies fly you in to talk to a room of 30 people and you charge $3k per head also works pretty well.

If you want to reach (and potentially help) a lot of people, putting text on the internet and giving it away works pretty well, but monetization for that works poorly. For technical topics, I'm not sure the non-ad-blocking audience is really large enough to monetize via ads (as opposed to a pay wall).

Just for example, Julia Evans can support herself from her zine income, which she's said has brought in roughly $100k/yr for the past two years. Someone who does very well in corporate training can pull that in with a one or two day training course and, from what I've heard of corporate speaking rates, some highly paid tech speakers can pull that in with two engagements. Those are significantly above average rates, especially for speaking engagements, but since we're comparing to Julia Evans, I don't think it's unfair to use an above average rate.

Appendix: misaligned incentive hedgehog defense, part 3

Of the three examples above, I found one on a team where it was clearly worth zero to me to do anything that was actually valuble to the company and the other two on a team where it valuable to me to do things that were good for the company, regardless of what they were. In my experience, that's very unusual for a team at a big company, but even on that team, incentive alignment was still quite poor. At one point, after getting a promotion and a raise, I computed the ratio of the amount of money my changes made the company vs. my raise and found that my raise was 0.03% of the money that I made the company, only counting easily quantifiable and totally indisputable impact to the bottom line. The vast majority of my work was related to tooling that had a difficult to quantify value that I suspect was actually larger than the value of the quantifiable impact, so I probably recieved well under 0.01% of the marginal value I was prociding. And that's really an overestimate of how much I was incentivized I was to do the work -- at the margin, I strongly suspect that anything I did was worth zero to me. After the first $10m/yr or maybe $20m/yr, there's basically no difference in terms of performance reviews, promotions, raises, etc. Because there was no upside to doing work and there's some downside (could get into a political fight, could bring the site down, etc.), the marginal return to me of doing more than "enough" work was probably negative.

Some companies will give very large out-of-band bonuses to people regularly, but that work wasn't for a company that does a lot of that, so there's nothing the company could do to indicate that it valued additional work once someone did "enough" work to get the best possible rating on a performance review. From a mechanism design point of view, the company was basically asking employees to stop working once they did "enough" work for the year.

So even on this team, which was relatively well aligned with the company's success compared to most teams, the company's compensation system imposed a low ceiling on how well the team could be aligned.

This also happened in another way. As is common at a lot of companies, managers were given a team-wide budget for raises that was mainly a function of headcount, that was then doled out to team members in a zero-sum way. Unfortunately for each team member (at least in terms of compensation), the team pretty much only had productive engineers, meaning that no one was going to do particularly well in the zero-sum raise game. The team had very low turnover because people like working with good co-workers, but the company was applying one the biggest levers it has, compensation, to try to get people to leave the team and join less effective teams.

Because this is such a common setup, I've heard of managers at multiple companies who try to retain people who are harmless but ineffective to try to work around this problem. If you were to ask someone, abstractly, if the company wants to hire and retain people who are ineffective, I suspect they'd tell you no. But insofar as a company can be said to want anything, it wants what it incentivizes.

Thanks to Leah Hanson, Heath Borders, Lifan Zeng, Justin Findlay, Kevin Burke, @chordowl, Peter Alexander, Niels Olson, Kris Shamloo, and Solomon Boulos for comments/corrections/discussion


  1. For one thing, most companies that copy the Google interview don't have that much scale. But even for companies that do, most people don't have jobs where they're designing high-scale algorithms (maybe they did at Google circa 2003, but from what I've seen at three different big tech companies, most people's jobs are pretty light on algorithms work). [return]
  2. Real is in quotes because I've passed a number of interviews for reasons outside of the interview process. Maybe I had a very strong internal recommendation that could override my interview performance, maybe someone read my blog and assumed that I can do reasonable work based on my writing, maybe someone got a backchannel reference from a former co-worker of mine, or maybe someone read some of my open source code and judged me on that instead of a whiteboard coding question (and as far as I know, that last one has only happened once or twice). I'll usually ask why I got a job offer in cases where I pretty clearly failed the technical interview, so I have a collection of these reasons from folks.

    The reason it's arguably zero is that the only software interview where I inarguably got a "real" interview and was coming in cold was at Google, but that only happened because the interviews that were assigned interviewed me for the wrong ladder -- I was interviewing for a hardware position, but I was being interviewed by software folks, so I got what was basically a standard software interview except that one interviewer asked me some questions about state machine and cache coherence (or something like that). After they realized that they'd interviewed me for the wrong ladder, I had a follow-up phone interview from a hardware engineer to make sure I wasn't totally faking having worked at a hardware startup from 2005 to 2013. It's possible that I failed the software part of the interview and was basically hired on the strength of the follow-up phone screen.

    Note that this refers only to software -- I'm actually pretty good at hardware interviews. At this point, I'm pretty out of practice at hardware and would probably need a fair amount of time to ramp up on an actual hardware job, but the interviews are a piece of cake for me. One person who knows me pretty well thinks this is because I "talk like a hardware engineer" and both say things that make hardware folks think I'm legit as well as say things that sound incredibly stupid to most programmers in a way that's more abbout shibboleths than actual knowledge or skills.

    [return]
  3. This one is a bit harder than you'd expect to get in a phone screen, but it wouldn't be out of line in an onsite interview (although a friend of mine once got a Google Code Jam World Finals question in a phone interview with Google, so you might get something this hard or harder, depending on who you draw as an interviewer).

    BTW, if you're wondering what my friend did when they got that question, it turns out they actually knew the answer because they'd seen and attempted the problem during Google Code Jam. They didn't get the right answer at the time, but they figured it out later just for fun. However, my friend didn't think it was reasonable to give that as a phone screen questions and asked the interviewer for another question. The interviewer refused, so my friend failed the phone screen. At the time, I doubt there were more than a few hundred people in the world who would've gotten the right answer to the question in a phone screen and almost all of them probably would've realized that it was an absurd phone screen question. After failing the interview, my friend ended up looking for work for almost six months before passing an interview for a startup where he ended up building a number of core systems (in terms of both business impact and engineering difficulty). My friend is still there after the mid 10-figure IPO -- the company understands how hard it would be to replace this person and treats them very well. None of the other companies that interviewed this person even wanted to hire them at all and they actually had a hard time getting a job.

    [return]
  4. Outside of egregious architectural issues that will simply cause a service to fall over, the most common way I see teams fix efficiency issues is to ask for more capacity. Some companies try to counterbalance this in some way (e.g., I've heard that at FB, a lot of the teams that work on efficiency improvements report into the capacity org, which gives them the ability to block capacity requests if they observe that a team has extreme inefficiencies that they refuse to fix), but I haven't personally worked in an environment where there's an effective system fix to this. Google had a system that was intended to address this problem that, among other things, involved making headcount fungible with compute resources, but I've heard that was rolled back in favor of a more traditional system for reasons. [return]

January 04, 2020

Unrelenting Technology (myfreeweb)

Current Windows 10 has a feature called “ January 04, 2020 03:38 PM

Current Windows 10 has a feature called “Windows Sandbox” which spawns a tiny Hyper-V VM with allegedly a very smart slim disk image thing that shares the OS files with the host, and smart memory management, and so on.. and virtualized GPU support, like virgl in the free world.

So can it run Crysis, or at least Quake? Is it what we need for isolating old games? Well.. it has the ability to load up the host GPU with work and to show the results, but it’s absolutely unsuitable for gaming in its current state. Seems like it uses regular RDP for the window, and there aren’t any special optimizations that make 3D fast. The frame pacing is awful, framerate is weirdly limited, etc.

Also, this is not obviously found on google right now: if you have a compressed disk, you need to decompress C:\ProgramData\Microsoft\Windows\Containers for it to work.

Was wondering for a month why Firefox on my laptop would forget my GitHub... January 04, 2020 02:27 PM

Was wondering for a month why Firefox on my laptop would forget my GitHub session (and some other sessions) after restarting. Turns out “Delete cookies and site data when Firefox is closed” got enabled somehow. Facepalm.

Bogdan Popa (bogdan)

Native Applications with Racket January 04, 2020 11:00 AM

A couple of days ago, I released a native macOS application called Remember. It is a small, keyboard-driven application for stashing away notes/reminders for later. One of the cool things about it from a programming nerd perspective is that, while it is a completely native Cocoa application whose frontend is built with Swift, the core business logic is all in Racket! Why not use racket/gui? I started out with a proof of concept that used Racket for the GUI, but I realized that I'd have to write a bunch of Objective-C FFI code to get the UI to look the way I wanted (a carbon copy of Spotlight) and it seemed like it would be a pain to try and integrate DDHotKey and to add support for launching at login into a package that is easy to distribute.

January 03, 2020

Jeff Carpenter (jeffcarp)

CV January 03, 2020 02:14 PM

This is my CV. I hope it provides a realistic perspective on my experience and skillset. For inquiries please see how to contact me here. This is currently a work in progress Work Experience Waymo Mountain View, CANovember 2019 to present I currently work on ML infrastructure at Waymo. Google I was on the Chrome Infrastructure (Chrome Ops) for 3 years. Below are my public contributions. Project: Monorail Monorail is the codename for the Chromium project’s bug tracker, living at bugs.

Nikola Plejić (nikola)

2019: A Year In Review January 03, 2020 09:20 AM

I never wrote one of these, but I've come to realize that I find reading other people's posts enjoyable and vaguely inspirational.

Without further ado, some things that stood out for me in 2019...

"Academics"

In 2018, I have started pursuing a BSc in Physics at The Open University. I transferred some credits from my previous unfinished studies, and ended up with four courses to go before earning the degree.

I have finished—with distinction—two of them in the year 2018/2019: "The Quantum World" (an undergrad quantum physics course) and "Electromagnetism" (an undergrad course in classical electrodynamics).

The entire pursuit was (and continues to be) fairly challenging. Distance learning is an interesting feat that seems to be working well. I believe it eliminates an entire class of fallacies I'm fairly prone on making, primarily the one of confusing "presence" with actual work.

This was also my most prominent "personal project" of the year, and it seems this is very likely to be the case for 2020 as well.

Meetups, Conferences & Talks

A couple of colleagues and I have started the Zagreb Rust meetup. It's been reasonably successful, attracting a nice audience of 10-20 people at each of its eight iterations.

Talks:

Travel

London, Sicily (Castelbuono, Cefalù, Palermo), Dublin.

Books

A couple of books I've read that have been particularly impactful:

Music

Last.fm is still keeping track of the vast majority of the music I listen to. I have yet to listen to a lot of albums that were very prominent in 2019, but a few that stood out (with Bandcamp links where available):

I have also, repeatedly, returned to 2018's "Safe in the Hands of Love" by Yves Tumor and "On Dark Horses" by Emma Ruth Rundle.

I traditionally go to many concerts, and the following made a special impact:

  • Vijay Iyer & Wadada Leo Smith at Muzička akademija, Zagreb
  • Godspeed You! Black Emperor at Tvornica Kulture, Zagreb
  • Homeboy Sandman & Edan at Vintage Industrial Bar, Zagreb
  • Peter Brötzmann & Heather Leigh at KSET, Zagreb
  • the entire lineup of Sarajevo Jazz Fest 2019, especially concerts by Miles Okazaki, The Ex and Joelle Leandre.

Personal (Tech) Projects

mjuziq3 is chugging along nicely with little to no intervention on my side. This year it hasn't seen much love, but its interface has been updated to Bootstrap 4, and some of the interface elements have received a facelift.

mc is a small project for exploring Mars Climate data I have started working on with a couple of people from the Atmospheric Physics group at The Open University. Unfortunately, it hasn't made much progress.

Other

  • enjoying my exploration of the tasty world of coffee
    (equipment: Wilfa Svart, Hario V60, Aeropress)
  • politically, this year has been depressing, both globally and locally
    The support for politics of solidarity has continued its decline, and while people luckily haven't given up, it's hard to be an optimist. I have no answers nor contributions here, but it has occupied a fair share of my mental bandwidth throughout the year.

Plans for 2020

The primary goal is to graduate and figure out if I want to pursue my academic "career" any further. Graduation assumes:

  • a passing grade in the Deterministic and stochastic dynamics course I'm currently attending;
  • a passing grade in the end-of-degree project course which involves an extensive literature review of a chosen field of Physics.

Project-wise:

  • encode some of my thoughts on and experiences with microservice-oriented web application architectures in Rust into a series of posts, code examples, and/or libraries;
  • bring mc into a usable (and useful) state;
  • containerize all of my personal infrastructure (I have a dream of being able to seamlessly move pieces of infra among a couple of servers as needed);
  • add a few more sources into mjuziq3 and rethink its interface.

Skill-wise:

  • learn some embedded programming & basic electronics skills in order to be able to follow things like these;
  • become comfortable with a development environment for one of the more prominent operating systems for the PinePhone;
  • understand the basics of quantum computing, both theoretically and practically.

Personally:

  • finally achieve my weight goal of dropping below 80 kg;
  • successfully spend the year living with my lovely partner.

January 02, 2020

Jan van den Berg (j11g)

The perfect notebook January 02, 2020 04:12 PM

I keep a daily journal. And journaling daily make pocket planners usable as journal notebooks.

I tend to be particular about certain things. So when searching for a new notebook — one that I will carry around for a year — I decided the following things are important.

The cover looks nice. But is this the perfect notebook?

Must haves

  • A5 format. Everything else is too big or too small.
  • Hardcover. No flappy stuff.
  • Lined paper Not too wide or too thick, but I do need to see where I’m writing. No dots or blank paper.
  • Rounded corners. I pull the notebook out of my bag a lot. I don’t like dented corners.
  • Elastic band. When I throw the notebook in my bag, it shouldn’t open and wrinkle the paper.
  • Bound bookmark. So I can start where I stopped last time.
  • Pen loop. So I don’t have to look for a pen. And this makes sure I always use the same pen.
  • Bound. No rings.
  • Soft paper. There is probably a technical term for this that I don’t know, but I know good paper when I see or feel it. So no hard print paper (paper that is made for printers and not for writing).
A5 format ✔ Hardcover ✔ Rounded corners ✔ Bookmark ✔ Bound ✔ Elastic band ❌

Nice to haves

These things are pluses:

  • Printed dates. So I don’t have to write them down every time.
  • Max. 100 pages. One week or multiple days per two pages is fine (7D/2P they seem to call that). One page per day would of course give more room: but this makes the journal at least 365/2 pages which is too big.
  • The above only works if there is also room for notes. I tend to take a few notes specific to a day (usually short) and the really important stuff I write down in one place (i.e. the blank left page, or in the notes section). Things I want to return to often.

Not important

  • Cover color/look/style. I don’t think I actually care too much about this.

No

  • Tear-off corners. Just don’t.

Did I find it?

Not really. It is surprisingly difficult selecting a notebook that checks all the boxes. And, trust me, I spend way too much time looking for one — online and in bookshops. Moleskine has one and Baron Fig also, that come close (though both have no pen loop) . But there are many more — like this one (too big) — that come close. And I think this one probably checks most boxes.

But I had already settled on one: the new year was starting! It doesn’t check all the boxes (I ordered a pen loop from eBay). And I do think I will miss the elastic band. And to be honest (it’s day 2 of the year) I am not too thrilled about the paper as of yet. It’s a bit too hard, the pen doesn’t glide. But we’ll see and maybe I’ll switch pens.

Now, about the perfect pen….

The post The perfect notebook appeared first on Jan van den Berg.

Bogdan Popa (bogdan)

Announcing Remember for macOS January 02, 2020 04:00 PM

I've been using org-mode capture templates for years and I've always wished I had something like that for the whole system. I took advantage of the holiday break to build Remember, a little reminders application with Spotlight-like UX. It's available on Gumroad and you can pay what you want for it (inlcuding $0!) at the moment so I hope you'll give it a go! Although the app isn't Open Source, it source code is available on GitHub.

Unrelenting Technology (myfreeweb)

“Why do programs I compile become all-zero files after rebooting?” well, maybe that untested... January 02, 2020 11:51 AM

“Why do programs I compile become all-zero files after rebooting?”

well, maybe that untested filesystem-related kernel patch you applied has something to do with it :D

But seriously, if anyone wants to make a very cursed unix system: apply this diff (note: old version by now) to FreeBSD from around now (say the beginning of 2020 — happy new year!), build programs using clang/lld 9.x and reboot.

Marc Brooker (mjb)

Why do we need distributed systems? January 02, 2020 12:00 AM

Why do we need distributed systems?

Building distributed systems is hard. It's expensive. It's complex. But we do it anyway.

I grew up reading John Carmack's .plan file. His stories about the development of Doom, Quake and the rest were a formative experience for me, and a big reason I was interested in computers beyond just gaming1. I was a little bit disappointed to see this tweet:

This isn't an isolated opinion, but I don't think it's a particularly good one. To be fair, there are a lot of good reasons not to build distributed systems. Complexity is one: distributed systems are legitimately harder to build, and significantly harder to understand and operate. Efficiency is another. As McSherry et al point out in Scalability! But at what COST?, single-system designs can have great performance and efficiency. Modern computers are huge and fast.

I was not so much disappointed in John, as in our success at building distributed systems tools that make this untrue. Distributed computing could be much easier, and needs to be much easier. We need to get to a point, with services, tooling and technology, that monolithic systems aren't a good default. To understand why, let me answer the question in the post's title.

Distributed systems offer better availability

The availability of a monolithic system is limited to the availability of the piece of hardware it runs on. Modern hardware is pretty great, and combined with a good datacenter and good management practices servers can be expected to fail with an annual failure rate (AFR) in the single-digit percentages. That's OK, but not great in two ways. First, if you run a lot of systems fixing these servers stacks up to an awful lot of toil. The toil is unavoidable, because if we're building a monolithic system we need to store the system state on the one server, and so creating a new server takes work (and lost state, and understanding what the lost state means to your users). The second way they get you is with time-to-recovery (TTR): unless you're super disciplined in keeping and testing backups, your rebuild process and all the rest, it's been a couple years since you last made a new one of these things. It's going to take time.

Distributed systems incur cost and complexity because they continuously avoid getting into this state. Dedicated state stores, replication, consensus and all the rest add up to avoiding any one server being a single point of failure, but also hide the long TTR that comes with fixing systems. Modern ops practices, like infrastructure as code, immutable infrastructure, containers, and serverless reduce the TTR and toil even more.

Distributed systems can also be placed nearer the users that need them. It doesn't really matter if a system is available or not if clients can't get to it, and network partitions happen. Despite the restrictions of the CAP theorem and friends, this extra degree of flexibility allows distributed systems to do much better than monolithic systems.

Distributed systems offer better durability

Like availability, the durability of single storage devices is pretty great these days. The Backblaze folks release some pretty great stats that show that they see about 1.6% of their drives fail in any given year. This has been the case since at least the late 2000s. If you put your customer's data on a single disk, you're highly likely to still have it at the end of the year.

For this blog, "highly likely" is good enough. For almost all meaningful businesses, it simply isn't. Monolithic systems then have two choices. One is RAID. Keep the state on multiple disks, and replace them as they fail. RAID is a good thing, but only protects against a few drive failures. Not floods, fires, or explosions. Or correlated drive failure2. The other option is backups. Again, a good thing with a big downside. Backups require you to choose two things: how often you run them (and therefore how much data you lose when you need them), and how long they take to restore. For the stuff on my laptop, a daily backup and multi-hour restore is plenty. For business-critical data, not so much.

Distributed storage systems continuously make multiple copies of a piece of data, allowing a great deal of flexibility around cost, time-to-recovery, durability, and other factors. They can also be built to be extremely tolerant to correlated failures, and avoid correlation outright.

Distributed systems offer better scalability

As with availability and durability, distributing a system over many machines gives a lot of flexibility about how to scale it. Stateless systems are relatively easy to scale, and basic techniques like HTTP load balancers are great for an awful lot of use-cases. Stateful systems are harder to scale, both because you need to decide how to spread the state around, and because you need to figure out how to send users to the right place to get the state. These two problems are at the heart of a high percentage of the distributed systems literature, and more is published on them every single day.

The good news is that many good solutions to these problems are already available. They are available as services (as in the cloud), and available as software (open source and otherwise). You don't need to figure this out yourself, and shouldn't try (unless you are really sure you want to).

Distributed systems offer better efficiency

Workloads are very seldom constant. Computers like to do things on the hour, or every day, or every minute. Humans, thanks to our particular foibles like sleeping and hanging out with our kids, tend to want to do things during the day, or on the holidays, or during the work week. Other humans like to do things in the evening, or late at night. This all means that the load on most systems varies, both randomly and seasonally. If you're running each thing on it's own box you can't take advantage of that3. Big distributed systems, like the cloud, can. They also give you tools (like automatic scaling) to take advantage of it economically.

When you count all the factors that go into their cost, most computers aren't that much more expensive to keep busy than they are to keep idle. That means it makes a lot of economic sense to keep computers as busy as possible. Monolithic systems find it hard to do that.

No magic

Unfortunately, none of this stuff comes for free. Actually building (and, critically, operating) distributed systems that do better than monolithic systems on all these properties is difficult. The reality is seldom as attractive as the theory would predict.

As an industry, we've made a fantastic amount of progress in making great distributed systems available over the last decade. But, as Carmack's tweet shows, we've still got a lot to do. Despite all the theoretical advantages it's still reasonable for technically savvy people to see monolithic systems as simpler and better. This is a big part of why I'm excited about serverless: it's the start of a big opportunity to make all the magic of distributed systems even more widely and simply available.

If we get this right, we can change the default. More availability, more durability, more efficiency, more scale, less toil. It's going to be an interesting decade.

Footnotes

  1. Along with hacking on gorillas.bas.
  2. Which is a real thing. In Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? Schroeder and Gibson report that Time between replacement, a proxy for time between failure, is not well modeled by an exponential distribution and exhibits significant levels of correlation, including autocorrelation and long-range dependence. This situation hasn't improved since 2007.
  3. I guess you can search for primes, or mine Ethereum, or something else. Unfortunately, these activities are seldom economically interesting.

January 01, 2020

Gustaf Erikson (gerikson)

December January 01, 2020 06:48 PM

Pete Corey (petecorey)

Random Seeds, Lodash, and ES6 Imports January 01, 2020 12:00 AM

David Bau’s seedrandom Javascript library is an excellent tool for introducing deterministic random values into your Javascript project. After setting a fixed seed, Math.random will produce a stream of random values. Those same random values will be produced again, in order, the next time you run your program. This is very important when creating generative art or procedurally generated game content.

However, there’s a small problem when trying to combine seedrandom with a library like Lodash. Ideally, we’d like Lodash to respect our random seed, so methods like shuffle would always produce a deterministic shuffling. Unfortunately, with a setup like the one described below, this won’t be the case:


import _ from "lodash";
import seedrandom from "seedrandom";

seedrandom("seed", { global: true });

_.shuffle([1, 2, 3]); // Ignores our random seed.

The seedrandom library wholesale replaces Math.random with a new pseudo-random number generator. Because we’re importing lodash before we initialize seedrandom, Lodash defines all of its functions, shuffle included, to use the original reference to Math.random. We need to initialize seedrandom before importing Lodash.

Unfortunately, this won’t work:


import seedrandom from "seedrandom";
seedrandom("seed", { global: true });

import _ from "lodash";

Node.js requires all import statements to be at the top of a module. We can’t initialize seedrandom before importing Lodash.

Thankfully, a simple solution exists. We’ll make a new module called seed.js that simply imports seedrandom and then initializes it with our seed:


import seedrandom from "seedrandom";

seedrandom("seed", { global: true });

Next we can import our local "./seed.js" module before importing Lodash:


import "./seed.js";
import _ from "lodash";

_.shuffle([1, 2, 3]); // Produces deterministic shufflings!

And with that small change seedrandom, Lodash, and ES6-style imports all play nicely together. Our shuffle function will now product deterministic shufflings based on the seed we pass into seedrandom!

December 31, 2019

Wesley Moore (wezm)

Software Contributions 2019 December 31, 2019 11:00 PM

Open-source software has a bit of a sustainability problem so I try to contribute back to the ecosystem where I can. I am very fortunate to be in a position where I have spare time and income that I'm able to funnel into this. At the end of 2017 I did a round-up of the software contributions I'd made that year. I thought it would be good to do another one now that 2019 has come to a close.

My motivation for doing so is twofold: to encourage others to do the same if they are able, and to highlight people and projects doing interesting and important work.

Financial Contributions

Monthly Donations

I make small (typically US$5–10) monthly donations to the following:

One Off Contributions

  • CopyQ — Clipboard manager for open-source desktops.
  • Movember — This one isn't software but my brother took his own life in April 2019. I supported some friends taking part in Movember, a cause that aims to improve mens health.

Open Source Contributions

In addition to financial contributions I also made code contributions, both to existing projects and by releasing my own work. Some of the highlights are:

Conclusion

2019 was a good year for contributions. This was partly due to me starting a new job at YesLogic 4 days a week. I dedicate the fifth day of the work week to personal projects and open-source. I was also fortunate to contribute to open-source projects though my work at YesLogic. We released the Allsorts font parsing and shaping engine and several crates relating to font handling and Unicode.

Onward to 2020!

December 30, 2019

Jeff Carpenter (jeffcarp)

2019 Year in Review December 30, 2019 09:37 PM

In 2019 I got married—twice! (to the same person!) I changed jobs. I ran my first marathon. 2019 was a huge year in my life. I grew in a lot of ways this year. Wedding planning was a huge undertaking and strengthened our communication and quick decision making skills. Additionally this year I started working with a therapist which has been great—I’m wondering why I never thought of doing that earlier.

Jan van den Berg (j11g)

A History Of Rock Music in Five Hundred Songs December 30, 2019 09:24 PM

This year I’ve listened to 519 podcasts and 36 of those were episodes of A History Of Rock Music in Five Hundred Songs.

But it’s safe to say that A History Of Rock Music in Five Hundred Songs is my favorite new podcast that I discovered this year and it deserves more attention and praise!

500 Songs

‘500 Songs’ is a mammoth project where Andrew Hickey sets out — over a period of ten years — to transcribe the history of rock music with one song per episode for a total of 500 songs. With 50 episodes per year, one per week, and the first year just wrapped.

Which 500 songs are on the list are not known, but it is not a popularity contest. Every song has to fit the narrative. It’s about the history and how everything is related and one thing is built on top of another. So he doesn’t just pick songs because they are popular. Which is not to say there are many episodes about well known songs.

I know nothing about Andrew Hickey, he is not a famous podcaster (yet!) or a celebrity turned podcaster. But I do know that there are very few people on the planet that know or care more about music than he does.

I’ve heard some people describe his monotone voice as off-putting (yes, I try to push this podcast a lot to people), but what I hear is someone who is ultimately knowledgeable and passionate about music history.

All episodes are also available as a transcript, I think he writes out episodes and reads them ‘on air’. And the first 50 episodes are bundled in a book. There will be more books, and I suspect because of their completeness they can all be added to the definite canon of rock ‘n roll music.

Information density

If you think you know or understand even a little bit about pop or rock music. Think again, because you don’t. Or, at least I’m speaking for myself here. I thought I knew a little bit, until I discovered this podcast.

The information density is absolutely mind boggling, and every episode I come away thinking: how does he know all this?! (Fortunately he also has a couple of delightful informative episodes answering that question). And it is just loaded with a kinds of fun and mind-blowing details:

Did you know this? Life imitating art

There are a few general observations that return again and again through most of the early 50’s rock music scene. Here are a few that stand out:

  • It seems like there is almost no original work in the early days. Every song is borrowed, stolen or riffed of something that was already well known and sometimes existed for decades. The concept of creating new or original music seemed to be novel. Just listen to the Hound Dog episode to get an idea of this. Everything is connected and grown out of something else (which is still the case nowadays, but more obvious and the norm back then). This goes for a lot of songs.
  • How things came to be is almost always chaotic and messy and more often than you would think the result of serendipity. So there are very few straight lines to be drawn. Not only did people borrow or steal songs, they also have different versions of “the truth”. This is also what makes the podcast a mammoth undertaking. Try figuring out how certain records came to be 60 years after the fact.
  • Some things stay hidden or unexplored but they just leave you thinking “excuse me?!”. Like this tidbit:
The Colonel.
  • Some of the music we still listen to were generation or genre defining songs. But to the creators they were sometimes nothing more than a quick way to make a buck. Music was (and is?) ethereal, and the concept that songs would still be played years and years later was not something most artists thought about. So there are many stories of artists selling the rights for a 100 dollars or so to a records that sold millions.
  • The first 50 episodes focus mostly on the 1950s and there are some brilliant episodes. The Little Richard one is an absolute standout episode, so are the Elvis and Johnny Cash ones. Actually: from episode 33 to 39 is the best podcast streak I have ever heard. It’s a completely different time, so I am always amazed some of the folks discussed are still alive. E.g. Jerry Lee Lewis (The Killer) and Little Richard are still among us.

‘500 Songs’ is not only the story of rock music it is also a story of America coming off age. The story of cities swallowing up rural areas, of changing landscapes, of changing lives. And of radiowaves and records connecting a vast country and cementing something that we now know as ‘popular culture’.

It’s an absolutely wonderful and riveting story and you should give this podcast a listen!

The post A History Of Rock Music in Five Hundred Songs appeared first on Jan van den Berg.

Gustaf Erikson (gerikson)

Advent of Code 2019 wrapup December 30, 2019 07:40 AM

I enjoyed this year’s edition more than last year’s.

It was a combination of being slightly easier, and having a better attitude to it coming in. In 2018, I was aiming for as good a result as I had in 2016, which was 49 stars out of 50. This left very little margin for bailing out of tough problems, and led to increased frustration when I didn’t have the time required to solve a problem within 24 hours.

I was thinking of not participating, but as time drew near, I got inspired, and even solved a couple of left-over problems from last year in preparation.

This year I gave myself more leeway for simply bailing on problems that didn’t seem fun. This turned out to be day 21. I also gave up on day 6, which was probably a mistake, and part 2 of day 22.

I also felt that this year was easier than 2018. In part, this was because of nearly half of the problems being intcode problems, where we first had to write an interpreter for a specific machine code, and then apply that interpreter to various fun follow-up questions, like playing Breakout or Adventure.

Then I had a lot of support from the dedicated IRC channel from lobste.rs. I’d like to thank regulars jjuran, wink, gthm and tumdum for encouragement and friendly competition.

I still have a number of stars to get, but unlike last year, I’m looking forward to solving those problems.

December 29, 2019

Derek Jones (derek-jones)

Reliability chapter of ‘evidence-based software engineering’ updated December 29, 2019 09:10 PM

The Reliability chapter of my evidence-based software engineering book has been updated (draft pdf).

Unlike the earlier chapters, there were no major changes to the initial version from over 18-months ago; we just don’t know much about software reliability, and there is not much public data.

There are lots of papers published claiming to be about software reliability, but they are mostly smoke-and-mirror shows derived from work down one of several popular rabbit holes:

The growth in research on Fuzzing is the only good news (especially with the availability of practical introductory material).

There is one source of fault experience data that looks like it might be very useful, but it’s hard to get hold of; NASA has kept detailed about what happened using space missions. I have had several people promise to send me data, but none has arrived yet :-(.

Updating the reliability chapter did not take too much time, so I updated earlier chapters with data that has arrived since they were last released.

As always, if you know of any interesting software engineering data, please tell me.

Next, the Source code chapter.

Ponylang (SeanTAllen)

Last Week in Pony - December 29, 2019 December 29, 2019 03:40 PM

The Pony compiler now has support for LLVM 9! Nightly builds of ponyc, corral, and ponyup are also available for macOS now.

December 28, 2019

Jeff Carpenter (jeffcarp)

The Best Books I Read in 2019 December 28, 2019 03:11 AM

Here are my 4 favorite books from 2019. 1. A British couple buys a farmhouse in the South of France and spends the next 12 months exploring the countryside, meeting the locals, renovating the house, and of course, eating and drinking well in A Year In Provence by Peter Mayle. For a premise that could come off as a little posh, the detail in this story is so rich and the storytelling so genial I couldn’t put it down.

Jan van den Berg (j11g)

Murder on the Orient Express – Agatha Christie December 28, 2019 12:17 AM

It’s clever. It’s smart. It’s eloquent. It’s articulate. It’s masterfully written. It’s the archetype of the whodunit. It’s the absolute queen of adverbs.

It’s quintessential Agatha Christie. I enjoyed it thoroughly and can’t imagine someone who wouldn’t.

Murder on the Orient Express – Agatha Christie (1934) – 315 pages

The post Murder on the Orient Express – Agatha Christie appeared first on Jan van den Berg.

December 27, 2019

Jeremy Morgan (JeremyMorgan)

How Blazor is Going To Change Web Development December 27, 2019 06:16 PM

A couple of weeks ago I wrote an article about building and deploying a Blazor app without touching a Windows machine and realized maybe I should take a step back and explain what Blazor is and why anyone would use it. It's still fairly new to most in the front end development world, but it's awesome and you should check it out.

So what is it, exactly?

Blazor is a framework from Microsoft that you can use to develop interactive client-side Web UIs with C#.

In their own words:

Blazor lets you build interactive web UIs using C# instead of JavaScript. Blazor apps are composed of reusable web UI components implemented using C#, HTML, and CSS. Both client and server code is written in C#, allowing you to share code and libraries.

Pretty cool right? You can download it here and get started. 

The old way

Remember the old way of developing web applications? 

What is Blazor

For the longest time we built applications that ran solely on the server, using things like ASP.NET, PHP, etc and they generated an HTML file to be pushed to the browser.   

We've always had some bit of interactivity with JavaScript and AJAX but for many years most of the business logic is handled on the server itself, spitting out HTML pages to interact. The browser for many years was just a glorified document viewer. It worked, but we knew we could do better.

There are some downsides to this pattern that we're all aware of:

  • The server needs to be configured with software to run the web app. ASP.NET, PHP, etc. Backend processors or runtimes have to exist on the server. 
  • Most of the processing power is in the server. 
  • Page loads are annoying and slow. 

So we found a new answer to it. 

How we do it now

With the rise of the Single Page Applications we have a new pattern, with frameworks like Angular, React and Vue:

What is Blazor

Now we're building full applications in JavaScript that run on the browser. This splits the business logic, so that some runs on the browser, and some runs on the server. JavaScript applications run client-side and use messaging to communicate with the "server". You can easily replace "server" with a service or application in the cloud, but the model is still the same.

This is an excellent improvement on what we had before, which essentially manipulating HTML and tossing it back and forth. Now we have real applications running in the browser, and page loads are mostly a thing of the past.

But Blazor improves on that pattern further. There are two main ways to develop with it.

Option 1: Web Assembly Method

When you choose to build a Blazor Web Assembly application it looks like this:

What is Blazor

Blazor uses Web Assembly which ships in all major browsers now. Web assembly is a binary instruction format that runs a virtual environment in the browser.

So what does that really mean?

Now the browser acts as a host for your application. Files built in a Blazor Web Assembly application are compiled and sent to the browser. The browser then runs your JavaScript, HTML and C# in an execution sandbox on the browser. It even runs a version of the .NET Runtime. This means you can execute calls to .NET from within the browser, and it's a fully-fledged application in the browser. It can even be run offline.

Why this is cool:

  • You can run it on any static file server (Nginx, ISS, Apache, S3, Heroku, etc)
  • It runs JS as bytecode, and runs C# at near-native speeds.
  • You can use C# to develop rich front-end applications.
  • Web Assembly ships with all major browsers
  • Reuse .NET components
  • Use Microsoft tooling and debugging

This is great for low latency applications such as games. There's no need for communicating with a server if you don't need to. You can download the application and run it offline in a browser. This is great for games and other things you need to run lightning fast in a browser.

Some downsides:

  • The .NET Framework and other runtime files need to be downloaded (one time)
  • You're restricted to the capabilities of the browser
  • All secrets (credentials, API keys, etc) downloaded locally
  • Not all .NET Framework components are compatible

So this may not be ideal for all applications. The good news is, there's another Blazor pattern we can use.

Option 2: Blazor Server

If you decide to build a Blazor Server application, it looks like this:

What is Blazor

This is closer to the model we're using today. You build an application and have a server that's powered by .NET Core, and you send HTML and JavaScript to the browser to act as a client. This is a great way to make screaming fast thin clients. 

Why this is cool:

  • You get the full power of the .NET Framework
  • Everything rests on the server, small downloads
  • Web Assembly is not required
  • Your secrets are safe

Some downsides:  

  • No offline applications
  • Requires a server running .NET Core or a service
  • Can be high latency with lots of network traffic

So how do I choose which one to use? 

If you want powerful client-side applications that can run offline and served from a static server, choose Blazor Web Assembly. If you want the full power of .NET and want to run a model with thin clients, choose Blazor Server.

Why is this such a big deal?

Blazor patterns open up big opportunities for development. Whether you want to build a powerful service with several thin clients, or some cool interactive game that runs in a browser, Blazor enables rich, interactive application potential.

Web Assembly is the way of the future. It enables near-native speeds in a browser, and uses a common interface. You will find Web Assembly on PCs, Phones, and tablets. If you have a bunch of C# developers on your team who don't do front end programming, they are now empowered to do it in the language they love.

It's pretty awesome, and I'm excited to see how Blazor progresses.

Where can I learn it?

If you live in the Portland, Oregon area I'll be hosting a presentation about getting started with Blazor. Attendance is free and Pizza is provided.

You can also learn more about it from Microsoft's Blazor Site.

I recently wrote a tutorial about setting up and deploying Blazor apps without touching a Windows Machine

If you want to dig deep and learn Blazor, Pluralsight has some modern courses that will get you running quickly:

So try it out! Let me know what you think of Blazor and share your experiences in the comments!

Eric Faehnrich (faehnrich)

Karnaugh Maps December 27, 2019 05:00 AM

Karnaugh maps are a tool for simplifying boolean expressions that can be used by programmers.

I learned about Karnaugh maps in a digital design class to simplify logic circuits. They’re a tool like state machines or logic tables, but I think they’re only taught if you’re more on the hardware side. However, I think they can be used when writing software too.

Say you have a complex condition to go into an if statement: (not A and B) or (A and B) or (A and not B).

You can perform some boolean algebra to simplify it.

A'B + AB + AB'
A'B + A(B + B')
A'B + A
A + A'B

But there’s a chance for a mistake, and you might not have it simplified as well as it could be.

That’s where Karnaugh maps come in. They’re a way to visually represent a boolean expression in a way that you can quickly see the grouping of the statements.

The map is first constructed on a grid with columns for possible inputs of some variables, and rows for the other variables.

With this simple two-input expression above, it would first be set up like this:

Empty two-variable k-map

Each of those squares is a possible input of all the variables. The top-left is 00, or A’B’. If that’s the input into our expression, the result is 0, so we put a 0 in the top-left.

Similarly, the top right is 10 or AB’ which gets a 1. Bottom-left is 01 or A’B which is 1, and bottom-right is 11 or AB which is also 1. Our map is then:

Filled two-variable k-map

To simplify, draw rectangles around the largest groups 1s that have one, two, four, etc, 1s (powers of 2). Even if the rectangles overlap, draw the largest you can.

Circled two-variable k-map

The simplification is then the variables for each rectangle that are the same for that rectangle. For instance, the red rectangle in the above diagram covers two inputs where B is 1, the blue has inputs where A is 1.

This shows the simplification for the above boolean expression is A+B.

Our boolean algebra simplification wasn’t as simple as that. To see how we can go from A+A’B to A+B with boolean algebra:

A + A'B
A(1 + B) + A'B
A + AB + A'B
A + (A + A')B
A + B

To further simplify, we first had to expand, and had to know what to expand with. It’s kind of like a local minima, looking around this might seem like the lowest point but too see if there’s an even lower point over the ridge you first have to climb up it. That’s where Karnaugh maps are handy, they let you see these simplifications easily.

K-maps can be used with even more variables. Consider the following truth table.

Three-variable truth table

You then create a map with two variables on one side. The trick though is to make it so adjacent squares only change one variable at a time. This is done with Gray codes.

Instead of counting up like in the truth table, you go through all the possible inputs but only changing by one each time.

So instead of:

00
01
10 <- this changed two
11

You would have:

00
01
11
10

Also note that it wraps from bottom to top, 10 is also one off from 00.

Our two-variable K-map was trivially in Gray code. For three variables, you could have this K-map from the above truth table:

Three-variable k-map

We now draw rectangles around as large groups of powers of two as possible.

Circled three-variable k-map

Note the blue rectangle is one rectangle, it just wraps around. Think of the map as a torus, it wraps top to bottom, left to right. Just like a Pac-Man level.

To simplify, we see which inputs don’t change in each rectangle. The blue rectangle has B’, green is A’, and red is C.

So the simplified expression is

A' + B' + C

Here’s another example:

Another three-variable k-map

This one simplifies to

AB + A'B'C

This was a small post on Karnaugh maps just to let you know about them and so you can do a Web search for further information. These can simplify expressions in your code, but that may obscure their meaning. I suggest comments with the code to give the original intent and how you arrived at the simplified expression. Even though the code may not have the original meaning, I feel it’s still worth simplifying the expressions because that can reduce the chance of mistakes compared to writing larger expressions.

December 26, 2019

Gustaf Erikson (gerikson)

Advent of Code 2019 December 26, 2019 02:16 PM

This blog post is a work in progress

Project website: Advent of Code 2019.

Previous years: 2015, 2016, 2017, 2018.

I use Perl for all the solutions.

Most assume the input data is in a file called input.txt in the same directory as the file.

A note on scoring

Current score (2019-12-25): 40. I’m aiming for a final of 44+1.

I score my problems to mark where I’ve finished a solution myself or given up and looked for hints. A score of 2 means I solved both the daily problems myself, a score of 1 means I looked up a hint for one of the problems, and a zero score means I didn’t solve any of the problems myself.

My goals for this year (in descending order of priority):

  • get 38 stars or more (75%)
  • solve all problems within 24 hours of release

Link to Github repo.

TODO

  • complete day 18
  • complete day 20 part 2
  • complete day 24 part 2

Day 1 - Day 2 - Day 3 - Day 4 - Day 5 - Day 6 - Day 7 - Day 8 - Day 9 - Day 10 - Day 11 - Day 12 - Day 13 - Day 14 - Day 15 - Day 16 - Day 17 - Day 19 - Day 20 - Day 21 - Day 22 - Day 23 - Day 24 - Day 25

Day 1 - The Tyranny of the Rocket Equation

Day 1 - complete solution

A nice and simple problem to kick off this year.

Score: 2

Day 2 - 1202 Program Alarm

Day 2 - complete solution

An earlier appearance of register rodeo than expected! I think we’ll see more of this going forward.

[intcode part 1]

Score: 2

Day 3 - Crossed Wires

Day 3 - complete solution

This took longer than it had to. I messed up adding the paths, and only managed to get the correct answer to part 1 by chance. Once I plotted the example data I could correct the code, then add the logic for part 2.

I’m not entirely happy with the duplicated direction subroutines. Some people have used complex numbers to simplify this but that would require a separate Perl module to implement.

Score: 2.

Day 4 - Secure Container

Day 4 - complete solution

I blanked on this one and dreaded combinatorics. Turns out brute force is eminently doable. Credits to /u/andreyrmg in the daily solutions thread, and A_D and sim642 in the IRC channels for help in inspiration.

I still think my solution is my own though (and pretty Perlish), so full score today.

Score: 2.

Day 5 - Sunny with a Chance of Asteroids

Day 5 - complete solution

After struggling with the convoluted problem description I was pleasantly surprised to find my code ran flawlessly first try. I still have some niggling issues with the test data, and need to clear that up before the inevitable next intcode problem.

[intcode part 2]

Score: 2.

Day 6 - Universal Orbit Map

Day 6 - complete solution

I bailed on this one and sought inspiration in the daily solutions subreddit. Credit in source!

Score: 0.

Day 7 - Amplification Circuit

Day 7 - part 1 Day 7 - part 2

A tough, but fun one. There were a lot of subtleties in the second part, and I got some pointers from the subreddit.

I got the chance to clean up my intcode implementation, and learned a new facet of Perl.

[intcode part 3]

Score: 2.

Day 8 - Space Image Format

Day 8 - complete solution

Defying expectations (and maybe fears), this Sunday problem was not that complicated.

Of course, it helps if you confirm that what you think is input actually is the same as the problem input. Not that I’d have anything other than theoretical knowledge of this situation…

Score: 2.

Day 9 - intcode test suite

Day 9 - complete solution Day 9 - complete solution

So the Intcode computer is done, and we’ve amassed a number of test cases to ensure it works. I’m kinda sorta happy with my code. It’s not the most elegantly put together but it works fine.

[intcode part 4]

Score: 2.

Day 10 - Monitoring Station

Day 10 - complete solution

This was a fun one, even though I got sidetracked by my incorrect assumptions and got lost in a hallway of indices, all alike.

Part 2 was found by inspecting the output, but hey, a star is a star.

Score: 2.

Day 11 - Space Police

Day 11 - complete solution

Ah, the return of Langton’s ant. Always nice to see an old friend.

Nothing too complex here, although I’m quite proud of the line noise for the dispatch table for movement:

my %directions = (
    '^'=>sub{!$_[0]?['<', 0,-1 ]:['>', 0, 1 ]},
    '<'=>sub{!$_[0]?['v', 1, 0 ]:['^',-1, 0 ]},
    'v'=>sub{!$_[0]?['>', 0, 1 ]:['<', 0,-1 ]},
    '>'=>sub{!$_[0]?['^',-1, 0 ]:['v', 1, 0 ]},
);

[intcode part 5]

Score: 2.

Day 12 - The N-Body Problem

Day 12 - complete solution

A fun little problem.

Score: 2.

Day 13 - Care Package

Day 13 - complete solution

I am in awe of what the creator of Advent of Code has wrought in the form of intcode.

[intcode part 6]

Score: 2.

Day 14 - Space Stoichiometry

Day 14 - complete solution

A hard problem that was satisfying to solve.

Score: 2.

Day 15 - Oxygen System

Day 15 - complete solution

Not my proudest moment. I’m happy my intcode implementation works well enough for this kind of application now, but my utter inability to code a BFS routine is humiliating. In the end I had to use a modified Djikstra’s that I cribbed for last year’s day 22.

[intcode part 7]

Score: 2.

Day 16 - Flawed Frequency Transmission

Day 16 - part 1 Day 16 - part 2

I had a lot of trouble with part 2, mostly due to indexing errors.

Runtime is 2m16s for part 2, which is just barely acceptable.

Score: 2

Day 17 - Set and Forget

Day 17 - complete solution

This one was a slog!

I was very worried that my intcode interpreter was incorrect, but it was actually just me not being able to read the input specification correctly.

[intcode part 8]

Score: 2.

Day 19 - Tractor Beam

Day 19 - complete solution

This was supposed to be a breather…

From the beginning I realized that this problem is best expressed as x in terms of y, instead of the more usual y in term of x, and I made a mental note not to mix them up.

Of course, many hours later I realized I had done that just that.

[intcode part 9]

Score: 2.

Day 20 - Donut Maze

Day 20 - part 1

Part 1 yields easily to a modified BFS approach.

Part 2 is still TODO.

Score: 1.

Day 21 - Springdroid Adventure

Day 21 - complete solution

I felt zero interest in trying to puzzle this out so found some closed forms on the subreddit.

[intcode part 10]

Score: 0.

Day 22 - Slam Shuffle

Day 22 - part 1

Part one only done for now, part 2 requires way too much weird (read modular) math for me. Damnit, Cap’n, I’m continuous, not discrete!

Score: 1.

Day 23 - Category Six

Day 23 - complete solution

A remarkably straight-forward puzzle.

[intcode part 11]

Score: 2.

Day 24 - Planet of Discord

Day 24 - part 1

An interesting problem. Part 1 only for now.

Score: 1

Day 25 - Cryostasis

Day 25 - complete solution

A fitting end to a good edition of Advent of Code!

Score: 2.

eta (eta)

Somewhat contrived schema designing for a new chat system December 26, 2019 12:00 AM

[This is post 5 about designing a new chat system. Have a look at the first post in the series for more context!]

This post follows on from the previous one in the series1, wherein I had a shot at designing / specifying what state – persistent information, shared amongst all servers in a federated system – in group chats should look like. To summarize, we ended up with the group chat state containing three important things:

  • a set of roles, which are a way of grouping together capabilities available to users with said roles
    • Remember, capabilities are simple keywords like speak or change-topic that represent actions users can take
  • a list of memberships (users in the chat), together with the role for each chat member
  • non-user-related state, like the chatroom topic, which sort of follows a key-value store
    • We figured out that allowing arbitrary stuff to be stored in a room’s state was a bad idea, so this just contains…some random fields we’ll specify more formally later2.

In this post, we’ll look into how this state will be represented in server databases, and spec out a database schema for our server implementations to use3.

Unpacking our group chat state object

We could just store the group chat state as a big JSON blob in the database (indeed, if we were using something like MongoDB, that would be commonplace). However, this probably isn’t a good idea – for a number of reasons:

  • we’d have to retrieve the whole thing every time we wanted to access information about it, which is suboptimal performance-wise
  • things in the blob could quietly become inconsistent with the rest of the database if we didn’t check it all the time
  • the database wouldn’t be able to enforce any schemas; we’d have to do that in our application code
  • unless we (ab)use something like PostgreSQL’s native JSON support, our group chat state would be completely opaque from the database’s point of view – meaning it’d be hard to draw links between things in there (e.g. user IDs) and the rest of the database

These concerns are similar to the concerns third-normal form (3NF), a way of structuring database schemas from 1971, hopes to address. Under 3NF, you store your data in a set of database tables representing various objects, with each table having a primary key (an identifier or set of identifiers uniquely identifying that object). 3NF then states that other information in each table must only tell you something about the primary key, and nothing else; they aren’t allowed to depend on anything other than the value of the primary key.

As a more concrete example, let’s say we have a User table representing a user of our chat system, where the primary key is a combination of a username and the user’s home server. If I wanted to add a column describing what channels they’re in, for example, that would be fine – but if I wanted to also add the most recent messages for each channel, say (let’s imagine you’re designing a UI like WhatsApp’s, with a homescreen that shows this information), that wouldn’t be valid under 3NF, because the most recent messages are a property of the channel, not the user. We could end up having two users in the same channel, and forget to keep this ‘recent messages sent’ property consistent, which would lead to confusion!

So, using 3NF seems like a pretty good idea – and that’s exactly what we’re going to do! Our group chat state object doesn’t fit 3NF as one large blob, so we’re going to need to decompose it into a set of database tables that store the same information.

Let’s do some schema designing!

groupchats: our starting point

Of course, we need some object to represent a group chat. Since group chats are going to be shared across different servers, we have to choose some identifier that’s always going to be unique – we can’t just give them textual names, otherwise they’d be a possibility of them clashing. Instead, we’ll use a universally unique identifier (UUID) – it does what it says on the tin!

Our set of group chat state (for now) is in the list below. I’ve also put 🚫 next to things we can’t put in the groupchats table directly due to 3NF, and explained why.

  • topic / subject
    • This is purely a property of the group chat itself, and it doesn’t depend on anything else.
  • list of users 🚫
    • This one technically could be a property of the group chat, but making it one isn’t a great idea.
    • Firstly, that’d mean we’d have to use an array, which is generally frowned upon; it makes it harder to do things like database table JOINs when the users are stuck in an array attribute.
    • Also, we probably want to associate some information with a user’s membership, like their role. Doing that in the groupchats table would be a big 3NF violation.
  • list of defined roles 🚫
    • Roles have capabilities associated with them, so they should be their own thing.
    • Said otherwise, our primary key is the channel’s UUID, not (channel UUID, role), so storing capabilities (which depend on those two things) would be a 3NF violation.
  • mapping of users to what roles they have 🚫
    • Similarly to the last item, this mapping introduces a 3NF violation.
    • We’ll probably end up doing this one in a separate object, as discussed above.
  • list of servers involved in this group chat, as well as whether they’re sponsoring or not 🚫
    • Ditto, really.
  • current state version
    • We need to keep track of what state version we’re on (remember, the state version is a monotonically incrementing integer), for the purposes of our consensus algorithm.

So, now that that’s all clear, we’re left with group chat UUID, subject, and current state version. Here’s the SQL DDL:

CREATE TABLE groupchats (
    uuid UUID PRIMARY KEY,
    state_ver INT NOT NULL,
    subject VARCHAR -- can be null, if a group chat is unnamed.
);

(We’ll include the DDL for each table in our schema.)

groupchat_roles and groupchat_role_capabilities: storing group chat role information

Before we can actually express user memberships, we need something to store group chat role information; what roles exist, and what capabilities are associated with them. Behold:

CREATE TABLE groupchat_roles (
    role_id SERIAL PRIMARY KEY,
    groupchat_uuid UUID NOT NULL REFERENCES groupchats,
    role_name VARCHAR NOT NULL,
    UNIQUE(group chat_uuid, role_name)
);
CREATE TABLE groupchat_role_capabilities (
    role_id INT NOT NULL REFERENCES groupchat_roles,
    capability VARCHAR NOT NULL,
    UNIQUE(role_id, capability)
);

A row in the groupchat_roles table represents a role name in a group chat. Role names are unique per group chat, so the 2-tuple (groupchat_uuid, role_name) is unique; the only bit of information associated with a role is a list of capabilities, but we aren’t going to use arrays (q.v.), so the separate groupchat_role_capabilities table represents capabilities granted to users with a given role.

We’ve given roles an internal integer ID (role_id) just to make the primary key less annoying; the ‘real’ primary key should be (groupchat_uuid, role_name), but that’d be a real pain to refer to in the groupchat_role_capabilities table (we’d have to store both the UUID and the role name! So much wasted space!4), so we just use an integer instead.

groupchat_memberships: associating users with group chats

Now that we’ve got a group chat table, and a way of expressing user roles, we want a way to express the set of users that are in said group chats, along with the role they have. This is very simple – we have a (groupchat, user) primary key, and a role foreign key.

CREATE TABLE groupchat_memberships (
    groupchat_uuid UUID NOT NULL REFERENCES groupchats,
    user_id INT NOT NULL REFERENCES users,
    role_id INT NOT NULL REFERENCES groupchat_roles,
    PRIMARY KEY(groupchat_uuid, user_id)
);

I’m deliberately not going to mention what’s in the users table yet; we’re going to discuss that in another blog post.

groupchat_sponsoring_servers: associating sponsoring servers with group chats

We also need to associate servers with groups somehow. This is done for us partially, in that users reside on servers, and so naturally the set of servers associated with a given group chat are just the servers on which the members reside – but that doesn’t take into account the fact that some of these servers might be sponsoring servers for the purposes of federation.

Enter the groupchat_sponsoring_servers table:

CREATE TABLE groupchat_sponsoring_servers (
    groupchat_uuid UUID NOT NULL REFERENCES groupchats,
    server_id INT NOT NULL REFERENCES servers,
    PRIMARY KEY(groupchat_uuid, server_id)
)

Again, there’s this mysterious servers table we haven’t got to.


Next steps

We’ve now established what principles we’re going to use to design our schema (3NF), and we’ve got 5 lovely tables that express all the group chat state stuff we’ve been jabbering on about for the last two blog posts in proper SQL DDL that we could actually use!

We now need to tackle the other two important parts of our schema; we’ve done group chats, but messages and users are yet to be specified. We’ll need to discuss a few important points about how we design our protocol to fit both of these, as there’s more to it than you might think! (For example, a user might seem simple – just someone on a server somewhere, right? – but what about them having a profile picture, or a bit of ‘status’ text describing what they’re up to, or things like that?)

All of that will come in the next blog post in the series, coming (hopefully quite) soon to a website near you!


Hey, those deadlines really are quite scary. Wouldn’t it be lovely if we had a roadmap, with nice intermediate dates on it, so we could actually plan stuff?


  1. Reading this post might be somewhat confusing, if you haven’t read that one! 

  2. This blog series is more “broad brushstrokes” than “exhaustive details” – because otherwise both you and I would get horrifically bored. Don’t worry, though – the exhaustive details will turn up somewhere and be featured in the final thing… 

  3. Of course, this schema isn’t part of the specification. Server implementors can do whatever they want; there does, however, have to be a reference implementation out there… 

  4. If we’re really wanted to save space, of course, we’d refer to the group chats table itself using an integer instead of a UUID (because a UUID is a blob of four integers, I think). I definitely draw the line at foreign composite primary keys, though… 

December 24, 2019

Andrew Owen (yumaikas)

Tabula Scripta: A spreadsheet for small data-driven apps December 24, 2019 08:53 PM

One of my hobbies of the last two years has been writing webapps. Last year, in December and part of January, I wrote an app called Gills, a journaling tool written in Go. One feature I added to it was the ability to define custom pages for Gills written in Lua. Since then, I’ve used that ability to shape Gills in a lot of different ways.

I’ve written pages that link to the cadre of webcomics I like, pages to keep a checklist of what I need to use on caters when I was working on Qdoba, pages that gave me overviews of blog post drafts, pages that keep track of notes on a backlog, and so on.

Point is, over half of the appeal to Gills for me was having a CMS that I could script from my phone or a chromebook. The main problem with Gills-as-CMS, rather than Gills-as-Journal or Task management-lite, is that, at it’s core, Gills has two fundamental types of data: Notes and Scripts. Adding any kind of other idea means adding in a parsing layer over that.

The inciting itch for Gills was the revelation that journaling could help my mind focus. I’ve since found a lot of other things that also help my mind focus. The inciting itch for Tabula Scripta is trying to keep track of Doctor’s appointments, believe it or not. Now, I have normal calendars, and other sorts of things in the in-between, so it’s not strictly necessary, but I’ve always wanted something that was like Excel, but wasn’t an utter pain to use from my phone. A place to collect gas mileage records, where I am in certain comics, or other light-weight data-driven websites. Sometimes, I’d rather not have to set up a database to make something data-driven.

This is where the idea of spreadsheets comes in. They make a natural habitat for scratching out data sets, cleaning them, or doing ad-hoc data recording. Formulas even make it easy to do analysis over your data. Add in the idea of scriptable forms, and you can make capture simple and easy, and you can also make reports and charting possible, maybe even a daily summary of different things on your plate.

The thing is, there isn’t currently, to my knowledge, an actual Access-lite webapp. Not that I’ve done a lot of research into the type of software. I had done a lot of research into task-tracking and journaling software before Gills, and still have gotten a lot of mileage out of making my own. I’ve always had a bent towards making software tools for myself, I don’t see a reason to stop now.

So, with all of that rambling out of the way, what is my plan for Tabula Scripta?

Zeroth, Tabula Scripta is going to be a WebApp, designed to be accessible from Mobile Devices as well as desktop platforms. That doesn’t mean that it’ll be 100% easy to use from mobile phones at first, but I’ll try to avoid being mobile-phone hostile.

First, I want to get a functional beta of a spreadsheet editor working. Right now, the idea is for the client-side to send updates from the user as they come in, and for the server to send back all the affected cells. This means that formulas will mostly be calculated server-side, which allows me to keep the JavaScript code focused on presentation, rather than on modeling the intricacies of evaluating the formulas.

Alongside this, I’m developing a hierarchy of folder, scripts, forms and sheets. I might add notes in there as well, The intent of all of this is to make it easy to build a project in Tabula Scripta without having to fuss too hard with the outside world. Webapps whose state can be contained inside a single SQLite database + executable are easy to operate once one learns a reverse proxy.

After I get the notes and folders going, I plan on building out a scripting language, which I’m planning on calling TabulaScript. I anticipate it seeing significant evolution over the next 6 months, as I dogfood Tabula Sheets in different applications. For now, I’m planning on writing a stack-based language, for the sake of implementation simplicity, but long-term I think I’d like to aim for something a bit more conventional.

Why add my own scripting language on top of all of this? In part because I think I have a lot I learned from PISC that I’d like to try to apply, but also because having a custom scripting language will allow me to do some unique spreadsheet-focused things, like building in support for tracking spreadsheet dependencies throughout the language, so that I can detect data-cycles, even if you’re using heavily custom scripts. It remains to be see how practical that ends up being, but it’s at least a thought.

I also plan on making it possible to extend Tabula Scripta with your own formulas, built on this scripting language, rather than relying on only the set that it ships with. Below are some sketches of how it will probably look at first. Long term, I may try to make the scripting language a bit more conventional.

formula: [sum range] [
     0 ->'total
     per-cell: 'range [ getNumberFromCell 'total add ->'total ]
     'total ->'result
]

Tabula Script is also going to be at the heart of forms, I think this is where it will shine:

# A proposed form for quickly recording appointments:


UI: [
    markdown: "# Appointments" # Gets converted to HTML
    style: [myForm.css] # drops in a style tag into the HTML
    raw: [<em>Tabula Scripta is just ideas</em>]
    form: [ # Designed to make it easy to build up input forms
         use-table-layout # an HTML table to layout the inputs for
                          # this form for pretty alignment.
         # If an input has two arguments,
         # the first is the label,
         # and the second is the variable name that the for will expose on submit
         DateTimePicker: ["Appointment Date" "ApptDate"]
         # If it only has one argument, that is both the label *and* the name exposed on submit
         Text: ["Location"]
         LongText: ["Notes"]
    ]
]


Submit: [
 # A stack-based pipline for now
 "Appointments.sheet" get-sheet "B2" get-next-empty-row ->'anchor-point
 # Save row takes an first cell, saves the first value in that cell,
 # and then keeps saving cells along the current row.
 # There will probably be save-col and save-grid for similar other applications.
 'anchor-point [ 'ApptDate 'Location 'Notes ] save-row
]


I’ve not thought through every little thing just yet, but I hope these examples give a decent first impression.

Ultimately, the first version of Tabula Scripta is aimed at making it faster for software developers to execute on small-to-medium ideas, rather than aiming it 100% accessible to everyone. Because there are a lot of small ideas that are worth being able to build, from making it easy to set up a College Club recruitment website, to making it easy to record workouts, to making it easy to keep track of web-comics. I don’t think these ideas each necessarily need a dedicated webapp for them, but combined together, I think something like Tabula Scripta could make it easier to build more of those small projects.

Long term, I’d like to look into adding things like cron-jobs, making it easy to skin/style Tabula Scripta, some form of user-access control (but that is not a priority for version 1), and a way to have a notion of publicly visible things vs internal things that only the admin(s).

But, once I get sheets, scriptable forms (which I intend to double as reports), and extensible formulas, I’ll see where the itches guide next.

Jan van den Berg (j11g)

The Subtle Art of Not Giving a F* – Mark Manson December 24, 2019 07:00 PM

When this book came out it was seemingly everywhere. Especially in airport bookshops (I don’t know if that’s a good thing or not though). Or maybe I am imagining things and the book just sticks out, more than others, because of the swear word in the title, which is …. quaint?

The Subtle Art of Not Giving a F* – Mark Manson (2016) – 224 pages

I happened to find* a Dutch copy and thought: well, why not? Seemed short enough.

And sure enough you can get through it in a few short hours. And I admit it sure did help that the book opens with an anecdote about the writer Bukowski. So mr. Manson definitely had my attention.

It’s clear Mark Manson has a background as a blogger. His writing is proof of that. In an uncomplicated way he explains complicated things. He knows when to stop or when to speed things up and he knows how to entice the reader by mixing personal anecdotes with more or less interesting analogies.

Overall though I have mixed feelings about this book. The F-word is used way too much for my taste. But OK, that’s the gimmick, I get it. I found it more problematic that (I felt) I was treated as an idiot. Especially, since I think I am very well aware of what was going on:

Mark Manson has written a book — based on well established stoic principles — for the masses that do not otherwise read books.

Stoicism

However, there is very little mention of stoicism and the stoic ideas (just one throwaway sentence). Which is strange. Because there are plenty of other ideas and philosophies he cites (and even cites Tim Ferris, so I definitely know he is familiar with Seneca). So this makes it almost seem like he is trying to hide it? Maybe ‘the airport reader’ doesn’t care where these ideas come from, but I do. And he surely must know his ideas are well established stoic principles (from Aurelius, Seneca and others).

Regardless of all this, this book has a lot of truths in it. Truths I was mostly already familiar with, but nonetheless truths. And I don’t mind admitting that I did enjoy reading this book. Mark Manson has carved out a niche for himself by packaging stoic principles in a modern, in-your-face type of language. If that’s your thing, this might be for you. If not, I can point out that there are also many other books with the same message.

*Side-note: one reason I picked up this book is because I found a bookmark in Bregman’s (highly recommended) book that I just finished. The bookmark had an add for Mark Manson’s books. Which is a bit ironic since Bregman dissects and invalidates the findings of the famous Stephen Milgram experiment, while Manson cites the experiment results as a source. Curious, right?

The post The Subtle Art of Not Giving a F* – Mark Manson appeared first on Jan van den Berg.

Humanity’s Last New Year’s Eve – Niccolò Ammaniti December 24, 2019 03:25 PM

In 1996 — when he was just starting out — Ammaniti published a collection of short stories titled Fango. This particular story (which was also made into a movie) is one of the stories from Fango published as a separate book in 2010 and it’s absolutely vintage Ammaniti.

Humanity’s Last New Year’s Eve – Niccolò Ammaniti (1996/2010) – 143 pages

Being one of his earliest stories, it’s coarse and crude and a actually a bit too much for my taste. His later work is more delicate and smart. I think he tries just a little bit too hard to go for the shock effect. But nonetheless the unmistakable Amminiti touch is all over the place. And it’s still a treat.

Recognizable and unforgettable characters that come to live with only a few sentences: this is something Ammaniti has patented and what most writers strife for. It’s not a superficial skill. Amminiti embodies the notion that the better a writer you are, the more you are able to suffer.

The story itself is a clever, seemingly unrelated, chronological timeline of many different characters on the last evening of the millennium. And eventually they all intertwine in a big way. Because, of course they do.

The post Humanity’s Last New Year’s Eve – Niccolò Ammaniti appeared first on Jan van den Berg.

Eric Faehnrich (faehnrich)

Take a Floppy Disk Image December 24, 2019 05:00 AM

As a reverse of my Floppy Disk Resume post, I show how to take an image of a 3.5” high-density floppy disk.

Multiple times, I’ve been approached by people to investigate floppy disks they have. I’m making a quick post to share how to do that, and to document my steps as a reminder to myself.

I don’t know if this is the “right” way if you really want to get forensic with it, but my thoughts on how to do it are take an image of the entire disk to preserve it, then you can mount that copy or do whatever you want without further risk to the real disk itself.

First, I’d find the device. After I plug in my USB floppy drive, dmesg should say which device it is under /dev/. Then I copy the disk to an image file with dd. I use status=progress to know it’s still working because it can take some time, just note I don’t think that works with all versions of the command.

sudo dd if=/dev/sdf of=~/Documents/floppydiskimage status=progress

This creates the image file floppydiskimage. I left off a file extension because it doesn’t really matter, but it matters to some other programs and they get confused, so just left it off.

You can then see the image file you have.

$ file ~/Documents/floppydiskimage /home/eric/Documents/floppydiskimage: DOS/MBR boot sector, code offset 0x3c+2, OEM-ID "1^1^4IHC" cached by Windows 9M, root entries 224, sectors 2880 (volumes <=32 MB), sectors/FAT 9, sectors/track 18, reserved 0x1, dos < 4.0 BootSector (0x0), FAT (12 bit by descriptor+sectors), followed by FAT

Then you create a directory and mount the image to it. You can then explore the contents.

sudo mkdir /mnt/floppyimage
sudo mount ~/Documents/floppydiskimage /mnt/floppyimage
ls /mnt/floppyimage/

This is of course for disks that are still intact enough to have a file system. I’m pretty sure this method copies all the bytes regardless, so if the file system on the disk is bad you can look into other software tools to investigate. You’ll need to find different readers for different disks, like 5.25” floppies, but if they show up the same as this, this method should still work. If you really want to get into it even more, look into reading the “flux” of disks, the actual magnetic fields stored on disks.

December 23, 2019

Benjamin Pollack (gecko)

When class-based React beats Hooks December 23, 2019 04:07 PM

As much as I love exploring and using weird tech for personal projects, I’m actually very conservative when it comes to using new tech in production. Yet I was an immediate, strong proponent of React Hooks the second they came out. Before Hooks, React really had two fundamentally different ways to write components: class-based, with arbitrary amounts of state; or pure components, done as simple functions, with zero state. That could be fine, but the absolutely rigid split between the two was a problem: even an almost entirely pure component that had merely one little tiny bit of persistent state—you know, rare stuff like a checkbox—meant you had to use the heavyweight class-based component paradigm. So in most projects, after awhile, pretty much everyone just defaulted to class-based components. Why go the lightweight route if you know you’ll have to rewrite it in the end, anyway?

Hooks promised a way out that was deeply enticing: functional components could now be the default, and state could be cleanly added to them as-needed, without rewriting them in a class-based style. From a purist perspective, this was awesome, because JavaScript profoundly does not really want to have classes; and form a maintenance perspective, this meant we could shift functional-components—which are much easier to test and debug than components with complex state, and honestly quite common—back to the forefront, without having the threat of a full rewrite dangling over our heads.

I was able to convince my coworkers at Bakpax to adopt Hooks very quickly, and we used them successfully in the new, much richer content model that we launched a month ago. But from the get-go, one hook made me nervous: useReducer. It somehow felt incredibly heavyweight, like Redux was trying to creep into the app. It seemed to me like a tacit admission that Hooks couldn’t handle everything.

The thing is, useReducer is actually awesome: the reducer can easily be stored outside the component and even dependency-injected, giving you a great way to centralize all state transforms in a testable way, while the component itself stays pure. Complex state for complex components became simple, and actually fit into Hooks just fine. After some experimentation, small state in display components could be a useState or two, while complex state in state-only components could be useReducer, and everyone went home happy. I’d been entirely wrong to be afraid of it.

No, it was useEffect that should’ve frightened me.

A goto for React

If you walk into React Hooks with the expectation that Hooks must fully replace all use cases of class-based components, then you hit a problem. React’s class-based components can respond to life-cycle events—such as being mounted, being unmounted, and getting new props—that are necessary to implement certain behaviors, such as altering global values (e.g., history.pushState, or window.scrollTo), in a reasonable way. React Hooks, out-of-the-box, would seem to forbid that, specifically because they try to get very close to making state-based components look like pure components, where any effects would be entirely local.

For that reason, Hooks also provides an odd-one-out hook, called useEffect. useEffect gets around Hooks limitations by basically giving you a way to execute arbitrary code in your functional component whenever you want: every render, every so many milliseconds, on mounts, on prop updates, whatever. Congratulations: you’re back to full class-based power.

The problem is that, just seeing that a component has a useEffect1 gives you no idea what it’s trying to do. Is the effect going to be local, or global? Is it responding to a life-cycle event, such as a component mount or unmount, or is it “merely” escaping Hooks for a brief second to run a network request or the like? This information was a lot easier to quickly reason about in class-based components, even if only by inference: seeing componentWillReceiveProps and componentWillMount get overrides, but componentWillUnmount left alone, gives me a really good idea that the component is just memoizing something, rather than mutating global state.

That’s a lot trickier to quickly infer with useEffect: you really need to check everything listed in its dependency list, see what those values are doing, and track it up recursively, to come up with your own answer of what life-cycle events useEffect is actually handling. And this can be error-prone not only on the read, but also on the write: since you, not React, supply the dependency chain, it’s extremely easy to omit a variable that you actually want to depend on, or to list one you don’t care about. As a result, you get a component that either doesn’t fire enough, or fires way too often. And figuring out why can sometimes be an exercise in frustration: sure, you can put in a breakpoint, but even then, just trying to grok which dependency has actually changed from React’s perspective can be enormously error-pone in a language where both value identity and pointer identity apply in different contexts.

I suspect that the React team intended useEffect to only serve as the foundation for higher-level Hooks, with things like useMemo or useCallback serving as examples of higher-level Hooks. And those higher-level Hooks will I think be fine, once there’s a standard collection of them, because I’ll know that I can just grep for, I dunno, useHistory to figure out why the pushState has gone wonky. But as things stand today, the anemic collection of useEffect-based hooks in React proper means that reaching for useEffect directly is all too common in real-world React projects I’ve seen—and when useEffect is used used in the raw, in a component, in place of explicit life-cycle events? At the end of the day, it just doesn’t feel worth it.

The compromise (for now)

What we’ve ended up doing at Bakpax is pretty straightforward: Hooks are great. Use them when it makes sense. Even complex state can stay in Hooks via useReducer. But the second we genuinely need to start dealing with life-cycle events, we go back to a class-based component. That means, in general, anything that talks to the network, has timers, plays with React Portals, or alters global variables ends up being class-based, but it can in certain places even bring certain animation effects or the like back to the class-based model. We do still have plenty of hooks in new code, but this compromise has resulted in quite a few components either staying class-based, or even migrating to a class-based design, and I feel as if it’s improved readability.

I’m a bit torn on what I really want to see going forward. In theory, simply shipping a lot more example hooks based on useEffect, whether as an official third-party library list or as an official package from the React team, would probably allow us to avoid more of our class-based components. But I also wonder if the problem is really that Hooks simply should not be the only abstraction in React for state. It’s entirely possible that class-based components, with their explicit life-cycle, simply work better than useEffect for certain classes of problems, and that Hooks trying to cover both cases is a misstep.

At any rate, for the moment, class-based components are going to continue to have a place when I write React, and Bakpax allowing both to live side-by-side in our codebase seems like the best path forward for now.


  1. And its sibling, useLayoutEffect. ↩︎

December 22, 2019

Richard Kallos (rkallos)

A Bit of Math: Triple Elimination Tournament December 22, 2019 07:58 PM

A while ago, my father, who is a competitive backgammon player, had a puzzle for me. He wanted to know the fraction of players remaining after a certain number of rounds of play in a triple-elimination tournament. This blog post goes over my work to figure out the answer.

A triple-elimination tournament is more-or-less how it sounds. If you lose 3 times, you’re eliminated. Since the problem doesn’t need to deal with any sort of bracket structure, I’m leaving it out completely.

In any triple-elimination tournament, there are 3 groups of players; those with zero, one, and two losses. The number of players with zero losses is easiest to figure out. After each match, there is one winner and one loser. After the first round, half of the players won, and half of the players lost. After the second round, half of the winning players won again. The proportion of players with zero losses after x rounds of play is

zeroes(x) = 2^{-x}

The number of players with one loss is a little trickier. After the first round, half of the players lost the first game. After the second round, the fraction of players with one loss is the sum of those who lost their first game and won the second, and those who won their first game and lost the second. Each of those groups comprise one quarter of the total number of players, so the fraction of players with one loss at the end of the second round is still half. After the third round, half of the half of players with one loss won their third game, and half of the quarter of players with zero losses lost their third game, so the fraction of players with one loss after round three is (1/4) + (1/8) = (3/8) After the fourth round, half of the three eighths of players with one loss won their fourth game, and one eighth of the players with zero losses lost their fourth game.

If we expand a little further, a clear pattern emerges. 1/2, 2/4, 3/8, 4/16, 5/32. The proportion of players with one loss after x rounds of play is

ones(x) = \frac{x}{2^x}

Finally, there are the players with two losses. Seeing as it’s not possible to lose twice in zero or one rounds of play, I’m going to start counting at x = 2 rounds.

After two rounds, one in four players lost twice. After the third round, half of those players lost for the third time, resulting in their elimination, and half won, giving one in eight players. These players are joined by the quarter of players with one loss who lost their second match, giving 3/8. After the fourth round, half of the 3/8 won, giving 3/16. These lucky few are joined by 3/16 of people with one loss who lost their fourth match, giving 6/16. After the fifth round, half of the 6/16 won, giving 6/32, who are joined by the 4/32 of people with one loss who lost their fifth match, giving 10/16. If we look at the number of rounds, and the sequence of values in the numerator, we see [(2, 1), (3, 3), (4, 6), (5, 10)], which matches the following equation:

g(x) = \sum_{i = 1}^{x-1} i = \frac{x \times (x - 1)}{2}

That means the proportion of people with two losses is equal to

twos(x) = \frac{1}{2^x} \frac{x \times (x - 1)}{2}

Now, the number of people in the tournament after x rounds (where x is greater than 2) is equal to

f(x) &= zeroes(x) + ones(x) + twos(x) \\ &= \frac{1}{2^x} + \frac{x}{2^x} + \frac{1}{2^x} \frac{x \times (x - 1)}{2} \\ &= \frac{1}{2^x} \left( 1 + x + \frac{x \times (x - 1)}{2} \right) \\ &= 2^{x-1} \left( x^2 + x + 2 \right)

Ponylang (SeanTAllen)

Last Week in Pony - December 22, 2019 December 22, 2019 05:25 PM

We have quite a lot of exciting announcements this week. Releases for Ponyup and Corral are hot off the presses, we have a new committer, and more!

Jon Williams (wizardishungry)

Speed Up GoMock With Conditional Generation December 22, 2019 12:00 AM

Recently I’ve been frustrated by the slow reflection performance of Go’s mockgen when running go generate ./... on a large project. I’ve found it useful to use the Bourne shell built-in test command to conditionally invoke mockgen iif:

  • the destination is older than the source file
  • or the destination does not exist

go generate does not implement any kind of parallelism, so the slow performance of mockgen, while in source mode, has become a bit of a drag; thus –

package ordering

import (
	"context"
)

//go:generate sh -c "test client_mock_test.go -nt $GOFILE && exit 0; mockgen -package $GOPACKAGE -destination client_mock_test.go github.com/whatever/project/ordering OrderClient"

type OrderClient interface {
	Create(ctx context.Context, o *OrderRequest) (*OrderResponse, error)
	Status(ctx context.Context, orderRefID string) (*OrderResponse, error)
	Cancel(ctx context.Context, orderRefID string) (*OrderResponse, error)
}

On my fairly large project, this reduces many generate runs from the order of 45 seconds to 2 or 3 seconds.

(The above code sample probably works in source mode, but has been contrived for simplicity.)

December 21, 2019

Jan van den Berg (j11g)

De Meeste Mensen Deugen – Rutger Bregman December 21, 2019 12:30 PM

I don’t know what the English title translation for Rutger Bregman’s latest book will be. But I do know two things. One: there will be one. And two: it will be a bestseller.

De Meeste Mensen Deugen – Rutger Bregman (2019) – 528 pages

The title will be something along the lines of: Most People Are Decent. Which is a terrible translation by me and I hope they come up with something better, but it is the general premise of the book.

Bregman hails from the school of Malcolm Gladwell (who he mentions many times). He is a great storyteller, very easy to read and he is able to create a riveting narrative — from different anecdotes and studies — around a compelling sociological thesis. Overall this book sends a message of hope, which is greatly needed. So I can definitely see this book becoming an international bestseller.

To my surprise I was already familiar with most of the ideas, because I am a loyal listener of Bregman’s podcast. His writing style is very similar to his speaking style (which is not always a good thing, but in this case it is). And having listened to him for more than 30 hours, I think I read this book in his voice.

Gripes

However, even though I can agree on many things (like ‘following the news is bad for you’), there are still a few gripes I have with the book. (Not included the paradoxical reality that I probably disagree with the general premise but completely wholeheartedly agree with the conclusion of the book.)

Dissecting studies

Bregman is not a scientist, he is an investigative historical journalist, and a really good one. He has a keen nose for pointing out flaws in scientific studies and plotting them against historical backgrounds. And the conclusions he draws from those are seemingly valid. And he makes a good case for most of them, but here is the thing:

Pointing out something is wrong doesn’t directly make the opposite true.

And even though the opposite might very well be true, that is not how science works.

Sometimes such a conclusion makes perfect sense (i.e. I will not argue the correctness of the Stanford Prison Experiment), but in other places I think Bregman lets the narrative prevail the validity of the argument. Which — again — might still be true, but is not necessarily backed up by evidence (this mostly being the case with the Steven Pinker study, I think).

And sometimes the proof or argument is more anecdotal and the sample sizes too small to take for granted. But I also think Bregman is well aware of this. Because this is exactly what he does himself — pointing out flaws. Also he is well aware that history is in the eye of one who tells it and that today’s earth-shattering scientific study can be tomorrows scrap paper. Just something to keep in mind.

Factual fallacies

There is one in particular I can’t not point out, because it is one of those persistent false facts that are constantly being regurgitated. And because in this case it is about my hometown, I feel I need to address this one.

In a throwaway sentence on page 432 Bregman argues that my hometown — Urk — consistently has the most PVV (far-right party) voters. Sure, it helps the narrative, but I would say this is false. Just look at the last 10 (!) elections. There is only one time Urk voted definitely higher (and one time marginally). In all other elections Urk voted structurally lower for the PVV than the general vote.


I would not call this consistently higher (sources: Google and Wikipedia)

This is not meant to point out that this book or the premise is wrong. This is just one small example of always keeping your eyes and ears open and always keep thinking for yourself.

Gladwell

I think I have read everything by Gladwell, except his latest. And I think Bregman is also a fan. And he will probably be called the Dutch Gladwell when this book becomes that international bestseller. An unimaginative title (though arguably better than ‘that Davos guy’), but more importantly maybe also a wrong one.

Because Gladwell is under a lot of fire lately, mostly because he tends to oversimplify in an effort to push his conclusions. And I think Bregman does steer clear of this. He is much more careful in drawing conclusions, and doesn’t shy away from casting doubts on his conclusions. Which makes the reader part of the process. But he does call for a grandiose idea (A New Realism) which is another thing where Gladwell usually misses the target. But in Bregman’s case this grandiose idea follows naturally and is commendable.

Overall

Having stated some gripes, know that I am not a miser (just a stickler for facts), and I can safely say this is a wonderful book!

Bregman is not an optimist, nor a pessimist but a possibilist (yes, I borrowed that from the book). And I like that a lot! And I don’t know if Bregman knows this, but his ten rules from the last chapter share a great resemblance to Covey’s seven principles. Which I also greatly endorse.

And while this is not a scientific book, it is a book about hope, ideas and presenting a different perspective. And like I have stated many times before on my blog: getting a different perspective is always a good thing. So I would definitely recommend reading this book.

Side note 1: the effect of a cover sticker (to me) has probably the opposite of the intended effect. Because the TV program (where the sticker is from) needs the writer as much as the writer needs the TV program. And when I read on page 28 that Bregman himself calls a different book ‘magistraal’: it makes it even more lazy or at least ironic. So to me such a sticker is a warning: always make up your own mind.

Side note 2: from all the books I read this year, this was probably my favorite physical copy. Though not a hardcover, it was just the right size, the cover design is gorgeous and the font and typesetting are absolutely perfect! Of course, it also helps that Bregman is a great writer, but the overall design also make this book a pure delight to hold and read. I wish all my books were like this.

The post De Meeste Mensen Deugen – Rutger Bregman appeared first on Jan van den Berg.

December 18, 2019

eta (eta)

Designing group chat state for a new chat system December 18, 2019 12:00 AM

[This is post 4 about designing a new chat system. Have a look at the first post in the series for more context!]

This is potentially the point at which the previous few blog posts, which have talked in vague, vacuous terms about the possibility of designing some new utopian chat system, start to gradually become more about actually getting the work done – purely due to the fact that, as previously mentioned, I have a deadline! It’s been a while1 since I last said anything about the chat systems project – so, let’s get things rolling again with a practical post about our database schema, shall we?

The importance of a good schema

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

~ Linus Torvalds

It’s been said, as you can see in the quote above, that thinking hard about your data structures – the way you choose to store whatever data it is that you’re processing – is vitally important to the success of whatever it is that you’re building – so it’s worth having good ones, or else you’ll end up with hacky code that doesn’t quite work right2. If you have the right data structures, the code usually sort of flows in to fill the gaps and make everything work right – whereas if you do the code first, it doesn’t usually work as nicely.

Now, data structures come in all shapes and sizes – there are trees, hash tables, association lists, binary heaps, and all sorts of other fun stuff that you’d probably find in some CS textbook somewhere. We could use some of these for our new chat system – and, in fact, we probably will use the odd hash table or two. However, given I want to keep things nice and boring, sticking to proven, reliable technologies for this project3, we’re probably just going to store everything in a set of relational database tables (i.e. rows and columns!).

And, a schema is essentially a description of a set of what columns mean what, and what tables you’re going to have. Which is what I’m going to write now, so, without further ado…

What do we actually want?

It’s a good idea to start with a discussion of what data we have, and what we’re trying to get out of that data. We’ve said that we want our new system to support our own funky blend of federation, so that’s going to need to be accounted for. Naturally, a chat service will have a bunch of users, each with their own usernames, passwords, emails, and other information that needs to be stored about them. We’ll probably have some messages as well, given users tend to make a lot of those when you give them the chance to.

As we’ve also discussed before, the primary function of our chat service is to convey messages from their source to their recipient, and do so in a reliable manner. That implies further that, in addition to the messages themselves, we’d also benefit from storing information about message delivery – did the messages get through, in the end, or do we need to try again sometime?

Since we’re supporting group chats, those also need to have some information stored about them. Our federation protocol4 requires us to store multiple different ‘versions’ of group chat state (remember, ‘state’ refers to things like the list of members of the chat, who has admin rights, and what the current topic is) – because it’s based on this whole funky consensus protocol stuff, we’ll need to keep track of what servers have proposed changes, and which changes we decided to actually accept.

A model for group chat state

The consensus algorithms we looked into previously allow us to get a bunch of trusted servers (‘sponsoring servers’) to agree on something, where ‘something’ is just some arbitrary value – usually with some kind of monotonically incrementing identifier or version number. It thus follows that we need some model for what that ‘something’ will look like; how will servers communicate information about a room’s state to one another?

Users, administrators, and moderation

Actually, we haven’t even specified what we want this room state to look like yet – there are still some unresolved questions around things as simple as how administrator rights / operator powers should work in group chats. Different platforms do this in different ways, after all:

  • IRC has a system of ‘channel modes’, where users can be given flags such as +o (operator; can kick, ban, etc.) and +v (voice; can speak in muted channels).
    • Some servers then extend this, adding +h (‘half-operator’), +a (‘admin’), +q (‘owner’), and all sorts of other random gunk that confuses people and makes it hard to determine who can do what.
    • Of course, server administrators can override all of this and just do what they want, on most implementations.
  • Matrix has ‘power levels’ - each user has an integer between 0 and 100 assigned to them, which determines what they’re able to do. A set of rules (stored in the room state, alongside the power levels) specify what power levels map to what – for example, only people with power level >= 50 can set the topic, etc.
    • You’re not allowed to give people power levels higher than what you already have, and you can’t take away power from people who have more or the same amount of power as you. This is kinda required to make the whole thing work.
    • Because Matrix is decentralised, it’s possible to get yourself into a state where everyone’s lost control of a chatroom and you can’t get it back. Of course, though, this is quite easy to avoid, by making the software stop you from shooting yourself in the foot.
  • WhatsApp has ‘group admins’ and…that’s pretty much it5. You either have admin rights, in which case you can do everything, or you don’t and you can’t really do much.
    • This is very simple for users to understand.
    • However, it makes WhatsApp completely impractical for all sorts of use cases; anyone who’s found themselves kicked out of a WhatsApp chat after someone went crazy and banned everyone after pleading for admin ‘in order to help keep things civil’ probably knows what I’m talking about.
  • Discord has ‘roles’ - there’s a bit in server settings where you can create sets of permissions, called ‘roles’, that you grant to different users in order to empower them to do specific things6.
    • This is partially why Discord can play host to massive chatrooms for very popular games like PUBG and Fortnite: the system is very flexible, and suited to large-scale moderation.
    • However, it’s also perhaps quite confusing for some people, especially if you’re just using it for smaller-scale stuff like a private group chat.
    • Like Matrix, the roles are arranged in a kind of hierarchy; roles at the top of the hierarchy can make changes to roles below them, but not the other way round.

So, hmm, it might seem like there’s a lot to choose from here – but, in fact, it’s a bit more simple than you’d think. It’s immediately apparent that the more flexible Matrix/Discord systems can in fact be used to make more simple systems, like the IRC and WhatsApp ones; if all you want is group admin or not group admin, you can make two Discord roles (one with all permissions, one with none), or have two power levels (100 and 0, say, with appropriate rules for each one), and you’ve essentially got the more simple system using your complex one. (And you can do funky things with your user interface to hide away the added power, for those who don’t really need it.)

Taking some inspiration from this idea, and the age-old concept of an access-control list, here’s a proposed model. We’ll specify a set of capabilities that describe what actions can be taken – for example, speak, change-topic, mute-user, and so on – and then a set of roles, like the Discord roles, that are sets of capabilities. Each user gets a role, which in turn describes what they’re able to do. (If we want to make things work nicely for ex-IRC people, the roles can optionally come with a small letters, like +o and +v.) Unlike Discord and Matrix, there won’t be any hierarchy to the roles. Roles can only be modified by someone holding a capability called change-roles, and that’ll be the end of it. The sponsoring servers in our federation model will do this role check every time they receive a message or a request to change the state in some way, and refuse to apply the change if it’s not permitted.

The list of capabilities will eventually be written in some spec document somewhere, and thereby standardised across different server implementations. Essentially, they’ll work like IRCv3 capabilities, where vendors can make their own capabilities up if they want to (prefixing them with a valid domain name for whatever it is that they built).

To make things easier, the special capability * allows a user to make any change.

In JSON-esque syntax7, this would look a bit like:

{
    "roles": {
        "normals": ["speak", "change-topic"],
        "voiced": ["speak", "speak-in-muted-channel", "change-topic", "theta.eu.org/special-extension-power"],
        "admins": ["*"]
    },
    "memberships": {
        "server.org/user1": "normals",
        "server.org/adminuser1": "admins"
    }
}

Of course, there’s also state that doesn’t directly relate to users – the current group chat subject, whether or not guest users are allowed in, etc. Some state may have more complicated structure – for example, the Matrix people have a server ACL state event that lets you ban specific servers from taking part in rooms, which is pretty much its own little object with embedded arrays and booleans – which means we can’t just model this state as a simple stringly-typed key-value store, i.e.

{
    "state": {
        "subject": "example",
        "other_state_key": "other_value",
        // etc
    }
}

The question is, though, how extensible we want things to be: can users (or servers, for that matter) store arbitrary objects in our group chat state, or do they all have to follow some predetermined schema? Matrix takes the approach of letting users chuck in whatever they like (assuming they have the necessary power level) – essentially treating each group chat as a mini database – while pretty much every other platform restricts room state to a set of predetermined things.

Capability negotiation

I’m not entirely sure allowing arbitrary extensibility, in the way that Matrix does, is such a good idea – for two reasons8:

  1. Allowing people to just store arbitrary data in your group chat seems a bit too much like an abuse vector. It’s a chat system, not a distributed database; allowing people to just chuck whatever stuff they want in there is a bit much!
    • How would you display these arbitrary state events, given you don’t know anything about them?
    • You’d need some limit on size, to prevent people just using you as free storage?
    • In many jurisdictions, you’re responsible for content that gets put on your server. What if someone uploads stuff you’re not comfortable hosting, and stashes it away in some event your client doesn’t implement (and therefore doesn’t show)?
  2. Usually, things in the group chat state are there for a reason, and it doesn’t make sense for servers to just ignore them.
    • For example, consider when the Matrix folks rolled out the server ACL feature: servers not on the latest version of their software would just completely ignore it, which is pretty bad (because then it had no effect, as malicious actors could get into the room via the unpatched servers).
      • They ‘solved’ this by polling all the servers in a room for their current version number, and checking to see which ones still needed updating (which let them badger the server owners until it got updated).

Instead, it’s probably better to actually have some system – like the previously-mentioned IRCv3 capability negotiation9 – where servers can say “yes, I support features X, Y, and Z”, thus enabling those features to be used only if all of the sponsoring servers in a group chat actually support them. This solves the issue of extensibility quite nicely: non-user related state is governed by a strict schema, with optional extensions for servers that have managed to negotiate that.

Group Chat state summary

So, to sum up: we’ve managed to get a better idea of what the blob of group chat state, shared across all servers participating in a group chat and agreed upon via the federation consensus protocol, should look like: it’ll contain a capability-based system for regulating what users are allowed to do what, and it’ll contain a set of other pieces of strictly-typed non-user-related state, like the group chat subject! This might all seem a bit abstract for now, but it’ll hopefully become clearer once actual code starts getting written. On that note…

A note about timings

We managed to roughly sketch out group chat state over the course of around ~2,500 words (!), but there’s still all the other stuff to spec out: users, messages, and reliable delivery mechanisms. In addition, there are also a number of less conceptual things to sketch out, like how we’re going to ensure server-to-server transport happens securely (SSL? Encryption?10) and things like that.

And this all has to be done by March, at the absolute latest. Yay!

On a more hopeful note, we do actually have some code for this whole project – currently, it does the registration flow part of a standard IRC server (basically, the plan is to piggyback off of IRC for the client<->server stuff, to get things going), and has a very untested implementation of the Paxos consensus protocol. We’re using Common Lisp as our main programming language, which is fun11!

Anyway, the rest of the spec stuff will follow in the next blogpost (hopefully quite quickly after this one…!) – if anyone actually has the tenacity to read my ~2,500 words about the annals of random chat protocol stuff, that is! We will also actually show how all of this translates into a database schema, which was sort of the point. Oops.

To be continued…


  1. Err, two months. I was very busy applying to universities, okay! 

  2. A large part of my criticism of Matrix centered around the fact that they did something funky – and, in my view, perhaps unnecessary – with theirs. 

  3. Using some fancy NoSQL database has undone many a project in the past; SQL databases have had a lot of engineering and research effort put into them over the decades to make them work reliably and quickly, and that shouldn’t be given up lightly! 

  4. Read the ‘funky blend of federation’ blog post linked earlier, if you need a refresher! 

  5. I think they recently added some sort of limited permissions system for “who can change the topic: admins or all group members?” and things like that, but this is the gist of it. 

  6. They also serve a vanity / novelty purpose; look at me, I’ve got a colourful name! 

  7. Even though we might not be actually using JSON at the end of the day, it’s a pretty well-understood way to describe things. 

  8. This IRCv3 issue is about a very similar problem (allowing arbitrary client-only message tags on IRC), and is also worth a read! 

  9. These capabilities are, unfortunately, nothing to do with the user/administrator capabilities discussed earlier… 

  10. The Matrix people went and used Perspectives instead of regular SSL PKI for their stuff, and they eventually had to move away from it. It’s worth learning from that mistake! 

  11. It was going to be Rust, but then I started doing Lisp stuff and figured this would be more ‘interesting’… 

December 16, 2019

Andreas Zwinkau (qznc)

No real numbers December 16, 2019 12:00 AM

Interactive article about the pitfalls of integer and floating point arithmetic.

Read full article!

December 15, 2019

Derek Jones (derek-jones)

The Renzo Pomodoro dataset December 15, 2019 08:08 PM

Estimating how long it will take to complete a task is hard work, and the most common motivation for this work comes from external factors, e.g., the boss, or a potential client asks for an estimate to do a job.

People also make estimates for their own use, e.g., when planning work for the day. Various processes and techniques have been created to help structure the estimation process; for developers there is the Personal Software Process, and specifically for time estimation (but not developer specific), there is the Pomodoro Technique.

I met Renzo Borgatti at the first talk I gave on the SiP dataset (Renzo is the organizer of the Papers We Love meetup). After the talk, Renzo told me about his use of the Pomodoro Technique, and how he had 10-years worth of task estimates; wow, I was very interested. What happened next, and a work-in-progress analysis (plus data and R scripts) of the data can be found in the Renzo Pomodoro dataset repo.

The analysis progressed in fits and starts; like me Renzo is working on a book, and is very busy. The work-in-progress pdf is reasonably consistent.

I had never seen a dataset of estimates made for personal use, and had not read about the analysis of such data. When estimates are made for consumption by others, the motives involved in making the estimate can have a big impact on the values chosen, e.g., underestimating to win a bid, or overestimating to impress the boss by completing a task under budget. Is a personal estimate motive free? The following plot led me to ask Renzo if he was superstitious (in not liking odd numbers).

Number of tasks having a given number of estimate and actual Pomodoro values.

The plot shows the number of tasks for which there are a given number of estimates and actuals (measured in Pomodoros, i.e., units of 25 minutes). Most tasks are estimated to require one Pomodoro, and actually require this amount of effort.

Renzo educated me about the details of the Pomodoro technique, e.g., there is a 15-30 minute break after every four Pomodoros. Did this mean that estimates of three Pomodoros were less common because the need for a break was causing Renzo to subconsciously select an estimate of two or four Pomodoro? I am not brave enough to venture an opinion about what is going on in Renzo’s head.

Each estimated task has an associated tag name (sometimes two), which classifies the work involved, e.g., @planning. In the task information these tags have the form @word; I refer to them as at-words. The following plot is very interesting; it shows the date of use of each at-word, over time (ordered by first use of the at-word).

at-words usage, by date.

The first and third black lines are fitted regression models of the form 1-e^{-K*days}, where: K is a constant and days is the number of days since the start of the interval fitted. The second (middle) black line is a fitted straight line.

The slow down in the growth of new at-words suggests (at least to me) a period of time working in the same application domain (which involves a fixed number of distinct activities, that are ‘discovered’ by Renzo over time). More discussion with Renzo is needed to see if we can tie this down to what he was working on at the time.

I have looked for various other patterns and associations, involving at-words, but have not found any (but I did learn some new sequence analysis techniques, and associated R packages).

The data is now out there. What patterns and associations can you find?

Renzo tells me that there is a community of people using the Pomodoro technique. I’m hoping that others users of this technique, involved in software development, have recorded their tasks over a long period (I don’t think I could keep it up for longer than a week).

Perhaps there are PSP followers out there with data…

I offer to do a free analysis of software engineering data, provided I can make data public (in anonymized form). Do get in touch.

Ponylang (SeanTAllen)

Last Week in Pony - December 15, 2019 December 15, 2019 03:31 PM

Another week in the world of Pony has passed. What’s going on? Well quite a bit, a lot of which will be surfacing over the next couple months so keep your eyes peeled on Last Week in Pony. We have a lot of exciting stuff coming.

In the meantime, let’s get to that news…

Unrelenting Technology (myfreeweb)

AWS CloudFormation looks rather disappointing: the import functionality is a joke?? you have... December 15, 2019 12:45 AM

AWS CloudFormation looks rather disappointing:

  • the import functionality is a joke?? you have to make the template yourself, for some reason there’s no “make template from this real thing” button??
  • even that import thing cannot import an ACM certificate at all, literally says that’s unsupported.
  • the GUI designer thing does not know anything about CloudFront!

What.

New image upload/optimization for sweetroll2 December 15, 2019 12:28 AM

Website update: imgroll image optimization has been deployed. Now I can finally properly share pics! :D

Meme: I CAN HAS IMAGE PROCESSING?Download original

How it works: the micropub media endpoint in sweetroll2 uploads to S3 (with a callback URL in the metadata), returns an S3 URL. The imgroll Lambda notices the upload, extracts metadata, does processing, uploads resized versions to S3, POSTs to the callback a rich object with metadata and links to the sizes. But from there, there’s three ways of getting the object into the post instead of the URL:

  • if everything goes right, it’s processed quickly: the callback is forwarded to the post editor via Server-Sent Events and the URL gets replaced with the object right in the browser;
  • if the post is saved with the S3 URL before the processing is done: the callback handler modifies all posts with that URL in any field;
  • same but after the processing is done: the micropub endpoint replaces all URLs for which these callbacks have happened.

Also, the images are served from CloudFront now, on a CNAME subdomain (with a certificate issued by AWS ACM). Which has required.. switching DNS providers: the 1984 FreeDNS was being buggy and wouldn’t apply my changes. Now I’m on desec.io which is currently API-only and has no web UI, but that’s actually cool because I now have all the DNS records in a script that deploys them using curl.

December 14, 2019

Jeremy Morgan (JeremyMorgan)

Build and Deploy a Blazor App Without Touching a Windows Machine December 14, 2019 04:35 PM

Do you want to try out Blazor, but you're not a Windows person? Strictly a Linux developer? We'll you're in luck. One of the goals of .NET Core is to be cross platform, so today we'll see just how "cross platform" it really is with Blazor, Microsoft's hot new front end development project.

Follow along with me while we develop a Blazor app and deploy it without ever using a Windows machine. Here's what we're going to do:

  • Set up our (Linux) developer machine
  • Build a small Blazor app
  • Deploy it to a Linux VM

So let's get started.

1. Setup Your Desktop

First we have to set up a developer environment. To do this, I will start with a fresh Ubuntu desktop install, it's never had anything done to it so we can include all the steps you need to get started.

Blazor in Linux

Install Git

The first thing we want to do is install Git. You probably already have it, but it's one of the steps needed. Open up a terminal and type:

bash sudo apt install git

Once we have Git installed, we need to get some sort of IDE. I recommend Visual Studio Code, and that's what I'll be using in this tutorial.

Install Visual Studio Code

First, we need to install some dependencies:

bash sudo apt update sudo apt install software-properties-common apt-transport-https wget

Then we'll import the Microsoft GPG Key:

bash wget -q https://packages.microsoft.com/keys/microsoft.asc -O- | sudo apt-key add -

Next, we'll enable the VSCode repository:

bash sudo add-apt-repository "deb [arch=amd64] https://packages.microsoft.com/repos/vscode stable main"

and then install it:

bash sudo apt update sudo apt install code

And now you have Visual Studio Code installed.

Before you send me hate mail, yes I know you can go into the software manager and install VS Code. I am showing how to do it manually for a reason, always know what's going on with your Linux system and install things intentionally so you have full control. If you want to click the button in the software manager, that's cool too.

Blazor in Linux

Install .NET Core

To make Blazor work you must install .NET Core on your local machine. There are a few ways to do this. We're going to install the .NET Core SDK and Runtime, straight from the directions at Microsoft. Depending on the version of Linux you're using it may be different. See the instructions for your distro.

First you'll need to register the Microsoft key and feed:

bash wget -q https://packages.microsoft.com/config/ubuntu/19.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb sudo dpkg -i packages-microsoft-prod.deb

Then we'll install the .NET Core SDK

bash sudo apt-get update sudo apt-get install dotnet-sdk-3.1

You can verify your install is correct by typing in:

bash dotnet --version It should look like this: (though the version may be newer later)

Blazor in Linux

You've got the SDK Installed! Now we're ready to create a Blazor project on our local machine.

2. Create A Blazor App (WebAssembly)

So you've probably seen a few tutorials about creating Blazor apps, and most of them are Visual Studio in a Windows environment. It's super easy and the push of a button. However, in Linux or Mac you don't have a suitable version of Visual Studio, but you have the dotnet CLI, and it's nearly just as easy to create a Blazor app here.

Important Note

There are two ways we can run a Blazor application. From Microsoft:

Blazor is a web framework designed to run client-side in the browser on a WebAssembly-based .NET runtime (Blazor WebAssembly) or server-side in ASP.NET Core (Blazor Server)

We will run it client side first, which means:

  • The Blazor app, its dependencies, and the .NET runtime are downloaded to the browser.
  • The app is executed directly on the browser UI thread.

We will deploy it as a standalone deployment.

Standalone deployment - A standalone deployment serves the Blazor WebAssembly app as a set of static files that are requested directly by clients. Any static file server is able to serve the Blazor app.

There are some downsides to this method and we'll discuss those, but for now we want to build this so it can be hosted on any static server.

Blazor server is included in .NET Core 3.0, but WebAssembly is still in preview. So we need to install a template for it.

You can grab the template with the following command:

bash dotnet new -i Microsoft.AspNetCore.Blazor.Templates::3.1.0-preview4.19579.2

Next we'll create a new Blazor WebAssembly App

bash dotnet new blazorwasm -o BlazorDemo

Go into the directory and run it:

bash cd BlazorDemo dotnet run Your terminal window should look like this:

Blazor in Linux

Now you can open it up in a web browser and view the page:

Blazor in Linux

Open it up in VS Code and make your own modifications and play around with it.

So now we'll take this basic Blazor app and deploy it in different places.

3. Deploy it to a Linux Server (WebAssembly)

Let's see what it will take to push this to a regular Linux hosting server. For this I will use a Digital Ocean $5 special server. I'm creating it from scratch, again to show all the steps needed to get it up and running. We'll run CentOS 7 on it.

Blazor in Linux

Set up our Server

To set this up, I'm just going to update it:

bash sudo yum update

Then I'll install Nginx on it to serve up our static files.

bash sudo yum install epel-release sudo yum install nginx

If you already have a web server set up that serves static files, you don't have to follow these steps.

I'll then start up our Nginx server:

bash sudo systemctl start nginx

And it's ready to go.

Blazor in Linux

Build a Deployment (WebAssembly)

Now we want to deploy our application.

Now open a command prompt to the home folder of your application and run the following:

bash dotnet publish -c Release -r linux-x64 And we'll go into our publish folder and look for the dist folder:

bash cd bin/Release/netstandard2.1/linux-x64/publish/BlazorDemo/dist

Here I can see a listing of files.

Blazor in Linux

I will copy these over to my new Linux server. I'm using SCP, but you can use whatever method you feel works:

bash scp -r * web@sillyblazordemo.jeremymorgan.com:/usr/share/nginx/html

And now I load it up in my web browser:

Blazor in Linux

Well, that's pretty cool! So this .NET Core application has been turned into static files I can host anywhere. I can put this on IIS, or S3, or wherever and it will work great. You can even Host it on Github Pages!.

This is great because C# and Razor files are compiled into .NET assemblies, and Blazor WebAssembly bootstraps the .NET runtime and loads the assemblies all right there in the browser.

But it requires modern browsers and has a huge payload to download to the browser to do that.

To truly leverage the power of Blazor we should set up a Blazor Server package. If you really want to know the differences, you can learn more here.

4. Create a Blazor App (Blazor Server)

Now we will to create a Blazor Server application.

bash dotnet new blazorserver -o BlazorServerDemo This creates another Blazor application, and we type in

bash dotnet run and it spins up our local application:

Blazor in Linux

and it looks pretty familiar. Only now I don't have the rabbit head.

Blazor in Linux

So let's publish it. We will publish this as a self-contained application, so we can run it on our Nginx server without installing the .NET Framework.

bash dotnet publish -c Release --self-contained -r linux-x64

Then we'll go into our publish directory:

bash cd bin/Release/netcoreapp3.1/linux-x64/publish/

And we'll copy those over to an app directory created on the host (yours may vary)

bash scp -r * web@sillyblazordemo.jeremymorgan.com:/home/web/apps/BlazorServerDemo

5. Set up the Server for .NET Core

To run .NET Core applications (even self-contained) there are some dependencies. We will install the following for CentOS, if you're using a different OS, you can check what dependencies you need here.

Here's the command to install the needed dependencies with Yum:

bash sudo yum install lttng-ust libcurl openssl-libs krb5-libs libicu zlib

Next, there's an SELinux setting you need to change that might hang you up:

bash setsebool -P httpd_can_network_connect 1

Now we can just run the executable:

./BlazorServerDemo --urls http://0.0.0.0:5000 And we have a server up and ready at port 5000:

Blazor in Linux

And we can load it up in our Web Browser!

Blazor in Linux

We're now up and running, but we don't want to just run it listening on a port like this, so let's use Nginx as a reverse proxy. Shut down the process.

Then let's run this in the background, by adding the ampersand at the end:

./BlazorServerDemo --urls http://0.0.0.0:5000 &

Now if you type in "jobs" you should see it running.

Blazor in Linux

Now, create the following two folders:

bash sudo mkdir /etc/nginx/sites-available sudo mkdir /etc/nginx/sites-enabled And then edit your default file

bash vi /etc/nginx/sites-available/default

And add in the following into your server directive:

``` location / {

proxy_pass         http://localhost:5000;
proxy_http_version 1.1;
proxy_set_header   Upgrade $http_upgrade;
proxy_set_header   Connection keep-alive;
proxy_set_header   Host $host;
proxy_cache_bypass $http_upgrade;
proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header   X-Forwarded-Proto $scheme;

} ```

Now restart nginx:

bash sudo systemctl reload nginx

Now you see your new page up and running!

Blazor in Linux

Now you have a full fledged Blazor server on a Linux VM instance.

Conclusion

So in this tutorial we built a Blazor application (both WebAssembly and Blazor Server) on an Ubuntu machine and pushed it up to a CentOS machine. We didn't need any Windows machines to do it.

My intent was to show how easy it is to develop Blazor and .NET Core applications if you're a Linux developer. I started out as a Linux developer, fell in love with C#/.NET, and now I can do both things together and I love it. You will to.

.NET Core is amazing, and I think Blazor will be too. I'm excited to develop more Blazor applications and pushing the limits of it.

If you want to learn more about Blazor, Pluralsight has just released some really cool courses on it.

So try it out! Let me know what you think of Blazor and share your experiences in the comments!

December 13, 2019

Gustaf Erikson (gerikson)

Maktspel och mord: Politik i medeltidens Frankrike 1380-1408 av Michael Nordberg [SvSe] December 13, 2019 08:13 PM

Ett djupdyk kring Frankrikes inrikespolitik kring 1407. Författaren har inte mycket till övers för Barbara Tuchmans A Distant Mirror men jag tycker båda verken har sina förtjänster.

Nordberg gör ett försök att rehabilitera Ludvig av Orléans från burgundiska smädesskrifter, men är inte lika vältalig som Mantel om Thomas Cromwell.

Ponylang (SeanTAllen)

0.33.1 released December 13, 2019 02:12 PM

Pony version 0.33.1 is now available. The release features no breaking changes for users’ Pony code. We recommend updating at your leisure.

Jeff Carpenter (jeffcarp)

CIM Race Report December 13, 2019 04:32 AM

I did it!! I ran my first marathon! It was painful… but worth it! Despite still being pretty sore I can tell you I would definitely do it again. Solid Type II fun. My final time was 3:59:44, sneaking in just under 4 hours. You can view the full splits here and my run on Strava here. How the race went Despite consciously trying to keep my pace under control in the first half, I went out too fast.

December 12, 2019

Anish Athalye (anishathalye)

Experiments in Constraint-based Graphic Design December 12, 2019 05:00 AM

Standard GUI-based graphic design tools only support a limited “snap to guides” style of positioning, have a basic object grouping system, and implement primitive functionality for aligning or distributing objects. They don’t have a way of remembering constraints and relationships between objects, and they don’t have ways of defining and reusing abstractions. I’ve been dissatisfied with existing tools for design, in particular for creating figures and diagrams, so I’ve been working on a new system called Basalt that matches the way I think: in terms of relationships and abstractions.

Basalt is implemented as a domain-specific language (DSL), and it’s quite different from GUI-based design tools like Illustrator and Keynote. It’s also pretty different from libraries/languages like D3.js, TikZ, and diagrams. At its core, Basalt is based on constraints: the designer specifies figures in terms of relationships, which compile down to constraints that are solved automatically using an SMT solver to produce the final output. This allows the designer to specify drawings in terms of relationships like “these objects are distributed horizontally, with a 1:2:3 ratio of space between them.” Constraints are also a key aspect of how Basalt supports abstraction, because constraints compose nicely.

I’ve been experimenting with this concept, off and on, for the last couple years. Basalt is far from complete, but the exploration has yielded some interesting results already. The prototype is usable enough that I made all the figures in my latest research paper and presentation with it.

If you want to read about the core ideas behind Basalt, take a look at the philosophy section. If you want to hear about my experience using Basalt to design real figures, skip ahead to the case studies section. If you want to see how gradient descent can be used to solve figures, go to the gradient descent section.

Philosophy

Basalt’s programming model is as follows. Designers write programs that produce figures described in terms of relationships. These relationships are compiled down to constraints, which are then solved automatically.

Basalt is a DSL embedded in a general-purpose programming language, so it inherits support for functional abstraction, classes, and so on. Its constraint-based approach is key to supporting abstractions that compose nicely.

Specifying drawings in terms of relationships

Basalt’s programming model allows drawings to be specified in terms of relationships between objects. The standard GUI-based tools for graphic design don’t have support for this. They have a limited “snap to guides” style of positioning, plus a basic object grouping system and basic functionality for aligning or distributing objects. They don’t encode figures in terms of primitive objects and constraints; instead, actions like aligning objects imperatively update positions, which is what the underlying representation stores. CAD tools have support for constraints, but they aren’t meant for graphic design.

Consider a figure with the following description. “There is a light grey canvas. In the center is a blue circle with a diameter that is half the canvas width. Circumscribed in the circle is a red square.” The picture looks like this:

Circumscribed square

Without using constraints, code to generate such a figure in Basalt looks like this:


width = 300
height = 300

bg = Rectangle(0, 0, width, height,
    style=Style({Property.fill: RGB(0xe0e0e0)}))
circ = Circle(x=150, y=150, radius=75,
    style=Style({Property.stroke: RGB(0x0000ff), Property.fill_opacity: 0}))
rect = Rectangle(x=96.97, y=96.97, width=106.06, height=106.06,
    style=Style({Property.stroke: RGB(0xff0000), Property.fill_opacity: 0}))

g = Group([bg, circ, rect])

c = Canvas(g, width=width, height=height)

The designer has to figure out where everything goes, manually computing the anchor points and size. Instead, with constraints, the designer can write down the relationship between the shapes, and the tool will solve the figure:


width = 300
height = 300

bg = Rectangle(0, 0, width, height, style=Style({Property.fill: RGB(0xe0e0e0)}))
circ = Circle(style=Style({Property.stroke: RGB(0x0000ff), Property.fill_opacity: 0}))
rect = Rectangle(style=Style({Property.stroke: RGB(0xff0000), Property.fill_opacity: 0}))

# The Group constructor takes two arguments: the first is a list of objects,
# and the second lists additional constraints, equations that relate the
# objects to one another

g = Group([bg, circ, rect], [
    # circle is centered
    circ.bounds.center == bg.bounds.center,

    # circle diameter is 1/2 of canvas width
    2*circ.radius == width/2,

    # rectangle is centered on circle
    rect.bounds.center == circ.bounds.center,

    # rectangle is a square
    rect.width == rect.height,

    # rectangle is circumscribed
    rect.width == circ.radius*2**0.5
])

c = Canvas(g, width=width, height=height)

Every primitive shape knows how to calculate its own bounds based on its internal attributes: for example, a circle’s bounds are to , and a group’s bounds are defined in terms of min/max of the individual elements’ bounds. Attributes, as well as bounds, are allowed to be symbolic expressions: e.g. in the code above, the circle is not given a concrete center or radius, so these attributes are each automatically initialized to a fresh Variable(), and then the bounds depend on these variables. They are only assigned a concrete value when the constraint solver runs.

Basalt’s constraint-based approach allows the designer to think in terms of relationships and express those in the specification of the drawing itself, rather than having them be implicit and be manually solved by the designer.

The figure above could feasibly be drawn in Illustrator, though it might require taking out a pencil and paper to calculate positions of objects. Manually solving implicit constraints that are in the designer’s mind but not encoded in the tool, which is what designers currently do with existing programs, is painful and not scalable.

What if the figure design were changed slightly, for example the circle was to be inscribed in the rectangle? With Illustrator, it requires recomputing all the positions by hand; with Basalt, the change is one line of code:

-  # rectangle is circumscribed
-  rect.width == circ.radius*2**0.5
+  # circle is inscribed
+  rect.width == circ.radius*2

Side-by-side comparison of circumscribed and inscribed square

For a simple figure, making such a change by hand might be feasible, but what if the figure had hundreds of objects? With Illustrator, it would get out of hand to manually position all the objects precisely where they need to go based on the constraints in the designer’s mind. In Basalt, the constraint solver does the difficult job of determining exact positions for objects, and it scales well (for example, this figure has hundreds of objects).

Supporting abstraction

Constraints lead to a natural way of supporting abstraction, another key aspect necessary for designing sophisticated figures. Sub-components can be specified in terms of their parts and internal constraints. When these components are instantiated for use in a top-level figure (or a higher-level component), the constraints can simply be merged together.

Suppose we wanted to have an abstraction for a circumscribed square and then have four of them, arranged in a 2x2 grid, that fill the canvas. First, we can define the abstraction, a group consisting of a circle and rectangle, with some internal constraints:

def circumscribed_square():
    circ = Circle(style=Style({
        Property.stroke: RGB(0x0000ff),
        Property.fill_opacity: 0
    }))
    rect = Rectangle(style=Style({
        Property.stroke: RGB(0xff0000),
        Property.fill_opacity: 0
    }))
    return Group([circ, rect], [
        # rectangle is centered on circle
        rect.bounds.center == circ.bounds.center,
        # rectangle is a square
        rect.width == rect.height,
        # rectangle is circumscribed
        rect.width == circ.radius*2**0.5
    ])

Then, we can instantiate it multiple times and add the constraint that they are arranged in a 2x2 grid and are all the same size:

c1 = circumscribed_square()
c2 = circumscribed_square()
c3 = circumscribed_square()
c4 = circumscribed_square()

g = Group([c1, c2, c3, c4], [
    # arranged like
    # c1 c2
    # c3 c4
    c1.bounds.right_edge == c2.bounds.left_edge,
    c3.bounds.right_edge == c4.bounds.left_edge,
    c1.bounds.bottom_edge == c3.bounds.top_edge,
    # and all the same size
    c1.bounds.width == c2.bounds.width,
    c2.bounds.width == c3.bounds.width,
    c3.bounds.width == c4.bounds.width,
])

Finally, we can express the constraint that the figure fills the canvas:

width = 300
height = 300

bg = Rectangle(0, 0, width, height, style=Style({Property.fill: RGB(0xe0e0e0)}))

top = Group([bg, g], [
    # figure is centered
    g.bounds.center == bg.bounds.center,
    # and fills the canvas
    g.bounds.height == bg.bounds.height
])

c = Canvas(top, width=width, height=height)

When rendered, this code produces the following figure:

Four circumscribed squares

Constraints as an assembly language

Constraints aren’t necessarily what end-users should write directly, but constraints make for a nice assembly language: it’s easy to express higher-level geometric concepts in terms of Basalt’s constraints. For example, expressing that a set of objects are top-aligned is as simple as this:

def aligned_top(elements):
    top = Variable() # some unknown value
    return Set([i.bounds.top == top for i in elements])

Other concepts, like objects being horizontally or vertically distributed, being the same width or height, or being inset inside another object, can also be compiled down to constraints. Basalt provides some higher-level geometric primitives, and users can define their own.

Case studies

Adversarial turtle figure

Here is a figure I made for a paper (Figure 5), recreated in Basalt:

Adversarial turtle figure

The figure shows a number of images, along with a classification of those images given by the border color, laid out in a grid. The Basalt code for this figure defines two abstractions that are composed to make the figure:

def bordered_image(source: str, border_color, border_radius):
    image = Image(source)
    border = Rectangle(style=Style({Property.fill: border_color}))
    return Group([border, image], [
        inset(image, border, border_radius, border_radius)
    ])


def grid(elements: List[List[Element]], spacing):
    rows = [Group(i) for i in elements]
    return Group(rows, [
        # all elements are the same size
        same_size([i for row in rows for i in row.elements]),
        # elements in rows are distributed horizontally
        Set([distributed_horizontally(row, spacing) for row in rows]),
        # and aligned vertically
        Set([aligned_middle(row) for row in rows]),
        # rows are distributed vertically
        Set([distributed_vertically(rows, spacing)]),
        # and aligned horizontally
        aligned_center(rows)
    ])

This figure is simple enough that it would be possible to draw it in a traditional tool like Illustrator, but there are benefits to drawing it programmatically beyond avoiding the annoyance of manually laying out the figure in a GUI-based tool. The paper has 5 figures in this style, with varying images and sizes: using code allowed us to parameterize over these and do the work of designing the figure just once. Furthermore, the figures were meant to go in a paper, a fairly restricted format: for example, the width of our figures was fixed. We needed to decide on figure parameters, such as grid dimensions, border size, and image size, that would make the figures readable, and having the code generate the figures made it easy to explore this parameter space.

The original figures in the paper were actually made without Basalt, because Basalt didn’t exist at the time. Instead, I wrote Python code that directly painted pixels of an output image. The Basalt code is much nicer.

My brother Ashay and I designed the logo for Robust ML. Starting with the general ideas of shields and neural networks, Ashay drew a couple sketches on paper:

Robust ML logo sketches

Next, I sketched the logo in Illustrator. Even after we had the basic idea for the logo, it took a ton of iteration to figure out the details, including choosing the number of layers in the neural network, the number of nodes in each layer, and the spacing between the nodes in a given layer. Certain changes were easy, like tweaking colors using Illustrator’s recolor artwork feature. Other changes were extremely painful; for example, adding a node required a lot of manual labor, because it required moving nodes and lines as well as creating and positioning a new node and many new lines. Cumulatively, over dozens of iterations of the logo, I spent a couple hours just moving shapes around in Illustrator.

Since then, I have re-made the logo with Basalt, where exploring a parameter space is much easier with the help of the live preview tool.

Here is what the final output looks like:

Robust ML logo

Notary architecture figure

This is a figure made in TikZ, from a draft of a recent paper:

Notary architecture figure in TikZ

Here’s the code for the TikZ figure above. It uses lots of hard-coded positions, with commands like \draw [boxed] (-1,-2.5) rectangle (16.5,4). It was difficult to get the figure to look particularly pretty using TikZ. For the published version of the paper (Figure 1), I designed the figure using Basalt, also changing the styling a bit in the process:

Notary architecture figure in Basalt

The figure defines and makes use of a number of new abstractions built on top of Basalt’s primitives, including:

  • Component — a box, with text centered inside
  • Peripheral — a box with rounded corners, with text centered inside
  • Wiring — a bunch of line segments, with optional arrowheads, connecting two points, with optional label text

Describing these abstractions once and then using them many times made for a pleasant figure design experience. For example, Wiring was defined once and instantiated 11 times in the figure above. And of course, the entire figure was specified relationally. Concepts like the “Kernel domain” text being centered within its area were easy to express. The code makes use of many geometric primitives, expressing concepts such as a component’s inputs/outputs being evenly spaced along the edge.

Notary SoC illustration

Here’s another figure from the same paper (Figure 2):

Notary SoC illustration

The figures in the paper share abstractions for a consistent look; this figure makes use of the Component and Wiring abstractions from the previous figure. Constraint-based design was especially helpful for designing certain aspects of this figure, such as the spacing for the arrows at the bottom. For example, having the arrows spaced evenly along the entire width of the boxes looks pretty terrible; this was easy to test, only requiring a small change to the I/O Arrows abstraction:

-  distributed_horizontally(arrows.elements, periph_io_spacing),
-  arrows.bounds.center.x == periph.bounds.center.x
+  distributed_horizontally(arrows.elements),
+  arrows.bounds.left == periph.bounds.left,
+  arrows.bounds.right == periph.bounds.right

The constraint-based design, along with Basalt’s live preview tool, made it easy to select other parameters such that they looked good for the final figure.

Implementing this figure in code also made it easy to do things like matching the font size between the figure and the text in the paper. When designing figures in external tools, the graphics are often scaled afterwards to fit in the paper, but this messes with the effective font size. With Basalt, it was easy to design the overall figure, then size it appropriately so it could be included in the paper without rescaling, and finally tweak figure parameters so it could use the same font size as the paper.

Notary architecture animation

I needed to make a simplified version of the Notary architecture figure for a presentation. Furthermore, I needed to animate the figure over multiple slides to highlight and explain different aspects.

I started out implementing the presentation and the figure in Keynote, but designing the figure in Keynote was a bit of a pain due to having overlapping objects, as well as having details varying between phases of the build. I tried doing everything on one slide, but I ran into limitations with Keynote’s animation system (for example, it’s not possible to make an object appear, then disappear, then appear again). I tried duplicating some objects to work around this issue, but that quickly got out of hand, even with Keynote’s object list. I tried splitting the build over multiple slides, but then making global edits became annoying.

Finally, I switched to using Basalt for the figure (and switched the presentation to Beamer). It was straightforward to design a figure with details that varied depending on the build phase:

Aspects that changed based on the build phase took advantage of a BUILD variable, such as reset_color = red if BUILD == 3 else black. This BUILD variable was set via an environment variable, BUILD = os.environ['BUILD'], so different versions of the figure could be rendered by setting e.g. BUILD=2 and rendering the figure.

Notary noninterference figure build

The same presentation has another figure that is built up over a series of slides:

It uses the same BUILD variable approach as the previous figure to encode aspects that changed based on the build phase, such as the delimiters along the bottom of the figure:

labels = []
if BUILD >= 1: labels.append('Agent A runs')
if BUILD >= 2: labels.append('Deterministic start')
if BUILD >= 3: labels.append('Agent B runs')
if BUILD == 4:
    labels = ['Deterministic start']
delim = LabeledDelimiters(labels)

To render different phases of the build, the figure is rendered with different settings of the variable, e.g. BUILD=3.

This figure was designed to match the same visual language set up in the previous figure. This was easy to achieve by sharing the abstraction with the previous figure (instantiated here with slightly different parameters such as a lower wire count).

Again, Basalt’s live preview tool was helpful in choosing figure parameters, such that the figure looked good and the font size matched other figures in the presentation.

DFSCQ log figure

I wanted to try to recreate a complex figure from a paper (Figure 9) that I did not write. The original was made in Inkscape by one of the authors of the paper. The goal of the Basalt version was not to create a pixel-perfect recreation but to use Basalt’s approach of building up abstractions to see how it scales to complex figures. Here is a side-by-side comparison showing that Basalt is capable of replicating the figure:

The Inkscape figure is a large collection of manually positioned objects, presumably made with a bunch of copy-pasting. The Basalt figure, on the other hand, defines and uses a number of abstractions, all implemented as classes that build on top of Basalt’s primitives:

  • Layer — one of the stacked horizontal boxes with various contents, e.g. “LogAPI”
  • Blocks — a sequence of colored rectangles, e.g. the blocks to the right of the “activeTxn” label
  • Bracket — a “[” or “]” shape
  • BlockList — a list of blocks, consisting of an open bracket, a sequence of blocks separated by commas, and a close bracket, e.g. the contents to the right of the “committedTxns” label (uses Blocks and Bracket)
  • LabeledDelimiters — labels corresponding to parts of something, e.g. below the “disk log” (this abstraction is also used in the Notary noninterference figure above)
  • DiskLog — contents of a disk log, e.g. to the right of the “disk log” label (uses Blocks)
  • Explode — the pair of dotted lines showing one part of the figure exploded out, connecting subparts of certain stacked layers, e.g. in between the “DiskLog” and “Applier” layers
  • ArrowFan — arrows fanning out, e.g. in the “Applier” layer

This figure took some effort to implement. Basalt’s live preview tool was immensely helpful while working on this figure: seeing updates to the figure immediately after changing code was essential for building up this complex figure. The sliders were also somewhat helpful in getting the layout to match the original. Playing around with the parameters and seeing the figure redraw itself while maintaining its general structure is pretty neat:

Discussion

Solving constraints

Basalt allows designers to specify diagrams in terms of relationships and constraints, which boil down to equations over real variables. Basalt doesn’t place many restrictions over these equations, which gives great flexibility to the user, but it makes the underlying implementation more challenging because it has to solve these equations automatically. Basalt equations can have things like conjunctions and disjunctions, min/max terms, and products/quotients of variables, so in the general case it’s not possible to transform equations into some nice form like a linear program that’s computationally efficient to solve. Basalt equations fit into the logic of quantifier-free non-linear real arithmetic (QFNRA), so at least satisfiability is decidable, but solving isn’t necessarily going to be fast.

Currently, Basalt supports a couple different strategies for solving equations.

Z3

The current default uses the Z3 theorem prover, a powerful SMT solver that has support for QFNRA and much more. Encoding a constraints from Basalt into a Z3 query is straightforward, and in practice, Z3 works quite well.

Mixed integer linear programming

Basalt supports compiling a subset of constraints, ones that use linear real arithmetic along with a handful of operations like min/max, into a mixed integer linear program (MILP), which are then solved using a MILP solver, currently CBC through Google OR-Tools. Basalt’s MILP backend is strictly less general than the Z3 backend, and it seems to be slower than Z3 even on the queries it supports, but it was fun to implement.

Gradient descent

By far the least useful but most fun equation solver is one based on gradient descent. There’s a straightforward translation from Basalt constraints to a differentiable loss function, and this translation is sound. This means that the loss function has a global minima of if and only if the original constraints are satisfiable, and when the loss function does have a global minima of , every global minima corresponds in a straightforward way to a satisfying assignment for the original variables.

Now of course, this says nothing about the characteristics of this loss function and whether it’s amenable to optimization via gradient descent. But I went ahead and implemented it anyways, using TensorFlow to handle the task of computing derivatives.

Unlike the other approaches, gradient descent computes the solution iteratively, so it’s possible to animate the solving of a figure, and the results are quite amusing:

Uniqueness of solutions

Sometimes, users produce figures where the set of constraints has multiple solutions; in other words, the figure is not fully constrained. In my experience, this is almost always indicative of a bug in the figure’s code, because it’s unlikely that the user has two different-looking figures in mind where either figure is acceptable.

Basalt can determine when this situation occurs, because it’s straightforward to encode as an SMT query. If the original set of constraints was , and it has a solution , then we can ask the SMT solver if has a solution. If the new formula is unsatisfiable, then the original solution is unique, and the figure is fully constrained. In the case that the new formula is satisfiable, the SMT solver returns a concrete example of another valid solution. Comparing the two solutions can help a user figure out why a figure is under-constrained.

Next steps

There is still much work to be done on Basalt.

Right now, it’s an embedded DSL in Python, and the language is a bit clunky. Racket, a programming language for building programming languages, may be a better platform. Tej Chajed and I are currently working on building a more sophisticated version of Basalt as a Racket DSL.

Another thing I’m interested in exploring is having a visual editor for Basalt figures, a kind of hybrid between Basalt’s approach and existing GUI-based tools, because it’s often convenient to be able to drag objects around. This might involve using an approach similar to what’s used in CAD tools or perhaps something more exotic like the technique used in g9.js to have a two-way binding between the code and its visual representation.

There has been a long line of research in the area of constraint-based design, starting with Ivan Sutherland’s Sketchpad in 1963. As far as I know, most of the tools that use these ideas are GUI-based CAD programs, not practical graphic design tools.

Conclusion

Basalt is an approach to designing figures based on constraints. Basalt’s model allows a designer to think in terms of relationships between objects, and it makes it easy to build and reuse abstractions.

Basalt is still a work in progress. If you’re interested in hearing about updates, click here.

Thanks to Tej Chajed for many insightful discussions about Basalt, and thanks to Ashay Athalye, Kevin Kwok, and Curtis Northcutt for feedback on this post.

December 11, 2019

Jan van den Berg (j11g)

Me and You – Niccolò Ammaniti December 11, 2019 07:56 PM

It’s probably fair to say Niccolò Ammaniti is one of my favorite writers at the moment. This being his third book I read since last year.

Me and You – Niccolò Ammaniti (2010) – 126 pages

He has a gut stomping way of describing the human condition in a funny, recognisable and smart way. His dialogue, characters and plot ooze effortlessly from the pages. And especially his metaphors are one of a kind.

This book specifically touches on a more serious subject, but is still a treat to read.

If you have a couple of hours to spare. Read some Ammaniti.

The post Me and You – Niccolò Ammaniti appeared first on Jan van den Berg.

The Black Swan – Nassim Nicholas Taleb December 11, 2019 07:52 PM

When The Black Swan came out in 2007 it caused quite a stir. And understandably so. Taleb has a distinctive and fresh view of looking at the world through the lens of an emperic skeptic.

The Black Swan – Nassim Nicholas Taleb (2007) – 444 pages

He also likes to write, think out loud, and argue why he is right (and others are wrong). He greatly admires Kahneman, Poincaré and Mandelbrot. But he also dislikes a lot of things: the traditional — and wrong — application of the Bell Curve, the Nobel prize, suits, non-erudite people and a lot of people who work in the field of economics and statistical analysis.

This book came out just before the financial crisis. And people thought the crisis was proof of the Black Swan theory. But alas, that is exactly the wrong conclusion to make, and this is usually made by people who have not fully grasped the Black Swan concept (Taleb argues this crisis was anything but a Black Swan, because he actually DID see it coming). Either way, it did help popularize the book and therefore Taleb (who is less interested in making friends than being right).

Taleb is obviously a very bright and gifted man. He his able to distill his original scientific ideas into more or less popular prose. In my 2010 edition of the book there is a post-essay ‘On Robustness and Fragility’. Which is written after the initial success of The Black Swan. But Taleb uses this essay mainly to double down on why he was right. Which is a bit tiresome.

But nonetheless I enjoyed reading this book: it is written by someone who clearly enjoys writing and explaining things. But mainly because The Black Swan offers you a different view of the world which is always a good thing!

The post The Black Swan – Nassim Nicholas Taleb appeared first on Jan van den Berg.

Derek Jones (derek-jones)

Calculating statement execution likelihood December 11, 2019 03:59 PM

In the following code, how often will the variable b be incremented, compared to a?

If we assume that the variables x and y have values drawn from the same distribution, then the condition (x y) will be true 50% of the time (ignoring the situation where both values are equal), i.e., b will be incremented half as often as a.

a++;
if (x  y)
   {
   b++;
   if (x  z)
      {
      c++;
      }
   }

If the value of z is drawn from the same distribution as x and y, how often will c be incremented compared to a?

The test (x y) reduces the possible values that x can take, which means that in the comparison (x z), the value of x is no longer drawn from the same distribution as z.

Since we are assuming that z and y are drawn from the same distribution, there is a 50% chance that (z y).

If we assume that (z y), then the values of x and z are drawn from the same distribution, and in this case there is a 50% change that (x z) is true.

Combining these two cases, we deduce that, given the statement a++; is executed, there is a 25% probability that the statement c++; is executed.

If the condition (x z) is replaced by (x > z), the expected probability remains unchanged.

If the values of x, y, and z are not drawn from the same distribution, things get complicated.

Let's assume that the probability of particular values of x and y occurring are alpha e^{-sx} and beta e^{-ty}, respectively. The constants alpha and beta are needed to ensure that both probabilities sum to one; the exponents s and t control the distribution of values. What is the probability that (x y) is true?

Probability theory tells us that P(A < B) = int{-infty}{+infty} f_B(x) F_A(x) dx B) = int{-infty}{+infty} f_B(x) F_A(x) dx" /> B) = int{-infty}{+infty} f_B(x) F_A(x) dx" /> B) = int{-infty}{+infty} f_B(x) F_A(x) dx" title="P(A B) = int{-infty}{+infty} f_B(x) F_A(x) dx"/>, where: f_B is the probability distribution function for B (in this case: beta e^{-tx}), and F_A the the cumulative probability distribution for A (in this case: alpha(1-e^{-sx})).

Doing the maths gives the probability of (x y) being true as: {alpha beta s}/{s+t}.

The (x z) case can be similarly derived, and combining everything is just a matter of getting the algebra right; it is left as an exercise to the reader :-)

December 08, 2019

Ponylang (SeanTAllen)

Last Week in Pony - December 8, 2019 December 08, 2019 09:47 PM

After some scheduling conflicts with the weekly Sync meeting, we decided to reschedule for Tuesdays at 2pm EST starting this week.

Carlos Fenollosa (carlesfe)

Andreas Zwinkau (qznc)

How to do hermetic builds December 08, 2019 12:00 AM

Bazel does surprisingly little and it could be easily ported

Read full article!

December 07, 2019

Carlos Fenollosa (carlesfe)

KONPEITO, Gemini and Gopher December 07, 2019 10:40 PM

KONPEITO is quarterly Lo-fi hip hop & chill bootleg mixtape, distributed exclusively through the Gemini protocol. Each tape is a half-hour mix, clean on side A and repeated on side B with an added ambient background noise layer for atmosphere. Tapes are generally released in the first week of each meteorological season.

Okay, so there's a lot to unpack here.

  • KONPEITO is a very nice chill mixtape with a couple mp3 files that I found thanks to Tomasino on Mastodon
  • These files are distributed over the Gemini protocol, via this link
  • Gemini is a new internet procotol in between Gopher and HTTP
  • There is one Gemini client available, AV-98
  • The specs of the Gemini protocol can be accessed via this Gopher link
  • Gopher is a protocol that ruled over the internet once but got replaced by HTTP, what we know as "the Web" nowadays
  • You can reach Gopher links with lynx or a web proxy, but there are no modern graphical clients
  • Gopher is making a niche comeback among a few enthusiasts and you should definitely check it out if only for its nostalgic and historical value

Now that's one hell of a rabbit hole. If you reach the end you'll find a very cool mp3 mixtape.

Tags: internet, retro

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Jan van den Berg (j11g)

Churchill – Sebastian Haffner December 07, 2019 05:02 PM

Writing a Churchill biography is not an easy assignment, even though it would be difficult to butcher the job. Churchill lead an unprecedented rich and varied life and just writing down the bare facts would already be enough for a great story. But it would also be a massive undertaking.

Churchill – Sebastian Haffner (1967/2002) – 206 pages

Haffner took a different route. He chose the high-level helicopter approach. And he managed to produce an impressive sketch and sagacious analysis of Englands’ most famous political figure, by focusing key on phases of his life. Haffner has the correct required biographers’ distance and writes with ultimate authority. His sentences are carved in stone and are a delight to read. And in some places he is as tough in his verdict as the man himself was.

The post Churchill – Sebastian Haffner appeared first on Jan van den Berg.

December 05, 2019

Frederik Braun (freddyb)

Help Test Firefox's built-in HTML Sanitizer to protect against UXSS bugs December 05, 2019 11:00 PM

This article first appeared on the Mozilla Security blog

I recently gave a talk at OWASP Global AppSec in Amsterdam and summarized the presentation in a blog post about how to achieve "critical"-rated code execution vulnerabilities in Firefox with user-interface XSS. The end of that blog posts encourages the reader to participate the bug bounty program, but did not come with proper instructions. This blog post will describe the mitigations Firefox has in place to protect against XSS bugs and how to test them. Our about: pages are privileged pages that control the browser (e.g., about:preferences, which contains Firefox settings). A successful XSS exploit has to bypass the Content Security Policy (CSP), which we have recently added but also our built-in XSS sanitizer to gain arbitrary code execution. A bypass of the sanitizer without a CSP bypass is in itself a severe-enough security bug and warrants a bounty, subject to the discretion of the Bounty Committee. See the bounty pages for more information, including how to submit findings.

How the Sanitizer works

The Sanitizer runs in the so-called "fragment parsing" step of innerHTML. In more detail, whenever someone uses innerHTML (or similar functionality that parses a string from JavaScript into HTML) the browser builds a DOM tree data structure. Before the newly parsed structure is appended to the existing DOM element our sanitizer intervenes. This step ensures that our sanitizer can not mismatch the result the actual parser would have created - because it is indeed the actual parser. The line of code that triggers the sanitizer is in nsContentUtils::ParseFragmentHTML and nsContentUtils::ParseFragmentXML. This aforementioned link points to a specific source code revision, to make hotlinking easier. Please click the file name at the top of the page to get to the newest revision of the source code. The sanitizer is implemented as an allow-list of elements, attributes and attribute values in nsTreeSanitizer.cpp. Please consult the allow-list before testing. Finding a Sanitizer bypass is a hunt for Mutated XSS (mXSS) bugs in Firefox -- Unless you find an element in our allow-list that has recently become capable of running script.

How and where to test

A browser is a complicated application which consists of millions of lines of code. If you want to find new security issues, you should test the latest development version. We often times rewrite lots of code that isn't related to the issue you are testing but might still have a side-effect. To make sure your bug is actually going to affect end users, test Firefox Nightly. Otherwise, the issues you find in Beta or Release might have already been fixed in Nightly.

Sanitizer runs in all privileged pages

Some of Firefox's internal pages have more privileges than regular web pages. For example about:config allows the user to modify advanced browser settings and hence relies on those expanded privileges. Just open a new tab and navigate to about:config. Because it has access to privileged APIs it can not use innerHTML (and related functionality like outerHTML and so on) without going through the sanitizer.

Using Developer Tools to emulate a vulnerability

From about:config, open The developer tools console (Go to Tools in the menu bar. Select Web Developers, then Web Console (Ctrl+Shift+k)). To emulate an XSS vulnerability, type this into the console: document.body.innerHTML = '<img src=x onerror=alert(1)>' Observe how Firefox sanitizes the HTML markup by looking at the error in the console: “Removed unsafe attribute. Element: img. Attribute: onerror.” You may now go and try other variants of XSS against this sanitizer. Again, try finding an mXSS bug or by identifying an allowed combination of element and attribute which execute script.

Finding an actual XSS vulnerability

Right, so for now we have emulated the Cross-Site Scripting (XSS) vulnerability by typing in innerHTML ourselves in the Web Console. That's pretty much cheating. But as I said above: What we want to find are sanitizer bypasses. This is a call to test our mitigations. But if you still want to find real XSS bugs in Firefox, I recommend you run some sort of smart static analysis on the Firefox JavaScript code. And by smart, I probably do not mean eslint-plugin-no-unsanitized.

Summary

This blog post described the mitigations Firefox has in place to protect against XSS bugs. These bugs can lead to remote code execution outside of the sandbox. We encourage the wider community to double check our work and look for omissions. This should be particularly interesting for people with a web security background, who want to learn more about browser security. Finding severe security bugs is very rewarding and we're looking forward to getting some feedback. If you find something, please consult the Bug Bounty pages on how to report it.

December 02, 2019

Derek Jones (derek-jones)

Christmas books for 2019 December 02, 2019 09:08 PM

The following are the really, and somewhat, interesting books I read this year. I am including the somewhat interesting books to bulk up the numbers; there are probably more books out there that I would find interesting. I just did not read many books this year, what with Amazon recommends being so user unfriendly, and having my nose to the grindstone finishing a book.

First the really interesting.

I have already written about Good Enough: The Tolerance for Mediocrity in Nature and Society by Daniel Milo.

I have also written about The European Guilds: An economic analysis by Sheilagh Ogilvie. Around half-way through I grew weary, and worried readers of my own book might feel the same. Ogilvie nails false beliefs to the floor and machine-guns them. An admirable trait in someone seeking to dispel the false beliefs in current circulation. Some variety in the nailing and machine-gunning would have improved readability.

Moving on to first half really interesting, second half only somewhat.

“In search of stupidity: Over 20 years of high-tech marketing disasters” by Merrill R. Chapman, second edition. This edition is from 2006, and a third edition is promised, like now. The first half is full of great stories about the successes and failures of computer companies in the 1980s and 1990s, by somebody who was intimately involved with them in a sales and marketing capacity. The author does not appear to be so intimately involved, starting around 2000, and the material flags. Worth buying for the first half.

Now the somewhat interesting.

“Can medicine be cured? The corruption of a profession” by Seamus O’Mahony. All those nonsense theories and practices you see going on in software engineering, it’s also happening in medicine. Medicine had a golden age, when progress was made on finding cures for the major diseases, and now it’s mostly smoke and mirrors as people try to maintain the illusion of progress.

“Who we are and how we got here” by David Reich (a genetics professor who is a big name in the field), is the story of the various migrations and interbreeding of ‘human-like’ and human peoples over the last 50,000 years (with some references going as far back as 300,000 years). The author tries to tell two stories, the story of human migrations and the story of the discoveries made by his and other people’s labs. The mixture of stories did not work for me; the story of human migrations/interbreeding was very interesting, but I was not at all interested in when and who discovered what. The last few chapters went off at a tangent, trying to have a politically correct discussion about identity and race issues. The politically correct class are going to hate this book’s findings.

“The Digital Party: Political organization and online democracy” by Paolo Gerbaudo. The internet has enabled some populist political parties to attract hundreds of thousands of members. Are these parties living up to their promises to be truly democratic and representative of members wishes? No, and Gerbaudo does a good job of explaining why (people can easily join up online, and then find more interesting things to do than read about political issues; only a few hard code members get out from behind the screen and become activists).

Suggestions for books that you think I might find interesting welcome.

December 01, 2019

Carlos Fenollosa (carlesfe)

November 30, 2019

Unrelenting Technology (myfreeweb)

I was wondering why I can’t watch Twitch streams in Firefox… turns out November 30, 2019 11:46 PM

I was wondering why I can’t watch Twitch streams in Firefox… turns out they serve a broken player if your User-Agent does not contain Linux/Windows/macOS. Fail.

Jan van den Berg (j11g)

Capitalism without brakes – Maarten van Rossem November 30, 2019 12:21 PM

In his highly distinctive ‘tone of voice’, Maarten van Rossem provides the most succinct available lecture on the root causes which lead to the 2008 financial crisis.

Capitalism without brakes (Kapitalisme zonder remmen) – Maarten van Rossem (2011) – 120 pages

From the change in Keynes thinking (after the 1920s) to the Hayek and Friedman ideology — embodied by the neoliberal policies of Reagan and Thatcher. Van Rossem explains how culture and ideology shifted and, combined with technology and humanity’s never-ending greed, provided the perfect ingredients for what happened in 2008. And what will probably happen again; because humans never tend to learn.

Van Rossem doesn’t wait for the reader, he uses very direct, compelling argumentation, but provides few footnotes or sources. So it’s a matter of believing what the messenger says, as opposed to the messenger providing evidence for his claims. But when you do, this book is the most tight high-level historical overview of the 2008 financial crisis you can find.

Side note: I found it remarkable that van Rossem (as a historian) shares similar ideas with Nassim Nicholas Taleb (who tends to dislike what historians do). E.g. they both subscribe to the idea of people’s general misinterpretation of the Gaussian distribution (the Bell curve). And they both share their admiration for Kahneman and they both seem to dislike the Nobel prize.

The post Capitalism without brakes – Maarten van Rossem appeared first on Jan van den Berg.

November 29, 2019

Jan van den Berg (j11g)

Dream Dare Do – Ben Tiggelaar November 29, 2019 07:34 PM

Dare Dream Do (Dromen Durven Doen) is one of the all-time bestselling Dutch self-management books. Tiggelaar is a popular figure and he has a charming, personal and pragmatic writing style.

Dear dream do / Dromen durven doen – Ben Tiggelaar (2010) – 152 pages

There are few new concepts in the book (at least for me). Practices like visualisation, goalsetting, checking goals, taking responsibility and being grateful. These are all familiar concepts, shared by many other similar well-known management theories.

And with that, Tiggelaar shows a direct linage with the likes of Covey, Kahneman and even Aurelius. But you wouldn’t know this if you’re not familiar with these theories. And that is precisely what makes this a good book. Tiggelaar has condensed this knowledge into an approachable, coherent, concise, practical and actionable book that can be read in a few short hours (or one sitting in my case). And if that’s what you’re looking for, go give it a read.

The post Dream Dare Do – Ben Tiggelaar appeared first on Jan van den Berg.

November 25, 2019

Pete Corey (petecorey)

Count the Divisible Numbers November 25, 2019 12:00 AM

Let’s try our hand at using a property test driven approach to solving a Codewars code kata. The kata we’ll be solving today is “Count the Divisible Numbers”. We’ll be solving this kata using Javascript, and using fast-check alongside Jest as our property-based testing framework.

The kata’s prompt is as follows:

Complete the [divisibleCount] function that takes 3 numbers x, y and k (where x y), and returns the number of integers within the range [x..y] (both ends included) that are divisible by k.

Writing Our Property Test

We could try to translate this prompt directly into a property test by generating three integers, x, y, and k, and verifying that the result of divisibleCount(x, y, k) matches our expected result, but we’d have to duplicate our implementation of divisibleCount to come up with that “expected result.” Who’s to say our test’s implementation wouldn’t be flawed?

We need a more obviously correct way of generating test cases.

Instead of generating three integers, x, y, and k, we’ll generate our starting point, x, the number we’re testing for divisibility, k, and the number of divisible numbers we expect in our range, n:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(), fc.nat(), (x, k, n) => {
      // TODO ...
    })
  );
});

Armed with x, k, and n, we can compute the end of our range, y:


let y = x + n * k;

Next, we’ll pass x, our newly commuted y, and k into divisibleCount and assert that the result matches our expected value of n:


return n === divisibleCount(x, y, k);

Our final property test looks like this:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(), fc.nat(), (x, k, n) => {
      let y = x + n * k;
      return n === divisibleCount(x, y, k);
    })
  );
});

Beautiful.

Our First Solution

Coming up with a solution to this problem is fairly straight-forward:


const divisibleCount = (x, y, k) => {
  return _.chain(y - x)
    .range()
    .map(n => x + n)
    .reject(n => n % k)
    .size()
    .value();
};

We generate an array of integers from x to y, reject those that aren’t divisible by k, and return the size of the resulting array.

Unfortunately, this simple solution doesn’t work as expected. Our property test reports a failing counterexample of [0, 0, 1] values for x, k, and n:


$ jest
 FAIL  ./index.test.js
  ✕ it works (10ms)
  
  ● it works
  
    Property failed after 1 tests
    { seed: 1427202042, path: "0:0:0:1:0:0:0", endOnFailure: true }
    Counterexample: [0,0,1]
    Shrunk 6 time(s)
    Got error: Property failed by returning false

Looking at our solution, this makes sense. The result of n % 0 is undefined. Unfortunately, the kata doesn’t specify what the behavior of our solution should be when k equals 0, so we’re left to figure that out ourselves.

Let’s just set up a precondition in our test that k should never equal 0:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(), fc.nat(), (x, k, n) => {
      fc.pre(k !== 0);
      let y = x + n * k;
      return n === divisibleCount(x, y, k);
    })
  );
});

Great!

Unfortunately, there’s another problem. Without putting an upper bound on the size of n * k, our solution will generate potentially massive arrays. This will quickly eat through the memory allocated to our process and result in a crash.

Let’s add some upper and lower bounds to our generated k and n values:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(-100, 100), fc.nat(100), (x, k, n) => {
      fc.pre(k !== 0);
      let y = x + n * k;
      return n === divisibleCount(x, y, k);
    })
  );
});

Perfect. Our starting integer, x, can be any positive or negative integer, but our generated values of k are clamped between -100 and 100, and n ranges from 0 to 100. These values should be large enough to thoroughly test our solution, and small enough to prevent memory issues from arising.

Our Second Solution

In hindsight, our solution seems to be making inefficient use of both time and memory. If we consider the fact that our property test is computing y in terms of x, n, and k, it stands to reason that we should be able to compute n, our desired result, in terms of x, y, and k. If we can manage this, our solution will run in both constant time and constant space.

Let’s use some algebra and work our way backwards from calculating y to calculating n. If y = x + n * k, that means that y - x = n * k. Dividing by k gives us our equation for computing n: n = (y - x) / k.

Let’s replace our original divisibleCount solution with this equation:


const divisibleCount = (x, y, k) => (y - x) / k;

And rerun our test suite:


$ jest
 PASS  ./index.test.js
  ✓ it works (8ms)
  
  Test Suites: 1 passed, 1 total
  Tests:       1 passed, 1 total

Wonderful!

November 24, 2019

Carlos Fenollosa (carlesfe)

Phil Hagelberg (technomancy)

in which we get socially rendered November 24, 2019 02:42 PM

I joined the fediverse in early 2017. If you haven't heard of it, it's a distributed network providing social media type features without any one centralized authority. Users are in control of their data, and anyone can run their own servers with their own rules, including the ability to block all traffic from other servers if they tolerate abusive behavior, etc.

my profile

It took me a while to get to the point where I was comfortable on the Fediverse. I created an account on the oddly-named icosahedron.website in April, but it didn't stick immediately. It didn't feel like there was much going on because I hadn't found that many users to follow. After a few months of poking my head around, clicking around a bit, and then forgetting about it for another few weeks, I finally got enough momentum for it to be a compelling place for me, and by November I stopped using my Twitter account altogether. I had felt since the 2016 US election that Twitter had spiraled into a worse and worse condition; the site felt engineered to drive more and more "engagement" at the expense of human misery. So making a clean break dramatically improved my mental well-being.

Even tho it makes a few things more complicated (like finding new users to follow[1]), I deeply appreciate the emphasis on user empowerment that's inherent in the design of the fediverse. One of the cornerstones of this empowerment is the ability to run your own fediverse server, or instance. The most common fediverse server software is Mastodon, which could be considered the flagship of the fediverse. While it's very slick and full-featured, a big downside of Mastodon is that it's difficult to run your own server. Administering it requires running a Ruby on Rails application with Node.js, Postgres, Redis, Nginx, ElasticSearch, and more. For servers which serve a medium-to-large community, this overhead can be justifiable, but it requires a lot of mental energy to get started. There are a lot of places where things could go wrong.

The Pleroma project aims to reduce this by creating a dramatically simpler fediverse server. Running a Pleroma server requires just an Elixir application, a Postgres database, and Nginx to handle TLS. Since Elixir is a lot more efficient than Ruby, it's even possible to run it on a low-powered machine like a Raspberry Pi[2]. I set up my own Pleroma server a few weeks ago at hi.technomancy.us. It's running on the Pi in the photo.

a raspberry pi and hard drive

One downside of Pleroma being simpler is that it's really just an API server. All your interaction in the browser goes thru a separate Javascript application called pleroma-fe, and mobile clients like Tusky just hit the JSON API. The API-first design makes sense when you're using the application to browse, post, search, etc, but a big downside is that when you want to share a post with someone else, they have to load all of pleroma-fe just to see it. If you share it with someone who has scripting turned off, then they'll just see a blank white page, which is very unfriendly[3].

I wanted to start using Pleroma, but I wasn't comfortable with this unfriendly behavior. I wanted it so that if I sent a link to a post to a friend, the server would send them the HTML of the post![4] So I took a course of action I never could have taken with a centralized, commercial social network: I fixed it myself. I found that there had been an attempt to start this 8 months ago which had more or less been forgotten, so I used that as my starting point.

Pleroma is written in Elixir, which I had never used before, but I had learned Erlang a few years ago, and many of the core concepts are the same. Since I based my work on the old initial sketch, I was able to make quick progress and add several features, like threading, media, content warnings, and more. I got some really helpful review about how to improve it and test it, and it got merged a couple weeks ago. So now you can see it in action. I'm thankful to the Pleroma developers for their helpful and welcoming attitude.

pleroma screenshot

One of the reasons this is important to me is that I normally use a laptop that's a bit old. But I think it's important for software developers to keep some empathy for users who don't have the latest and greatest hardware. On my laptop, using the pleroma-fe Javascript application to view a post takes eight seconds[5] if you haven't already loaded pleroma-fe (which is the main use case for when you're sharing a link with a friend). If you have it loaded already, it's still 2-3 seconds to load in pleroma-fe. When you have the server generate the HTML, it takes between 200 and 500 milliseconds. But 500ms is nearly a worst-case scenario since it's running on a tiny Raspberry Pi server; on a high-end server it would likely be several times faster.

Running your own fediverse server is still much harder than it should be. I've glossed over the annoyances of Dynamic DNS, port forwarding, and TLS certificates. There's still a lot of opportunity for this to become better. I have a vision of a system where you could sign up for a fediverse server and it would pre-generate an SD card image with Pleroma, Postgres, and Nginx preinstalled and configured with the domain name of your choice, but right now shortcomings in typical consumer-grade routers and consumer ISPs make this impractical. But it's come a long way, and I think it's only going to get better going forward.

If you're interested in running your own fediverse server, you might find runyourown.social helpful, tho it focuses on Mastodon instead of Pleroma. If you're not interested in running your own server, check out instances.social for a listing of servers with open registration. There's never been a better time to ditch corporate social media and join the fediverse!


[1] When people get started on the Fediverse, the first question is just "which server should I choose?" As someone who's been around a while, it's tempting for me to say "it doesn't matter as long as you pick a place with a code of conduct that disallows abusive behavior; all the servers talk to each other, so you can follow any user from any server that hasn't de-federated yours." The problem is this isn't quite true due to the bootstrapping problem; when you're trying to find interesting people to follow, you'll have an easier time if you land on a server where people have interests that overlap with yours.

In a distributed system, one server can't know about every single user in the entire network; it's just too big. So server A only knows about users on server B if someone from server A has already made a connection with a user on server B. Once you choose an server, your view of the network will be determined by the sum total of those followed by your server-mates.

[2] Just don't make the same mistake I did and try to run Postgres on an SD card! I tried this initially, and after a few days I started seeing unexplained segmentation fault loops from Postgres. Apparently this is common behavior when a disk failure corrupts the DB's files. Moving everything over to an external USB drive made the problem go away, but it was certainly a surprise. Everything else can run on the SD card but the database.

[3] Note that this problem also occurs with Twitter. Mastodon is slightly better, but it still refuses to show you images or content-warnings without scripting.

[4] You used to be able to take this very basic behavior for granted, but since the arrival of the "single-page app", it has become some kind of ancient forgotten wisdom.

[5] Eight seconds sounds like a very slow application (and it is!) but it's hardly the worst offender for single-page applications. Trello takes 10 seconds, Jira takes 16 seconds, and Slack takes 18 seconds.

November 23, 2019

Jeremy Morgan (JeremyMorgan)

10 Places to Learn Golang November 23, 2019 04:19 AM

Though it's now ten years old, Golang (Google's Go language) is one of the fastest growing languages out there right now. Do you want to learn it? Here are 10 great places to start.

1. Basics of GoLang for Beginners

How to Learn Golang]

Click Here

This is a great place to get started from zero. If you're committed to downloading and installing it shows you how, and you dig into some basic stuff. It's a good start.



2. Go.Dev

How to Learn Golang

Click Here

This is a fairly new source so I haven't been able to dig into all the features of it, but it has tutorials for installing, doing a hello world, etc. It seems like the perfect jumping off point.



3. Tour of Golang

How to Learn Golang

Click Here

This is a good place to start to get started as easy as possible. It doesn't have you building real applications, but you also don't have to install or setup anything.



4. Go by Example

How to Learn Golang

Click Here

This one digs a little deeper and tackles things like pointers and concurrency which are tough at first.



5. Golang Bootcamp

How to Learn Golang

Click Here

This is a book in downloadable form or online, and it's well organized and clear for the basics.



6. Introducing Go

How to Learn Golang

Click Here

This is a great book to go from total beginner to digging into deeper topics. It's well worth the price and is a great way to get started writing real application.



7. Justforfunc YouTube Channel

How to Learn Golang

Click Here

The justforfunc youtube channel is great for digging into to Golang. Francesc Campoy is entertaining and knowledgeable. Check it out.



8. Introduction to Programming in Go

How to Learn Golang

Click Here

This is a little older but still a great book for getting the basics down. It's fun and well organized.



9. Golang Tutorial series

How to Learn Golang

Click Here

This is a great series of tutorials for learning go, that's easy to follow and fun.



10. Gophercises

How to Learn Golang

Click Here

So now you know some Go and want to play? This is a great place to polish your skills. This site has some cool coding exercises to try out.



GO Learn some Golang!!

Golang is awesome. I love working with it, and you probably will to. The resources above are great places to get your feet wet and really start developing some cool stuff. What do you think of the list? Should I add anything? Yell at me on Twitter with suggestions!

Oh yeah, and I'm going to be doing more streaming stuff on Twitch so check it out!

November 22, 2019

Grzegorz Antoniak (dark_grimoire)

C++: Shooting yourself in the foot #5 November 22, 2019 06:00 AM

Consider this code:

#include <iostream>
using namespace std;

struct Base {
};

template <typename Type>
struct Derived : Base {
    Type field;
};

You can see two classes here. The class Base contains nothing, it's a placeholder. Then, there's also class Derived, which contains one field. The type of this field is specified in the …

November 21, 2019

Jeff Carpenter (jeffcarp)

Marathon Training Update November 21, 2019 04:47 PM

I’m signed up for the California International Marathon (a.k.a. CIM) in Sacramento, which is in less than 3 weeks! This will be my first marathon (the SF Marathon this year was supposed to be, but I gave myself a toe injury by overtraining). My goal is simply to finish. I think having any sort of time-based goal would risk pushing me past the point of injury during the race. Am I ready?

November 20, 2019

Derek Jones (derek-jones)

A study, a replication, and a rebuttal; SE research is starting to become serious November 20, 2019 02:41 PM

tldr; A paper makes various claims based on suspect data. A replication finds serious problems with the data extraction and analysis. A rebuttal paper spins the replication issues as being nothing serious, and actually validating the original results, i.e., the rebuttal is all smoke and mirrors.

When I first saw the paper: A Large-Scale Study of Programming Languages and Code Quality in Github, the pdf almost got deleted as soon as I started scanning the paper; it uses number of reported defects as a proxy for code quality. The number of reported defects in a program depends on the number of people using the program, more users will generate more defect reports. Unfortunately data on the number of people using a program is extremely hard to come by (I only know of one study that tried to estimate number of users); studies of Java have also found that around 40% of reported faults are requests for enhancement. Most fault report data is useless for the model building purposes to which it is put.

Two things caught my eye, and I did not delete the pdf. The authors have done good work in the past, and they were using a zero-truncated negative binomial distribution; I thought I was the only person using zero-truncated negative binomial distributions to analyze software engineering data. My data analysis alter-ego was intrigued.

Spending a bit more time on the paper confirmed my original view, it’s conclusions were not believable. The authors had done a lot of work, this was no paper written over a long weekend, but lots of silly mistakes had been made.

Lots of nonsense software engineering papers get published, nothing to write home about. Everybody gets writes a nonsense paper at some point in their career, hopefully they get caught by reviewers and are not published (the statistical analysis in this paper was probably above the level familiar to most software engineering reviewers). So, move along.

At the start of this year, the paper: On the Impact of Programming Languages on Code Quality: A Reproduction Study appeared, published in TOPLAS (the first was in CACM, both journals of the ACM).

This replication paper gave a detailed analysis of the mistakes in data extraction, and the sloppy data analyse performed in the original work. Large chunks of the first study were cut to pieces (finding many more issues than I did, but not pointing out the missing usage data). Reading this paper now, in more detail, I found it a careful, well argued, solid piece of work.

This publication is an interesting event. Replications are rare in software engineering, and this is the first time I have seen a take-down (of an empirical paper) like this published in a major journal. Ok, there have been previous published disagreements, but this is machine learning nonsense.

The Papers We Love meetup group ran a mini-workshop over the summer, and Jan Vitek gave a talk on the replication work (unfortunately a problem with the AV system means the videos are not available on the Papers We Love YouTube channel). I asked Jan why they had gone to so much trouble writing up a replication, when they had plenty of other nonsense papers to choose from. His reasoning was that the conclusions from the original work were starting to be widely cited, i.e., new, incorrect, community-wide beliefs were being created. The finding from the original paper, that has been catching on, is that programs written in some languages are more/less likely to contain defects than programs written in other languages. What I think is actually being measured is number of users of the programs written in particular languages (a factor not present in the data).

Yesterday, the paper Rebuttal to Berger et al., TOPLAS 2019 appeared, along with a Medium post by two of the original authors.

The sequence: publication, replication, rebuttal is how science is supposed to work. Scientists disagree about published work and it all gets thrashed out in a series of published papers. I’m pleased to see this is starting to happen in software engineering, it shows that researchers care and are willing to spend time analyzing each others work (rather than publishing another paper on the latest trendy topic).

From time to time I had considered writing a post about the first two articles, but an independent analysis of the data meant some serious thinking, and I was not that keen (since I did not think the data went anywhere interesting).

In the academic world, reputation and citations are the currency. When one set of academics publishes a list of mistakes, errors, oversights, blunders, etc in the published work of another set of academics, both reputation and citations are on the line.

I have not read many academic rebuttals, but one recurring pattern has been a pointed literary style. The style of this Rebuttal paper is somewhat breezy and cheerful (the odd pointed phrase pops out every now and again), attempting to wave off what the authors call general agreement with some minor differences. I have had some trouble understanding how the rebuttal points discussed are related to the problems highlighted in the replication paper. The tone of the medium post is that there is nothing to see here, let’s all move on and be friends.

An academic’s work is judged by the number of citations it has received. Citations are used to help decide whether someone should be promoted, or awarded a grant. As I write this post, Google Scholar listed 234 citations to the original paper (which is a lot, most papers have one or none). The abstract of the Rebuttal paper ends with “…and our paper is eminently citable.”

The claimed “Point-by-Point Rebuttal” takes the form of nine alleged claims made by the replication authors. In four cases the Claim paragraph ends with: “Hence the results may be wrong!”, in two cases with: “Hence, FSE14 and CACM17 can’t be right.” (these are references to the original conference and journal papers, respectively), and once with: “Thus, other problems may exist!”

The rebuttal points have a tenuous connection to the major issues raised by the replication paper, and many of them are trivial issues (compared to the real issues raised).

Summary bullet points (six of them) at the start of the Rebuttal discuss issues not covered by the rebuttal points. My favourite is the objection bullet point claiming a preference, in the replication, for the use of the Bonferroni correction rather than FDR (False Discovery Rate). The original analysis failed to use either technique, when it should have used one or the other, a serious oversight; the replication is careful and does the analysis using both.

I would be very surprised if the Rebuttal paper, in its current form, gets published in any serious journal; it’s currently on a preprint server. It is not a serious piece of work.

Somebody who has only read the Rebuttal paper would take away a strong impression that the criticisms in the replication paper were trivial, and that the paper was not a serious piece of work.

What happens next? Will the ACM appoint a committee of the great and the good to decide whether the CACM article should be retracted? We are not talking about fraud or deception, but a bunch of silly mistakes that invalidate the claimed findings. Researchers are supposed to care about the integrity of published work, but will anybody be willing to invest the effort needed to get this paper retracted? The authors will not want to give up those 234, and counting, citations.

Update

The replication authors have been quick off the mark and posted a rebuttal of the Rebuttal.

The rebuttal of the Rebuttal has been written in the style that rebuttals are supposed to be written in, i.e., a point by point analysis of the issues raised.

Now what? I have no idea.

November 19, 2019

Gustaf Erikson (gerikson)

For the Soul of France: Culture Wars in the Age of Dreyfus by Frederick Brown November 19, 2019 06:13 PM

This is an excellent and entertaining view of the war between Republicans and their opponents in the years between 1870 and World War 1. It reminds the reader of the virulent anti-Semitism of French discourse at the time.

As an example, Lt. Col. Henry was instrumental in framing Alfred Dreyfus. He literally forged evidence to “prove” Dreyfus’ guilt. When he was arrested and committed suicide in prison, he was hailed a hero. A subscription was started to finance a lawsuit bought by his wife against Joseph Reinach for libel. A journalist collected the testimonials in a book, and the statements from that book, excerpted in a footnote, are among the most chilling in the entire book:

“From an antisemitic merchant in Boulogne-sur-Mer who hopes that the Hebes are blown away, above all Joseph Reinach, that unspeakable son-in-law and nephew of the Panama swindler one of whose victims I am.” “From a cook who would rejoice in roasting Yids in her oven.” “Long live Christ! Love live France! Long live the Army! A curate from a little very antisemitic village.” “One franc to pay for the cord that hangs Reinach.” “Joan of Arc, help us banish the new English.” “Two francs to buy a round of drinks for the troopers who will shoot Dreyfus, Reinach, and all the traitors.” A resident of Baccarat wanted “all the kikes” in the region—men, women, and children—thrown into the immense ovens of the famous crystal factory. Another contributor longed, prophetically, for the day that a “liberating boot” would appear over the horizon.

In these days where the ideas of sange et terre are making a resurgance, it’s instructive to look back on a time where the Right expressed itself in its true voice.

Jan van den Berg (j11g)

The Unicorn Project – Gene Kim November 19, 2019 04:06 PM

When I read The Phoenix Project last year, I was smitten. I loved the combination of using fiction to describe how to apply — management and DevOps — theory to true to life situations. So when the publisher asked if I wanted to review the follow-up, I didn’t hesitate. And I can safely say The Unicorn Project is just as much fun as its predecessor.

The Unicorn Project – Gene Kim (2019) – 406 pages

This fiction book takes place in the same universe as The Phoenix Project. Actually: the same company and even the same timeline. However the protagonist this time is female — Maxime Chambers — which is a welcome change.

The Unicorn Project builds a fictionalised business war story around DevOps theory.

A large incumbent auto parts manufacturer struggles to keep up with changing markets. And the hero of the story is given a thankless project and is told to keep her head down. But she doesn’t! She assembles a team and guided by the mysterious Erik Reid and his Five Ideals (and later the Three Horizons) she sets out to change the course of her project — and subsequently the company!

The reason why I love it so much, is because even though it is fiction the described situations are just too real. I know them all too well. And The Unicorn Project provides insight for how to deal with these situations. It does so by applying the Five Ideals.

Five Ideals

The main plot is built around seeing the application of these ideals unfold and their beneficial consequences.

Gene Kim has defined the following Five Ideals:

  • The First Ideal is Locality and Simplicity
  • The Second Ideal is Focus, Flow, and Joy
  • The Third Ideal is Improvement of Daily Work
  • The Fourth Ideal is Psychological Safety
  • The Fifth Ideal is Customer Focus

I will not go into detail about the definition and application of the Five Ideals — you should read the book! — but you can take an educated guess from their description what they mean.

And here is the author himself, giving some background information:

I’ve identified values and principles I call the Five Ideals to frame today’s most important IT challenges impacting engineering and business. … My main objective is to confirm the importance of the DevOps movement as a better way of working, and delivering better value, sooner, safer, and happier. I do this by addressing what I call the invisible structures, the architecture, needed to enable developers’ productivity and to scale DevOps across large organizations.

Gene Kim

Plot and references

The Unicorn Project is on the intersection of these things. Which make it pretty unique.

Saying there is a happy ending, is probably not a spoiler. The plot is the vehicle for carrying and embodying the DevOps concepts. My only critique is that the book is so chuck full of management and pop cultural references that their application to the story sometimes feel contrived. But not to a fault.

(And apart from that, I kept expecting the CISO to show up and complicating things. But he didn’t?)

I really love the References in the back of the book. I’m a sucker for references and further reading. And I now have a large YouTube playlist with talks about concepts in the book.

Functional programming

Of all the references and concepts mentioned, one in particular popped out: the author’s clear love for functional programming, and Clojure in particular. So I did some digging and sure enough, Gene Kim loves Clojure!

Here he is explaining his love for Clojure:

Bonus: the references from chapter 7 point to a talk by Rich Hickey (creator of Clojure), which is just a phenomenal talk.

Reading The Unicorn Project, will leave you smarter and energized to take on challenges you or your company might face. Go read it!

The post The Unicorn Project – Gene Kim appeared first on Jan van den Berg.