Planet Crustaceans

This is a Planet instance for community feeds. To add/update an entry or otherwise improve things, fork this repo.

April 18, 2019

Derek Jones (derek-jones)

OSI licenses: number and survival April 18, 2019 12:23 AM

There is a lot of source code available which is said to be open source. One definition of open source is software that has an associated open source license. Along with promoting open source, the Open Source Initiative (OSI) has a rigorous review process for open source licenses (so they say, I have no expertise in this area), and have become the major licensing brand in this area.

Analyzing the use of licenses in source files and packages has become a niche research topic. The majority of source files don’t contain any license information, and, depending on language, many packages don’t include a license either (see Understanding the Usage, Impact, and Adoption of Non-OSI Approved Licenses). There is some evolution in license usage, i.e., changes of license terms.

I knew that a fair-few open source licenses had been created, but how many, and how long have they been in use?

I don’t know of any other work in this area, and the fastest way to get lots of information on open source licenses was to scrape the brand leader’s licensing page, using the Wayback Machine to obtain historical data. Starting in mid-2007, the OSI licensing page kept to a fixed format, making automatic extraction possible (via an awk script); there were few pages archived for 2000, 2001, and 2002, and no pages available for 2003, 2004, or 2005 (if you have any OSI license lists for these years, please send me a copy).

What do I now know?

Over the years OSI have listed 110107 different open source licenses, and currently lists 81. The actual number of license names listed, since 2000, is 205; the ‘extra’ licenses are the result of naming differences, such as the use of dashes, inclusion of a bracketed acronym (or not), license vs License, etc.

Below is the Kaplan-Meier survival curve (with 95% confidence intervals) of licenses listed on the OSI licensing page (code+data):

Survival curve of OSI licenses.

How many license proposals have been submitted for review, but not been approved by OSI?

Patrick Masson, from the OSI, kindly replied to my query on number of license submissions. OSI doesn’t maintain a count, and what counts as a submission might be difficult to determine (OSI recently changed the review process to give a definitive rejection; they have also started providing a monthly review status). If any reader is keen, there is an archive of mailing list discussions on license submissions; trawling these would make a good thesis project :-)

April 14, 2019

Derek Jones (derek-jones)

The Algorithmic Accountability Act of 2019 April 14, 2019 08:00 PM

The Algorithmic Accountability Act of 2019 has been introduced to the US congress for consideration.

The Act applies to “person, partnership, or corporation” with “greater than $50,000,000 … annual gross receipts”, or “possesses or controls personal information on more than— 1,000,000 consumers; or 1,000,000 consumer devices;”.

What does this Act have to say?

(1) AUTOMATED DECISION SYSTEM.—The term ‘‘automated decision system’’ means a computational process, including one derived from machine learning, statistics, or other data processing or artificial intelligence techniques, that makes a decision or facilitates human decision making, that impacts consumers.

That is all encompassing.

The following is what the Act is really all about, i.e., impact assessment.

(2) AUTOMATED DECISION SYSTEM IMPACT ASSESSMENT.—The term ‘‘automated decision system impact assessment’’ means a study evaluating an automated decision system and the automated decision system’s development process, including the design and training data of the automated decision system, for impacts on accuracy, fairness, bias, discrimination, privacy, and security that includes, at a minimum—

I think there is a typo in the following: “training, data” -> “training data”

(A) a detailed description of the automated decision system, its design, its training, data, and its purpose;

How many words are there in a “detailed description of the automated decision system”, and I’m guessing the wording has to be something a consumer might be expected to understand. It would take a book to describe most systems, but I suspect that a page or two is what the Act’s proposers have in mind.

(B) an assessment of the relative benefits and costs of the automated decision system in light of its purpose, taking into account relevant factors, including—

Whose “benefits and costs”? Is the Act requiring that companies do a cost benefit analysis of their own projects? What are the benefits to the customer, compared to a company not using such a computerized approach? The main one I can think of is that the customer gets offered a service that would probably be too expensive to offer if the analysis was done manually.

The potential costs to the customer are listed next:

(i) data minimization practices;

(ii) the duration for which personal information and the results of the automated decision system are stored;

(iii) what information about the automated decision system is available to consumers;

This act seems to be more about issues around data retention, privacy, and customers having the right to find out what data companies have about them

(iv) the extent to which consumers have access to the results of the automated decision system and may correct or object to its results; and

(v) the recipients of the results of the automated decision system;

What might the results be? Yes/No, on a load/job application decision, product recommendations are a few.

Some more potential costs to the customer:

(C) an assessment of the risks posed by the automated decision system to the privacy or security of personal information of consumers and the risks that the automated decision system may result in or contribute to inaccurate, unfair, biased, or discriminatory decisions impacting consumers; and

What is an “unfair” or “biased” decision? Machine learning finds patterns in data; when is a pattern in data considered to be unfair or biased?

In the UK, the sex discrimination act has resulted in car insurance companies not being able to offer women cheaper insurance than men (because women have less costly accidents). So the application form does not contain a gender question. But the applicants first name often provides a big clue, as to their gender. So a similar Act in the UK would require that computer-based insurance quote generation systems did not make use of information on the applicant’s first name. There is other, less reliable, information that could be used to estimate gender, e.g., height, plays sport, etc.

Lots of very hard questions to be answered here.

Ponylang (SeanTAllen)

Last Week in Pony - April 14, 2019 April 14, 2019 01:58 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

April 13, 2019

Carlos Fenollosa (carlesfe)

I miss Facebook, and I'm not ashamed to admit it April 13, 2019 05:25 PM

I'm 35. Before Facebook, I had to use different tools depending on whom I wanted to chat with.

I'm not talking about the early era of the Internet, but rather the period after everybody started getting online. Chat was just getting popular, but it was quite limited.

We used ICQ/MSN Messenger to chat with real life friends. IRC was used mostly for "internet friends", as we called them back then. Finally, we had the Usenet and forums for open discussion with everybody else.

If you wanted to post pictures, Flickr was the go-to website. We didn't share many videos, and there was no really good tool to do so, so we didn't care much.

There was Myspace, and Fotolog, very preliminar social networks which had their chance but simply didn't "get it."

Then Facebook appeared. And it was a big deal.

Add me on Facebook

Whenever you met somebody IRL you would add them to Facebook almost immediately, and keep connected through it.

Suddenly, everybody you knew and everybody you wanted to know was on Facebook, and you could reach all of them, or any of them, quickly and easily.

At that time, privacy was not such a big concern. We kinda trusted the network, and furthermore, our parents and potential employers weren't there.

On Facebook, we were raw.

At some point it all went south. The generational change, privacy breaches, mobile-first apps and the mass adoption of image and video moved everybody to alternative platforms. Whatsapp, mainly for private communications, and Instagram as our facade.

I wrote about Facebook's demise so I will not go through the reasons here. Suffice to say, we all know what happened.

The Wall was replaced by an algorithm which sunk original content below a flood of ads, fake news, and externally shared content "you might like". We stopped seeing original content. Then, people stopped sharing personal stuff, as nobody interacted with it.

In the end, we just got fed up with the changes, and maybe some people just wanted something shiny and new, or something easier to use.

Facebook was a product of its era, technologically and socially. But, as a service, it was peak human connection. Damn you Zuck, you connected mankind with a superb tool, then let it slip through your fingers. What a tragic outcome.

Current social networks, not the same thing

I, too, moved to Instagram when friends stopped being active on Facebook and encouraged me to create an account there.

Then I realized how fake it is. Sorry for the cliché, but we all know it's true.

I gave it an honest try. I really wanted to like it. But I just couldn't. At least, not as an alternative to Facebook. Stories were a step forward, but I felt —maybe rightfully— that I was being gamed to increase my engagement, not to have access to my friends content.

Instagram is a very different beast. There is no spontaneity; all posts are carefully selected images, masterfully filtered and edited, showcasing only the most successful of your daily highlights.

I admit it's very useful to connect with strangers, but the downside is that you can't connect with friends the same way you did on Facebook.

Of course, I'm not shooting the messenger, but let me apportion a bit of blame. A service that is a picture-first sharing site and demotes text and comments to an afterthought makes itself really difficult to consider as an honest two-way communication tool.

Instagram is designed to be used as it is actually used: as a posturing tool.

On Facebook you could share a moment with friends. With Instagram, however, moments are projected at you.

I miss Facebook

I miss knowing how my online friends are really doing these days. Being able to go through their life, their personal updates, the ups and the downs.

I miss spontaneous updates at 3 am, last-minute party invites, making good friends with people who I just met once in person and now live thousands of kilometers away.

I miss going through profiles of people to learn what kind of music and movies they liked, and feeling this serendipitous connection based on shared interests with someone I did not know that well in real life.

I miss the opportunity of sharing a lighthearted comment with hundreds of people that understand me and will interpret it in the most candid way, instead of the nitpicking and criticism of Twitter.

I miss the ability to tell something to my friends without the need of sharing a picture, the first-class citizen treatment of text.

I miss the degree of casual social interaction that Facebook encouraged, where it was fine to engage with people sporadically. On the contrary, getting a comment or a Like from a random acquaintance could make your day.

I miss when things online were more real, more open.

I miss peak Facebook; not just the tool, but the community it created.

Facebook was the right tool at the right time

Somebody might argue that, for those people I am not in touch anymore, they were clearly not such big friends. After all, I still talk to my real-life friends and share funny pics via Whatsapp.

Well, those critics are right; they were not so important in my life as to keep regular contact. But they still held a place in there, and I would have loved to still talk to them. And the only socially acceptable way to keep in touch with those acquaintances was through occasional contact via Facebook. I've heard the condescending "pick up the phone and call them"; we all know that's not how it works.

In the end, nobody is in a position to judge how people enjoy their online tools. If users prefer expressing themselves with pictures rather than text, so be it. There is nothing wrong with fishing for Likes.

So please don't misinterpret me, nobody is really at fault. There was no evil plan to move people from one network to another. No one forced friends to stop posting thoughts and post only pics. Instagram just facilitated a new communication channel that people happened to like more than the previous one.

When Facebook Inc. started sensing its own downfall, they were happy to let its homonymous service be cannibalized by Instagram. It's how business works. The time of Facebook had passed.

I'm sorry I can't provide any interesting conclusion to this article. There was no real intent besides feeling nostalgic for a tool and community that probably won't come back, and hopefully connecting with random strangers that might share the same sentiment.

Maybe, as we all get older, we just want to enjoy what's nice of life, make everybody else a little bit jealous, and avoid pointless online discussions. We'd rather shut up, be more careful, and restrict our online interactions to non-rebuttable pictures of our life.

We all, however, lost precious connections on the way.

Tags: life, internet, facebook, web

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

April 12, 2019

Bogdan Popa (bogdan)

The Problem with SSH Agent Forwarding April 12, 2019 11:00 AM

After hacking the website today, the attacker opened a series of GitHub issues mentioning the flaws he discovered. In one of those issues, he mentions that “complete compromise could have been avoided if developers were prohibited from using [SSH agent forwarding].”

April 10, 2019

Átila on Code (atilaneves)

Boden, Flutter, React Native? April 10, 2019 01:47 PM

A while back I had an idea for a mobile app that I still might write. The main problems were finding the time and analysis paralysis: what do I write the project in? I’ve done some Android development before using Qt, and it was a hassle. I’ve been told that JNI was made painful on […]

April 08, 2019

Tobias Pfeiffer (PragTob)

Revisiting “Tail Call Optimization in Elixir & Erlang” with benchee 1.0 April 08, 2019 03:00 PM

All the way back in June 2016 I wrote a well received blog post about tail call optimization in Elixir and Erlang. It was probably the first time I really showed off my benchmarking library benchee, it was just a couple of days after the 0.2.0 release of benchee after all. Tools should get better […]

Gustaf Erikson (gerikson)

The Gun by C. J. Chivers April 08, 2019 07:29 AM

A technical, social and political history of the AK-47 assault rifle and derivatives.

Chivers does a good job tying the design of the gun into Soviet defense policy, and compares the development of the weapon favorably compared to the US introduction of the M16.

The author explores the issues with the massive proliferation of these assault rifles worldwide, but he seems to have a blind spot for the similar proliferation of semi-automatic weapons with large magazine sizes in the US. He has faith that the situations that lead to the widespread uses of the AK-47 will never occur in the USA.

Gergely Nagy (algernon)

On Git workflows April 08, 2019 06:45 AM

To make things clear, I'll start this post with a strongly held opinion: E-mail based git workflows are almost always stupid, because in the vast majority of cases, there exists a more reliable, more convenient, easier to work with workflow, which usually requires less setup and even less sacrificial lambs. I've tried to explain this a few times on various media, but none of those provide the tools for me to properly do so, hence this blog post was born, so I can point people to it instead of trying to explain it - briefly - over and over again. I wrote about mailing lists vs GitHub before, but that was more of an anti-GitHub rebuttal than a case against an e-mail workflow.

I originally wanted to write a long explanation comparing various workflows: A Forge's web UI vs Forge with loose IDE integration vs Forge with tight IDE integration vs E-mail-based variations. However, during this process I realised I don't need to go that far, I can just highlight the shortcomings of e-mail with a few examples, and then show a glimpse into the power a Forge can give us.

One of the reasons I most often hear in support of an e-mail-based workflow is that git ships with built-in tools for collaborating over e-mail. It does not. It ships with tools to send email, and tools to process e-mail sent by itself. There's no built-in tool to bridge the two, it is entirely up to you to do so. Collaboration is not about sending patches into the void. Collaboration includes receiving feedback, incorporating it, change, iteration. Git core does not provide tools for those, only some low-level bits upon which you can build your own workflow.

Git is also incredibly opinionated about how you should work with e-mail: one mail per commit, nicely threaded, patches inline. But that's not the only way to have an e-mail based workflow: PostgreSQL for example uses attachments and multiple commits attached to the same e-mail. This isn't supported by core git tools, even though it solves a whole lot of problems with the one inline patch per commit method - more about that a few paragraphs below.

So what's the problem with e-mail? First of all, in this day and age, delivery is not reliable. This might come as a surprise to proponents of the method, but despite the SMTP protocol being resilient, it is not reliable. It will keep retrying if it gets a temporary failure, yes. But that's about the only thing it guarantees, that it keeps trying. Once we add spam filters, greylisting, and a whole lot of other checks one needs in 2019 to not drown in junk, there's a reasonable chance that something will, at some point, go horribly wrong. Let me describe a few examples from personal experience!

At one time, I sent a 10-commit patch series to a mailing list. I wasn't subscribed at the time, and the mailing list software silently dropped every mail: on the SMTP level, it appeared accepted, but it never made to the list. I had no insight into why, and had to contact a list admin to figure it out. Was it a badly configured server? Perhaps, or perhaps not. Silently dropping junk makes sense if you don't want to let the sender know that you know they're sending junk. Sometimes there are false positives, which sucks, but the administrators made this trade-off, who am I to disagree? Subscribing and resending worked flawlessly, but this introduced days of delay and plenty of extra work for both me and the list admins. Not a great experience. I could have read more about the contribution process and subscribe in advance, but as this was a one-off contribution, subscribing to the list (a relatively high-volume one) felt like inviting a whole lot of noise for no good reason. Having to subscribe to a list to meaningfully contribute is also a big barrier: not everyone's versed in efficiently handling higher volumes of e-mail (nor do people need to be).

Another time, back when greylisting was new, I had some of my patches delayed for hours. This isn't a particularly big deal, as I'm in no rush. It becomes a big deal when patches start arriving out of order, sometimes with hours between them because I didn't involve enough sacrificial lambs to please the SMTP gods. When the first feedback you get is "where's the first patch?", even though you sent it, that's not a great experience. I've even had a case where a part of the commit thread was rejected by the list, another part went through. What do you do in this case? You can't just resend the rejected parts unchanged. If you change them to please the list software, that pretty much invalidates the parts that did get through - and nothing guarantees that they'll all get through this time, either.

In all of these cases, I had no control. I didn't set the mailing lists up, I didn't configure their SMTP servers. I did everything by the book, and yet...

From another point of view, as a reviewer, receiving dozens of mail in a thread for review isn't as easy to work with as one would like. For example, if I want to send feedback on two different - but related - commits, then I have to either send two mail, as replies to the relevant commits, or merge the two patches in one email for the purpose of replying. In the second case, it's far too easy to loose track of what's where and why.

With these in mind, I'm sorry to say, but e-mail is not reliable. E-mail delivery is not reliable. It is resilient, but not reliable (see above). The contents of an e-mail are fragile, change the subject, and git am becomes unhappy. You want to avoid bad MUAs screwing up patches? Attach them! Except the default git tooling can't deal with that. There are so many things that can go wrong, it's not even funny. Many of those things, you won't know until hours, or days later. That's not a reliable way to work.

Sending patches as attachments, in a single mail, solves most of these problems: if it gets rejected, all gets rejected. If it gets delayed, the whole thing gets delayed. Patches never arrive out of order and with delays. Reviewing multiple commits becomes easier too, because all of them are available at hand, without having to build tooling to make them available. But patches as attachments aren't supported by core git tools. Even in this case, there's plenty more you can't easily do, becaise there's something that patches lack: plenty of meta-information.

You can't easily see more context than what the patch file provides. You can, if you apply the patchset and look at the tree, but that's not something the default tools provide out of the box. It's not hard to do that, not hard to automate it, but it doesn't come out of the box. To navigate the source code at any given time of its history, you have to apply the patches too. There are plenty of other things where one wants more information than what is available in a patch.

But I said in the opening paragraph that:

there exists a more reliable, more convenient, easier to work with workflow, which usually requires less setup and even less sacrificial lambs

So what is this magical, more reliable, more convenient, easier to work with workflow? Forges. Forges like GitHub, GitLab, Gitea, and so on. You may have been led to believe that you need a browser for these, that a workflow involving a forge cannot be done without switching to a browser at some point. This is true: you will usually need a browser to register. However, from that point onwards, you do not, because all of these forges provide powerful APIs. Powerful APIs that are much easier to build good tooling upon than email. Why? Because these APIs are purpose-built, their reason for existence is to allow tooling to be built upon them. That's their job. When you have purpose-built tools, those will be easier to work with that something as generic and lax as e-mail. With this comes that Forges do most of the integration required for our workflow. We only have to build one bridge: one between the API, and our IDE of choice.

As an example, lets look at magit/forge, an Emacs package that integrates forge support into Magit (the best git UI out there, ever, by far)!

Forge overview

We see pull requests, issues, recent commits, and the current branch in one place. Want to look at an issue? Navigate there, press enter, and voila:

Viewing an issue with Forge

Easy access to the whole history of the issue. You can easily quote, reply, tag, whatever you wish. From the comfort of your IDE.

Pull-requests? Same thing, navigate there, press enter:

Viewing a pull request with Forge

You have easy access to all discussions, all the commits, all the trees, from the comfort of your IDE. You do not need to switch to another application, with different key bindings, slightly different UX. You do not need to switch to a browser, either. You can do everything from the comfort of your Integrated Development Environment. And this, dear reader, is awesome.

The real power of the forges is not that they provide a superior user experience out of the box - they kinda do anyway, since you only have to register and you're good to go. No need to care about SMTP, formatting patches, switching between applications and all that nonsense. The web UI is quite usable for a newcomer. For a power-user - no; using a browser for development would be silly (alas, poor people stuck using Atom, VS Code or Chromebooks). Thankfully, we do not have to, because all of the forges provide APIs, and many IDEs also provide various levels of integration.

But what these Forges provide are not just easy access to issues, pull-requests and commits at one's fingertips. They provide so much more! You see, with a tightly integrated solution, if you want to expand the context of a patch, you can: it's already right there, a single shortcut away. You can easily link to parts of the patchset, or the code, and since they'll be links, everyone reading it will have an easy, straightforward way to navigate there. You can reference issues, pull-requests from commit messages, other issues or pull-requests - and they'll be easy to navigate to, out of the box. A forge binds the building blocks together, to give us an integrated solution out of the box.

Forges make the boring, tedious things invisible. They're not exclusive owners of the code either: you can always drop down to the CLI and use low-level git commands if need be. This is what computers are meant to do: help us be more efficient, make our jobs more convenient, our lives easier. Thankfully, we have Forges like GitLab, Gitea and others, that are open source. We aren't even forced to trust our code, meta-data and workflows to proprietary systems.

However, forges aren't always a good fit. There are communities that wouldn't work well with a Forge. That's ok too. But in the vast majority of cases, a forge will make the life of contributors, maintainers and users easier. So unless you're the Linux kernel, don't try to emulate them.

Pete Corey (petecorey)

FizzBuzz is Just a Three Against Five Polyrhythm April 08, 2019 12:00 AM

Congratulations, you’re now the drummer in my band. Unfortunately, we don’t have any drums, so you’ll have to make due by snapping your fingers. Your first task, as my newly appointed drummer, is to play a steady beat with your left hand while I lay down some tasty licks on lead guitar.

Great! Now let’s add some spice to this dish. In the span of time it takes you to snap three times with your left hand, I want you to lay down five evenly spaced snaps with your right. You probably already know this as the drummer in our band, but this is called a polyrhythm.

Sound easy? Cool, give it a try!

Hmm, I guess being a drummer is harder than I thought. Let’s take a different approach. This time, just start counting up from one. Every time you land on a number divisible by three, snap your left hand. Every time you land on a number divisible by five, snap your right hand. If you land on a number divisible by both three and five, snap both hands.


You’ll notice that fifteen is the first number we hit that requires we snap both hands. After that, the snapping pattern repeats. Congratulations, you don’t even have that much to memorize!

Here, maybe it’ll help if I draw things out for you. Every character represents a tick of our count. "ı" represents a snap of our left hand, ":" represents a snap of our right hand, and "i" represents a snap of both hands simultaneously.

But man, I don’t want to have to manually draw out a new chart for you every time I come up with a sick new beat. Let’s write some code that does it for us!

_.chain(_.range(1, 15 + 1))
    .map(i => {
        if (i % 3 === 0 && i % 5 === 0) {
            return "i";
        } else if (i % 3 === 0) {
            return "ı";
        } else if (i % 5 === 0) {
            return ":";
        } else {
            return ".";

Here’s the printout for the “three against five” polyrhythm I need you to play:


But wait, this looks familiar. It’s FizzBuzz! Instead of printing "Fizz" for our multiples of three, we’re printing "i", and instead of printing "Buzz" for our multiples of five, we’re printing "ı".

FizzBuzz is just a three against five polyrhythm.

We could even generalize our code to produce charts for any kind of polyrhythm:

const polyrhythm = (pulse, counterpulse) =>
    _.chain(_.range(1, pulse * counterpulse + 1))
    .map(i => {
            if (i % pulse === 0 && i % counterpulse === 0) {
                return "i";
            } else if (i % pulse === 0) {
                return "ı";
            } else if (i % counterpulse === 0) {
                return ":";
            } else {
                return ".";

And while we’re at it, we could drop this into a React project and create a little tool that does all the hard work for us:

Anyways, we should get back to work. We have a Junior Developer interview lined up for this afternoon. Maybe we should have them play us a polyrhythm to gauge their programming abilities?

April 07, 2019

Bogdan Popa (bogdan)

Continuations for Web Development April 07, 2019 02:00 PM

One of the distinguishing features of Racket’s built-in web-server is that it supports the use of continuations in a web context. This is a feature I’ve only ever seen in Smalltalk’s Seaside before, though Racket’s version is more powerful.

April 05, 2019

Simon Zelazny (pzel)

Uses for traits vs type aliases in Ponylang April 05, 2019 10:00 PM

I realized today that while both traits and type aliases can be used to represent a union of types in Pony, each of these solutions has some characteristics which make sense in different circumstances.

Traits: Keeping an interface open

Let's say you have the trait UUIDable.

trait UUIDable
  fun uuid(): UUID

and you have a method that accepts object implementing said trait.

  fun register_uuid(thing: UUIDable): RegistrationResult =>

Delcaring the function parameter type like this means that any thing that has been declared to implement the trait UUIDable will be a valid argument. Inside the method body, we can only call .uuid() on the thing, because that's the only method specified by the trait.

We can take an instance of the class class User is UUIDable, and pass it to register_uuid. When we continue development and add class Invoice is UUIDable, no change in any code is required for register_uuid to also accept this new class. In fact, we are free to add as many UUIDable classes to our codebase, and they'll all work without any changes to register_uuid.

This approach is great when we just want to define a typed contract for our methods. However, it does not work when we want to explicitly break encapsulation and – for example – match on the type of the thing.

fun register_uuid(thing: UUIDable): RegistrationResult =>
  match thing
  | let u: User => _register_user(u)
  | let i: Invoice => _register_invoice(i)

The compiler will complain about this method definition, because it can't know that User and Invoice are the only exising types that satisfy UUIDable. For the compiler, this now means that any UUIDable thing that is not a User or an Invoice will fall through the match, and so the resulting output type must also include None, which represents the 'missed' case in our match statment.

We know that the above match is indeed exhaustive. Users and Invoices will be the only types that satisfy UUIDable. How can we let the compiler know?

Type aliases: explicit and complete enumerations

If we want to break encapsulation, and are interested in an exhaustive and explicit union type, then a type alias gives the compiler enough info to determine that the match statement is in fact exhaustive:

type UUIDable is (User | Invoice)
fun register_uuid(thing: UUIDable): RegistrationResult =>
  match thing
  | let u: User => _register_user(u)
  | let i: Invoice => _register_invoice(i)

Different situations will call for different approaches. The type alias approach means that anytime you add a new UUIDable, you'll have to redefine this alias, and have to go through all your match statements and add a new case. The silver lining is that the compiler will tell you which places you need to modify.

Also, note that you can still call thing.uuid() and have it type-check, as the compiler can determine that all classes belonging to (User | Invoice) actually provide this method.

Encapsulation vs. exhaustiveness

Using traits (or interfaces for even more 'looseness') means that, in the large, your code will have to conform to the OOO practices of loose coupling, information hiding, and encapsulation.

Using union types defined as type aliases means that encapsulation is no longer possible, but the compiler will guide you in making sure that matches are exhaustive when you take apart function arguments. This results in the code looking more 'functional' in the large.

You can play around with this code in the Pony playground.

Siddhant Goel (siddhantgoel)

The "Hacker News" Effect April 05, 2019 10:00 PM

I submitted Developer to Manager to Hacker News on 25th March. The common/accepted knowledge is that there's a ton of variation on what makes a post hit the front page. Having absolutely no knowledge about those variations, I submitted the post regardless, not thinking much about it. What happened next was something I wasn't expecting at all.

The post ended up trending on the front page for one full day, gathering slightly more than 500 upvotes in the process, resulting in a cool 50,000 page views over the next 2 days, and moving the site's Alexa rank by about 2 million places (up). It was quite an experience to see the number of concurrent visitors to the site jump beyond 300.

The site is completely static and hosted on Netlify, so I wasn't too worried about all that traffic taking things down, but that's besides the point.

Fathom Analytics

I think the main takeaway for me, personally, was that it made me realize that I was not the only one wishing for something like this to exist. I started working on this project because I really felt the need for a resource like this, which could help me with my own transition to a slightly more managerial role. Reading the comments on hacker news, and the tons of encouraging emails that people sent me, made it clear that there are plenty of other developers who are transitioning to management and looking for resource to assist with the transition. Solving personal problems is always good, but it's even better when others validate it.

So everyone who upvoted the HN post, left an encouraging comment, sent me an email, volunteered for an interview for the site, or helped in any other way - a huge thank you! I'll be using all that motivation to make Developer to Manager the site you open when you want to know how to become a good engineering manager.

April 03, 2019

Gokberk Yaltirakli (gkbrk)

Plaintext budgeting April 03, 2019 08:26 PM

For the past ~6 months, I’ve been using an Android application to keep track of my daily spending. To my annoyance, I found out that the app doesn’t have an export functionality. I didn’t want to invest more time in a platform that I couldn’t get my data out of, so I started looking for another solution.

I’ve looked into budgeting systems before, and I’ve seen both command-line (ledger) and GUI systems (GNUCash). Now; both of these are great software, and I can appreciate how Double-entry bookkeeping is a useful thing for accounting purposes. But while they are powerful, they’re not as simple as they could be.

I decided to go with CSV files. CSV is one of the most universal file formats, it’s simple and obvious. I can process it with pretty much every programming language and import it to pretty much every spreadsheet software. Or… I could use a shell script to run calculations with SQLite.

If I ever want to migrate to another system; it will probably be possible to convert this file with a shell script, or even a sed command.

I create monthly CSV files in order to keep everything nice and tidy, but the script adapts to everything from a single CSV file to one file for each day/hour/minute.

Here’s what an example file looks like:

2019-04-03,2.75,EUR,Transport,Bus to work

And here’s the script:



cat *.csv | sed '/^Date/d' > combined.csv.temp

output=$(sqlite3 <<EOF
create table Transactions(Date, Amount, Currency, Category, Description);
.mode csv
.import combined.csv.temp Transactions
.mode list

select 'Amount spent today:',
coalesce(sum(Amount), 0) from Transactions where Date = '$(date +%Y-%m-%d)';

select '';
select 'Last $days days average:',
sum(Amount)/$days, Currency from Transactions where Date > '$(date --date="-$days days" +%Y-%m-%d)'
group by Currency;

select '';
select 'Last $days days by category';
select '=======================';

select Category, sum(Amount) from Transactions
where Date > '$(date --date="-$days days" +%Y-%m-%d)'
group by Category order by sum(Amount) desc;

rm combined.csv.temp

echo "$output" | sed 's/|/ /g'

This is the output of the command

[leo@leo-arch budget]$ ./
Amount spent today: 8.46

Last 7 days average: 15.35 EUR

Last 7 days by category
Groceries 41.09
Transport 35.06
Food 31.35
[leo@leo-arch budget]$ ./ 5
Amount spent today: 8.46

Last 5 days average: 11.54 EUR

Last 5 days by category
Groceries 29.74
Transport 17.06
Food 10.9
[leo@leo-arch budget]$

Andrew Montalenti (amontalenti)

Shipping the Second System April 03, 2019 12:54 PM

In 2015-2016, the team embarked upon the task of re-envisioning its entire backend technology stack. The goal was to build upon the learnings of more than 2 years delivering real-time web content analytics, and use that knowledge to create the foundation for a scalable stream processing system that had built-in support for fault tolerance, data consistency, and query flexibility. Today in 2019, we’ve been running this new system successfully in production for over 2 years. Here’s what we learned about designing, building, shipping, and scaling the mythical “second system”.

The Second System Effect

But why re-design our existing system? This question lingered in our minds a few years back. After all, the first system was successful. And I had the lessons of Frederick Brooks accessible and nearby when I embarked on this project. He wrote in The Mythical Man-Month:

Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs.

When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a “big pile.”

Were we suffering from engineering hubris to redesign a working system? Perhaps. But we may have been suffering from something else altogether healthy — the paranoia of a high-growth software startup.

I discuss’s log-oriented architecture at Facebook’s HQ for PyData Silicon Valley, with’s VP of Engineering, Keith Bourgoin.

Our product had only just been commercialized. We were a team small enough to be nimble, but large enough to be dangerous. Yes, there were only a handful of engineers. But we were operating at the scale of billions of analytics events per day, on-track to serve hundreds of enterprise customers who required low-latency analytics over terabytes of production data. We knew that scale was not just a “temporary problem”. It was going to be the problem. It was going to be relentless.

Innovation Leaps

Innovation doesn’t come incrementally; it comes in waves. And in this industry, yesterday’s “feat of engineering brilliance” is today’s “commodity functionality”.

This is the environment I found myself in during 2014-2015: We had a successful system with a lot of demands from customers, but no way to satisfy those demands, because the system had been designed with a heap of trade-offs that reflected “life on the edge” in 2012-2014, working with large-scale analytics data.

Back then, we were eager adopters of tools like Hadoop Streaming, Apache Pig, ZeroMQ, and Redis, to handle both low-latency and large-batch analytics. We had already migrated away from these tools, refreshing our stack as the open source community shipped new usable tools.

Meanwhile, expectations had changed. Real-time was no longer innovative; it was expected. A cornucopia of nuanced metrics were no longer nice-to-haves; they were necessities for our customers.

It was no longer enough for our data to be “directionally correct” (our main goal during the “initial product/market fit” phase), but instead it needed to be enterprise-ready: verifiable, auditable, resilient, always-on. The web was changing, too. Assumptions we made about the kind of content our system was measuring shifted before our very eyes. Edge cases were becoming common cases. The exception was becoming the rule. Our data processing scale was already on track to 10x. We could see it going 100x. (It actually went 1000x.)

Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.

–Stewart Brand

So yes, we could have attempted an incremental improvement of the status quo. But doing so would have only resulted in an incremental step, not a giant leap. And incremental steps wouldn’t cut it.

Joel Spolsky famously said that rewriting a working system is something “you should never do”.

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong.

The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming: It’s harder to read code than to write it.

But what about Apple II vs Macintosh? Did Steve Jobs and his pirate team make a mistake in deciding to rethink everything from scratch — from the hardware to operating system to programming model — for their new product? Or was rethinking everything the price of admission for a high-growth tech company to continue to innovate in its space?

Big Rewrites are New Products

Perhaps we really have a problem of term definition.

When we think “rewrite”, we think “refactor”, as in change the codebase for a “zen garden” requirement, like code cleanliness or performance.

In software, we should admit that “big rewrites” aren’t about refactoring existing products — they are about building and shipping brand new products upon existing knowledge.

Perhaps what your existing product tells you is a core set of use cases which the new product must satisfy. But if your “rewrite” doesn’t support a whole set of new use cases, then it is probably doomed to be a waste of time.

If the Macintosh had shipped and was nothing more than a prettier, better-engineered Apple II, it would have been a failure. But the Macintosh represented a leap from command prompts to graphical user interfaces, and from keyboard-oriented control to the mouse. These two major innovations (among others) propelled not just the Apple’s products into the mainstream, but also generally fueled the personal computing revolution!

So, forget rewrites. Think, new product launches. We re-use code and know-how, yes, but also organizational experience.

We eliminate sunk cost thinking and charge ahead. We fly the pirate flag.

But, we still keep our wits about us.

If you are writing a new product as a “rewrite”, then you should expect it to require as much attention to detail as shipping the original product took, with the added downsides of legacy expectations (how the product used to work) and inflated future expectations (how big an impact the new product should have).

Toward the Second System

So, how did we march toward this new launch? How did we do a big rewrite in such a way that we lived to tell the tale?

System One Problems

In the case of, one thing we did right was to focus on the actual, not imagined or theoretical, problems. We identified a few major problems with the first version of our system, which was in production 2012-2015.

  • Data consistency problems due to pre-aggregation. Issues with data consistency and reliability were our number one support burden, and investigating them was our number one engineering distraction.
  • Did not track analytics at the visitor and URL level. As our customers moved their web content from the “standard article” model to a number of new content forms, such as interactive features, quizzes, embedded audio/video, cards/streams, landing pages, mobile apps, and the like, the unit of tracking we used in our product/market fit MVP (the “article” or “post”) started to feel increasingly limiting.
  • Only supported a single metric, the page view, or click. Our customers wanted to understand alternative metrics, segments, and source attributions — and we only supported this single metric throughout our databases and user interface. This metric was essentially a core assumption built throughout the system. That we only supported this one metric at first might seem surprising — after all, page views are a basic metric! But, that was precisely the point. During our initial product/market fit stage, we were trying to drain as much risk from the problem. We were focusing on the core value, and market differentiation. We weren’t trying to prove the value of new metrics. We were trying to prove the value of a new category of analytics, which we called “content analytics” — the merger of content understanding (what the content is about) and content engagement (how many people visited the content and where they came from). From a technical standpoint, our MVP addressed both of these issues while only supporting a single metric, along with various filters and groupings thereof. Note: in retrospect, this was a brilliant bit of scope reduction in our early days. This MVP got us to our first million in annual revenue, and let us see all sorts of real-world data from real customers. The initial revenue gave us enough runway to survive to a Series A financing round, as well.

There were other purely technical problems with the existing system, too:

  • Some components were single points of failure, such as our Redis real-time store.
  • Our system did not work easily with multi-data-center setups, thus making failover and high availability harder to provide.
  • Data was stored and queried in two places for real-time and historical queries (Redis vs MongoDB), making the client code that accessed the data complex.
  • Rebuilding the data from raw logs was not possible for real-time data and was very complex for historical data.
  • To support our popular API product, we had a completely separate codebase, which had to access and merge data from four different databases (Redis, MongoDB, Postgres, and Apache Solr).

System One Feats of Engineering

However, System One also did many things very well.

  • Real-time and historical queries were satisfied with very good latencies; often milliseconds for real-time and below 1-2 seconds for historical data.
  • Data collection was solid, tracking billions of page views per month and archiving them reliably in Amazon S3.
  • API latencies were serving a billion API requests per month and were very stable.
  • Despite not supporting multiple data centers, we did have a working high availability and failover story that had worked for us so far.

System One had also undergone its share of refactorings. It started its life as simple cronjobs, evolved into a Python and Celery/ZeroMQ system, and eventually into using Storm and Kafka. It had layered on “experimental support” for a couple of new metrics, namely visitors (via a clever implementation of HyperLogLog) and shares (via a clever share crawling subsystem). Both were proving to be popular data sources, though these metrics were only supported in a limited way, due to their experimental implementation. Throughout, System One’s data was being used to power rich dashboards with thousands of users per customer; CSV exports that drove decision-making; and APIs that powered sites the world over.

System Two Requirements

Based on all of this, we laid our requirements for System Two, both customer oriented and technical.

Customer requirements:

  • URL is basic unit of tracking; every URL is tracked.
  • Supports multiple sources, segments, and metrics.
  • Still supports page views and referrers.
  • Adds visitor-oriented understanding and automatic social share counting — unique visitor counts and social interaction counts across URLs, posts, and content categories. The need for this was proven by our experiments in System One.
  • Real-time queries for live updates.
  • 5-minute buckets for past 24 hours and week-ago benchmarking.
  • 1-day buckets for past 2 years.
  • Low latency queries, especially for real-time data.
  • Verifiably correct data.

Technical requirements:

  • Real-time processing can handle current firehose with ease — this started at 2k pixels per second, but 10x’ed to 20k pixels per second in 2016.
  • Batch processing can do rebuilds of customer data and daily periods.
  • The batch and real-time layers are simplified, with shared code among them in pure Python.
  • Databases are linearly and horizontally scalable with more hardware.
  • Data is persistent and highly available once stored.
  • Query of real-time and historical data uses a unified time series engine.
  • A common query language and client library is used across our dashboard and our API.
  • Room in model for adding fundamentally new metrics (e.g. engaged time, video starts) without rearchitecting the entire system.
  • Room in the model for adding new dimensions (e.g. URL-grouping campaign parameters, new data channels) without re-architecting the system or re-indexing production data.

Soft requirements that acted as a guiding light:

  • Backend codebase should be much smaller.
  • There should be fewer production data storage engines — ideally, one or two.
  • System should be easier to reason about; the distributed cluster topology should fit on a single slide.
  • Our frontend team should feel much more user interface innovation is possible atop the new query language for real-time and historical data.

And with that, we charged ahead!

The First Prototype

First prototypes are where engineering hubris is at its height. This is because we can invent various scenarios that allow us to verify our own dreams. It is hard to be the cold scientist who proves the experiment a failure in the name of truth. Instead, we are emotionally tied up in our inventions, and want them to succeed.

Put another way:

But sometimes, merely by believing we shall succeed, we can fashion a bespoke instrument of innovation that causes us to succeed.

In the First Prototype of this project, I wanted to prove two things:

  1. That we could have a simpler codebase.
  2. That the state-of-the-art in open source technology had moved far enough along to benefit us toward our concrete System Two goals.

To start things off, I created a little project for Cassandra and Elasticsearch experiments, which we code-named “casterisk”. I went as deep as I could to teach myself these two technologies. As part of the prototyping process, we also shared what we learned in technical deep-dive posts on Cassandra, Lucene, and Elasticsearch.

The First Prototype had data shaped like our data, but it wasn’t quite our data. It generated random but reasonable-looking customer traffic in a stream, and using the new tools available to me, I managed to restructure the data in myriad ways. Technically speaking, I now had Cassandra CQL tables representing log-structured data that could be scaled horizontally, and Elasticsearch indices representing aggregate records that could be queried across time and also scaled horizontally. The prototype was starting to look like a plausible system.

Some early time series data flowing through our “casterisk” prototype.

But then, it took about 3 months — from May to August — for the prototype to go from “R&D” to “pre-prod” stage. It wasn’t until August of 2015 that we published a post detailing all the new metrics supported in our new “beta” backend system. Why so long, given the early advancements?

Recruiting a Team

You would think as CTO of a startup that I don’t need to recruit a team for an innovative new project. I certainly thought that. But I was wrong.

You see, the status quo is a powerful drug. Upon the first reports of my experiments, my team met me with suspicion and doubt. This was not due to any fault of their own. Smart engineers should be skeptical of prototypes and proposed re-architectures. Only when prototypes survive the harshest and most skeptical scrutiny can they blossom into production systems.

Building Our Own Bike

Steve Jobs once famously said that computers are like a “bicycle for the mind”, because it’s a technology that lets us “move faster” than any of other species’ naturally-endowed speed.

Well, the creaky bike that seemed to be slowing us down in our usage of Apache Storm was an open source module called petrel. Now long unmaintained, at the time, it was the only way to run Python code on Apache Storm, which was how we reliably ran massively parallel real-time streaming jobs, overcoming Python’s global interpreter lock (GIL) and handling multi-node scale-out.

So, we built our own bike: streamparse. I discussed streamparse a bit at PyCon 2015, but the upshot is that it lets us run massively parallel stream processing jobs in pure Python atop Apache Storm. And it let us prototype those distributed stream processing clusters very, very quickly.

But though bikes let you move fast, if you build your own bike, you have to factor in how long it takes to build it — that is, while you’re standing still. And that’s exactly what we did for a few months.

This may have been a bit of scope creep. After all, we didn’t need streamparse to test out our new casterisk system. But it sure made testing them a whole lot easier. It let us run local tests of the clusters and it let us deploy multiple topologies in parallel that tweaked different settings. But it meant a new investment was required that was not the same as the core problem at hand.

Upgrading The Gears… While We Rode

The other bit of hubris that slowed us down: the alluring draw of “upgrades”.

Elasticsearch had just added its aggregation framework, which was exactly what we needed to do time series analysis against its records. It had also just added a new aggregate, cardinality, that we thought could satisfy some important use cases for us. Cassandra had a somewhat-buggy counter implementation in 2.0, but a complete re-implementation was around the corner in 2.1. We thought upgrading to it would save us, but then, we discovered counters were a bad idea altogether. Likewise, Storm had a stable release that we were already running in 0.8, but 0.9.2 was around the corner and was going to be the new stable. We upgraded to it, but then, discovered bugs in its network layer that stopped things from working. Our DevOps team reasonably pushed for a new “stable” Ubuntu version. We adopted it, thinking it’d be safe and stable. Turned out, we hit kernel/driver incompatibility problems with Xen, which were only triggered due to the scale of our bandwidth requirements.

So, all in all, we did several “upgrades to stable” that were actually bleeding edge upgrades in disguise. All while we were testing the systems in question with our own new Second System. The upgrades felt like “adopting a stable version”, but they were simply too new. If you upgrade the gears while riding the bike, you should expect to fall. This was one of the core lessons learned.

Taking a Couple of Fun Detours

The project seemed like it had already been distracted a bit by streamparse development and new database versions, but now a couple of fun detours also emerged from the noise. These were “good” detours. For example: we built an “engaged time” experiment that showcased our ability to track and measure a brand new metric, which had a very different shape from page views and visitors. We proved we could measure the metric effectively with our new system, and report on it in the context of our other metrics. It turns out, this metric was a driving force for adoption of our product in later months and years.

Our “referrer” experiment showed that we’d be able to satisfy several important queries in the eventually-delivered full system. Namely, we could breakout every traffic source category, domain, and URL in full detail, both in real-time and over long historical periods. This made our traffic source analytics more powerful than anything else on the market.

Our visits and sessions experiment showed our ability to do distinct counts just as well (albeit more slowly) than integer counts. Our “new vs returning” visit classifier had not just one, but two, rewrites atop different data structures, before eventually succumbing to a third rewrite that removed some functionality altogether. The funny thing is, these two attempts were eventually thrown away as we replaced it with a much simpler solution (called client-side sessionization, where we establish sessions in the JavaScript tracker rather than on the server). But, it was still a “good” detour, because it resulted in us shipping total visitors, new visitors, and returning visitors as real-time and historical metrics in our core dashboard — something that our competitors have still failed to deliver, years later.

These detours all had the feeling of engineers innovating at their best, but also meant multi-month delays in the delivery of an end-to-end working system.

Arriving at the Destination

Despite all this, in early August, we called a huddle to say we were finally going to be done with our biking tour. It was a new bike, it was fast, its gears were upgraded, and it was running as smoothly as it was going to. This led to the October, November, December period, which was among our team’s most productive during the Second System period. “ Preview” was built, tested, and delivered, as the data proved its value and flexibility.

Our old and new dashboard experience running side-by-side, powered by the different backends!

We ran the new system side-by-side with the old system. This was hard to do, but an absolute hard requirement for reducing risk. That’s another lesson learned: something we definitely did right, and that any future rewrites should consider to be a hard requirement, as well.

The new system was iteratively refined, while also making the backend more stable and performant. We updated our blog post on Mage: The Magical Time Series Backend Behind Analytics to reflect the shipped reality of the working system. I presented our usage of Elasticsearch as a large-scale production time series engine at Elastic{on}, where I got to know some members of the Elastic team who had worked on the aggregation engine that we managed to abstract over in Mage. We cut our customers over to the new system, just as our old system was hitting its hard scaling limits. It felt great.

Several new features were launched atop it in the following years, including a new version of our API, a native iOS app, a new “homepage overlay” tool, new full-screen dashboards, new filters and new reports. We shipped campaign tracking, channel tracking, non-post tracking, video tracking — all of which would have been impossible in the old system.

We’ve continued to ship feature after feature atop “casterisk” and “mage” since then. We expanded the scope of toward tracking more than just articles, including videos, landing pages, and other content types. We now support advanced custom segmentation of audience by subscription status, loyalty level, geographic region, and so on. We support custom events that are delivered through to a hosted data pipeline, which customers can use for raw data analysis and auditing. In other words, atop this rewritten backend, our product just kept getting better and better.

Meanwhile, we have kept up with 100% annual growth in monthly data capture rate, along with a 1000x growth in historical data volume. All thanks to our team’s engineering ingenuity, thanks to our willingness to pop open the hood and modify the engine, and thanks to the magic of linear hardware scaling.

A view of how Analytics looks today, powered by the “mythical Second System” and informed by thousands of successful site integrations, tens of thousands of users, and hundreds of enterprise customers.

Brooks was right

Brooks was right to say, in his typically gendered way, that “the second system is the most dangerous one a man ever designs”.

Building and shipping that Second System in the context of a startup evolving its “MVP” into “production system” has its own challenges. But though backend rewrites are hard and painful, sometimes the price of progress is to rethink everything.

Having Brooks in mind while you do so ensures that when you redesign your bike, you truly end up with a lighter, faster, better bike — and not an unstable unicycle that can never ride, or an overweight airplane that can never fly.

Doing this wasn’t easy. But, watching this production system grow over the last few years to support over 300 enterprise customers in their daily work, to serve as a base for cutting-edge natural language processing technology, to answer the toughest questions of content attribution — has been the most satisfying professional experiment of my life. There’s still so much more to do.

So, now that we’ve nailed value and scale, what’s next? My bet’s on scaling this value to the entire web. Care to join us?

Indrek Lasn (indreklasn)

These tips will boost your React code’s performance April 03, 2019 11:05 AM

React is popular because React applications scale well and are fun to work with. Once your app scales, you might consider optimizing your app. Going from 2500ms wait time to 1500ms can have huge impacts on your UX and conversion rates.

Note: This article was originally published here — read the original too!

So without further ado, here are some performance tips I use with React.


If you have a stateless component and you know your component won’t be needing re-render, wrap your entire stateless component inside a React.memo function. This:

becomes the following:

We wrap the entire stateless component inside a React.memo function. Notice the profile.displayName which helps to debug. More info about component.displayName

React.memo is the equivalent to the class version React.PureComponent


PureComponent compares props and state in shouldComponentUpdate life cycle method. This means it won’t re-render if the state and props are the same. Let’s refactor the previous stateless component to a class-based component.

If we know for sure the props and state won’t change, instead of using Component we could use the PureComponent.

componentDidCatch(error, info) {} Lifecycle method

Components may have side-effects which can crash the app in production. If you have more than 1000 components, it can be hard to keep track of everything.

There are so many moving parts in a modern web app; it’s hard to wrap one’s head around the whole concept and to handle errors. Luckily, React introduced a new lifecycle method for handling errors.

The componentDidCatch() method works like the JavaScript catch {} block, but for components. Only class components can be error boundaries.

React.lazy: Code-Splitting with Suspense

Dan explains:

“We’ve built a generic way for components to suspend rendering while they load async data, which we call suspense. You can pause any state update until the data is ready, and you can add async loading to any component deep in the tree without plumbing all the props and state through your app and hoisting the logic. On a fast network, updates appear very fluid and instantaneous without a jarring cascade of spinners that appear and disappear. On a slow network, you can intentionally design which loading states the user should see and how granular or coarse they should be, instead of showing spinners based on how the code is written. The app stays responsive throughout.”

Read more about Beyond React 16 here.

React.Fragments to Avoid Additional HTML Element Wrappers

If you used React you probably know the following error code:

“Parse Error: Adjacent JSX elements must be wrapped in an enclosing tag”.

React components can only have a single child. This is by design.

The following code will crash your app. The antidote to this is to wrap everything in a single element.

The only problem with the following code is we have an extensive wrapper for every component. The more markup we have to render, the slower our app.

Fragments to the rescue!

Voilà! No extra mark-up is necessary.

Bonus: Here’s the shorthand for Fragments.

Thanks for reading! Check out my Twitter for more. Here are some cool articles you might enjoy:

These tips will boost your React code’s performance was originally published in on Medium, where people are continuing the conversation by highlighting and responding to this story.

Marc Brooker (mjb)

Learning to build distributed systems April 03, 2019 12:00 AM

Learning to build distributed systems

A long email reply

A common question I get at work is "how do I learn to build big distributed systems?". I've written replies to that many times. Here's my latest attempt.

Learning how to design and build big distributed systems is hard. I don't mean that the theory is harder than any other field in computer science. I also don't mean that information is hard to come by. There's a wealth of information online, many distributed systems papers are very accessible, and you can't visit a computer science school without tripping over a distributed systems course. What I mean is that learning the practice of building and running big distributed systems requires big systems. Big systems are expensive, and expensive means that the stakes are high. In industry, millions of customers depend on the biggest systems. In research and academia, the risks of failure are different, but no less immediate. Still, despite the challenges, doing and making mistakes is the most effective way to learn.

Learn through the work of others

This is the most obvious answer, but still one worth paying attention to. If you're academically minded, reading lists and lists of best papers can give you a place to start to find interesting and relevant reading material. If you need a gentler introduction, blogs like Adrian Colyer's Morning Paper summarize and explain papers, and can also be a great way to discover important papers. There are a lot of distributed systems books I love, but I haven't found an accessible introduction I particularly like yet.

If you prefer to start with practice, many of the biggest distributed systems shops on the planet publish papers, blogs, and talks describing their work. Even Amazon, which has a reputation for being a bit secretive with our technology, has published papers like the classic Dynamo paper, and a recent papers on the Aurora database, and many more. Talks can be a valuable resource too. Here's Jaso Sorenson describing the design of DynamoDB, me and Holly Mesrobian describing a bit of how Lambda works, and Colm MacCarthaigh talking about some principles for building control planes. There's enough material out there to keep you busy forever. The hard part is knowing when to stop.

Sometimes (as I've written about before) it can be hard to close the gap between theory papers and practice papers. I don't have a good answer to that problem.

Get hands-on

Learning the theory is great, but I find that building systems is the best way to cement knowledge. Implement Paxos, or Raft, or Viewstamped Replication, or whatever you find interesting. Then test it. Fault injection is a great approach for that. Make notes of the mistakes you make (and you will make mistakes, for sure). Docker, EC2 and Fargate make it easier than ever to build test clusters, locally or in the cloud. I like Go as a language for building implementations of things. It's well-suited to writing network services. It compiles fast, and makes executables that are easy to move around.

Go broad

Learning things outside the distributed systems silo is important, too. I learned control theory as an undergrad, and while I've forgotten most of the math I find the way of thinking very useful. Statistics is useful, too. ML. Human factors. Formal methods. Sociology. Whatever. I don't think there's shame in being narrow and deep, but being broader can make it much easier to find creative solutions to problems.

Become an owner

If you're lucky enough to be able to, find yourself a position on a team, at a company, or in a lab that owns something big. I think the Amazon pattern of having the same team build and operate systems is ideal for learning. If you can, carry a pager. Be accountable to your team and your customers that the stuff you build works. Reality cannot be fooled.

Over the years at AWS we've developed some great mechanisms for being accountable. The wheel is one great example, and the COE process (similar to what the rest of the industry calls blameless postmortems) is another. Dan Luu's list of postmortems has a lot of lessons from around the industry. I've always enjoyed these processes, because they expose the weaknesses of systems, and provide a path to fixing them. Sometimes it can feel unforgiving, but the blameless part works well. Some COEs contain as many great distributed systems lessons as the best research papers.

Research has different mechanisms. The goal (over a longer time horizon) is the same: good ideas and systems survive, and bad ideas and systems are fall away. People build on the good ones, with more good ideas and the whole field moves forward. Being an owner is important.

Another tool I like for learning is the what-if COE or premortem. These are COEs for outages that haven't happened yet, but could happen. When building a new system, think about writing your first COE before it happens. What are the weaknesses in your system? How will it break? When replacing an older system with a new one, look at some of the older one's COEs. How would your new system perform in the same circumstances?

It takes time

This all takes time, both in the sense that you need to allocate hours of the day to it, and in the sense that you're not going to learn everything overnight. I've been doing this stuff for 15 years in one way or another, and still feel like I'm scratching the surface. Don't feel bad about others knowing things you don't. It's an opportunity, not a threat.

Pepijn de Vos (pepijndevos)

The only good open source software is for software developers April 03, 2019 12:00 AM

The rest is all inferior clones of commercial software.

When I think of really high-quality open source software, 90% of it are compilers, databases and libraries. Tools for software developers by software developers. There are exceptions (Firefox comes to mind), but as they say, the exception proves the rule.

Outside commercial projects that happen to be open source (Android comes to mind), open source software is largely driven by a “scratch your own itch” mentality. However, this poses a problem when software developers don’t have the itch, and people with the itch are not software developers.

I have recently begun to see the world from the perspective of academia and electrical engineering, and it came as a bit of a shock to me how many of the tools that are in common use are bloated commercial Windows GUI software, compared to nimble open source command-line tools I was used to.

Many of them cost hundreds if not thousands of Euros, take up gigabytes of RAM and storage, are a pain to use, and are still the best or only option available. I can only imagine the horrors of working in a non-tech industry.

I don’t think there is an easy solution. If I’m solving a problem for someone else, I probably want to get paid. So it seems the only plausible model is commercial software that happens to be open source.

The other option is either teaching people with an itch to code, or make people who code have the itch. Broaden your interests, y’all!!! </rant>

April 02, 2019

Caius Durling (caius)

Cheaper Oil for Mini One R50 April 02, 2019 12:00 PM

My Mini One 2003 R50 1.6 litre petrol engine takes specific BMW 5w30 Longlife-04 oil. (I believe the R52 and R53 models take the same oil too.) The oil is 5w30 fully synthetic made to BMW's exacting standards.

The cheapest I've found to buy currently is a GM (Vauxhall/Opel) manufactured one, made to BMW's specifications. Searching for something like "dexos 2 5w30 gm" on ebay finds them at about £20 for 5 litres with free delivery in UK. (Comparatively, an equivalent from Castrol is about £50 at time of writing.)

Sounds like a small saving, but if your Mini is anything like mine it needs topping up once a month or so, and I do a full oil/filter change every 5k miles as an attempt at longevity. Soon adds up.

Andreas Zwinkau (qznc)

C++ State Machines April 02, 2019 12:00 AM

Avoid input parameters so state machines transitions are decoupled from transition effects.

Read full article!

April 01, 2019

Derek Jones (derek-jones)

MI5 agent caught selling Huawei exploits on Russian hacker forums April 01, 2019 01:11 AM

An MI5 agent has been caught selling exploits in Huawei products, on an underground Russian hacker forum (a paper analyzing the operation of these forums; perhaps the researchers were hired as advisors). How did this news become public? A reporter heard Mr Wang Kit, a senior Huawei manager, complaining about not receiving a percentage of the exploit sale, to add to his quarterly sales report. A fair point, given that Huawei are funding a UK centre to search for vulnerabilities.

The ostensive purpose of the Huawei cyber security evaluation centre (funded by Huawei, but run by GCHQ; the UK’s signals intelligence agency), is to allay UK fears that Huawei have added back-doors to their products, that enable the Chinese government to listen in on customer communications.

If this cyber centre finds a vulnerability in a Huawei product, they may or may not tell Huawei about it. Obviously, if it’s an exploitable vulnerability, and they think that Huawei don’t know about it, they could pass the exploit along to the relevant UK government department.

If the centre decides to tell Huawei about the vulnerability, there are two good reasons to first try selling it, to shady characters of interest to the security services:

  • having an exploit to sell gives the person selling it credibility (of the shady technical kind), in ecosystems the security services are trying to penetrate,
  • it increases Huawei’s perception of the quality of the centre’s work; by increasing the number of exploits found by the centre, before they appear in the wild (the centre has to be careful not to sell too many exploits; assuming they manage to find more than a few). Being seen in the wild adds credibility to claims the centre makes about the importance of an exploit it discovered.

How might the centre go about calculating whether to hang onto an exploit, for UK government use, or to reveal it?

The centre’s staff could organized as two independent groups; if the same exploit is found by both groups, it is more likely to be found by other hackers, than an exploit found by just one group.

Perhaps GCHQ knows of other groups looking for Huawei exploits (e.g., the NSA in the US). Sharing information about exploits found, provides the information needed to more accurately estimate the likelihood of others discovering known exploits.

How might Huawei estimate the number of exploits MI5 are ‘selling’, before officially reporting them? Huawei probably have enough information to make a good estimate of the total number of exploits likely to exist in their products, but they also need to know the likelihood of discovering an exploit, per man-hour of effort. If Huawei have an internal team searching for exploits, they might have the data needed to estimate exploit discovery rate.

Another approach would be for Huawei to add a few exploits to the code, and then wait to see if they are used by GCHQ. In fact, if GCHQ accuse Huawei of adding a back-door to enable the Chinese government to spy on people, Huawei could claim that the code was added to check whether GCHQ was faithfully reporting all the exploits it found, and not keeping some for its own use.

March 31, 2019

Ponylang (SeanTAllen)

Last Week in Pony - March 31, 2019 March 31, 2019 01:40 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Luke Picciau (user545)

Evaluating 8 Months of Building a Rails Vue SPA March 31, 2019 04:38 AM

Picture unrelated by Pexels Over the last ~8 months I have been building a website called PikaTrack using Rails for an api backend and VueJS frontend. I wanted to write a blog post detailing how it went, what the positives and negatives are and would I do it again. The website is an open source service for fitness tracking using gps logs captured while running or cycling. I’ll start with my background.

March 30, 2019

Simon Zelazny (pzel)

Uncrashable languages aren't March 30, 2019 11:00 PM

A trivial observation, with some examples

Making buggy situations unrepresentable

Programs have bugs. Both creators and end-users of software dislike bugs. Businesses paying for software development dislike bugs. It's no wonder that, as the role of software in the world expands, we've become very interested in minimizing occurrences of bugs.

One way of reducing bugs is via process: making sure that critical code is tested to the greatest practical extent. Another way is via construction: making sure that buggy code is not representable. This could be achieved by making such code unexpressible in the syntax of a language, or having it fail the compiler's type check.

There are two new programming languages that take a principled stance on the side of non-representability, by preventing code from crashing wantonly: Elm and Pony.

Elm does this by eliminating exceptions from language semantics and forcing the programmer to handle all branches of sum types (i.e your code has to cover all representable states that it might encounter).

Pony does this by allowing anonymous exceptions (the error operator), but forcing the programmer to deal with them at some point. All functions – apart from those which are explicitly marked as capable of throwing errors – MUST be total and always return a value.

A small aside about division by zero

Elm used to crash when you tried to divide by zero. Now (I tried version 0.19), it returns 0 for integer division and Infinity for floating-point division. The division functions are therefore total.

> 5.0 / 0.0
Infinity : Float
> 5 // 0
0 : Int
> remainderBy 5 0
0 : Int
> modBy 5 0
0 : Int

Pony also returns zero when zero is the divisor.

actor Main                                        //  This code prints:
  new create(env: Env) =>                         //  0
   env.out.print((U32(1) / U32(0)).string())

However, Pony also provides partial arithmetic operators (/? for division, +? for addition, below), for when you explicitly need integer over/underflows and division-by-zero to be illegal:

actor Main                                         //  This code prints:
  new create(env: Env) =>                          //  div by zero
    try U32(1) /? U32(0)                           //  overflow
    else env.out.print("div by zero")              //  0
    end                                            //  0
    try U8(255) +? U8(1)
    else env.out.print("overflow")
   env.out.print((U32(1) / U32(0)).string())
   env.out.print((U8(255) + U8(1)).string())

While returning '0' for divison-by-zero is a controversial topic (yet silent integer overflows somehow don't generate the same debate), I think it's reasonable to view this compromise as the necessary cost of eliminating crashes in our code. More interesting is this: we have just made a tradeoff between eliminating crashes and wrong results. Having a total division function eliminates crashes at the cost of allowing wrong results to propagate. Let's dig into this a bit more.

Bugs and the bottom type

Taxonomy is supposed to be the lowest form of science, but let's indulge and distinguish two main types of program misbehavior:

1) A program (or function) produces output which does not match the programmer's intent, design, or specification;

2) A program (or function) fails to produce output (e.g. freezes or crashes)

I hope you'll agree that eliminating 'bugs' caused by the first type of error is not an easy feat, and probably not within the scope of a language runtime or compiler. Carefully designing your data structures to make illegal states unrepresentable may go a long way towards eliminating failures of this kind, as will a good testing regimen. Let's not delve deeper into this category and focus on the second one: functions that crash and never return.

The wikipedia article on the Bottom Type makes for an intersting read. It's nice to conceive of as a hole in the program, where execution stops and meaning collapses. Since the bottom type is a subtype of every type, theoretically any function can return the 'bottom value' — although returning the 'bottom value' acutally means never returning at all.

My claim is that while some languages, like Haskell or Rust, might explicitly embrace the existence of , languages that prevent programmers from 'invoking' the bottom type will always contain inconsistencies (I'm leaving dependently-typed languages out of this). Below are two examples.

Broken promises and infinite loops

Elm's promise is that an application written in Elm will never crash, unless there is a bug in the Elm runtime. There are articles out there that enumerate the various broken edge-cases (regexp, arrays, json decoding), but these cases can arguably be construed as bugs in the runtime or mistakes in library API design. That is, these bugs do not mean that Elm's promise is for naught.

However, if you think about it, an infinite loop is a manifestation of the bottom type just as much as an outright crash, and such a loop is possible in all Turing-complete languages.

Here's a legal Elm app that freezes:

import Browser
import Html exposing (Html, button, div, text)
import Html.Events exposing (onClick)

main =
  Browser.sandbox { init = 0, update = update, view = view }

type Msg = Increment | Decrement

add : Int -> Int
add n =
  add (n+1)

update msg model =
  case msg of
    Increment ->
      add model

    Decrement ->
      model - 1

view model =
  div []
    [ button [ onClick Decrement ] [ text "-" ]
    , div [] [ text (String.fromInt model) ]
    , button [ onClick Increment ] [ text "+" ]

What will happen when you click the + button in the browser? This is what:

Screenshot of browser message saying: 'A web page is slowing down your browser.'

The loop is hidden in the add function, which never actually returns an Int. Its true return type, in this program, is precisely . Without explicitly crashing-and-stopping, we've achieved the logical (and type-systematic) equivalent: a freeze.

Galloping forever and ever

The Pony language is susceptible to a similar trick, but we'll have to be a bit more crafty. First of all, Pony does indeed allow the programmer to 'invoke' the Bottom Type, by simply using the keyword error anywhere in a function body. Using this keyword (or calling a partial function) means that we, the programmer, now have a choice to make:

1) Use try/else to handle the possibility of error, and return a sensible default value

2) Mark this function as partial ?, and force callers to deal with the possibility of the Bottom Type rearing its head.

However, we can craft a function that spins endlessly, never exiting, and thus 'returning' the Bottom Type, without the compiler complaining, and without having to annotate it as partial.

Interestingly enough, naïve approaches are optimized away by the compiler, producing surprising result values instead of spinning forever:

actor Main
  new create(env: Env) =>
    let x = spin(false)

  fun spin(n: Bool): Bool =>
    spin(not n)

Before you run this program, think about what, if anything, it should output. Then, run it and see. Seems like magic to me, but I'm guessing this is LLVM detecting the oscillation and producing a 'sensible' value.

We can outsmart the optimizer by farming out the loop to another object:

actor Main
  new create(env: Env) =>
    let t: TarpitTrap = TarpitTrap
    let result = t.spin(true)

class TarpitTrap
  fun spin(n: Bool): Bool =>
    if n then spin(not n)
    else spin(n)

Now, this program properly freezes forever, as intended. Of course this is just a contrived demonstration, but one can imagine an analogous situation happening at run-time, for example when parsing tricky (or malicious) input data.

The snake in the garden

While I enjoy working both in Elm and Pony, I'm not a particular fan of these languages' hard-line stance on making sure programs never crash. As long as infinite loops are expressible in the language, the Bottom Type cannot be excised.

Even without concerns somewhat external to our programming language runtime, such as memory constraints, FFIs, syscalls, or the proverbial admin pulling the plug on our machine (did this really used to happen?), the humble infinite loop ensures that non-termination can never be purged from our (non-dependently-typed) program.

Instead of focusing on preventing crashes in the small, I think we, as programmers, should embrace failure and look at how to deal with error from a higher-level perspective, looking at processes, machines, and entire systems. Erlang and OTP got this right so many years ago. Ensuring the proper operation of a system despite failure is a much more practical goal than vainly trying to expel the infinitely-looping snake from our software garden.

Derek Jones (derek-jones)

The 2019 Huawei cyber security evaluation report March 30, 2019 02:23 PM

The UK’s Huawei cyber security evaluation centre oversight board has released it’s 2019 annual report.

The header and footer of every page contains the text “SECRET”“OFFICIAL”, which I assume is its UK government security classification. It lends an air of mystique to what is otherwise a meandering management report.

Needless to say, the report contains the usually puffery, e.g., “HCSEC continues to have world-class security researchers…”. World class at what? I hear they have some really good mathematicians, but have serious problems attracting good software engineers (such people can be paid a lot more, and get to do more interesting work, in industry; the industry demand for mathematicians, outside of finance, is weak).

The most interesting sentence appears on page 11: “The general requirement is that all staff must have Developed Vetting (DV) security clearance, …”. Developed Vetting, is the most detailed and comprehensive form of security clearance in UK government (to quote Wikipedia).

Why do the centre’s staff have to have this level of security clearance?

The Huawei source code is not that secret (it can probably be found online, lurking in the dark corners of various security bulletin boards).

Is the real purpose of this cyber security evaluation centre, to find vulnerabilities in the source code of Huawei products, that GCHQ can then use to spy on people?

Or perhaps, this centre is used for training purposes, with staff moving on to work within GCHQ, after they have learned their trade on Huawei products?

The high level of security clearance applied to the centre’s work is the perfect smoke-screen.

The report claims to have found “Several hundred vulnerabilities and issues…”; a meaningless statement, e.g., this could mean one minor vulnerability and several hundred spelling mistakes. There is no comparison of the number of vulnerabilities found per effort invested, no comparison with previous years, no classification of the seriousness of the problems found, no mention of Huawei’s response (i.e., did Huawei agree that there was a problem).

How many vulnerabilities did the centre find that were reported by other people, e.g., the National Vulnerability Database? This information would give some indication of how good a job the centre was doing. Did this evaluation centre find the Huawei vulnerability recently disclosed by Microsoft? If not, why not? And if they did, why isn’t it in the 2019 report?

What about comparing the number of vulnerabilities found in Huawei products against the number found in vendors from the US, e.g., CISCO? Obviously back-doors placed in US products, at the behest of the NSA, need not be counted.

There is some technical material, starting on page 15. The configuration and component lifecycle management issues raised, sound like good points, from a cyber security perspective. From a commercial perspective, Huawei want to quickly respond to customer demand and a dynamic market; corners are likely to be cut off good practices every now and again. I don’t understand why the use of an unnamed real-time operating system was flagged: did some techie gripe slip through management review? What is a C preprocessor macro definition doing on page 29? This smacks of an attempt to gain some hacker street-cred.

Reading between the lines, I get the feeling that Huawei has been ignoring the centre’s recommendations for changes to their software development practices. If I were on the receiving end, I would probably ignore them too. People employed to do security evaluation are hired for their ability to find problems, not for their ability to make things that work; also, I imagine many are recent graduates, with little or no practical experience, who are just repeating what they remember from their course work.

Huawei should leverage its funding of a GCHQ spy training centre, to get some positive publicity from the UK government. Huawei wants people to feel confident that they are not being spied on, when they use Huawei products. If the government refuses to play ball, Huawei should shift its funding to a non-government, open evaluation center. Employees would not need any security clearance and would be free to give their opinions about the presence of vulnerabilities and ‘spying code’ in the source code of Huawei products.

March 29, 2019

Wesley Moore (wezm)

Cross Compiling Rust for FreeBSD With Docker March 29, 2019 11:02 PM

For a little side project I'm working on I want to be able to produce pre-compiled binaries for a variety of platforms, including FreeBSD. With a bit of trial and error I have been able to successfully build working FreeBSD binaries from a Docker container, without using (slow) emulation/virtual machines. This post describes how it works and how to add it to your own Rust project.

Update 27 March 2019: Stephan Jaekel pointed out on Twitter that cross supports a variety of OSes including FreeBSD, NetBSD, Solaris, and more. I have used cross for embedded projects but didn't think to use it for non-embedded ones. Nonetheless the process described in this post was still educational for me but I would recommend using cross instead.

I started with Sandvine's freebsd-cross-build repo. Which builds a Docker image with a cross-compiler that targets FreeBSD. I made a few updates and improvements to it:

  • Update from FreeBSD 9 to 12.
  • Base on newer debian9-slim image instead of ubuntu 16.04.
  • Use a multi-stage Docker build.
  • Do all fetching of tarballs inside the container to remove the need to run a script on the host.
  • Use the FreeBSD base tarball as the source of headers and libraries instead of ISO.
  • Revise the fix-links script to automatically discover symlinks that need fixing.

Once I was able to successfully build the cross-compilation toolchain I built a second Docker image based on the first that installs Rust, and the x86_64-unknown-freebsd target. It also sets up a non-privileged user account for building a Rust project bind mounted into it.

Check out the repo at:

Building the Images

I haven't pushed the image to a container registry as I want to do further testing and need to work out how to version them sensibly. For now you'll need to build them yourself as follows:

  1. git clone && cd freebsd-cross-build
  2. docker build -t freebsd-cross .
  3. docker build -f Dockerfile.rust -t freebsd-cross-rust .

Using the Images to Build a FreeBSD Binary

To use the freebsd-cross-rust image in a Rust project here's what you need to do (or at least this is how I'm doing it):

In your project add a .cargo/config file for the x86_64-unknown-freebsd target. This tells cargo what tool to use as the linker.

linker = "x86_64-pc-freebsd12-gcc"

I use Docker volumes to cache the output of previous builds and the cargo registry. This prevents cargo from re-downloading the cargo index and dependent crates on each build and saves build artifacts across builds, speeding up compile times.

A challenge this introduces is how to get the resulting binary out of the volume. For this I use a separate docker invocation that copies the binary out of the volume into a bind mounted host directory.

Originally I tried mounting the whole target directory into the container but this resulted in spurious compilation failures during linking and lots of files owned by root (I'm aware of user namespaces but haven't set it up yet).

I wrote a shell script to automate this process:


set -e

mkdir -p target/x86_64-unknown-freebsd

# NOTE: Assumes the following volumes have been created:
# - lobsters-freebsd-target
# - lobsters-freebsd-cargo-registry

# Build
sudo docker run --rm -it \
  -v "$(pwd)":/home/rust/code:ro \
  -v lobsters-freebsd-target:/home/rust/code/target \
  -v lobsters-freebsd-cargo-registry:/home/rust/.cargo/registry \
  freebsd-cross-rust build --release --target x86_64-unknown-freebsd

# Copy binary out of volume into target/x86_64-unknown-freebsd
sudo docker run --rm -it \
  -v "$(pwd)"/target/x86_64-unknown-freebsd:/home/rust/output \
  -v lobsters-freebsd-target:/home/rust/code/target \
  --entrypoint cp \
  freebsd-cross-rust \
  /home/rust/code/target/x86_64-unknown-freebsd/release/lobsters /home/rust/output

This is what the script does:

  1. Ensures that the destination directory for the binary exists. Without this, docker will create it but it'll be owned by root and the container won't be able to write to it.
  2. Runs cargo build --release --target x86_64-unknown-freebsd (the leading cargo is implied by the ENTRYPOINT of the image.
    1. The first volume (-v) argument bind mounts the source code into the container, read-only.
    2. The second -v maps the named volume, lobsters-freebsd-target into the container. This caches the build artifacts.
    3. The last -v maps the named volume, lobsters-freebsd-cargo-registry into the container. This caches the carge index and downloaded crates.
  3. Copies the built binary out of the lobsters-freebsd-target volume into the local filesystem at target/x86_64-unknown-freebsd.
    1. The first -v bind mounts the local target/x86_64-unknown-freebsd directory into the container at /home/rust/output.
    2. The second -v mounts the lobsters-freebsd-target named volume into the container at /home/rust/code/target.
    3. The docker run invocation overrides the default ENTRYPOINT with cp and supplies the source and destination to it, copying from the volume into the bind mounted host directory.

After running the script there is a FreeBSD binary in target/x86_64-unknown-freebsd. Copying it to a FreeBSD machine for testing shows that it does in fact work as expected!

One last note, this all works because I don't depend on any C libraries in my project. If I did, it would be necessary to cross-compile them so that the linker could link them when needed.

Once again, the code is at:

Previous Post: My First 3 Weeks of Professional Rust

Gokberk Yaltirakli (gkbrk)

Phone Location Logger March 29, 2019 10:22 AM

If you are using Google Play Services on your Android phone, Google receives and keeps track of your location history. This includes your GPS coordinates and timestamps. Because of the privacy implications, I have revoked pretty much all permissions from Google Play Services and disabled my Location History on my Google settings (as if they would respect that).

But while it might be creepy if a random company has this data, it would be useful if I still have it. After all, who doesn’t want to know the location of a park that they stumbled upon randomly on a vacation 3 years ago.

I remember seeing some location trackers while browsing through F-Droid. I found various applications there, and picked one that was recently updated. The app was a Nextcloud companion app, with support for custom servers. Since I didn’t want a heavy Nextcloud install just to keep track of my location, I decided to go with the custom server approach.

In the end, I decided that the easiest path is to make a small CGI script in Python that appends JSON encoded lines to a text file. Because of this accessible data format, I can process this file in pretty much every programming language, import it to whatever database I want and query it in whatever way I see fit.

The app I went with is called PhoneTrack. You can find the APK and source code links on F-Droid. It replaces the parameters in the URL, logging every parameter looks like this: ?acc=%ACC&alt=%ALT&batt=%BATT&dir=%DIR&lat=%LAT&lon=%LON&sat=%SAT&spd=%SPD &timestamp=%TIMESTAMP

Here’s the script in all it’s glory.

import cgi
import json

PATH = '/home/databases/location.txt'

print('Content-Type: text/plain\n')
form = cgi.FieldStorage()

# Check authentication token
if form.getvalue('token') != 'SECRET_VALUE':
    raise Exception('Nope')

obj = {
    'accuracy':   form.getvalue('acc'),
    'altitude':   form.getvalue('alt'),
    'battery':    form.getvalue('batt'),
    'bearing':    form.getvalue('dir'),
    'latitude':   form.getvalue('lat'),
    'longitude':  form.getvalue('lon'),
    'satellites': form.getvalue('sat'),
    'speed':      form.getvalue('spd'),
    'timestamp':  form.getvalue('timestamp'),

with open(PATH, 'a+') as log:
    line = json.dumps(obj)

March 28, 2019

Derek Jones (derek-jones)

Using Black-Scholes in software engineering gives a rough lower bound March 28, 2019 04:22 PM

In the financial world, a call option is a contract that gives the buyer the option (but not the obligation) to purchase an asset, at an agreed price, on an agreed date (from the other party to the contract).

If I think that the price of jelly beans is going to increase, and you disagree, then I might pay you a small amount of money for the right to buy a jar of jelly beans from you, in a month’s time, at today’s price. A month from now, if the price of Jelly beans has gone down, I buy a jar from whoever at the lower price, but if the price has gone up, you have to sell me a jar at the previously agreed price.

I’m in the money if the price of Jelly beans goes up, you are in the money if the price goes down (I paid you a premium for the right to purchase at what is known as the strike price).

Do you see any parallels with software development here?

Let’s say I have to rush to complete implementation some functionality by the end of the week. I might decide to forego complete testing, or following company coding practices, just to get the code out. At a later date I can decide to pay the time needed to correct my short-cuts; it is possible that the functionality is not used, so the rework is not needed.

This sounds like a call option (you might have thought of technical debt, which is, technically, the incorrect common usage term). I am both the buyer and seller of the contract. As the seller of the call option I received the premium of saved time, and the buyer pays a premium via the potential for things going wrong. Sometime later the seller might pay the price of sorting out the code.

A put option involves the right to sell (rather than buy).

In the financial world, speculators are interested in the optimal pricing of options, i.e., what should the premium, strike price and expiry date be for an asset having a given price volatility?

The Black-Scholes equation answers this question (and won its creators a Nobel prize).

Over the years, various people have noticed similarities between financial options thinking, and various software development activities. In fact people have noticed these similarities in a wide range of engineering activities, not just computing.

The term real options is used for options thinking outside of the financial world. The difference in terminology is important, because financial and engineering assets can have very different characteristics, e.g., financial assets are traded, while many engineering assets are sunk costs (such as drilling a hole in the ground).

I have been regularly encountering uses of the Black-Scholes equation, in my trawl through papers on the economics of software engineering (in some cases a whole PhD thesis). In most cases, the authors have clearly failed to appreciate that certain preconditions need to be met, before the Black-Scholes equation can be applied.

I now treat use of the Black-Scholes equation, in a software engineering paper, as reasonable cause for instant deletion of the pdf.

If you meet somebody talking about the use of Black-Scholes in software engineering, what questions should you ask them to find out whether they are just sprouting techno-babble?

  • American options are a better fit for software engineering problems; why are you using Black-Scholes? An American option allows the option to be exercised at any time up to the expiry date, while a European option can only be exercised on the expiry date. The Black-Scholes equation is a solution for European options (no optimal solution for American options is known). A sensible answer is that use of Black-Scholes provides a rough estimate of the lower bound of the asset value. If they don’t know the difference between American/European options, well…
  • Partially written source code is not a tradable asset; why are you using Black-Scholes? An assumption made in the derivation of the Black-Scholes equation is that the underlying assets are freely tradable, i.e., people can buy/sell them at will. Creating source code is a sunk cost, who would want to buy code that is not working? A sensible answer may be that use of Black-Scholes provides a rough estimate of the lower bound of the asset value (you can debate this point). If they don’t know about the tradable asset requirement, well…
  • How did you estimate the risk adjusted discount rate? Options involve balancing risks and getting values out of the Black-Scholes equation requires plugging in values for risk. Possible answers might include the terms replicating portfolio and marketed asset disclaimer (MAD). If they don’t know about risk adjusted discount rates, well…

If you want to learn more about real options: “Investment under uncertainty” by Dixit and Pindyck, is a great read if you understand differential equations, while “Real options” by Copeland and Antikarov contains plenty of hand holding (and you don’t need to know about differential equations).

Andreas Zwinkau (qznc)

TipiWiki (2003) March 28, 2019 12:00 AM

More than 15 years ago I published a little wiki software.

Read full article!

March 25, 2019

Wesley Moore (wezm)

My First 3 Weeks of Professional Rust March 25, 2019 06:00 AM

For the last 15 years as a professional programmer I have worked mostly with dynamic languages. First Perl, then Python, and for the last 10 years or so, Ruby. I've also been writing Rust on the side for personal projects for nearly four years. Recently I started a new job and for the first time I'm writing Rust professionally. Rust represents quite a shift in language features, development process and tooling. I thought it would be interesting to reflect on that experience so far.

Note that some of my observations are not unique to Rust and would be equally present in other languages like Haskell, Kotlin, or OCaml.


In my first week I hit up pretty hard against my knowledge of lifetimes in Rust. I was reasonably confident with them conceptually and their simple application but our code has some interesting type driven zero-copy parsing code that tested my knowledge. When encountering some compiler errors I was fortunate to have experienced colleagues to ask for help. It's been nice to extend my knowledge and learn as I go.

Interestingly I had mostly been building things without advanced lifetime knowledge up until this point. I think that sometimes the community puts too much emphasis on some of Rust's more advanced features when citing its learning curve. If you read the book you can get a very long way. Although that will depend on the types of applications or data structures you're trying to build.


In my second week I implemented a change to make a certain pattern more ergonomic. It was refreshing to be able to build the initial functionality and then make a project-wide change, confident that given it compiled after the change I probably hadn't broken anything. I don't think I would have had the confidence to make such a change as early on in the Ruby projects I've worked on previously.


I cringe whenever I see proponents of statically typed languages say things like, "if it compiles, it works", with misguided certainty. The compiler and language do eliminate whole classes of bugs that you'd need to test for in a dynamic language but that doesn't mean tests aren't needed.

Rust has great built in support for testing and I've enjoyed being able to write tests focussed solely on the behaviour and logic of my code. Compared to Ruby where I have to write tests that ensure there are no syntax errors, nil is handled safely, arguments are correct, in addition to the behaviour and logic.

Editor and Tooling

Neovim is my primary text editor. I've been using vim or a derivative since the early 2000s. I have the RLS set up and working in my Neovim environment but less than a week in I started using IntelliJ IDEA with the Rust and Vim emulation plugins for work. A week after that I started trialling CLion as I wanted a debugger.

JetBrains CLion IDE JetBrains CLion IDE

The impetus for the switch was that I was working with a colleague on a change that had a fairly wide impact on the code. We were practicing compiler driven development and were doing a repeated cycle of fix an error, compile, jump to next top most error. Vim's quickfix list + :make is designed to make this cycle easier too but I didn't have that set up at the time. I was doing a lot of manual jumping between files, whereas in IntelliJ I could just click the paths in the error messages.

It's perhaps the combination of working on a foreign codebase and also trying to maximise efficiency when working with others that pushed me to seek out better tooling for work use. There is ongoing to work to improve the RLS so I may still come back to Neovim and I continue to use it for personal projects.

Other CLion features that I'm enjoying:

  • Reliable autocomplete
  • Reliable jump to definition, jump to impl block, find usages
  • Refactoring tooling (rename across project, extract method, extract variable)
  • Built in debugger

VS Code offers some of these features too. However, since they are built on the RLS they suffer many of the same issues I had in Neovim. Additionally I think the Vim emulation plugin for IntelliJ is more complete, or at least more predictable for a long time vim user. This is despite the latter actually using Neovim under the covers.


In Ruby with a gem like pry-byebug it's trivial to put a binding.pry in some code to be dropped into a debugger + REPL at that point in the code. This is harder with Rust. println! or dbg! based debugging can get you a surprisingly long way and had served me well for most of my personal projects.

When building some parsing code I quickly felt the need to use a real debugger in order to step through and examine execution of a failing test. It's possible to do this on the command line with the rust-gdb or rust-lldb wrappers that come with Rust. However, I find them fiddly to use and verbose to operate.

CLion makes it simple to add and remove break points by clicking in the gutter, run a single test under the debugger, visually step through the code, see all local variables, step up and down the call stack, etc. These are possible with the command line tools (which CLIon is using behind the scenes), but it's nice to have them built in and available with a single click of the mouse.


So far I am enjoying my new role. There have been some great learning opportunities and surprising tooling changes. I'm also keen to keep an eye on the frequency of bugs encountered in production, their type (such as panic or incorrect logic), their source, and ease of resolution. I look forward to writing more about our work in the future.

Discuss on Lobsters

Previous Post: A Coding Retreat and Getting Embedded Rust Running on a SensorTag
Next Post: Cross Compiling Rust for FreeBSD With Docker

Pete Corey (petecorey)

Bending Jest to Our Will: Restoring Node's Require Behavior March 25, 2019 12:00 AM

Jest does some interesting things to Node’s default require behavior. In an attempt to encourage test independence and concurrent test execution, Jest resets the module cache after every test.

You may remember one of my previous articles about “bending Jest to our will” and caching instances of modules across multiple tests. While that solution works for single modules on a case-by-case basis, sometimes that’s not quite enough. Sometimes we just want to completely restore Node’s original require behavior across the board.

After sleuthing through support tickets, blog posts, and “official statements” from Jest core developers, this seems to be entirely unsupported and largely impossible.

However, with some highly motivated hacking I’ve managed to find a way.

Our Goal

If you’re unfamiliar with how require works under the hood, here’s a quick rundown. The first time a module is required, its contents are executed and the resulting exported data is cached. Any subsequent require calls of the same module return a reference to that cached data.

That’s all there is to it.

Jest overrides this behavior and maintains its own “module registry” which is blown away after every test. If one test requires a module, the module’s contents are executed and cached. If that same test requires the same module, the cached result will be returned, as we’d expect. However, other tests don’t have access to our first test’s module registry. If another test tries to require that same module, it’ll have to execute the module’s contents and store the result in its own private module registry.

Our goal is to find a way to reverse Jest’s monkey-patching of Node’s default require behavior and restore it’s original behavior.

This change, or reversal of a change, will have some unavoidable consequences. Our Jest test suite won’t be able to support concurrent test processes. This means that all our tests will have to run “in band”(--runInBand). More interestingly, Jest’s “watch mode” will no longer work, as it uses multiple processes to run tests and maintain a responsive command line interface.

Accepting these limitations and acknowledging that this is likely a very bad idea, let’s press on.

Dependency Hacking

After several long code reading and debugging sessions, I realized that the heart of the problem resides in Jest’s jest-runtime module. Specifically, the requireModuleOrMock function, which is responsible for Jest’s out-of-the-box require behavior. Jest internally calls this method whenever a module is required by a test or by any code under test.

Short circuiting this method with a quick and dirty require causes the require statements throughout our test suites and causes our code under test to behave exactly as we’d expect:

requireModuleOrMock(from: Path, moduleName: string) {
+ return require(this._resolveModule(from, moduleName));
  try {
    if (this._shouldMock(from, moduleName)) {
      return this.requireMock(from, moduleName);
    } else {
      return this.requireModule(from, moduleName);
  } catch (e) {
    if (e.code === 'MODULE_NOT_FOUND') {
      const appendedMessage = findSiblingsWithFileExtension(

      if (appendedMessage) {
        e.message += appendedMessage;
    throw e;

Whenever Jest reaches for a module, we relieve it of the decision to use a cached module from it’s internally maintained moduleRegistry, and instead have it always return the result of requiring the module through Node’s standard mechanisms.

Patching Jest

Our fix works, but in an ideal world we wouldn’t have to fork jest-runtime just to make our change. Thankfully, the requireModuleOrMock function isn’t hidden within a closure or made inaccessible through other means. This means we’re free to monkey-patch it ourselves!

Let’s start by creating a test/globalSetup.js file in our project to hold our patch. Once created, we’ll add the following lines:

const jestRuntime = require('jest-runtime');

jestRuntime.prototype.requireModuleOrMock = function(from, moduleName) {
    return require(this._resolveModule(from, moduleName));

We’ll tell our Jest setup to use this config file by listing it in our jest.config.js file:

module.exports = {
    globalSetup: './test/globalSetup.js',

And that’s all there is to it! Jest will now execute our globalSetup.js file once, before all of our test suites, and restore the original behavior of require.

Being the future-minded developers that we are, it’s probably wise to document this small and easily overlooked bit of black magic:

 * This requireModuleOrMock override is _very experimental_. It affects
 * how Jest works at a very low level and most likely breaks Jest-style
 * module mocks.
 * The upside is that it lets us evaluate heavy modules once, rather
 * that once per test.

jestRuntime.prototype.requireModuleOrMock = function(from, moduleName) {
    return require(this._resolveModule(from, moduleName));

If you find yourself with no other choice but to perform this incantation on your test suite, I wish you luck. You’re most likely going to need it.

March 24, 2019

Pages From The Fire (kghose)

Another short story, where unit tests save my butt March 24, 2019 10:04 PM

I like to refactor. A lot. As I work on a problem I understand it better and I want to reflect this in the code. I was nearing the end of a pretty serious refactor. The tests had been failing for the last 60 commits or so. I wasn’t worried, I was expecting this. But …

Ponylang (SeanTAllen)

Last Week in Pony - March 24, 2019 March 24, 2019 01:59 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Wesley Moore (wezm)

A Coding Retreat and Getting Embedded Rust Running on a SensorTag March 24, 2019 02:01 AM

This past long weekend some friends on I went on a coding retreat inspired by John Carmack doing similar in 2018. During the weekend I worked on adding support for the Texas Instruments SensorTag to the embedded Rust ecosystem. This post is a summary of the weekend and what I was able to achieve code wise.

Back in March 2018 John Carmack posted about a week long coding retreat he went on to work on neural networks and OpenBSD. After reading the post I quoted it to some friends and commented:

I finally took another week-long programming retreat, where I could work in hermit mode, away from the normal press of work.

In the spirit of my retro theme, I had printed out several of Yann LeCun’s old papers and was considering doing everything completely off line, as if I was actually in a mountain cabin somewhere

I kind of love the idea of a week long code retreat in a cabin somewhere.

One of my friends also liked the idea and actually made it happen! There was an initial attempt in June 2018 but life got in the way so it was postponed. At the start of the year he picked it up again and organised it for the Labour day long weekend, which just passed.

We rented an Airbnb in the Dandenong Ranges, 45 minutes from Melbourne. Six people attended, two of which were from interstate. The setting was cozy, quiet and picturesque. Our days involved coding and collaborating, shared meals, and a walk or two around the surrounds.

Photo of a sunrise with trees and windmill visible The view from our accommodation one morning.

After I got inspired to set up some self-hosted home sensors and automation. I did some research and picked up two Texas Instruments SensorTags and a debugger add-on. It uses a CC2650 microcontroller with an ARM Cortex-M3 core and has support for a number of low power wireless standards, such as Bluetooth, ZigBee, and 6LoWPAN. The CC2650 also has a low power 16-bit sensor controller that can be used to help achieve years long battery life from a single CR2032 button cell. In addition to the microcontroller the SensorTag also add a bunch of sensors, including: temperature, humidity, barometer, accelerometer, gyroscope, and light.

Two SensorTags. One with it's rubberised case removed and debugger board attached Two SensorTags. One with it's rubberised case removed and debugger board attached.

My project for the weekend was to try to get some Rust code running on the SensorTag. Rust has good support out of the box for targeting ARM Cortex microcontrollers but there were no crates to make interacting with this particular chip or board easy, so I set about building some.

The first step was generating a basic crate to allow interacting with the chip without needing to wrap everything in an unsafe block and poke at random memory addresses. Fortunately svd2rust can automate this by converting System View Description XML files (SVD) into a Rust crate. Unfortunately TI don't publish SVD files for their devices. As luck would have it though, M-Labs have found that TI do publish XML descriptions in format of their own called DSLite. They have written a tool, dslite2svd, that converts this to SVD, so you can then use svd2rust. It took a while to get dslite2svd working. I had to tweak it to handle differences in the files I was processing, but eventually I was able to generate a crate that compiled.

Now that I had an API for the chip I turned to working out how to program and debug the SensorTag with a very basic Rust program. I used the excellent embedded Rust Discovery guide as a basis for the configuration, tools, and process for getting code onto the SensorTag. Since this was a different chip from a different manufacturer it took a long time to work out which tools worked, how to configure them, what format binaries they wanted, create a linker script, etc. A lot of trial and error was performed, along with lots of searching online with less than perfect internet. However, by Sunday I could program the device, debug code, and verify that my very basic program, shown below, was running.

fn main() -> ! {
    let _y;
    let x = 42;
    _y = x;

    // infinite loop; just so we don't leave this stack frame
    loop {}

The combination that worked for programming was:

  • cargo build --target thumbv7m-none-eabi
  • Convert ELF to BIN using cargo objcopy, which is part of cargo-binutils: cargo objcopy --bin sensortag --target thumbv7m-none-eabi -- -O binary sensortag.bin
  • Program with UniFlash:
    • Choose CC2650F128 and XDS1100 on the first screen
    • Do a full erase the first time to reset CCFG, etc
    • Load image (select the .bin file produced above)

For debugging:

  • Build OpenOCD from git to get support for the chip and debugger (I used the existing AUR package)
  • Run OpenOCD: openocd -f jtag/openocd.cfg
  • Use GDB to debug: arm-none-eabi-gdb -x jtag/gdbinit -q target/thumbv7m-none-eabi/debug/sensortag
  • The usual mon reset halt in GDB upsets the debugger connection. I found that soft_reset_halt was able to reset the target (although it complains about being deprecated).

Note: Files in the jtag path above are in my sensortag repo. Trying to program through openocd failed with an error that the vEraseFlash command failed. I'd be curious to know if anyone has got this working as I'd very much like to ditch the huge 526.5 MiB UniFlash desktop-web-app dependency in my workflow.

Now that I could get code to run on the SensorTag I set about trying to use the generated chip support crate to flash one of the on board LEDs. I didn't succeed in getting this working by the time the retreat came to an end, but after I arrived home I was able to find the source of the hard faults I was encountering and get the LED blinking! The key was that I needed to power up the peripheral power domain and enable the GPIO clocks to be able to enable an output GPIO.

It works!

Below is the code that flashes the LED. It should be noted this code is operating with very little abstraction and is using register and field names that match the data sheet. Future work to implement the embedded-hal traits for this controller would make it less verbose and less cryptic.


#[allow(unused_extern_crates)] // NOTE(allow) bug rust-lang/rust#53964
extern crate panic_halt; // panic handler

// SensorTag is using RGZ package. VQFN (RGZ) | 48 pins, 7×7 QFN

use cc2650_hal as hal;
use cc2650f128;
use cortex_m_rt::entry;

use hal::{ddi, delay::Delay, prelude::*};

pub fn init() -> (Delay, cc2650f128::Peripherals) {
    let core_peripherals = cortex_m::Peripherals::take().unwrap();
    let device_peripherals = cc2650f128::Peripherals::take().unwrap();

    let clocks = ddi::CFGR {
        sysclk: Some(24_000_000),

    let delay = Delay::new(core_peripherals.SYST, clocks);

    // LEDs are connected to DIO10 and DIO15
    // Configure GPIO pins for output, maximum strength
        .modify(|_r, w| w.port_id().gpio().ie().clear_bit().iostr().max());
        .modify(|_r, w| w.port_id().gpio().ie().clear_bit().iostr().max());

    // Enable the PERIPH power domain and wait for it to be powered up
    device_peripherals.PRCM.pdctl0.modify(|_r, w| w.periph_on().set_bit());
    loop {
        if {

    // Enable the GPIO clock
    device_peripherals.PRCM.gpioclkgr.write(|w| w.clk_en().set_bit());

    // Load settings into CLKCTRL and wait for LOAD_DONE
    device_peripherals.PRCM.clkloadctl.modify(|_r, w| w.load().set_bit());
    loop {
        if {

    // Enable outputs
        .modify(|_r, w| w.dio10().set_bit().dio15().set_bit());

    (delay, device_peripherals)

fn entry() -> ! {
    let (mut delay, periphs) = init();
    let half_period = 500_u16;

    loop {
        // Turn LED on and wait half a second
        periphs.GPIO.dout11_8.modify(|_r, w| w.dio10().set_bit());

        // Turn LED off and wait half a second
        periphs.GPIO.dout11_8.modify(|_r, w| w.dio10().clear_bit());

The rest of the code is up on Sourcehut. It's all in a pretty rough state at the moment. I plan to tidy it up over the coming weeks and eventually publish the crates. If you're curious to see it now though, the repos are:

  • cc2650f128 Documentation -- chip support crate generated by dslite2svd and svd2rust.
  • cc26x0-hal (see wip branch, currently very rough).
  • sensortag -- LED flashing code. I hope to turn this into a board support crate eventually.

Overall the coding retreat was a great success and we hope to do another one next year.

Previous Post: Rebuilding My Personal Infrastructure With Alpine Linux and Docker
Next Post: My First 3 Weeks of Professional Rust

March 22, 2019

Ponylang (SeanTAllen)

0.28.0 Released March 22, 2019 08:06 PM

Pony 0.28.0 is a high-priority release. We advise updating as soon as possible.

In addition to a high-priority bug fix, there are “breaking changes” if you build Pony from source. We’ve also dropped support for some Debian and Ubuntu versions. Read on for further details.

March 21, 2019

Derek Jones (derek-jones)

Describing software engineering in terms of a traditional science March 21, 2019 04:33 PM

If you were asked to describe the ‘building stuff’ side of software engineering, by comparing it with one of the traditional sciences, which science would you choose?

I think a lot of people would want to compare it with Physics. Yes, physics envy is not restricted to the softer sciences of humanities and liberal arts. Unlike physics, software engineering is not governed by a handful of simple ‘laws’, it’s a messy collection of stuff.

I used to think that biology had all the necessary important characteristics needed to explain software engineering: evolution (of code and products), species (e.g., of editors), lifespan, and creatures are built from a small set of components (i.e., DNA or language constructs).

Now I’m beginning to think that chemistry has aspects that are a better fit for some important characteristics of software engineering. Chemists can combine atoms of their choosing to create whatever molecule takes their fancy (subject to bonding constraints, a kind of syntax and semantics for chemistry), and the continuing existence of a molecule does not depend on anything outside of itself; biological creatures need to be able to extract some form of nutrient from the environment in which they live (which is also a requirement of commercial software products, but not non-commercial ones). Individuals can create molecules, but creating new creatures (apart from human babies) is still a ways off.

In chemistry and software engineering, it’s all about emergent behaviors (in biology, behavior is just too complicated to reliably say much about). In theory the properties of a molecule can be calculated from the known behavior of its constituent components (e.g., the electrons, protons and neutrons), but the equations are so complicated it’s impractical to do so (apart from the most simple of molecules; new properties of water, two atoms of hydrogen and one of oxygen, are still being discovered); the properties of programs could be deduced from the behavior its statements, but in practice it’s impractical.

What about the creative aspects of software engineering you ask? Again, chemistry is a much better fit than biology.

What about the craft aspect of software engineering? Again chemistry, or rather, alchemy.

Is there any characteristic that physics shares with software engineering? One that stands out is the ego of some of those involved. Describing, or creating, the universe nourishes large egos.

Stig Brautaset (stig)

Bose QuietComfort 35 Review March 21, 2019 02:39 PM

I review the noise-cancelling headphones I've been using for about 3 years.

March 19, 2019

Simon Zelazny (pzel)

How to grab all hosts but the first, in Ansible March 19, 2019 11:00 PM

Today I was trying to figure out how to run a particular ansible play on one host out of a group, and another play on all the other hosts.

The answer was found in a mailing group posting from 2014, but in case that service goes down, here's my note-to-self on how to do it.

Let's say you have a group of hosts called stateful_cluster_hosts in your inventory. You'd like to upload files/ to the first host, and files/ to all the others.

The play for the leader host would look like this:

- hosts: stateful_cluster_hosts[0]
  - name: "Upload leader start script"
      src: files/
      mode: "u=rwx,g=rx,o=rx"

The play for the follower hosts would look like this:

- hosts:  stateful_cluster_hosts:!stateful_cluster_hosts[0]
  - name: "Upload follower start script"
      src: files/
      mode: "u=rwx,g=rx,o=rx"

Where the sytax list:!list[idx] means take list, but filter out list[idx].

Richard Kallos (rkallos)

Inveniam viam - Building Bridges March 19, 2019 03:00 AM

If you know where you are and where you want to be, you can start to plot a course between the two points. Following on the ideas presented in the previous two posts, I describe a practice I learned from reading Robert Fritz’s book Your Life as Art.

I spent a few years of my life voraciously consuming self-help books. I believe the journey started with Eckhart Tolle’s The Power of Now, and it more-or-less ended with Meditations, Stoicism and the Art of Happiness, and Your Life as Art. I’ll probably wind up writing about my path to Stoicism some other time. This post is about what I learned from Robert Fritz.

Your Life as Art is filled with insight about navigating the complex murky space of life while juggling the often competing aspects of spontaneity and rigid structure. My collection of notes about Your Life as Art is nearly a thousand lines long, and there’s definitely far too much good stuff to fit into a single blog post. At this point, I’ll focus on what Robert Fritz calls structural tension, and his technique for plotting a course with the help of a chart.

Hopefully I convinced you with the previous two posts about the importance of objectively “seeing” where you are and where you want to be. These two activites form the foundation of what Fritz calls structural tension, a force that stems from the contrast between reality and an ideal state and seeks to relieve the tension by moving you from your present state to your ideal state.

Writing is a handy exercise generating this force. A structural tension chart (or ST chart) has your desired ideal state at the top of a page, your current state at the bottom, and a series of steps bridging the gap between the two. First you write the ideal section, then the real section, and finally add the steps in the middle. It’s very important to be as objective and detailed as possible about your ideal and current states. Here’s an example:

--- Ideal ---
I meditate every day for at least 15 minutes. My mind is calm
and focused as I go about my daily activities. I feel comfortable
sitting for the duration of my practice, no matter how long it is.

- Try keeping a meditation journal
- Experiment with active forms of meditation
- Experiment with seating positions
- Give myself time in the morning to meditate
- Wake up at the same time every day

--- Real ----
I meditate approximately once per week. I have difficulty
finding a regular time during the day to devote to meditation,
making it difficult to create a habit that sticks. I find that
I become uncomfortable sitting with my legs crossed for more
than 5 minutes. I do not often remember how good I feel after
meditating, which results in difficulty deciding to sit.

If you’re interested in reading more, I highly recommend Your Life as Art. Robert Fritz’s books are filled with great ideas. While this is basically a slightly more detailed to-do list, I find the process to be very grounding.

In conclusion, once you know where you are and where you want to be, try writing a structural tension chart in order to set a course.

March 18, 2019

Gergely Nagy (algernon)

Solarium March 18, 2019 11:45 AM

I wanted to build a keyboard for a long time, to prepare myself for building two for our Twins when they're old enough, but always struggled with figuring out what I want to build. I mean, I have the perfect keyboard for pretty much all occasions: my daily driver is the Keyboardio Model01, which I use for everything but the few cases highlighted next. For Steno, I use a Splitography. When I need to be extra quiet, I use an Atreus with Silent Reds. For gaming, I have a Shortcut prototype, and use the Atreus too, depending on the game. I don't travel much nowadays, so I have no immediate need for a portable board, but the Atreus would fit that purpose too.

As it turns out there is one scenario I do not have covered: if I have to type on my phone, I do not have a bluetooth keyboard to do it with, and have to rely on the virtual keyboard. This is far from ideal. Why do I need to type on the phone? Because sometimes I'm in a call at night, and need to be quiet, so I go to another room - but I only have a phone with me there. I could use a laptop, but since I need the phone anyway, carrying a phone and a laptop feels wrong, when I could carry a phone and a keyboard instead.

So I'm going to build myself a bluetooth keyboard. But before I do that, I'll build something simpler. Simpler, but still different enough from my current keyboards that I can justify the effort going into the build process. It will not be wireless at first, because during my research, I found that complicates matters too much, at least for a first build.

A while ago, I had another attempt at coming up with a keyboard, which had bluetooth, was split, and had a few other twists. We spent a whole afternoon brainstorming on the name with the twins and my wife. I'll use that name for another project, but I needed another one for the current one: I started down the same path we used back then, and found a good one.

You see, this keyboard is going to feature a rotary encoder, with a big scrubber knob on top of it, as a kind of huge dial. The knob will be in the middle, surrounded by low-profile Kailh Choc keys.


balcony, dial, terrace, sundial, sunny spot

The low-profile keys with a mix of black and white keycaps does look like a terrace; the scrubber knob, a dial. So the name fits like a glove.

Now, I know very little about designing and building keyboards, so this first attempt will likely end up being a colossal failure. But one has to start somewhere, and this feels like a good start: simple enough to be possible, different enough to be interesting and worthwhile.

It will be powered by the same ATMega32U4 as many other keyboards, but unlike most, it will have Kailh Choc switches for a very low profile. It will also feature a rotary encoder, which I plan to use for various mouse-related tasks, such as scrolling. Or volume setting. Or brightness adjustment. Stuff like that.

This means I'll have to add rotary encoder support to Kaleidoscope, but that shouldn't be too big of an issue.

The layout


(Original KLE)

The idea is that the wheel will act as a mouse scroll wheel by default. Pressing the left Fn key, it will turn into volume control, pressing the right Fn key, it will turn into brightness control. I haven't found other uses for it yet, but I'm sure I will once I have the physical thing under my fingers. The wheel is to be operated by the opposite hand that holds either Fn, or any hand when no Fn is held. Usually that'll be the right hand, because Shift will be on the left thumb cluster, and I need that for horizontal scrolling.

While writing this, I got another idea for the wheel: I can make it switch windows or desktops. It can act as a more convenient Alt+Tab, too!


The most interesting component is likely the knob. I've been eyeing the Scrubber Knob from Adafruit. Need to find a suitable encoder, the one on Adafruit is out of stock. One of the main reasons I like this knob is that it's low profile.

For the rest, they're pretty usual stuff:

  • Kailh Choc switches. Not sure whether I want reds or browns. I usually enjoy tactile switches, but one of the goals of this keyboard is to be quiet, and reds might be a better fit there.
  • Kailh Choc caps: I'll get a mix of black and white caps, for that terrace / balcony feeling.
  • ATMega32U4

Apart from this, I'll need a PCB, and perhaps a switch- and/or bottom plate, I suppose. Like I said, I know next to nothing about building keyboards. I originally wanted to hand-wire it, but Jesse Vincent told me I really don't want to do that, and I'll trust him on that.

Future plans

In the future, I plan to make a Bluetooth keyboard, and a split one (perhaps both at the same time, as originally planned). I might experiment with adding LEDs to the current one too as a next iteration. I also want to build a board with hotswap switches, though I will likely end up with Kailh Box Royals I guess (still need my samples to arrive first, mind you). We'll see once I built the first one, I'm sure there will be lessons learned.

#DeleteFacebook March 18, 2019 08:00 AM

Your Account Is Scheduled for Permanent Deletion

On March 15, the anniversary of the 1848-49 Hungarian Revolution and war for independence, I deleted my facebook account. Or at least started the process. This is something I've been planning to do for a while, and the special day felt like the perfect opportunity to do so. It wasn't easy, not because I used facebook much - I did not, I haven't looked at my timeline in months, had a total of 9 posts over the years (most of them private). I didn't "like" stuff, and haven't interacted with the site in any meaningful way.

I did use Messenger, mostly to communicate with friends and family, convincing at least some of them to find alternative ways to contact me wasn't without issues. But by March 15, I got the most important people to use another communication platform (XMPP), and I was able to hit the delete switch.

I have long despised facebook, for a whole lot of reasons, but most recently, they started to expose my phone number, which I only gave them for 2FA purposes. They exposed it in a way that I couldn't hide it from friends, either. That's a problem because I don't want every person who I "friended" on there to know my phone number. It's a privilege to know it, and facebook abusing its knowledge of it was over the line. But this isn't the worst yet.

You see, facebook is so helpful that it lets people link their contacts with their facebook friends. A lot of other apps are after one's contact list, and now my phone number got into a bunch more of those. This usually isn't a big deal, people will not notice. But programs will. Programs that hunt for phone numbers to sell.

And this is exactly what happened: my phone number got sold. How do I know? I got a call from an insurance company. One I never had any prior contact with, nor did anyone in my family. I was asked if I have two minutes, and I frankly told them that yes, I do, and I'd like to use that two minutes to inquire where they got my phone number from, as per the GDPR, because as a data subject, I have the right to know what data has been collected about me, how such data was processed. I twisted the right a bit, and said I have the right to know how I got into their database - I'm not sure I have this right. In any case, poor caller wasn't prepared for this, took a bit more than two minutes to convince him that he's better off complying with my request, otherwise they'll have a formal GDPR data request and a complaint against him, personally filed within hours.

A few hours later, I got a call back: they got my phone number from facebook. I thanked them for the information, and asked them to delete all data they have about me, and never contact me again. Yes, there's a conflict between those two requests, we'll see how they handle it, let it be their problem figuring out how to resolve it. Anyway, there's only a few possibilities how they could've gotten my number through facebook:

  • If I friended them, they'd have access. They wouldn't have my consent to use it for this kind of stuff, but they'd have the number. This isn't the case. I'm pretty sure I can't friend corporations on facebook (yet?) to begin with.
  • Some of my friends had their contacts synced with facebook (I know of at least two who made this mistake, one of them by mistake, one too easily made), and had their contacts uploaded to the insurance company via their app, or some similarly shady process. This still doesn't mean I consented to being contacted.
  • Facebook sold my number to them. Likewise, this doesn't imply consent, either.

They weren't able to tell me more than that they got my number from facebook. I have a feeling that this is a lie anyway - they just don't know where they bought it from, and facebook probably sounded like a reasonable source. On the other hand, facebook selling one's personal data, despite the GDPR is something I'm more than willing to believe, considering their past actions. Even if facebook is not the one who sold the number, the fact that an insurance company deemed it acceptable to lie and blame them paints an even worse picture.

In either case, facebook is a sickness I wanted to remove from my life, and this whole deal was the final straw. I initiated the account deletion. They probably won't delete anything, just disable it, and continue selling what they already have about me. But at least I make it harder for them to obtain more info about me. I started to migrate my family to better services: we use an XMPP server I host, with end to end encryption, because noone should trust neither me, nor the VPS provider the server is running on.

It's a painful break up, because there are a bunch of people who I talked with on Messenger from time to time, who will not move away from facebook anytime soon. There are parts of my family (my brother & sister) who will not install another chat app just to chat with me - we'll fall back to phone calls, email and SMS. Nevertheless, this had to be done. I'm lucky that I could, because I wasn't using facebook for anything important to begin with. Many people can't escape its clutches.

I wish there will be a day when all of my family is off of it. With a bit of luck, we can raise our kids without facebook in their lives.

Jan van den Berg (j11g)

The Effective Executive – Peter Drucker March 18, 2019 06:51 AM

Pick up any good management book and chances are that Peter Drucker will be mentioned. He is the godfather of management theory. I encountered Drucker many times before in other books and quotes, but I had never read anything directly by him. I have now, and I can only wish I had done so sooner.

The Effective Executive – Peter Drucker (1967) – 210 pages

The sublime classic The Effective Executive from 1967 was a good place to start. After only finishing the first chapter at the kitchen table, I already told my wife: this is one of the best management books I have ever read.

Drucker is an absolute authority who unambiguously will tell you exactly what’s important and what’s not. His voice and style cuts like a knife and his directness will hit you like a ton of bricks. He explains and summarizes like no one else, without becoming repetitive. Every other sentence could be a quote. And after reading, every other management book makes a bit more sense, because now I can tell where they stem from.

Drucker demonstrates visionary insight, by correctly predicting the rise of knowledge workers and their specific needs (and the role of computers). In a rapidly changing society all knowledge workers are executives. And he/she needs to be effective. But, mind you, executive effectiveness “can be learned, but can’t be taught.”

Executive effectiveness

Even though executive effectiveness is an individual aspiration, Drucker is crystal clear on the bigger picture:

Only executive effectiveness can enable this society to harmonize its two needs: the needs of organization to obtain from the individual the contribution it needs, and the need of the individual to have organization serve as his tool for the accomplishment of his purposes. Effectiveness must be learned…..Executive effectiveness is our one best hope to make modern society productive economically and viable socially.

So this book makes sense on different levels and is timeless. Even if some references, in hindsight, are dated (especially the McNamara references, knowing what we now know about the Vietnam war). I think Drucker himself did not anticipate the influence of his writing, as the next quotes demonstrates. But this is also precisely what I admire about it.

There is little danger that anyone will compare this essay on training oneself to be an effective executive with, say, Kierkegaard’s great self-development tract, Training in Christianity. There are surely higher goals for a man’s life than to become an effective executive. But only because the goal is so modest can we hope at all to achieve it; that is, to have the large number of effective executives modern society and its organizations need.

The post The Effective Executive – Peter Drucker appeared first on Jan van den Berg.

Pete Corey (petecorey)

A Better Mandelbrot Iterator in J March 18, 2019 12:00 AM

Nearly a year ago I wrote about using the J programming language to write a Mandelbrot fractal renderer. I proudly exclaimed that J could be used to “write out expressions like we’d write English sentences,” and immediately proceeded to write a nonsensical, overcomplicated solution.

My final solution bit off more than it needed to chew. The next verb we wrote both calculated the next value of iterating on the Mandelbrot formula and also managed appending that value to a list of previously calculated values.

I nonchalantly explained:

This expression is saying that next “is” (=:) the “first element of the array” ({.) “plus” (+) the “square of the last element of the array” (*:@:{:). That last verb combines the “square” (*:) and “last” ({:) verbs together with the “at” (@:) adverb.

Flows off the tongue, right?

My time spent using J to solve last year’s Advent of Code challenges has shown me that a much simpler solution exists, and it can flow out of you in a fluent way if you just stop fighting the language and relax a little.

Let’s refresh ourselves on Mandelbrot fractals before we dive in. The heart of the Mandelbrot fractal is this iterative equation:

The Mandelbrot set equation.

In English, the next value of z is some constant, c, plus the square of our previous value of z. To render a picture of the Mandelbrot fractal, we map some section of the complex plane onto the screen, so that every pixel maps to some value of c. We iterate on this equation until we decide that the values being calculated either remain small, or diverge to infinity. Every value of c that doesn’t diverge is part of the Mandelbrot set.

But let’s back up. We just said that “the next value of z is some constant, c, plus the square of our previous value of z”.

We can write that in J:


And we can plug in example values for c (0.2j0.2) and z (0):

   0.2j0.2 (+*:) 0

Our next value of z is c (0.2j0.2) plus (+) the square (*:) of our previous value of z (0). Easy!

My previous solution built up an array of our iterated values of z by manually pulling c and previously iterated values off of the array and pushing new values onto the end. Is there a better way?

Absolutely. If I had read the documentation on the “power” verb (^:), I would have noticed that “boxing” (<) the number of times we want to apply our verb will return an array filled with the results of every intermediate application.

Put simply, we can repeatedly apply our iterator like so:

   0.2j0.2 (+*:)^:(<5) 0
0 0.2j0.2 0.2j0.28 0.1616j0.312 0.128771j0.300838

Lastly, it’s conceivable that we might want to switch the order of our inputs. Currently, our value for c is on the left and our initial value of z is on the right. If we’re applying this verb to an array of c values, we’d probably want c to be the right-hand argument and our initial z value to be a bound left-hand argument.

That’s a simple fix thanks to the “passive” verb (~):

   0 (+*:)^:(<5)~ 0.2j0.2
0 0.2j0.2 0.2j0.28 0.1616j0.312 0.128771j0.300838

We can even plot our iterations to make sure that everything looks as we’d expect.

Our plotted iteration for a C value of 0.2 + 0.2i.

I’m not going to lie and claim that J is an elegantly ergonomic language. In truth, it’s a weird one. But as I use J more and more, I’m finding that it has a certain charm. I’ll often be implementing some tedious solution for a problem in Javascript or Elixir and find myself fantasizing about how easily I could write an equivalent solution in J.

That said, I definitely haven’t found a shortcut for learning the language. Tricks like “reading and writing J like English” only really work at a hand-wavingly superficial level. I’ve found that learning J really just takes time, and as I spend more time with the language, I can feel myself “settling into it” and its unique ways of looking at computation.

If you’re interested in learning J, check out my previous articles on the subject and be sure to visit the JSoftware home page for books, guides, and documentation.

Richard Kallos (rkallos)

Esse quam videri - Seeing what's in front of you March 18, 2019 12:00 AM

As humans, we are easily fooled. Our five senses are the primary way we get an idea of what’s happening around us, and it’s been shown time and time again that our senses are unreliable. In this post, I try to explain the importance of ‘seeing’ what’s in front of you, and how to practice it.

See this video a classic example of our incredible ability at missing important details.

Rembrandt used to train his students by making them copy his self-portraits. This exercise forced them to see their subject as objectively as possible, which was essential to make an accurate reproduction. Only after mastering their portraiturne skills did Rembrandt’s students go on to develop their own artistic styles.

It is important to periodically evaluate your position and course in life. It’s something you do whether you’re aware of it or not. When you plan something, you’re setting a course. When you’re reflecting on past events, you’re estimating your position. For the sake of overloading words like sight, vision, and planning, let’s refer to this act as life portraiture.

Life portraiture can be compared to navigating on land, air, or sea, except the many facets of our lives results in a space of many more dimensions. We can consider our position and course on axes like physical health, emotional health, career, finance, and social life. If we want finer detail, we can split any of those axes into more dimensions.

Objective life portraiture is not easy. We are all vulnerable to cognitive biases. Following the above analogy with navigation, our inaccuracy at objectively evaluating our lives is akin to inaccurately navigating a ship or airplane. If you’re not well-practiced at seeing, your only tool for navigation might be dead reckoning. If you practice drawing self-portraits of your life, you might suddenly find yourself in possession of a sextant and an almanac, so you can navigate using the stars. The ideal in this case would be to have something like GPS, which might look like Quantified Self with an incredible amount of detail.

It’s worth mentioning that our ability to navigate varies across different dimensions. This is an idea that doesn’t really carry over to navigating Earth, but it’s important to recognize. For example, if you’re thorough with your personal finances, you could have tools akin to GPS for navigating that part of your life. At the same time, if you don’t check in with your emotions, or do anything to improve your emotional health, you might be lost in those spaces.

There are ways to improve our navigating abilities depending on the spaces we’re looking at. To improve navigating your personal finances, you can regularly consult your banking statements, make budgets, and explore different methods of investing. To improve navigating your physical health, you can perform one of many different fitness tests, or consult a personal trainer. To improve navigating your emotional health, you could try journaling, or maybe begin seeing a therapist. Any and all of these could help you locate yourself in the vast space where your life could be.

In order to get where you want to go, you need to know where you are, and what direction you’re moving in.

March 17, 2019

Ponylang (SeanTAllen)

Last Week in Pony - March 17, 2019 March 17, 2019 02:35 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Caius Durling (caius)

Download All Your Gists March 17, 2019 02:32 PM

Over time I've managed to build up quite the collection of Gists over at Github, including secret ones there's about 1200 currently. Some of these have useful code in, some are just garbage output. I'd quite like a local copy either way, so I can easily search1 across them.

  1. Install the gist command from Github

    brew install gist
  2. Login to your Github Account through the gist tool (it'll prompt for your login credentials, then generate you an API Token to allow it future access.)

    gist --login
  3. Create a folder, go inside it and download all your gists!

    mkdir gist_archive
    cd gist_archive
    for repo in $(gist -l | awk '{ print $1 }'); do git clone $repo 2> /dev/null; done
  4. Now you have a snapshot of all your gists. To update them in future, you can run the above for any new gists, and update all the existing ones with:

    cd gist_archive
    for i in */; do (cd $i && git pull --rebase); done

Now go forth and search out your favourite snippet you saved years ago and forgot about!

  1. ack, ag, grep, ripgrep, etc. Pick your flavour. [return]

Marc Brooker (mjb)

Control Planes vs Data Planes March 17, 2019 12:00 AM

Control Planes vs Data Planes

Are there multiple things here?

If you want to build a successful distributed system, one of the most important things to get right is the block diagram: what are the components, what does each of them own, and how do they communicate to other components. It's such a basic design step that many of us don't think about how important it is, and how difficult and expensive it can be to make changes to the overall architecture once the system is in production. Getting the block diagram right helps with the design of database schemas and APIs, helps reason through the availability and cost of running the system, and even helps form the right org chart to build the design.

One very common pattern when doing these design exercises is to separate components into a control plane and a data plane, recognizing the differences in requirements between these two roles.

No true monoliths

The microservices and SOA design approaches tend to push towards more blocks, with each block performing a smaller number of functions. The monolith approach is the other end of the spectrum, where the diagram consists of a single block. Arguments about these two approaches can be endless, but ultimately not important. It's worth noting, though, that there are almost no true monoliths. Some kinds of concerns are almost always separated out. Here's a partial list:

  1. Storage. Most modern applications separate business logic from storage and caching, and talk through APIs to their storage.
  2. Load Balancing. Distributed applications need some way for clients to distribute their load across multiple instances.
  3. Failure tolerance. Highly available systems need to be able to handle the failure of hardware and software without affecting users.
  4. Scaling. Systems which need to handle variable load may add and remove resources over time.
  5. Deployments. Any system needs to change over time.

Even in the most monolithic application, these are separate components of the system, and need to be built into the design. What's notable here is that these concerns can be broken into two clean categories: data plane and control plane. Along with the monolithic application itself, storage and load balancing are data plane concerns: they are required to be up for any request to succeed, and scale O(N) with the number of requests the system handles. On the other hand, failure tolerance, scaling and deployments are control plane concerns: they scale differently (either with a small multiple of N, with the rate of change of N, or with the rate of change of the software) and can break for some period of time before customers notice.

Two roles: control plane and data plane

Every distributed system has components that fall roughly into these two roles: data plane components that sit on the request path, and control plane components which help that data plane do its work. Sometimes, the control plane components aren't components at all, and rather people and processes, but the pattern is the same. With this pattern worked out, the block diagram of the system starts to look something like this:

Data plane and control plane separated into two blocks

My colleague Colm MacCárthaigh likes to think of control planes from a control theory approach, separating the system (the data plane) from the controller (the control plane). That's a very informative approach, and you can hear him talk about it here:

I tend to take a different approach, looking at the scaling and operational properties of systems. As in the example above, data plane components are the ones that scale with every request1, and need to be up for every request. Control plane components don't need to be up for every request, and instead only need to be up when there is work to do. Similarly, they scale in different ways. Some control plane components, such as those that monitor fleets of hosts, scale with O(N/M), which N is the number of requests and M is the requests per host. Other control plane components, such as those that handle scaling the fleet up and down, scale with O(dN/dt). Finally, control plane components that perform work like deployments scale with code change velocity.

Finding the right separation between control and data planes is, in my experience, one of the most important things in a distributed systems design.

Another view: compartmentalizing complexity

In their classic paper on Chain Replication, van Renesse and Schneider write about how chain replicated systems handle server failure:

In response to detecting the failure of a server that is part of a chain (and, by the fail-stop assumption, all such failures are detected), the chain is reconfigured to eliminate the failed server. For this purpose, we employ a service, called the master

Fair enough. Chain replication can't handle these kinds of failures without adding significant complexity to the protocol. So what do we expect of the master?

In what follows, we assume the master is a single process that never fails.

Oh. Never fails, huh? They then go on to say that they approach this by replicating the master on multiple hosts using Paxos. If they have a Paxos implementation available, then why do they just not use that and not bother with this Chain Replication thing at all? The paper doesn't say2, but I have my own opinion: it's interesting to separate them because Chain Replication offers a different set of performance, throughput, and code complexity trade offs than Paxos3. It is possible to build a single code base (and protocol) which handles both concerns, but at the cost of coupling these two different concerns. Instead, by making the master a separate component, the chain replicated data plane implementation can focus on the things it needs to do (scale, performance, optimizing for every byte). The control plane, which only needs to handle the occasional failure, can focus on what it needs to do (extreme availability, locality, etc). Each of these different requirements adds complexity, and separating them out allows a system to compartmentalize its complexity, and reduce coupling by offering clear APIs and contract between components.

Breaking down the binary

Say you build awesome data plane based on chain replication, and an awesome control plane (master) for that data plane. At first, because of its lower scale, you can operate the control plane manually. Over time, as your system becomes successful, you'll start to have too many instances of the control plane to manage by hand, so you build a control plane for that control plane to automate the management. This is the first way the control/data binary breaks down: at some point control planes need their own control planes. Your controller is somebody else's system under control.

One other way the binary breaks down is with specialization. The master in the chain replicated system handles fault tolerance, but may not handle scaling, or sharding of chains, or interacting with customers to provision chains. In real systems there are frequently multiple control planes which control different aspects of the behavior of a system. Each of these control planes have their own differing requirements, requiring different tools and different expertise. Control planes are not homogeneous.

These two problems highlight that the idea of control planes and data planes may be too reductive to be a core design principle. Instead, it's a useful tool for helping identify opportunities to reduce and compartmentalize complexity by introducing good APIs and contracts, to ensure components have a clear set of responsibilities and ownership, and to use the right tools for solving different kinds of problems. Separating the control and data planes should be a heuristic tool for good system design, not a goal of system design.


  1. Or potentially with every request. Things like caches complicate this a bit.
  2. It does compare Chain Replication to other solutions, but doesn't specifically talk about the benefits of seperation. Murat Demirbas pointed out that Chain Replication's ability to serve linearizable reads from the tail is important. He also pointed me at the Object Storage on CRAQ paper, which talks about how to serve reads from intermediate nodes. Thanks, Murat!
  3. For one definition of Paxos. Lamport's Vertical Paxos paper sees chain replication as a flavor of Paxos, and more recent work by Heidi Howard et al on Flexible Paxos makes the line even less clear.

March 16, 2019

Richard Kallos (rkallos)

Memento Mori - Seeing the End March 16, 2019 09:30 PM

Memento Mori (translated as “remember death”) is a powerful idea and practice. In this post, I make the case that it’s important to think not just about your death, but to clearly define what it means to be finished in whatever you set out to do.

We are mites on a marble floating in the endless void. Our lifespans are blinks in cosmic history. Furthermore, for many of us, our contributions are likely to be forgotten soon after we rejoin the earth, if not sooner.

This is great news.

Whenever I’m feeling nervous or embarrassed, I start to feel better when I realize that nobody in front of me is going to be alive 100 years from now, and I doubt that they’ll be telling their grandchildren about that time when Richard made a fool of himself, because I try to make a fool of myself often enough that it’s usually not worth telling people about.

Knowing that we are finite is also pretty motivating. I feel less resistance to starting new things. It doesn’t have to be perfect, in fact, it’s probably going to be average. However, it’s my journey, so it’s special to me, and I probably (hopefully?) learned and improved on the way.

It’s important to think about the ends of things, even when we don’t necessarily want things to end. Endings are as much a part of life as beginnings are, to think otherwise is delusion. Endings tend to have a reputation for being sad, but they don’t always have to be.

For example, some developers of open source software get stuck working on their projects for far longer than they expected. It’s unfortunate that creating something that people enjoy can turn into a source of grief and resentment.

Specifying an end of any endeavor is an important task. If no ‘end state’ is declared, it’s possible that a project will continue to take up time and effort, perpetually staying on the back-burner of things you have going on, draining you of resources until you are no longer able to start anything new.

Spending time thinking about what your finished project will look like sets a target for you to achieve, which is a point I’ll elaborate on very soon. This exercise, along with evaluating where you are currently on your path toward achieving your goal/finishing your project, are immensely useful for getting your brain to focus on the intermediate tasks that need to be finished in order to get to that idealized ‘end state’.

All in all, while it’s sometimes nice to simply wander, it’s important to acknowledge that you are always going somewhere, even when you think you’re standing still. You should be the one who decides where you go, not someone else.

March 14, 2019

Derek Jones (derek-jones)

Altruistic innovation and the study of software economics March 14, 2019 02:11 PM

Recently, I have been reading rather a lot of papers that are ostensibly about the economics of markets where applications, licensed under an open source license, are readily available. I say ostensibly, because the authors have some very odd ideas about the activities of those involved in the production of open source.

Perhaps I am overly cynical, but I don’t think altruism is the primary motivation for developers writing open source. Yes, there is an altruistic component, but I would list enjoyment as the primary driver; developers enjoy solving problems that involve the production of software. On the commercial side, companies are involved with open source because of naked self-interest, e.g., commoditizing software that complements their products.

It may surprise you to learn that academic papers, written by economists, tend to be knee-deep in differential equations. As a physics/electronics undergraduate I got to spend lots of time studying various differential equations (each relating to some aspect of the workings of the Universe). Since graduating, I have rarely encountered them; that is, until I started reading economics papers (or at least trying to).

Using differential equations to model problems in economics sounds like a good idea, after all they have been used to do a really good job of modeling how the universe works. But the universe is governed by a few simple principles (or at least the bit we have access to is), and there is lots of experimental data about its behavior. Economic issues don’t appear to be governed by a few simple principles, and there is relatively little experimental data available.

Writing down a differential equation is easy, figuring out an analytic solution can be extremely difficult; the Navier-Stokes equations were written down 200-years ago, and we are still awaiting a general solution (solutions for a variety of special cases are known).

To keep their differential equations solvable, economists make lots of simplifying assumptions. Having obtained a solution to their equations, there is little or no evidence to compare it against. I cannot speak for economics in general, but those working on the economics of software are completely disconnected from reality.

What factors, other than altruism, do academic economists think are of major importance in open source? No, not constantly reinventing the wheel-barrow, but constantly innovating. Of course, everybody likes to think they are doing something new, but in practice it has probably been done before. Innovation is part of the business zeitgeist and academic economists are claiming to see it everywhere (and it does exist in their differential equations).

The economics of Linux vs. Microsoft Windows is a common comparison, i.e., open vs. close source; I have not seen any mention of other open source operating systems. How might an economic analysis of different open source operating systems be framed? How about: “An economic analysis of the relative enjoyment derived from writing an operating system, Linux vs BSD”? Or the joy of writing an editor, which must be lots of fun, given how many have text editors are available.

I have added the topics, altruism and innovation to my list of indicators of poor quality, used to judge whether its worth spending more than 10 seconds reading a paper.

March 13, 2019

Oleg Kovalov (olegkovalov)

Indeed, I should add it. Haven’t used it for a long time. March 13, 2019 05:18 PM

Indeed, I should add it. Haven’t used it for a long time.

Wesley Moore (wezm)

My Rust Powered e-Paper Badge March 13, 2019 09:39 AM

This week I attended (for the first time) in Christchurch, New Zealand. It's a week long conference covering Linux, open source software and hardware, privacy, security and much more. The theme this year was IoT. In line with the theme I built a digital conference badge to take to the conference. It used a tri-colour e-Paper display and was powered by a Rust program I built running on Raspbian Linux. This post describes how it was built, how it works, and how it fared at the conference. The source code is on GitHub.

The badge in its final state after the conference. The badge in its final state after the conference


After booking my tickets in October I decided I wanted to build a digital conference badge. I'm not entirely sure what prompted me to do this but it was a combination of seeing projects like the BADGEr in the past, the theme of 2019 being IoT, and an excuse to write more Rust. Since it was ostensibly a Linux conference it also seemed appropriate for it to run Linux.

Over the next few weeks I collected the parts and adaptors to build the badge. The main components were:

The Raspberry Pi Zero W is a single core 1Ghz ARM SoC with 512Mb RAM, Wi-FI, Bluetooth, microSD card slot, and mini HDMI. The Inky pHAT is a 212x104 pixel tri-colour (red, black, white) e-Paper display. It takes about 15 seconds to refresh the display but it draws very little power in between updates and the image persists even when power is removed.

Support Crates

The first part of the project involved building a Rust driver for the controller in the e-Paper display. That involved determining what controller the display used, as Pimoroni did not document it. Searching online for some of the comments in the Python driver suggested the display was possibly a HINK-E0213A07 from Holitech Co. Further searching based on the datasheet for that display suggested that the controller was a Solomon Systech SSD1675. Cross referencing the display datasheet, SSD1675 datasheet, and the Python source of Pimoroni's Inky pHAT driver suggested I was on the right track.

I set about building the Rust driver for the SSD1675 using the embedded HAL traits. These traits allow embedded Rust drivers to be built against a de facto standard set of traits that allow the driver to be used in any environment that implements the traits. For example I make use of traits for SPI devices, and GPIO pins, which are implemented for Linux, as well as say, the STM32F30x family of microcontrollers. This allows the driver to be written once and used on many devices.

The result was the ssd1675 crate. It's a so called no-std crate. That means it does not use the Rust standard library, instead sticking only to the core library. This allows the crate to be used on devices and microcontrollers without features like file systems, or heap allocators. The crate also makes use of the embedded-graphics crate, which makes it easy to draw text and basic shapes on the display in a memory efficient manner.

While testing the ssd1675 crate I also built another crate, profont, which provides 7 sizes of the ProFont font for embedded graphics. The profont crate was published 24 Nov 2018, and ssd1675 was published a month later on 26 Dec 2018.

The Badge Itself

Now that I had all the prerequisites in place I could start working on the badge proper. I had a few goals for the badge and its implementation:

  • I wanted it to have some interactive component.
  • I wanted there to be some sort of Internet aspect to tie in with the IoT theme of the conference.
  • I wanted the badge to be entirely powered by a single, efficient Rust binary, that did not shell out to other commands or anything like that.
  • Ideally it would be relatively power efficient.

An early revision of the badge from 6 Jan 2019 showing my name, website, badge IP, and kernel info. An early revision of the badge from 6 Jan 2019

I settled on having the badge program serve up a web page with some information about the project, myself, and some live stats of the Raspberry Pi (OS, kernel, uptime, free RAM). The plain text version of the page looked like this:

Hi I'm Wes!

Welcome to my conference badge. It's powered by Linux and
Rust running on a Raspberry Pi Zero W with a tri-colour Inky
pHAT ePaper dispay. The source code is on GitHub:

Say Hello

12 people have said hi.

Say hello in person and on the badge. To increment the hello
counter on the badge:

    curl -X POST

About Me

I'm a software developer from Melbourne, Australia. I
currently work at GreenSync building systems to help make
better use of renewable energy.

Find me on the Internet at:


Host Information

   (_\)(/_)   OS:        Raspbian GNU/Linux
   (_(__)_)   KERNEL:    Linux 4.14.79+
  (_(_)(_)_)  UPTIME:    3m
   (_(__)_)   MEMORY:    430.3 MB free of 454.5 MB

              |    Powered by Rust!    |
                  \) /  o o  \ (/
                    '_   -   _'
                    / '-----' \

The interactive part came in the form of a virtual "hello" counter. Each HTTP POST to the /hi endpoint incremented the count, which was shown on the badge. The badge displayed the URL of the page. The URL was just the badge's IP address on the conference Wi-Fi. To provide a little protection against abuse I added code that only allowed a given IP to increment the count once per hour.

When building the badge software these are some of the details and things I strived for:

  • Handle Wi-Fi going away
  • Handle IP address changing
  • Prevent duplicate submissions
  • Pluralisation of text on the badge and on the web page
  • Automatically shift the text as the count requires more digits
  • Serve plain text and HTML pages:
    • If the web page is requested with an Accept header that doesn't include text/html (E.g. curl) then the response is plain text and the method to, "say hello", is a curl command.
    • If the user agent indicates they accept HTML then the page is HTML and contains a form with a button to, "say hello".
  • Avoid aborting on errors:
    • I kind of ran out of time to handle all errors well, but most are handled gracefully and won't abort the program. In some cases a default is used in the face of an error. In other cases I just resorted to logging a message and carrying on.
  • Keep memory usage low:
    • The web server efficiently discards any large POST requests sent to it, to avoid exhausting RAM.
    • Typical RAM stats showed the Rust program using about 3Mb of RAM.
  • Be relatively power efficient:
    • Use Rust instead of a scripting language
    • Only update the display when something it's showing changes
    • Only check for changes every 15 seconds (the rest of the time that thread just sleeps)
    • Put the display into deep sleep after updating

I used hyper for the HTTP server built into the binary. To get a feel for the limits of the device I did some rudimentary HTTP benchmarking with wrk and concluded that 300 requests per second was was probably going to be fine. ;-)

Running 10s test @
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   316.58ms   54.41ms   1.28s    92.04%
    Req/Sec    79.43     43.24   212.00     67.74%
  3099 requests in 10.04s, 3.77MB read
Requests/sec:    308.61
Transfer/sec:    384.56KB


When I started the project I imagined it would hang around my neck like a conference lanyard. By the time departure day arrived I still hadn't worked out how this would work in practice (power delivery being a major concern). In the end I settled on attaching it to the strap on my backpack. My bag has lots of webbing so there were plenty of loops to hold it in place. I was also able to use the Velcro covered holes intended for water tubes to get the cable neatly into the bag.

At the Conference

I had everything pretty much working for the start of the conference. Although I did make some improvements and add a systemd unit to automatically start and restart the Rust binary. At this point there were still two unknowns: battery life and how the Raspberry Pi would handle coming in and out of Wi-Fi range. The Wi-Fi turned out fine: It automatically reconnected whenever it came into range of the Wi-Fi.

Badge displaying a count of zero. Ready for day 1


Day 1 was a success! I had several people talk to me about the badge and increment the counter. Battery life was good too. After 12 hours of uptime the battery was still showing it was half full. Later in the week I left the badge running overnight and hit 24 hours uptime. The battery level indicator was on the last light so I suspect there wasn't much juice left.

Me with badge display showing a hello count of 1. Me after receiving my first hello on the badge

On day 2 I had had several people suggest that I needed a QR code for the URL. Turns out entering an IP address on a phone keyboard is tedious. So that evening I added a QR code to the display. It's dynamically generated and contains the same URL that is shown on the display. There were several good crates to choose from. Ultimately I picked one that didn't have any image dependencies, which allowed me to convert the data into embedded-graphics pixels. The change was a success, most people scanned the QR code from this point on.

Badge display now including QR code. Badge display showing the newly added QR code

On day 2 I also ran into E. Dunham, and rambled briefly about my badge project and that it was built with Rust. To my absolute delight the project was featured in their talk the next day. The project was mentioned and linked on a slide and I was asked to raise my hand in case anyone wanted to chat afterwards.

Photo of E. Dunham's slide with a link to my git repo. Photo of E. Dunham's slide with a link to my git repo

At the end of the talk the audience was encouraged to tell the rest of the room about a Rust project they were working on. Each person that did so got a little plush Ferris. I spoke about Read Rust.

Photo of a small orange plush crab. Plush Ferris


By the end of the conference the badge showed a count of 12. It had worked flawlessly over the five days.

Small projects with a fairly hard deadline are a good way to ensure they're seen through to completion. They're also a great motivator to publish some open source code.

I think I greatly overestimated the number of people that would interact with the badge. Of those that did, I think most tapped the button to increase the counter and didn't read much else on the page. For example no one commented on the system stats at the bottom. I had imagined the badge as a sort of digital business card but this did not really eventuate in practice.

Attaching the Pi and display to my bag worked out pretty well. I did have to be careful when putting my bag on as it was easy to catch on my clothes. Also one day it started raining on the walk back to the accommodation. I had not factored that in at all and given it wasn't super easy to take on and off I ended up shielding it with my hand all the way back.

Would I Do It Again?

Maybe. If I were to do it again I might do something less interactive and perhaps more informational but updated more regularly. I might try to tie the project into a talk submission too. For example, I could have submitted a talk about using the embedded Rust ecosystem on a Raspberry Pi and made reference to the badge in the talk or used it for examples. I think this would give more info about the project to a bunch of people at once and also potentially teach them something at the same time.

All in all it was a fun project and excellent conference. If you're interested, the Rust source for the badge is on GitHub.

Next Post: Rebuilding My Personal Infrastructure With Alpine Linux and Docker

Rebuilding My Personal Infrastructure With Alpine Linux and Docker March 13, 2019 09:37 AM

For more than a decade I have run one or more servers to host a number of personal websites and web applications. Recently I decided it was time to rebuild the servers to address some issues and make improvements. The last time I did this was in 2016 when I switched the servers from Ubuntu to FreeBSD. The outgoing servers were managed with Ansible. After being a Docker skeptic for a long time I have finally come around to it recently and decided to rebuild on Docker. This post aims to describe some of the choices made, and why I made them.

Before we start I'd like to take a moment to acknowledge this infrastructure is built to my values in a way that works for me. You might make different choices and that's ok. I hope you find this post interesting but not prescriptive.

Before the rebuild this is what my infrastructure looked like:

You'll note 3 servers, across 2 countries, and 2 hosting providers. Also the Rust Melbourne server was not managed by Ansible like the other two were.

I had a number of goals in mind with the rebuild:

  • Move everything to Australia (where I live)
  • Consolidate onto one server
  • https enable all websites

I set up my original infrastructure in the US because it was cheaper at the time and most traffic to the websites I host comes from the US. The Wizards Mattermost instance was added later. It's for a group of friends that are all in Australia. Being in the US made it quite slow at times, especially when sharing and viewing images.

Another drawback to administering servers in the US from AU was that it makes the Ansible cycle time of "make a change, run it, fix it, repeat", excruciatingly slow. It had been on my to do list for a long time to move Wizards to Australia but I kept putting it off because I didn't want to deal with Ansible.

While having a single server that does everything wouldn't be the recommended architecture for business systems, for personal hosting where the small chance of downtime isn't going to result in loss of income the simplicity won out, at least for now.

This is what I ended up building. Each box is a Docker container running on the host machine:

Graph of services

I haven't always been in favour of Docker but I think enough time has passed to show that it's probably here to stay. There are some really nice benefits to Docker managed services too. Such as, building locally and then shipping the image to production, and isolation from the host system (in the sense you can just nuke the container and rebuild it if needed).

Picking a Host OS

Moving to Docker unfortunately ruled out FreeBSD as the host system. There is a very old Docker port for FreeBSD but my previous attempts at using it showed that it was not in a good enough state to use for hosting. That meant I needed to find a suitable Linux distro to act as the Docker host.

Coming from FreeBSD I'm a fan of the stable base + up-to-date packages model. For me this ruled out Debian (stable) based systems, which I find often have out-of-date or missing packages -- especially in the latter stages of the release cycle. I did some research to see if there were any distros that used a BSD style model. Most I found were either abandoned or one person operations.

I then recalled that as part of his Sourcehut work, Drew DeVault was migrating things to Alpine Linux. I had played with Alpine in the past (before it became famous in the Docker world), and I consider Drew's use some evidence in its favour.

Alpine describes itself as follows:

Alpine Linux is an independent, non-commercial, general purpose Linux distribution designed for power users who appreciate security, simplicity and resource efficiency.

Now that's a value statement I can get behind! Other things I like about Alpine Linux:

  • It's small, only including the bare essentials:
    • It avoids bloat by using musl-libc (which is MIT licensed) and busybox userland.
    • It has a 37Mb installation ISO intended for virtualised server installations.
  • It was likely to be (and ended up being) the base of my Docker images.
  • It enables a number of security features by default.
  • Releases are made every ~6 months and are supported for 2 years.

Each release also has binary packages available in a stable channel that receives bug fixes and security updates for the lifetime of the release as well as a rolling edge channel that's always up-to-date.

Note that Alpine Linux doesn't use systemd, it uses OpenRC. This didn't factor into my decision at all. systemd has worked well for me on my Arch Linux systems. It may not be perfect but it does do a lot of things well. Benno Rice did a great talk at 2019, titled, The Tragedy of systemd, that makes for interesting viewing on this topic.

Building Images

So with the host OS selected I set about building Docker images for each of the services I needed to run. There are a lot of pre-built Docker images for software like nginx, and PostgreSQL available on Docker Hub. Often they also have an alpine variant that builds the image from an Alpine base image. I decided early on that these weren't really for me:

  • A lot of them build the package from source instead of just installing the Alpine package.
  • The Docker build was more complicated than I needed as it was trying to be a generic image that anyone could pull and use.
  • I wasn't a huge fan of pulling random Docker images from the Internet, even if they were official images.

In the end I only need to trust one image from Docker Hub: The 5Mb Alpine image. All of my images are built on top of this one image.

Update 2 Mar 2019: I am no longer depending on any Docker Hub images. After the Alpine Linux 3.9.1 release I noticed the official Docker images had not been updated so I built my own. Turns out it's quite simple. Download the miniroot tarball from the Alpine website and then add it to a Docker image:

FROM scratch


ADD alpine-minirootfs-${ALPINE_VERSION}-${ALPINE_ARCH}.tar.gz /
CMD ["/bin/sh"]

An aspect of Docker that I don't really like is that inside the container you are root by default. When building my images I made a point of making the entrypoint processes run as a non-privileged user or configure the service drop down to a regular user after starting.

Most services were fairly easy to Dockerise. For example here is my nginx Dockerfile:

FROM alpine:3.9

RUN apk update && apk add --no-cache nginx

COPY nginx.conf /etc/nginx/nginx.conf

RUN mkdir -p /usr/share/www/ /run/nginx/ && \
  rm /etc/nginx/conf.d/default.conf



ENTRYPOINT ["/usr/sbin/nginx", "-g", "daemon off;"]

I did not strive to make the images especially generic. They just need to work for me. However I did make a point not to bake any credentials into the images and instead used environment variables for things like that.

Let's Encrypt

I've been avoiding Let's Encrypt up until now. Partly because the short expiry of the certificates seems easy to mishandle. Partly because of certbot, the recommended client. By default certbot is interactive, prompting for answers when you run it the first time, it wants to be installed alongside the webserver so it can manipulate the configuration, it's over 30,000 lines of Python (excluding tests, and dependencies), the documentation suggests running magical certbot-auto scripts to install it... Too big and too magical for my liking.

Despite my reservations I wanted to enable https on all my sites and I wanted to avoid paying for certificates. This meant I had to make Let's Encrypt work for me. I did some research and finally settled on It's written in POSIX shell and uses curl and openssl to do its bidding.

To avoid the need for to manipulate the webserver config I opted to use the DNS validation method (certbot can do this too). This requires a DNS provider that has an API so the client can dynamically manipulate the records. I looked through the large list of supported providers and settled on LuaDNS.

LuaDNS has a nice git based workflow where you define the DNS zones with small Lua scripts and the records are published when you push to the repo. They also have the requisite API for You can see my DNS repo at:

Getting the + hitch combo to play nice proved to be bit of a challenge. needs to periodically renew certificates from Let's Encrypt, these then need to be formatted for hitch and hitch told about them. In the end I built the hitch image off my image. This goes against the Docker ethos of one service per container but doesn't run a daemon, it's periodically invoked by cron so this seemed reasonable.

Docker and cron is also a challenge. I ended up solving that with a simple solution: use the host cron to docker exec in the hitch container. Perhaps not "pure" Docker but a lot simpler than some of the options I saw.


I've been a happy DigitalOcean customer for 5 years but they don't have a data centre in Australia. Vultr, which have a similar offering -- low cost, high performance servers and a well-designed admin interface -- do have a Sydney data centre. Other obvious options include AWS and GCP. I wanted to avoid these where possible as their server offerings are more expensive, and their platforms have a tendency to lock you in with platform specific features. Also in the case of Google, they are a massive surveillance capitalist that I don't trust at all. So Vultr were my host of choice for the new server.

Having said that, the thing with building your own images is that you need to make them available to the Docker host somehow. For this I used an Amazon Elastic Container Registry. It's much cheaper than Docker Hub for private images and is just a standard container registry so I'm not locked in.


Once all the services were Dockerised, there needed to be a way to run the containers, and make them aware of each other. A popular option for this is Kubernetes and for a larger, multi-server deployment it might be the right choice. For my single server operation I opted for Docker Compose, which is, "a tool for defining and running multi-container Docker applications". With Compose you specify all the services in a YAML file and it takes care of running them all together.

My Docker Compose file looks like this:

version: '3'
    command: ["--config", "/etc/hitch/hitch.conf", "-b", "[varnish]:6086"]
      - ./hitch/hitch.conf:/etc/hitch/hitch.conf:ro
      - ./private/hitch/dhparams.pem:/etc/hitch/dhparams.pem:ro
      - certs:/etc/hitch/cert.d:rw
      - acme:/etc/
      - "443:443"
      - private/hitch/development.env
      - varnish
    restart: unless-stopped
    command: ["-F", "-a", ":80", "-a", ":6086,PROXY", "-p", "feature=+http2", "-f", "/etc/varnish/default.vcl", "-s", "malloc,256M"]
      - ./varnish/default.vcl:/etc/varnish/default.vcl:ro
      - "80:80"
      - nginx
      - pkb
      - binary_trance
      - wizards
      - rust_melbourne
    restart: unless-stopped
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./volumes/www:/usr/share/www:ro
    restart: unless-stopped
      - pages:/home/pkb/pages:ro
      - private/pkb/development.env
      - syncthing
    restart: unless-stopped
      - private/binary_trance/development.env
      - db
    restart: unless-stopped
      - ./private/wizards/config:/mattermost/config:rw
      - ./volumes/wizards/data:/mattermost/data:rw
      - ./volumes/wizards/logs:/mattermost/logs:rw
      - ./volumes/wizards/plugins:/mattermost/plugins:rw
      - ./volumes/wizards/client-plugins:/mattermost/client/plugins:rw
      - /etc/localtime:/etc/localtime:ro
      - db
    restart: unless-stopped
      - ./private/rust_melbourne/config:/mattermost/config:rw
      - ./volumes/rust_melbourne/data:/mattermost/data:rw
      - ./volumes/rust_melbourne/logs:/mattermost/logs:rw
      - ./volumes/rust_melbourne/plugins:/mattermost/plugins:rw
      - ./volumes/rust_melbourne/client-plugins:/mattermost/client/plugins:rw
      - /etc/localtime:/etc/localtime:ro
      - db
    restart: unless-stopped
      - postgresql:/var/lib/postgresql/data
      - ""
      - private/postgresql/development.env
    restart: unless-stopped
      - syncthing:/var/lib/syncthing:rw
      - pages:/var/lib/syncthing/Sync:rw
      - ""
      - "22000:22000"
      - "21027:21027/udp"
    restart: unless-stopped

Bringing all the services up is one command:

docker-compose -f docker-compose.yml -f production.yml up -d

The best bit is I can develop and test it all in isolation locally. Then when it's working, push to ECR and then run docker-compose on the server to bring in the changes. This is a huge improvement over my previous Ansible workflow and should make adding or removing new services in the future fairly painless.

Closing Thoughts

The new server has been running issue free so far. All sites are now redirecting to their https variants with Strict-Transport-Security headers set and get an A grade on the SSL Labs test. The Wizards Mattermost is much faster now that it's in Australia too.

There is one drawback to this move though: my sites are now slower for a lot of visitors. https adds some initial negotiation overhead and if you're reading this from outside Australia there's probably a bunch more latency than before.

I did some testing with WebPageTest to get a feel for the impact of this. My sites are already quite compact. Firefox tells me this page and all resources is 171KB / 54KB transferred. So there's not a lot of slimming to be done there. One thing I did notice was the TLS negotiation was happening for each of the parallel connections the browser opened to load the site.

Some research suggested HTTP/2 might help as it multiplexes requests on a single connection and only performs the TLS negotiation once. So I decided to live on the edge a little and enable Varnish's experimental HTTP/2 support. Retrieving the site over HTTP/2 did in fact reduce the TLS negotiations to one.

Thanks for reading, I hope the bits didn't take too long to get from Australia to wherever you are. Happy computing!

Previous Post: My Rust Powered e-Paper Badge
Next Post: A Coding Retreat and Getting Embedded Rust Running on a SensorTag

Oleg Kovalov (olegkovalov)

What I don’t like in your repo March 13, 2019 06:35 AM

What I Don’t Like In Your Repo

Hi everyone, I’m Oleg and I’m yelling at (probably your) repo.

This is a copy of my dialogue with a friend about how to make a good and helpful repo for any community of any size and any programming language.

Let’s start.

README says nothing

But it’s a crucial part of any repo!

It’s the first interaction with your potential user and this is a first impression that you might bring to the user.

After the name (and maybe a logo) it’s a good place to put a few badges like:

  • recent version
  • CI status
  • link to the docs
  • code quality
  • code coverage
  • even the number of users in a chat
  • or just scroll all of them on

Personal fail. Not so long time ago I did a simple, hacky and a bit funny project in Go which is called sabotage. I put a quote from a song, have added a picture, but… haven’t provided any info what it does.

This takes like 10 minutes to make a simpler intro and explain what I’m sharing and what it can do.

There is no reason why you or I should skip it.

Custom license or no license at all

First and most important: DO. NOT. (RE)INVENT. LICENSE. PLEASE.

When you’re going to create a new shiny new license or make any existent much better, please, ask yourself 17 times: what is the point to do so?

Companies of any size are very conservative in licenses, ’cause it might destroy their business. So if you’re targeting big audience — it’s a dumb way to do so.

There are a lot of guides on how to select the license and living it unlicensed or using an unpopular or funny (like WTFPL) will just be a bad sign for a user.

Feel free to choose one of the most popular:

  • MIT — when you want to give it for free
  • BSD3 — when you want a bit more rights for you
  • Apache 2.0 — when it’s a commercial product
  • GPLv3 — which is also a good option

(that’s might be an opinionated list, but whatever)

No Dockerfile

It’s already 2019 and the containers have won this world.

It’s much simpler for anyone to make a docker pull foo/bar command rather than download all dependencies, configure paths, realise that some things might be incompatible or even be scared to totally destroy their system.

Is there a guarantee that there is no rm -rf in an unverified project? 😈

Adding a simple Dockerfile with everything needed can be done is 30 mins. But this will give your users a safe and fast way to start using, validating or helping to improve your work. A win-win situation.

Changes without pull requests

That might look weird, but give me a second.

When a project is small and there are 0 or few users — that might be okay. It’s easy to follow what happened last days: fixes, new features, etc. But when the scale gets bigger, oh… it becomes a nightmare.

You have pushed few commits into the master, so probably you did it on your computer and no one saw what happened, there wasn’t any feedback. You may break API backward compatibility, forgot to add or remove something, even make useless work (oh, nasty one).

When you’re doing a pull request, some random guru-senior-architect might occasionally check your code and suggest few changes. Sounds unlikely but any additional eyes might uncover bugs or architecture mistakes.

Do not hide your work, isn’t this a reason for open sourcing it?

Bloated dependencies

Maybe it’s just me but I’m very conservative with dependencies.

When I see dozens of deps in the lock file, the first question which comes to my mind is: so, am I ready to fix any failures inside any of them?

Yeah, it works today, maybe it worked 1 week/month/year before, but can you guarantee what will happen tomorrow? I cannot.

No styling or formatting

Different files (sometimes even functions) are written in a different style.

This causes troubles for the contributors, ‘cause one prefers spaces and another prefers tabs. And this is just the simplest example.

So what will be the result:

  • 1 file in one style and another in completely different
  • 1 with { at the end of a line and another { on the new line
  • 1 function in functional style and right below in pure procedural

Which of them is right? — I dunno but this is acceptable if it works but also this horribly distracts readers for no reason.

Simple rule for this: use formatter and linters: eslint, gofmt, rustfmt…oh tons of them! Feel free to configure it as you would like to but keep in mind that the most popular tend to be most natural.

No automatic builds

How you can verify that user can build your code?

The answer is quite simple — build system. TravisCI, GitlabCI, CircleCI and that ‘s only a few of them.

Treat a build system as a silent companion that will check your every commit and will automatically run formatters/linters to ensure that new code has good quality. Sounds amazing, isn’t it?

And adding a simple yaml file which describes how the build should be done in minutes, as always.

No releases or Git tags

Master branch might be broken.

That happening. This is unpleasant stuff but it happens.

Some recent changes might be merged and somehow this causes troubles on the master. How much time it will take to fix for you? few minutes? an hour? a day? Till you’ll be back from vacation? Who knows ‾\_(ツ)_/‾

But when there is a Git tag which points to the time when a project was correct and able to be built, oh, that’s a good thing to have and make the life of your users much better.

Adding release on a Github (similar as Gitlab or any other place) is literally seconds, no reason to omit this step.

No tests

Well, it might be okay.

Of course, having correct tests is a nice thing to have, but probably you’re doing this project after work in your free time or weekend (I’m guilty, I’m doing this so often instead to have a rest).

So don’t be strict to yourself, feel free to share your work and share knowledge. The test can be added later, time with family and friends is more important, the same as mental or physical health.


There are a lot of other things that will make your repo even better, but maybe you will mention them personally in comments?






What I don’t like in your repo was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

March 12, 2019

Kevin Burke (kb)

Phone Number for SFMTA Temporary Sign Office March 12, 2019 04:10 AM

The phone number for the SFMTA Temporary Sign Office is very difficult to find. The SFMTA Temporary Sign web page directs you to 311. 311 does not know the right procedures for the Temporary Sign Office.

The email address on the website is also slow to get back to requests. The Temporary Sign department address listed on the website, at 1508 Bancroft Avenue, is not open to the public — it's just a locked door.

To contact the Temporary Sign Office, call 415-550-2716. This is the direct line to the department. I reached someone in under a minute.

If your event is more than 90 days in the future, don't expect an update. They don't start processing signage applications until 90 days before the event.

Here's a photo of my large son outside of the SFMTA Temporary Sign Office, where I did not find anyone to speak with, but I found the phone number that got me the right phone number to get someone to give me an update on my application.

Using an AWS Aurora Postgres Database as a Source for Database Manager Service March 12, 2019 03:53 AM

Say you have a Aurora RDS PostgreSQL database that you want to use as the source database for Amazon Database Manager Service.

The documentation is unclear on this point so here you go: you can't use an Aurora RDS PostgreSQL database as the source database because Aurora doesn't support the concept of replication slots, which is how Amazon DMS migrates data from one database to another.

Better luck with other migration tools!

Andreas Zwinkau (qznc)

The New Economics March 12, 2019 12:00 AM

A book review which is about system thinking, statistics, learning, and psychology.

Read full article!

March 11, 2019

Maxwell Bernstein (tekknolagi)

Understanding the 100 prisoners problem March 11, 2019 04:04 PM

I visited my friends Chris and Yuki in Seattle. After lunch, Chris threw us a brainteaser: the 100 prisoners problem. For those not familiar, Minute Physics has a great YouTube video about it.

For those who would prefer not to watch a video, a snippet from the Wikipedia page is attached here:

In this problem, 100 numbered prisoners must find their own numbers in one of 100 drawers in order to survive. The rules state that each prisoner may open only 50 drawers and cannot communicate with other prisoners. At first glance, the situation appears hopeless, but a clever strategy offers the prisoners a realistic chance of survival.

and for some reason that snippet sounds like the voice-over to a movie trailer.

Since we did not have a good intuitive grasp of the solution and reasoning, we decided to simulate the experiment and run some numbers. When in doubt, implement it yourself, right?

The Minute Physics video has 100 boxes, but we should generalize to n. Since the boxes in the room are shuffled at the beginning of each experiment, we start by shuffling a list of numbers from 1 to n:

import random

def sample(n=100, limit=50):
    boxes = list(range(n))
    return sum(try_find_self(boxes, person, limit) for person in range(n))

Then, for each person (which is the same as “for each box” in this case), attempt to find their hidden box using the method described in the video. Since try_find_self yields a success (True) or a failure (False), summing should give the number of people who found their boxes.

def try_find_self(boxes, start, limit):
    next_box = boxes[start]
    num_opened = 1
    while next_box != start and num_opened < limit:
        next_box = boxes[next_box]
        num_opened += 1
    return next_box == start

The try_find_self function implements the strategy described in the video: start at the box indexed by your number (not necessarily containing your number) and follow that linked list of boxes until you either hit the limit or find your number. If the next box at the end is yours, you have found your box!

Now, this isn’t very interesting on its own. We can run an experiment, sure, but we still have to analyze the results of the data over multiple samples and varying parameters.

In order to do that, we made some visualizations. We start off by importing all of the usual suspects:

import random
import simulate

import matplotlib.pyplot as plt
import numpy as np

Then, in order to get reproducible results, seed the random number generator. This was essential for improving our implementations of both the visualizations and the simulations while verifying that the end results did not change.


In order to get a feel for the effect of different parameters on the probability of a group of people winning, we varied the number of boxes and the maximum number of tries. It’s a good thing we tried this, since our intuition about how the results scale with the ratio was very wrong.

num_samples = 1000
max_tries_options = np.arange(5, 50, 10)
num_box_options = np.arange(10, 100, 10)

Since our sampler only takes one parameter pair at once, we have to vectorize our function. Note that we specify otypes, because otherwise vectorize has to run the sample function with the first input multiple times in order to determine the type of the output. This is a known issue and was very annoying to debug, given the randomness.

vsample = np.vectorize(simulate.sample, otypes=[int])

Now we take samples at all combinations of the parameter, num_samples number of times. This returns a large NumPy array with dimensions like results[sample_num][max_tries][num_boxes]. For each sample, all of the combinations of parameters are tried and returned in a 2D grid.

params = np.meshgrid(num_box_options, max_tries_options)
results = np.array([vsample(*params) for _ in range(num_samples)])

This produces some nice data, like this:

[[[10  2  0 ...  1  7  6]
  [10  4 30 ... 32  7  1]
  [10 20 30 ...  1 11 35]
  [10 20 30 ... 70 41 40]
  [10 20 30 ...  3 29 30]]


 [[ 4 13 18 ...  3  2  3]
  [10  0 30 ... 31 11 47]
  [10 20 30 ... 43  0 34]
  [10 20 30 ... 29 80 45]
  [10 20 30 ... 70 80 90]]]

While it’s all nice and good to know how many people in each sample found their boxes, we want to visualize the probability of a group winning. Remember that a group winning is defined by all of the n people finding their number in a box. To calculate that probability, we binarize the results and get the mean success rate across all the samples.

results_bin = np.sum(results == num_box_options, axis=0) / num_samples

This turns the results from above into an array like this:

[[0.337 0.012 0.    0.    0.    0.    0.    0.    0.   ]
 [1.    0.699 0.338 0.127 0.029 0.007 0.003 0.    0.   ]
 [1.    1.    0.836 0.545 0.304 0.181 0.072 0.038 0.012]
 [1.    1.    1.    0.871 0.662 0.462 0.316 0.197 0.093]
 [1.    1.    1.    1.    0.907 0.694 0.54  0.429 0.313]]

which has dimensions results_bin[max_tries][num_boxes].

If you are unfamiliar with the term binarize, I was too until last night. It means reduce to a success/failure value.

There are three interesting regions of this data, identifiable even before plotting:

  1. The bottom left field of 1s, which comes from allowing many tries compared to the number of boxes in the room.
  2. The top right field of 0s, which comes from allowing not many tries compared to the number of boxes in the room. They really shouldn’t be zero, but winning is so rare that we would need to have a lot of samples.
  3. The middle “normal” numbers.

Let’s chart the data and see what this looks like in beautiful shades of purple:

ax = plt.axes()
contour = ax.contourf(*params, results_bin)
ax.set_xlabel('num boxes')
ax.set_ylabel('max tries allowed')
ax.set_title('probability of group win')

Note that this graph was generated with 1000 samples, and intervals of 1 for max_tries_options and num_box_options, which is different than the above code snippets. It took a while to generate the data.

On the x-axis we have the total number of both people and boxes and on the y-axis we have the maximum number of tries that each person is given to find their box. This confirms Minute Physics’ conclusion about the probability of everyone winning using the strategy. It also provides a handy way of testing your own strategy against the proposed one and seeing how often you lead your group to success! Feel free to send any interesting ones in.

If Chris, Yuki, and I have time, we’ll update this post with a more efficient simulation so it doesn’t take so dang long to generate the data. We also have another visualization lying around that contains the different probability distributions for all the configuration settings, but haven’t written about it… yet.

There’s some sample code in the repo — check it out and let us know what you think. We found that re-writing the simulation as a Python C-extension improved speeds 20x, so there’s also a small C++ program in there.

March 10, 2019

Ponylang (SeanTAllen)

Last Week in Pony - March 10, 2019 March 10, 2019 03:19 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

March 09, 2019

Gustaf Erikson (gerikson)

February March 09, 2019 08:30 PM

Andreas Zwinkau (qznc)

Deriving Story Points March 09, 2019 12:00 AM

Story points are a useful technique to improve prediction but they have limits because the lack in statistics.

Read full article!

March 08, 2019

Jan van den Berg (j11g)

Getting Things Done – David Allen March 08, 2019 06:38 PM

For some reason I had never read the David Allen classic Getting Things Done. But I found out that 18 years after its release it’s still a good introduction to time and action management.

Getting Things Done – David Allen (2001) – 220 pages

David Allen tries to make the natural, systematic. He does so by introducing a 5 step workflow: capture, clarify, organize, reflect, and engage. Allen does a great job of explaining these steps with real world examples and sprinkles his text with inspiring and relevant quotes. His system is very much based in the physical world — notes, folders, file cabinets etc. — which can feel a bit outdated, but does make sense (as he explains).

GTD in less than 200 words

GTD is a way of thinking about organizing. And it has elements you can also find in other organisation methods. But GTD really focuses around three main concepts.

1. Put everything on a list

Yes, everything. The idea is to clear your head, and use your brain to think about things, not to think of things.

2. Define the next ACTION

This is really the hardcore key concept of GTD. Define the next step. Think about results and decide the next action. And it is very important that the next step is an action. If your car needs a check-up, your list entry is not “Car check-up”, your next action and list entry is “Call the garage to make an appointment”. But you may discover that you need the phone number first. So your next action becomes, look up garage phone number. Get it?

3. Update actions

When you’ve written down everything you need or want to do in your system (1), and decided on the next action (2), your system will only work if you regularly revise your system. You do so by updating or working on your actions.


I can see how the GTD method can work, when you stick to it. And even if I don’t think I will apply GTD fully, I certainly take away some key concepts. And I like how the system tries to empower our natural abilities, and to let your brain do what your brain is good at. That is: not keeping track of things, but creating new things.

My only question is that people who could really benefit from such a system, are usually already in over their head. So they would need a coach (or outside help) to successfully implement GTD.

I enjoyed reading GTD and would argue to read it at least once. By just reading it, it already seems to activate a mental process to want to organise and declutter. How else can you explain that I just ordered a labelprinter and 60 feet of bookshelves?

The post Getting Things Done – David Allen appeared first on Jan van den Berg.

Nikola Plejić (nikola)

Developing Web Services in Rust: my talk at the Zagreb Rust Meetup March 08, 2019 04:30 PM

I've given a short overview of Rust's support for writing web services at the Zagreb Rust meetup. It was all in Croatian, but there's some code with the occasional comment in English which may or may not be useful.

There's also an org-mode file in the repository, containing the vast majority of the stuff I've covered in the talk. Incidentally, GitLab does a far better job at rendering org-mode files than GitHub does, so you can enjoy a beautiful render without notable loss of content.

March 06, 2019

Jan van den Berg (j11g)

Leonardo da Vinci – Walter Isaacson March 06, 2019 07:26 PM

My favorite biographer, Walter Isaacson, did it again. He created a gorgeously illustrated book about the quintessential renaissance man, Leonardo da Vinci. The book is based on the mind blowing — in number and content — 7200 pages of notes Leonardo left behind (which probably only accounts for one quarter, the rest is lost). As far as I am concerned this biography is the definitive introduction to this left-handed, mirror writing, ever procrastinating, sculpting, painting, stargazing, riddle creating, bird watching, theatre producing, water engineering, corpse dissecting, observing and ever curious dandy polymath.

Leonardo da Vinci – Walter Isaacson (2017) – 601 pages

“Leonardo’s notebooks are nothing less than an astonishing windfall that provides the documentary record of applied creativity.”

Walter Isaacson

I don’t want to go into too much detail about Leonardo da Vinci; just read the book! But needless to say he was one of a kind, his mind worked differently from other people and he made wide varying discoveries. I always thought he must have been a reclusive person. Because he was so far ahead of his time — sometimes centuries — that he must not have enjoyed present company. But, this couldn’t be further from the truth.

Leonardo was very much a people person. And this is one of the key arguments made by Isaacson about Leonardo’s art and skill. Not only was he a keen curious (the most curious) observer and tinkerer but he also sought cooperation to bounce ideas off. Isaacson makes a strong case of Leonardo specifically becoming and being a genius because of the combination of these things.

A different print than my copy, but still gorgeous. Also, my copy has an autograph 😉

As I’ve come to expect of biographies by Isaacson, his own personal passion and admiration for the subject shine trough. Which is why I always enjoy his writing. Of course, some things that happened 500 years ago are up for debate, but Isaacson demonstrates enough knowledge and backstory to his findings to come to mostly natural conclusions. This book does an especially good job of going through da Vinci’s life chronologically but still managing to show the cross-sections and connections between art and science (and everything else) throughout Leonardo’s life. And with Leonardo everything was interconnected and related, so this is quite an accomplishment!

All of Leonardo’s skills and knowledge, of course, came together in the painting he worked on for 16 years. The Mona Lisa. The book beautifully works towards that conclusion. And by reading this book you come away with a deeper understanding and appreciation of what exactly it is you’re looking at.

The post Leonardo da Vinci – Walter Isaacson appeared first on Jan van den Berg.

Derek Jones (derek-jones)

Regression line fitted to noisy data? Ask to see confidence intervals March 06, 2019 06:13 PM

A little knowledge can be a dangerous thing. For instance, knowing how to fit a regression line to a set of points, but not knowing how to figure out whether the fitted line makes any sense. Fitting a regression line is trivial, with most modern data analysis packages; it’s difficult to find data that any of them fail to fit to a straight line (even randomly selected points usually contain enough bias on one direction, to enable the fitting algorithm to converge).

Two techniques for checking the goodness-of-fit, of a regression line, are plotting confidence intervals and listing the p-value. The confidence interval approach is a great way to visualize the goodness-of-fit, with the added advantage of not needing any technical knowledge. The p-value approach is great for blinding people with science, and a necessary technicality when dealing with multidimensional data (unless you happen to have a Tardis).

In 2016, the Nationwide Mutual Insurance Company won the IEEE Computer Society/Software Engineering Institute Watts S. Humphrey Software Process Achievement (SPA) Award, and there is a technical report, which reads like an infomercial, on the benefits Nationwide achieved from using SEI’s software improvement process. Thanks to Edward Weller for the link.

Figure 6 of the informercial technical report caught my eye. The fitted regression line shows delivered productivity going up over time, but the data looks very noisy. How good a fit is that regression line?

Thanks to WebPlotDigitizer, I quickly extracted the data (I’m a regular user, and WebPlotDigitizer just keeps getting better).

Below is the data plotted to look like Figure 6, with the fitted regression line in pink (code+data). The original did not include tick marks on the axis. For the x-axis I assumed each point was at a fixed 2-month interval (matching the axis labels), and for the y-axis I picked the point just below the zero to measure length (so my measurements may be off by a constant multiplier close to one; multiplying values by a constant will not have any influence on calculating goodness-of-fit).

Nationwide: delivery productivity over time; extracted data and fitted regression line.

The p-value for the fitted line is 0.15; gee-wiz, you say. Plotting with confidence intervals (in red; the usual 95%) makes the situation clear:

Nationwide: delivery productivity over time; extracted data and fitted regression line with 5% confidence intervals.

Ok, so the fitted model is fairly meaningless from a technical perspective; the line might actually go down, rather than up (there is too much noise in the data to tell). Think of the actual line likely appearing somewhere in the curved red tube.

Do Nationwide, IEEE or SEI care? The IEEE need a company to award the prize to, SEI want to promote their services, and Nationwide want to convince the rest of the world that their IT services are getting better.

Is there a company out there who feels hard done-by, because they did not receive the award? Perhaps there is, but are their numbers any better than Nationwide’s?

How much influence did the numbers in Figure 6 have on the award decision? Perhaps not a lot, the other plots look like they would tell a similar tail of wide confidence intervals on any fitted lines (readers might like to try their hand drawing confidence intervals for Figure 9). Perhaps Nationwide was the only company considered.

Who are the losers here? Other companies who decide to spend lots of money adopting the SEI software process? If evidence was available, perhaps something concrete could be figured out.

March 04, 2019

Derek Jones (derek-jones)

Polished human cognitive characteristics chapter March 04, 2019 01:32 AM

It has been just over two years since I release the first draft of the Human cognitive characteristics chapter of my evidence-based software engineering book. As new material was discovered, it got added where it seemed to belong (at the time), no effort was invested in maintaining any degree of coherence.

The plan was to find enough material to paint a coherence picture of the impact of human cognitive characteristics on software engineering. In practice, finishing the book in a reasonable time-frame requires that I stop looking for new material (assuming it exists), and go with what is currently available. There are a few datasets that have been promised, and having these would help fill some holes in the later sections.

The material has been reorganized into what is essentially a pass over what I think are the major issues, discussed via studies for which I have data (the rule of requiring data for a topic to be discussed, gets bent out of shape the most in this chapter), presented in almost a bullet point-like style. At least there are plenty of figures for people to look at, and they are in color.

I think the material will convince readers that human cognition is a crucial topic in software development; download the draft pdf.

Model building by cognitive psychologists is starting to become popular, with probabilistic languages, such as JAGS and Stan, becoming widely used. I was hoping to build models like this for software engineering tasks, but it would have taken too much time, and will have to wait until the book is done.

As always, if you know of any interesting software engineering data, please let me know.

Next, the cognitive capitalism chapter.

Pete Corey (petecorey)

Secure Meteor is Live March 04, 2019 12:00 AM

The big day is finally here. Secure Meteor is live and available for purchase!

Secure Meteor is the culmination of all of my work as a Meteor security professional. Between the years of 2014 and 2017 I completely immersed myself in the Meteor ecosystem and became increasingly focused on the unique security characteristics of Meteor applications. I wrote and spoke about Meteor security,built security-focused tools and packages for the Meteor ecosystem, and worked hands-on with talented teams to better secure their Meteor applications. Secure Meteor is the embodiment of everything I learned about Meteor security during that time.

It’s my goal that reading Secure Meteor will teach you the ins and outs of the various attack vectors present in your Meteor application, and will also teach you how to see your application through the eyes of a potential attacker.

Check out the Secure Meteor page for more details, samples chapters, and to snag your copy today!

On a personal note, it’s been over a year since I first announced and started working on Secure Meteor. There were many times over the past year when I never thought I’d finish. Writing a book has always been a personal goal of mine, and I couldn’t be more happy to have persevered and seen this project through to completion.

I deeply believe that Secure Meteor is a valuable addition to the Meteor community, and I’m happy to be giving back to a community that has given so much to me over the years.

Thanks for all of your support.

March 03, 2019

Gokberk Yaltirakli (gkbrk)

Writing a Simple IPFS Crawler March 03, 2019 04:38 PM

IPFS is a peer-to-peer protocol that allows you to access and publish content in a decentralized fashion. It uses hashes to refer to files. Short of someone posting hashes on a website, discoverability of content is pretty low. In this article, we’re going to write a very simple crawler for IPFS.

It’s challenging to have a traditional search engine in IPFS because content rarely links to each other. But there is another way than just blindly following links like a traditional crawler.

Enter DHT

In IPFS, the content for a given hash is found using a Distributed Hash Table. Which means our IPFS daemon receives requests about the location of IPFS objects. When all the peers do this, a key-value store is distributed among them; hence the name Distributed Hash Table. Even though we won’t get all the queries, we will still get a fraction of them. We can use these to discover when people put files on IPFS and announce it on the DHT.

Fortunately, IPFS lets us see those DHT queries from the log API. For our crawler, we will use the Rust programming language and the ipfsapi crate for communicating with IPFS. You can add ipfsapi = "0.2" to your Cargo.toml file to get the dependency.

Using IPFS from Rust

Let’s test if our IPFS daemon and the IPFS crate are working by trying to fetch and print a file.

let api = IpfsApi::new("", 5001);

let bytes ="QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u")?;
let data = String::from_utf8(bytes.collect())?;

println!("{}", data);

This code should grab the contents of the hash, and if everything is working print “Hello World”.

Getting the logs

Now that we can download files from IPFS, it’s time to get all the logged events from the daemon. To do this, we can use the log_tail method to get an iterator of all the events. Let’s get everything we get from the logs and print it to the console.

for line in api.log_tail()? {
    println!("{}", line);

This gets us all the loops, but we are only interested in DHT events, so let’s filter a little. A DHT announcement looks like this in the JSON logs.

  "duration": 235926,
  "event": "handleAddProvider",
  "key": "QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u",
  "peer": "QmeqzaUKvym9p8nGXYipk6JpafqqQAnw1ZQ4xBoXWcCrLb",
  "session": "ffffffff-ffff-ffff-ffff-ffffffffffff",
  "system": "dht",
  "time": "2018-03-12T00:32:51.007121297Z"

We are interested in all the log entries with the event handleAddProvider. And the hash of the IPFS object is key. We can filter the iterator like this.

let logs = api.log_tail()
        .filter(|x| x["event"].as_str() == Some("handleAddProvider"))
        .filter(|x| x["key"].is_string());

for log in logs {
    let hash = log["key"].as_str().unwrap().to_string();
    println!("{}", hash);

Grabbing the valid images

As a final step, we’re going to save all the valid image files that we come across. We can use the image crate. Basically; for each object we find, we’re going to try parsing it as an image file. If that succeeds, we likely have a valid image that we can save.

Let’s write a function that loads an image from IPFS, parses it with the image crate and saves it to the images/ folder.

fn check_image(hash: &str) -> Result<(), Error> {
    let api = IpfsApi::new("", 5001);

    let data: Vec<u8> =;
    let img = image::load_from_memory(data.as_slice())?;

    println!("[!!!] Found image on hash {}", hash);

    let path = format!("images/{}.jpg", hash);
    let mut file = File::create(path)?; file, image::JPEG)?;


And then connecting to our main loop. We’re checking each image in a seperate thread because IPFS can take a long time to resolve a hash or timeout.

for log in logs {
    let hash = log["key"].as_str().unwrap().to_string();
    println!("{}", hash);

    thread::spawn(move|| check_image(&hash));

Possible improvements / future work

  • File size limits: Checking the size of objects before downloading them
  • More file types: Saving more file types. Determining the types using a utility like file.
  • Parsing HTML: When the object is valid HTML, parse it and index the text in order to provide search

Ponylang (SeanTAllen)

Last Week in Pony - March 3, 2019 March 03, 2019 03:58 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Gustaf Erikson (gerikson)

January March 03, 2019 01:22 PM

The only image worth posting this month is in this post.

Jan 2018 | Jan 2017 | Jan 2016 | Jan 2015 | Jan 2014 | Jan 2013 | Jan 2012 | Jan 2011 | Jan 2010 | Jan 2009

November March 03, 2019 10:59 AM

March 02, 2019

Grzegorz Antoniak (dark_grimoire)

Is it worth using make? March 02, 2019 06:00 AM

You may think you've created a nice and tidy Makefile, which tracks dependencies and works across many different operating systems, but ask yourself these questions (and answer honestly):

Which compiler does your Makefile support?

Is it GCC? Or Clang? Some of their options are pretty similar, so it may not …

March 01, 2019

Ponylang (SeanTAllen)

0.27.0 Released March 01, 2019 05:00 AM

Pony 0.27.0 is a big release for us. LLVM 7, 6, and 5 are all now fully supported by Pony. This is the last release that LLVM 3.9.1 will be supported for.

Additionally, there are a number of important fixes and a couple of breaking changes in the release. Keep reading for more information.

February 28, 2019

Patrick Louis (venam)

Time On The Internet February 28, 2019 10:00 PM

Trapped Within

Time can be measured in all sort of ways, some more accurate than others, but the perception of its flow varies widely depending on the subjective experience. That’s the distinction between physical and psychological time.
Psychological time is influenced and influences our cognitive systems. It influences how we act and respond to information and events around us, and the information and events around us influence it.

Boredom is one fascinating stimuli or lack thereof. According to researches it makes us feel as if time is passing slowly, minutes can feel like hours, things that happened yesterday feel like they happened a week ago. Boredom digs us deep into the ground, stifling our performance; it is associated with depression, lack of satisfaction, lack of alertness, and other nasty effects. A new study has found that when experiencing boredom we are prone to be more altruistic, empathic, and perhaps more creative. The reason could be that boredom incites daydreaming or incites the hope to escape from this mental state.
Routine resembles boredom but has a time-lapse effect, weeks go by so quickly we may not notice them. Unsurprisingly, emotions are also related to how time feels. For instance, when facing a threat, or in a state of fear, time slows down.

Tying these together, a theory proposes that time perception is tightly linked to how we pay attention. If we had money for the hours in a day, let’s say $24, then we would spend it with our attention. The parts of the day we allocate extra money for are the ones that are felt slower.
Another way to think about it is to imagine it as a beat. The faster the beat, the slower it feels.


Thus a lack of stimuli leads to time warps; while over stimuli, by boredom accompanied by inaction, for instance, makes us feel trapped in a time cage.

Time can pass quickly on the internet, news get old fast, and boredom or routine follows. What are the emotional states affecting this awkward time dimension?

It begins with the war for attention and the information overload we are presented with everyday.

“There are things you don’t know about that you absolutely need to know about, and please remember this other thing too at the same time”, is what is being shouted at us. It’s easy to fall prey and cave in to the pressure to follow up, the new internet version of keeping up with the Kardashian.
What are worthy news and what are unworthy ones, what are the things that captivate us and make us spend our precious attention coins, consequentially dilating our time on them. There are many actors that have different incentives and would benefit greatly from our attention. They may bore us in a time capsule, making us feel scammed and robbed, may fall into a routine or may play with our emotions.

The upfront players in the online space are advertisers, they keep the machine running. Marketers know their craft of branding and exposure, spamming us with their names. Their goal is to remind us that they exist, and this is achieved through repetition.
Unfortunately, everything else has started to interact with us in the same fashion. In marketing, popularity is key while with the rest it’s supposed to be about value. Nonetheless trendism is in vogue: “the belief that an already-trending topic deserves to be promoted”. Repetition is even praised as a strategy to brand one-self over and over again because of the short shelf life of media, so that followers can spend enough time to get an impression. Everything is a product.
Keep in mind that information is defined as anything that is surprising and new, something of value. Things that tell us something we already know are worthless. Repetition has become the norm, so much brain grinding and mashing with the same topics over and over again - An eternity of time passed on the same topics.
It’s even more upsetting that less than 60 percent of the web traffic is human, the rest is fake. This means a lot of the grinding we get refurbished is based on metrics that don’t exist, using algorithms that know us on a superficial level. An algorithm can hardly learn novelty, at least not yet.
Hence the unoriginality and predictability that is spreading fast is nothing extraordinary.

Not limited to algorithms, that predictability permeates everything.
The web has dwindled in attempts at being daring, original, or standing out. People love novelty and are seeking it yet it seems to fade away remarkably fast. As soon as something new comes out it’s instantly part of the copy-paste culture, part of the memetic. Additionally, we’ve all become trained critics looking for the finest in every little thing. It’s risky to bet on being audacious and easier to bet on tried-and-tested. We’ve turned into the Edmund Burke in the Burke vs Pain debate, or Hem in the “Who moved my cheese” book.

This clearly explains why nostalgia is such a business. It’s better to use the “good-ol’ days” with rosy retrospection as a backup rather than entering the unknown. The internet is too serious to allow things to go haywire - Stay in the tunnel and don’t divert.

We now talk about digital presence and digital identity and it matters. It blurs the lines between professional, personal, and online lives, mixing all of it in a bowl that is neither neat nor tidy. The internet is important business!
In this era of hyper-usefulness and over-rationalization everything has to serve an obligation, has to be straight-forward, has to intermix work and social appearances. You have to watch your steps as there may be unforeseen consequences that may be judged. Nothing disappears in the online world. In consequence we suppress our speech, we’re held in the cage of time even more, we can’t escape boredom because it’s too risky and too real.

This opens the door for tribality, propaganda, and political affiliations. A great amount of discussions are diverted into political messages using the time jail to their advantage. Black PR, fake news, state trolls, all use repetition and emotional manipulation to spread ideas as viruses on the internet. The internet is a commodity.

We are watched, remembered, and measured deliberately, and not so deliberately. Privacy is a beloved issue we like to see brought up in the news. We’re on edges, we can’t make mistakes. “L’enfer, c’est les autres.”

“We have arrived at a version where everything seems to be just another version of LinkedIn. Every online space is supposed to get you a job or a partner or a stronger personal brand so you can accomplish the big, public-record goals of life. The public marketplace is everywhere. It’s an interactive and immersive CV, an archive. It all counts, and it all matters.” (The decline of Snapchat and the secret joy of internet ghost towns)

LinkedIn displays the epitome of this with its SSI, the Social Selling Index, a metric to calculate how successful you are. This can’t be more explicit. “By checking out your SSI, you’ll see how you stack up against your industry peers and your network on LinkedIn.”
“There are things you don’t know about that you absolutely need to know about, and please remember this other thing too at the same time”, “furthermore, you are behind and need to be productive”.

As with real life, comparing yourself with others drags with it the phantom of unhappiness. It sings along the tune of the thriving productivity craze that has emerged, histrionic productivity. Seemingly something wanted or needed in the age of multitasking and constant interruptions.
Time runs fast when you see that everyone is already leading the race and you’re far behind. Days are not enough to complete the never ending TODO lists. We’re getting old so fast, how could we possibly keep up before being discarded as an unwanted product! Guilty of not displaying our labor; A worthlessness hole. How to to be worthy in a place where unworthiness is the norm.
Boredom may lead to creativity…

Then marketers know exactly our wish to escape and offer us a pre-packaged solution to our external validation needs. Using our time, we buy molds and frameworks advertised as metamorphosis.

How to genuinely break out of the cocoon.
How to not go the effortless apathetic advertised way.
How to take back control of time.

Adding to the statistic that 40% of the traffic online is fake we have to add that the actual contribution and participation of users is done by a mere 1 to 3%. The rest are lurkers.
As we’ve said boredom accompanied with inaction makes us feel trapped in a time cage.

So what should we do to govern our content instead of the passive consumption. Unfortunately, we start with a bad hand. It’s frightening to be a voice on the fringe because the house plays against you.
Maybe if we moved to a different host or become hosts ourselves it would change the rules. What we do to maintain our homes is certainly worthy of our time. When you’re building your living space you shouldn’t worry about what happens in the neighbours houses.
Indeed, there are many barriers stopping us, making us think we’re not good enough, making us believe we aren’t original nor interesting. Hence we keep quiet as to not bother the owners and roommates.

We then build a preference for ephemeral media, places where our voices dissipate quickly, where we’re anonymous but can nevertheless talk without holding ourselves.
Still we are in the dungeon where no one can see, living an illusion in a parallel time that doesn’t affect the current one.

You have to take the effort and step out.
Break the algorithms, use them to your advantage.

Then space-time curves on itself.

time curvature

Time is relative, on the internet.
A watched pot never boils!

And this concludes the small amount of time I’ll get from you on the internet.

Because even this post is a repetition:






Derek Jones (derek-jones)

Modular vs. monolithic programs: a big performance difference February 28, 2019 03:15 PM

For a long time now I have been telling people that no experiment has found a situation where the treatment (e.g., use of a technique or tool) produces a performance difference that is larger than the performance difference between the subjects.

The usual results are that differences between people is the source of the largest performance difference, successive runs are the next largest (i.e., people get better with practice), and the smallest performance difference occurs between using/not using the technique or tool.

This is rather disheartening news.

While rummaging through a pile of books I had not looked at in many years, I (re)discovered the paper “An empirical study of the effects of modularity on program modifiability” by Korson and Vaishnavi, in “Empirical Studies of Programmers” (the first one in the series). It’s based on Korson’s 1988 PhD thesis, with the same title.

There were four experiments, involving seven people from industry and nine students, each involving modifying a 900(ish)-line program in some way. There were two versions of each program, they differed in that one was written in a modular form, while the other was monolithic. Subjects were permuted between various combinations of program version/problem, but all problems were solved in the same order.

The performance data (time to complete the task) was published in the paper, so I fitted various regressions models to it (code+data). There is enough information in the data to separate out the effects of modular/monolithic, kind of problem and subject differences. Because all subjects solved problems in the same order, it is not possible to extract the impact of learning on performance.

The modular/monolithic performance difference was around twice as large as the difference between subjects (removing two very poorly performing subjects reduces the difference to 1.5). I’m going to have to change my slides.

Would the performance difference have been so large if all the subjects had been experienced developers? There is not a lot of well written modular code out there, and so experienced developers get lots of practice with spaghetti code. But, even if the performance difference is of the same order as the difference between developers, that is still a very worthwhile difference.

Now there are lots of ways to write a program in modular form, and we don’t know what kind of job Korson did in creating, or locating, his modular programs.

There are also lots of ways of writing a monolithic program, some of them might be easy to modify, others a tangled mess. Were these programs intentionally written as spaghetti code, or was some effort put into making them easy to modify?

The good news from the Korson study is that there appears to be a technique that delivers larger performance improvements than the difference between people (replication needed). We can quibble over how modular a modular program needs to be, and how spaghetti-like a monolithic program has to be.

February 25, 2019

Pete Corey (petecorey)

Secure Meteor Releasing Next Week! February 25, 2019 12:00 AM

You may have noticed that I haven’t been doing much publicly in 2019. I haven’t released any new articles, I haven’t send out any newsletters, and I’ve been relatively quiet on Twitter.

Underneath the surface of this calm lake of inactivity, my little duck feet have been churning. I’ve been pursuing one of my main goals for the new year and working diligently to finish my first book, Secure Meteor. I’m excited to announce that I’ll be releasing Secure Meteor early next week!

Between the years of 2014 and 2017, I lived and breathed Meteor security. I spent those years writing and speaking about Meteor security, developing and deploying secure Meteor applications, working with amazing teams to better secure their Meteor applications, and building security-focused packages and tools for the Meteor ecosystem.

While Meteor doesn’t play as central of a role in my day-to-day development work today, it would be a shame to throw away all of the knowledge and expertise I built up around the ins-and-outs of securing Meteor applications.

Secure Meteor is an effort to capture and distill everything I’ve learned about Meteor security from my years of real-world Meteor security experience.

I’m happy to announce that over a year after originally announcing Secure Meteor, it’s ready to be released! The final product is one hundred ten pages of what I consider to be vitally important information on securing your Meteor application. If you are actively developing Meteor applications, or own a Meteor application living in production, Secure Meteor can help bring you understanding and peace of mind in the unforgiving world of software security.

Be sure to check out a few of the sample chapters to whet your whistle for next week’s release.

If you’re interested in the book, sign up for the Secure Meteor newsletter. Subscribers will be the first to know when Secure Meteor launches, and I might even offer them an initial discount for all of their support.

I couldn’t be more excited for next week. See you then!

February 24, 2019

Derek Jones (derek-jones)

Evidence-based election campaigning February 24, 2019 09:30 PM

I was at a hackathon on evidence-based election campaigning yesterday, organized by Campaign Lab.

My previous experience with political oriented hackathons was a Lib Dem hackathon; the event was only advertised to party members and I got to attend because a fellow hackathon-goer (who is a member) invited me along. We spent the afternoon trying to figure out how to extract information on who turned up to vote, from photocopies of lists of people eligible to vote marked up by the people who hand out ballot papers.

I have also been to a few hackathons where the task was to gather and analyze information about forthcoming, or recent, elections. There did not seem to be a lot of information publicly available, and I had assumed that the organization, and spending power, of the UK’s two main parties (i.e., Conservative and Labour) meant that they did have data.

No, the main UK political parties don’t have lots of data, in fact they don’t have very much at all, and make hardly any use of what they do have.

I had a really interesting chat with Campaign Lab’s Morgan McSweeney, about political campaigning, and how they have not been evidence-based. There were lots of similarities with evidence-based software engineering, e.g., a few events (such the Nixon vs. Kennedy and Bill Clinton elections) created campaigning templates that everybody else now follows. James Moulding drew diagrams showing Labour organization and Conservative non-organization (which looked like a Dalek) and Hannah O’Rourke spoiled us with various kinds of biscuits.

An essential component of evidence-based campaigning is detailed knowledge of the outcome, such as: how many votes did each candidate get? Based on past hackathon experience, I thought this data was only available for recent elections, but Morgan showed me that Wikipedia had constituency level results going back many years. Here was a hackathon task; collect together constituency level results, going back decades, in one file.

Following the Wikipedia citations led me to Richard Kimber’s website, which had detailed results at the constituency level going back to 1945. The catch was that there was a separate file for each constituency; I emailed Richard, asking for a file containing everything (Richard promptly replied, the only files were the ones on the website).


The following plot was created using some of the data made available during a hackathon at the Office of National Statistics (sometime in 2015). We (Pavel+others and me) did not make much use of this plot at the time, but it always struck me as interesting. I showed it to the people at this hackathon, who sounded interested. The plot shows the life-expectancy for people living in a constituency where Conservative(blue)/Labour(red) candidate won the 2015 general election by a given percentage margin, over the second-placed candidate.

Life-expectancy for people living in a constituency where Conservative/labour won by a given percentage margin.

Rather than scrape the election data (added to my TODO list), I decided to recreate the plot and tidy up the associated analysis code; it’s now available via the CampaignLab Github repo

My interpretation of the difference in life-expectancy is that the Labour strongholds are in regions where there is (or once was) lots of heavy industry and mining; the kind of jobs where people don’t live long after retiring.

Artemis (Artemix)

2019 and this blog February 24, 2019 06:12 PM

As you can notice, articles are quite slow to come. The reason is mainly due to me being a slow writer, and not having a lot of topics to write on.

Another problem is school, as there's definitely more work than the previous year, giving me less time to maintain this blog, together with some other projects.

# Website changes and this blog

A project I'm currently walking towards is to rework most of my websites.

I have a lot of very small static websites for lots of different things, and it doesn't make a lot of sense in my opinion to spread them so much, so I'll work on joining together a lot of them.

Part of this work has already been done, where several small documentation websites were finally brought together at

I have a few websites left:

In the Portfolios section, there's some heavy duplication, so I intend to remove all of them, and merge them into a single simple page, which will be simpler to find (and link back to me), and much cleaner to maintain.

I also intend to merge my devlog inside this blog, and to, once again (I know), re-work this blog.

This includes a re-design, a change in the categorization system, and a removal of the comment space (since the service I used is pretty much closing his doors).

# A new hope

The first goal is to merge my Devlog into my blog, which should be followed by the integration of my folio (a.k.a. CV) into the blog website, then followed by the integration of a small "About" page.

To organise all written articles, there'll be some categorization.

The design'll probably change to become simpler; I'll probably use a third-party lightweight CSS library, like skeleton or milligram for the base CSS.

I'll probably redevelop the static site generator, not in nodejs but in go, to be simpler and easier to maintain, because it's a real hell as of now, especially around file manipulation.

# Articles access

The direct articles' links won't change, but the RSS feeds will be differently structured: the global one will continue to work, but a new feed by category will be created.

# Overall

Overall, a lot of planned change, which will take some time, especially around the static site generation (with a much more complex layout), but I personally think it's really worth the change and amount of work.

For now, I'll work on that, so I'll see you later!

Ponylang (SeanTAllen)

Last Week in Pony - February 24, 2019 February 24, 2019 03:33 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

February 23, 2019

Jeff Carpenter (jeffcarp)

Book Review: The Shame of the Nation February 23, 2019 02:42 PM

The Shame of the Nation: The Restoration of Apartheid Schooling in America Author: Jonathan Kozol Published: 2005 Rating: ⭐⭐⭐⭐⭐ This is an upsetting book. It describes the dream of integrated schooling enabled by Brown v. Board of Education in 1954 and how, through racist policy making at the federal, state, and local levels, this dream has been slowly dismantled resulting in an American school system that is as segregated today as it was during the civil rights movement.

February 22, 2019

Noon van der Silk (silky)

2018s Crazy Ideas February 22, 2019 12:00 AM

Posted on February 22, 2019 by Noon van der Silk

I can’t believe I forgot to do this at the start of the year!

Let’s look back over 2018’s ideas:

The Ideas

  1. using ai to reduce medical dosages: i.e. imagine you need such-and-such amount of radiation in order for certain whatever to be seen in some scan can you lower the dosage and then use some ai technique to enhance the quality of the image?
  2. something about shape-based reasoning: i.e. the fact that some grammar-parsing problem teems like a good fit for tensor-network works because they both have the “shape” of a tree another example would be thinking of solving a problem of identifying different types of plants as it fits in the “shape” of a classifier but these things could potentially have other shapes? and therefore fit into other kinds of problems?
  3. a q&a bot that can answer tests on the citizenship test: Q: What is our home girt by? A: ..
  4. buddhist neural network: it features significant self-attention and self-awareness it performs classification, but predicts the same class for every input as it has a nondual mind it’s loss function features a single term for each of the 6 paramitas: - Generosity: to cultivate the attitude of generosity. - Discipline: refraining from harm. - Patience: the ability not to be perturbed by anything. - Diligence: to find joy in what is virtuous, positive or wholesome. - Meditative concentration: not to be distracted. - Wisdom: the perfect discrimination of phenomena, all knowable things.
  5. music-to-image: say you’re listening to greek music you want to generate an image of a singer singing on some clifftop on greece under an olive tree near a vineyard surely a simple matter of deep learning
  6. code story by word frequency: take all the words in a code repo, order them by frequency, then match that up to some standard book, then remap the code according to the frequency
  7. generalisation of deutch-jozsa’s problem: here - generalise it so that we have multiple f’s that have different promises; i.e. i’m constant “50% of the time”. what now?
  8. analogy-lang: reading by @maetl i had an idea for a programming language consider two types, “person” and “frog”, let’s say there are at least two problems; one is the one in the article - it’s hard to come up with a complete list of methods/properties that should exactly define one of these things. in surfaces and essences they argue that in reality no categorisation has perfectly defined boundaries; so how to define the types? another problem is what happens if i want to build a thing that is both “person-like” and “frog-like”? you can’t. especially not if the differences are far away (i.e. is frog hopping like “walking”? should it be the implementation of a “walk” method? probably not; but a “move” method? probably? but what about less-clear things? and doesn’t it depend on what your merging with) in this way it seems like it’s impossible to come up with strict types for anything so here’s an idea: analogy-lang: let’s you define types in a much more relaxed way; “this thing is like that thing in these ways, but different in these ways”
  9. multi-layered network: train some network f to produce y_1 from x_1 f(x_1) = y_1 then, wrap that in a new network, g, that produces y_1 and y_2 from x_1 and x_2 g(x_1, x_2) = ( f(x_1,), g’(x_1, x_2) ) and so on. interesting
  10. psychadelics for ai: ideas: - locate some kind of “default-mode network” in your model and inhibit it - after training; allow many more connections - have two modes of network; one this “high-entropy” learning one, which prunes down to a more efficient one that can’t learn but can decide quickly
  11. ultimate notification page: it’s just a page with a bunch of iframes to all your different websites where you get the little notification things and then it just tiles them; showing in-which places you have notifications.
  12. use deep learning to replace a movie character with yourself and your own acting: 1. semantic segmentation to remove an actor 2. film yourself saying their lines 3. plug yourself back into the missing spot 4. dress yourself in the appropriate clothes 5. adjust the lighting 6. ??? 7. profit!
  13. paramterised papers: “specifically, this model first generates an aligned position p_t for each target word at time t” show me this sentence with t = whatever and p = some dimension
  14. on a slide-deck, have a little small game that plays out in a tiny section of each slide
  15. use autocomplete and random phrases to guess things about people: “i love you …” find out who they love “give me …” find out what they want and so on
  16. visualise the difference between streaming and shuffling sampling algorithms: streaming -> for low numbers, misses items shuffling -> for low numbers, bad distribution across indices that are shuffled. @dseuss @martiningram
  17. symbolic tensorflow: so i can do convs and explain them really easily
  18. the “lentictocular”: uses lenticular technology on a sphere with AI so that it watches your gaze and moves itself accordingly so that it always displays the appropriate time
  19. ai email judgement front: intercepts all your emails for every email, it decides the optimal time that someone will respond it sends it at that time so that they respond
  20. “growth maps” for determining affected areas of projects w.r.t. a pattern language
  21. friend tee: lights up when other friends are nearby
  22. lenticular business cards: this is already done by many people
  23. innovative holiday inventor: thinks up cool holidays
  24. buddhist twitter: there’s only one account, and no password
  25. programming ombudsman: @sordina @kirillrdy
  26. the computational complexity of religion: given various religious abilities, what computational problems can you solve? what are the implications on computational complexity by buddhism? and so on.
  27. spacetime calendar: we have calendars for dates but they don’t often contain space constraints so why not a space-time calendar, defined in some kind of light cone?
  28. app to check consistency of items before you leave: it’s an app you configure it to be aware of your keys, laptop, wallet, glasses then, as you leave your house, it can inform you of the status of those items: - “hey, your computer is at home” - “all g, your glasses are at work” etc @gacafe
  29. gradient tug-of-war demonstration: given a function f(x,y) = x + y then if we have competing loss functions then it would be nice to visualise the gradient flow as a tug of war
  30. a website that is entirely defined in the scroll bars/url link bar, whatever: you can move pages by moving your mouse to different parts of the scroll bars and so on in that fashion
  31. quantum calendar: it’s a calendar where on any given day in the future, items can be scheduled at the same time. but up to some limit (say 1 week) the items get collapsed and locked in @silky @dseuss
  32. streaming terminal graph receiving updates over mqtt: then, can use it to plot tensorboard logs to the terminal instead of tensorboard using blessed + blessed-contrib seems to be the easiest way - just need to put in the mqtt part and update the data that way.
  33. ai escape room: this is an idea of dave build an escape room controlled entirely by ai the only way out is by interacting with the machine it can control everything: heating, doors, whatever
  34. programmable themepark: here’s a ride; how you interact with it is defined via your own programs you play minigolf, but instead of a club you use programming @sordina
  35. graph+weights to neuronal net rendering:
  36. Arbitrary-Task Programming: given that programming is just arranging symbols, and we can use deep learning to interpret the real world into symbols, then it’s possible to do programming by performing arbitrary tasks i.e. any job can be programming, if we can build the deep-learning system that converts actions in that job into symbols in a programming language
  37. Sitcom-Lang: it’s a pre-processor, or something, for an arbtrariy language whenever a symbol is defined, that symbol is embued with a soul and a “nature”. it starts to have wants and needs; and those must be satisfied in order for it to stay in it’s present form (i.e. as a “for loop”), otherwise it might change (i.e. to be a “while” loop, or maybe even a string instead). all the symbols will interact with each other, and in that way a program will be made @sordina
  38. brain2object2teeshirt: this - - but once it’s decided on the object, it gets rendered on your LED tee-shirt @geekscape @sordina
  39. pix2pix sourcecode2cat: generate pictures of source code convert to pictures of cat instant machine for generating cat pictures from code what cat does your code look like? @sordina
  40. physical xmonad: use the uarm to be a “physical” xmonad you want to write on some piece of paper? no worries, the uarm will re-arrange your physical desk so that everything is conveniently arranged to do that @sordina
  41. collaborative painting in the style of christopher alexander: it has 3 parts done by 3 artists i draw the left part; you draw the middle, and it has to interact coherently with whatever i’ve drawn; then another person draws the right side, again, it must interact that’s it.
  42. lego laptops: laptops that plug together in a lego-like way
  43. business version of 30 kids vs 2 professional soccer players: 30 grads vs 2 ceos 30 ceos vs 2 grads etc.
  44. shops in parks: would make the parks safer/nicer, because people would be in them more could limit the type of shops, and their size, but would be a nice way to build a bit more of a community feeling in them
  45. an icon next to your public email address that indicates how many unread emails you have: then people can gauge what will happen if they email you
  46. ml for configuring linux: “what file should i look at to change default font sizes?” “how can i set up my new gpu? what settings should i set?”
  47. water-t-shirt: the essence of a water-bed, in t-shirt form!
  48. deep antiques roadshow: the idea explains itself
  49. a being that is by-default inherently abstract, instead of inherently practical like us: for them, being practical would be really hard by default they live at the other end of the abstraction spectrum
  50. business card bowl: through all the business cards into a bowl; each day, call them, if they don’t want to do business, throw the card out
  51. “live blog”: whenever someone visits your blog, instead of reading articles, they get to open a chat window with you in your terminal then you tell them what you’ve been up to; and they can ask you questions
  52. software art: take all the source code; stack them up as if the line count is one slice, that’s the structure
  53. t-shirt whiteboard: in essence i.e. you can draw on the t-shirt, and the writing just washes out the next time then you can design whatever you want would this just work?
  54. physics simulation + diagrams: would be great to define things such as “two pieces of rope inter-twined”, and then “drop” them, but then let that resulting expression become a haskell diagrams Diagram, so that you can then do diagram stuff with it
  55. git version flattener: clone a git repo at every revision, into some folder.
  56. see-through jacket that is also warm: optionally also magnetic @sordina
  57. magnetic glass: @sordina
  58. heated keyboard: keeps your fingers warm
  59. run an experiment where monkeys/dogs/whatever are encouraged to learn some kind of programming to solve a task: i.e. a monkey gets 1 food package per day but if learns to program, using the tools provided to it, ( something like a giant physical version of scratch ) then it gets 3 food packages in some sense people have tried this, with them solving problems, but has anyone tried it where the tool they use to solve the problem is general, and can be applied to other areas of their life?
  60. tree to code: physical trees 1. order trees by the number of leaves 2. order code by the number of statements train a deeplearning network to map between these things then, trees can write computer programs @sordina
  61. ethical algorithms testing ground: related to the last two #409 #408 basically, people can sign up to be ethical tester algorithms can join to provide games for people to play how would it work?
  62. ethical testers: beta testers game testers ethics testers
  63. simulation for ethical machine learning problems: consider the situation: “how do i know if this algorithm X is unethical?” well, instead of waiting for the salespeople to tell you, you could just have it run in a simulated environment and see if it’s unethical by the way that it acts.
  64. minecraft file browser: walk around your filesystem in 3d
  65. ocr clipboard copy and paste: select an image region, send it to some text api thing, get the text back in the clipboard
  66. low-powered de-colourisation network: learns to convert colour -> black and white if it doesn’t do a good job, it’ll look awesome
  67. physical quine: a robot that can type on the computer and write code that writes the program that writes itself
  68. deep learning “do” networks: can you include the “do calculus” into neural networks somehow? to make it do some causal things?
  69. plot websites on cesium map, browse the internet that way.: web-world
  70. animated colour schemes for vim: the colour scheme rotates as you code
  71. a tale of three dickens: the movie: it’s an auto-generated movie from locally-coherent slices instead of the book, we make a movie, where all the scenes in the movie are interspersed based on “local coherence” i.e. from two movies select two people having a conversation with someone named bill or, flick between scenes at the beach @sordina
  72. revolutionary walls: the floor is fixed; but the walls are a tube you pull on some part of the wall to rotate it @tall-josh
  73. activation function composer: or more generally, a function composer 1. what does the graph of relu look like? 2. what about the graph of relu . tanh ? and so on, indefinitely and arbitrarily. some features: - what points should be push through? maybe could add certain kinds of initialisations and ranges - add things like drop-out and whatnot.
  74. record videos of people doing interviews but have their voice replaced by obama and their image replaced by obama
  75. hair cut & deep learning deal with the hair-dresser across the road: sit down for a hair cut, get an hour of deep learning consulting as well
  76. live action star wars playing out across many websites in the background of cesium js windows: on my website, a death-star is driving around on it’s way somewhere eventually it reachers your website, and destroy’s it’s logo, or something
  77. deep learning tcp or upd: find something inbetween
  78. meta-search in google: “i want to see all the alternatives to cloud-ranger” it’s impossible to do this search.
  79. umbrella-scarf / fresh-scarf: it’s a scarf but also, it has a hood that you can pull up, maybe even a clear hood, that let’s you see out front of it, but keeps you under cover could also keep smoke out of your face
  80. meme-net: watch video, extract meme i.e. and the rollsafe guy
  81. ultimate computer setup person: someone who just has the worlds best computer set up everything works no data is duplicated whole operating system exists in 1.5 gig; they’ve got 510 gig free no conda/ruby/stack issues
  82. codebase -> readme: looks at an entire codebase; learns to predict the readme
  83. divangulation theorem for websites: @sordina surfaces can be triangulated websites can be divangulated what are the associated theorems?
  84. tasting plates for saas, *aas: instead of just saying “sign up now for 6 months free”; just auto-sign people up for x free things, then let them use it up. easy way to get a billion more dollars for your saas business. @sordina
  85. self-skating skateboard: it drives down to the skate park; skates around on the pipes, does flips, 180s, griding, whatever. @sordina @tall-josh
  86. different password entry forms: 1. any password you type logs you in, but they all take you to a different computer. only your password takes to you yours. “honeypassword” 2. you password consists of the actual letters, but also the timing between the letters @sordina 3. any key you press is irrelevant, all that matters is the spacing; everything is done via morse-code (@geekscape)
  87. congratulations!!!
  88. meeting chaos monkey: every time a meeting is scheduled, a random attendee is replaced by some other random staff member
  89. consulto the consulting teddybear: @sordina “that sounds good in theory” “have you tried kan-ban’ing that” “moving forward that sounds good, but right now i think we should be pragmatic”
  90. small magnets in fabric that can attach to other magnets so-as to create customisable clothing: just put a diff design on by switching out the magnets i just need some small magnets. jaycar sells them
  91. “collaboration card”: some way of listing and engaging with people in various projects you’re interested in
  92. nlp self-defending thesis
  93. rent factor charged in the city based on how innovative your store is: hairdresser: f = 0.85 funky clothing store: f = 0.6 some weird shop that only sells whatever: 0.2 cafe: 1 or some kind of scheme like so
  94. e-fabric: like e-ink, but for fabric
  95. clothes that change colour with respect to the magnetic fields that are around it
  96. grand designs: of computer programs: follow the development of some kind of app, over a few years. hahahaha would be terrible.
  97. giant magnet that aligns all the spins of the atoms of objects (people?!) so that they can pass through each other with different polarisations
  98. dance-curve net
  99. shoes that look like little cars: volvo shoes monster-truck shoes lamboghini shoes fi-car shoes etc.
  100. augmentation reality glasses that convert what people are saying into words that float in front of you that you can read: so you can “hear” what people say to you when you’re wearing headphones
  101. “html/css layout searcher”, like visual image search, but for how to lay things out with flex/css/react/whatever: input: some scribble about how you want your content laid out in boxes: output: the css/html that achieves this. there’s some networks that do this already, where they convert the drawings to code. but maybe that can be augmented by thinking of it like a search across already-existing content?
  102. “relax ai” or “mindful story ai”: it makes up nice stories, like “you are walking on the beach, you see a small turtle; you follow the turtle for a swim in the water …” could also use cool accents of people, and make sure the story is consistent with another NLP after the first generative run
  103. comedy audience that instead of laughing they just say the things people say when they think something is funny: instead of “ahahaha” audience (in unison): “that’s funny” audience (in unison): “good one” audience (in unison): “great joke”
  104. Adversial NLP: a sentence so similar to another sentence as to be humanly-indistinguishable, but makes the AI think it’s something wildly different
  105. Inflatable Whiteboard Room: it’s a large room, inflatable like a balloon or whatever, but you can walk into it and use the internals of it as a whiteboard useful for offices
  106. Collaborative Password Manager: say i want to make a password for a system you will control, but we both need access to maybe i can have my program generate part of it; you’re program generate part, then combine them both on our independent computers, without the entire password leaving either of them could build this on top of the public keys on github somehow; so i just pick the github user i’m going to share a password with could clearly do this immediately by encrypting it with their public key, or something. but maybe something richer can be done
  107. Video Issues:
  108. Faux Project Management Generator Thing: it’s an RNN that generates hundreds of tickets in trello or jira or whatever; with arbitrary due dates makes you feel stressed @sordina could be used for project-management training scenarios
  109. quantum cppn
  110. AIWS: ai for aws you: “hey, i need to computer with whatever to be up at and to have some database, blah blah” aiws: “no worries, that’s set up for you!” alt. “talky-form for AWS” @sordina
  111. deep haircut mirror: a mirror infront of hair-dressers that lets you look at potential haircuts on your own head
  112. train a network to learn when to laugh in response to jokes: deep-audience
  113. dance led prompt device: it’s a little led board that sits at the front of a dance thing, like a teleprompter, but for dance it puts out the next dance moves a dance-prompter move-prompter
  114. Easter Egg Evangelist for Enterprises (E^4): A floating employee who embeds on teams to consult on how to best add easter-eggs to the features they build.
  115. self-driving food truck: @martiningram
  116. Stabbucks: Starbucks for knives. * * Order venti, grande, etc knives
  117. BrainRank: A leaderboard of brain-shaped logos.
  118. DeliveryNet: Reads prose with impeccable timing.
  119. Rant Meetup: Rant about stuff that sucks. * No solutions allowed * Surely james has something to say
  120. submit an AI-entry to every large festival in melbourne in a single year:
  121. Stochasm: Metal band that plays random notes. * Easy to swap out band members!
  122. Seinfreinds: Have the cast of one sitcom act out an episode of another and see if anyone notices.
  123. hire a comedian to come along to your meetings: they can provide background entertainment me: “hey nice to meet you, this is my associate jerry seinfeld, let’s get started” jerry: “what’s the deal with peanuts?” …
  124. stacked double-coffee-cup holder: it’s just a handle, that holds on to two cups, one above the other useful for carrying multiple cups
  125. Auto-generating face detection camouflage: Aka, auto-generating styles from
  126. use the technology of marina (ShapeTex) to make little movable people in jackets:
  127. “studio gan”: it just makes up every single thing, much like #341 , but in more depth and for everything could use for #343 for example.
  128. the journey of your parcel: imagine you’re waiting for a parcel from auspost you put on your VR headset and you get a real-time view into it’s life; maybe it’s sitting on a boat, on it’s way here, or it’s in an airplane, or it’s driving etc. you’d get a full HD video-style image of the thing moving, that would be completely imagined by a gan or something.
  129. menu democracy: buy a coffee, earn 1 voting right to change the menu in some way buy more coffees, proceed in this fashion other food yields you more votes
  130. dynamic videos built on the fly to answer standard google queries: i.e “use python requests to do post request” a video could be made on the fly using the celeb-generating stuff of stack-gan, then the voice-simulation stuff of lyrebird or whoever, then the lip-moving stuff, the text-to-speech of wavenet or whatever, and some other random backing scene gans and music production networks it would get the content by reading the first answer it finds on google, in some summarised way. @sordina
  131. fully-automated fashion design: 1. Fashion-MNIST CPPN - At random, pick a random item of clothing, figure out what it is, and generate a large version. 2. Pick a random (creative-commons) photo from Flickr, train a style transfer network on it. 3. Apply the style transfer to a bunch of different clothing items? To make a theme? 4. Pick a name from an RNN? 5. Upload to PAOM? Run-time should be several hours for one collection? Not so bad.
  132. remote-controlled magnet: a perfectly spherical magnet that can be rolled around by remote.
  133. use cppn to generate a 3d landscape by determining the height by the colour
  134. lunch formation yaml specification: example: lunch: - sandwhich: - bread - butter - lettuce - cucumber - butter - bread region: cbd elements are ordered by height on the plate. @sordina
  135. a tale of 3 dickens: combine: 1. a christmas carol 2. a tale of two cities 3. great expectations in order page-by-page. @sordina
  136. instead of colouring in the retro-haskell tee with colours, print the source code for the program itself in the previous colour: easy!
  137. reverse twospace - use offices for other purposes out of hours: silverpond -> t-shirt business on the weekends
  138. dance karaoke: like karaoke, but instead of singing you need to dance uses some pose-recognition thing
  139. build a markov chain thing and then run all the words through some “smoothing” operation by way of a word embedding: i.e. somehow pick a few lines within embedding space, and move all of the words closer to those lines maybe something would happen
  140. naive nn visualisation: just reshape all the weights to be in the shape of an image, normalise the values, and output it.
  141. map sentences to “the gist”; just a few words: “an embedding that compresses a piece of text to its core concepts” “like if i can compress an image” “then i should be able to compress a book” “and if i can do that that means that i can also write a compressed book” “and have the neural network write my book” @gacafe
  142. version number which, in ascii, eventually approaches the source code itself
  143. personal world map: it’s one of those scaled world maps, where the scaling is determined by say your gps coordinates over a given year, so that it only enlarges the places you go. @mobeets
  144. water doughnut
  145. Deep-Can-I-Do-Deep-Learning-Here?: it’s a network for which you input a situation and it tells you if you can use deep learning to help.
  146. haskell type journey challenge: get from package x to package y using only the following types once ….
  147. make the 3d wall art that we saw at the house of sonya
  148. novels in binder-form so that you can take out small sections of the pages and read them
  149. multi-agent learning where the agents also watch each other locally and learn from each other
  150. ml for learning the life/centers function from christopher alexander: two pictures which one has more life? alt. something about centers?
  151. bureaucratic-net: instead of a network that is really good at explanationing it’s decisions, this network is really bad at it. nothing it says makes sense, or alternatively it’s really long-winded in it’s responses. or maybe it’s always right, but it never has any idea why.
  152. artistic-arxiv: instead of papers, each day take a random few images from every paper and show that. maybe it’d be cool.
  153. DeepWiki: on normal wikipedia, humans edit pages about concepts in the form of words on deepwiki AIs edit concepts in the form of embeddings by way of adjusting the vectors (or something) they’d need to think about how to manage edits and revisions and so on. but that’s the general idea. @sordina
  154. secret walls: wear the streets (or: graffiti on a wall of clothes; and wear them)
  155. a network that is given the punchline and has to work out the setup: @icp-jesus
  156. reverse website or inverse website: normally, you visit a site and see the website and you can view source to see the source what about if you could visit a site and see the source, then view the source to see the site
  157. endless pasta hat: has a self-pesto’ing tube that pushed out a long piece of spaghetti that you can munch on.
  158. in the gan setting, the discriminator isn’t needed when generating, maybe there’s another setting where the discriminator is still useful at the generative stage?
  159. a jacket that makes amazon’s automated shopping thing think you’re a packet of chips: or something similar
  160. CompromiseApp: two people need to agree on something they both have the app person 1 rates the estimated compromise, on a scale, of person 2 person 2 likewise both people record their own true compromise values then, over time, there’s a bunch of things that can be done, such as comparing predicted compromises, total compromises made, etc.
  161. DerivativeNet: it watches all seinfield episodes and sees if it can generate curb your enthusiasm episodes it reads all smbc comics and sees if it can generate xkcd ones etc.
  162. Cap-Gun mechanical keyboard: You pull back a bunch of hammers then as you type it fires the caps.

February 21, 2019

Patrick Louis (venam)

February 2019 Projects February 21, 2019 10:00 PM

The new year has begun… A while ago!

My last post Was almost 9 months ago, more than half a year has passed. A lot has happened but I still feel like time has passed quickly.

Psychology, Philosophy & Books

Language: brains

Les fleurs du mal

The majority of my reading has been through the newsletter, however I still had the time to go through the following:

  • The man who mistook his wife for a hat - Oliver Sacks
  • Les fleurs du mal - Baudelaire
  • Tribal leadership - Dave Logan
  • Managing business through human psychology - Ashish Bhagoria
  • The new one minute manager - Ken Blanchard
  • Authentic leadership - Harvard Business Review
  • The one thing - Garry Keller

As you might have noticed a lot of them are related to methodologies, approaches of interaction with others, and new ways of thinking. I find it fascinating to gather novel ways of seeing the world, all the mental models, all the manners to make decisions. This was the mindset for most of the past months along with re-energizing, invigorating, and bringing back my artistic sense.
A long long time ago I used to be in love with poetry and thus I’m slowly reincorporating this old habit into my daily life. Too much rationality is a disease of these days and ages which I’m trying not to fall into. I’m also working on my personal “immunity to change” regarding over-planning, so all of this helps.

As far as podcasts go, here’s the new list apart from the usual ones I was already following:

  • The Knowledge Project with Shane Parrish [All]
  • Planet Money [All]
  • The Food Chain [All]
  • Science Friday [All]
  • Hi-Phi Nation [All]
  • The History of GNOME [All]
  • The End Of The World with Josh Clark [All]
  • The Darknet Diaries [Most]
  • LeVar Burton Reads [Most]
  • Team Human [Started]

I’ve gathered more than one thousand hours those past months in AntennaPod, the application I’m using to listen to podcast (Something like 3h a day everyday). So I thought of moving away from the podcast world, at least for a while, for the next 2 or 3 months to learn something else on the road. I’ve chosen to dedicate this time to practicing singing. We’ll see what comes out of it, so far it’s going great but there needs to be a day or two a week for resting.

Learning & Growth


Face to face

I go through phases of learning and then creating. These months it’s been about learning. The emphasis was on work life, management, leadership, and android.

I have in mind some ideas for applications I want to build and I’m slowly gathering info, and in the meantime having fun, learning the various aspects of the android ecosystem.

On the other side, I’m working on my “immunity to change”, something I’ve learned from a Robert Kegan’s book. This relates to how we unknowingly create toxic loops within our lives that stop us to do actual changes we would like to do because those loops inherently defines us. For me it’s about an obsession with time, delegation, and loosening the grip over control of time.
Thus the switch to reading and learning in those leadership, management, and emotional intelligence books instead of digging deeper into projects one after the other like time is running out fast. I would’ve dismissed such content before and the same goes for the reemergence of artistic hobbies in my day to day.

Upgrading the Old

Language: C++

Manga downloader GUI

On that same note, I’ve gone in the past to upgrade an old project of mine, I’ve upgraded my manga downloader to support webtoons, a popular free Korean manwha website.

Get it here.

It has been fun re-reading old code, checking out what kind of mindset I had while writing it. I can’t access that it’s great code, nor that I would write it the same way today, but hey it’s still standing!

Newsletter and community

Language: Unix

nixers sticker

Countless videos, talks, research papers, and articles about Unix were consumed with much pleasure!
The newsletter has had its two years anniversary (issue 104), it has been a blast with a drawing from rocx, a ricing tip section from xero, and the start of a partnership with vermaden.

Every week we’re learning more and topics start to link together in a spider web of ideas. I like to reference previous sections of the newsletter when things are related and long time readers might enjoy this too (Check the archive).

To the joy of the now ~400 readers, vermaden has now partnered to give a nice touch to the newsletter, he’s more into BSD news than I am which keeps the balance.

I’ve dropped the “Others” part of the newsletter because of criticism that it was too offtopic and had a hint of political ideas to it. Let’s cut it short and simply say that I’d rather leave that section out than play the so-trendy game of internet over-interpretative argumentations and red herrings.

As for the podcast, I couldn’t put it on my top list but still felt in my guts that I wanted to write or prepare something similar. So I’ve begun a series named “Keeping time and date” which is similar to having a podcast/research on it. I’m hoping to do more of those in the future. By the time this post is online the series should almost be over.

On the nixers community side of things, the forums don’t get much appreciation but the irc conversation is still going on. I might organize events in the near future but I’m hoping a community push will do some good. My guess is that the forums aren’t as active because it’s seen as too high a pedestal to contribute to, nobody thinks their ideas are worth sharing there and if they find it worth sharing they’d rather do that on their personal platform, which I totally understand.

2bwm and Window management

Language: C


2bwm now supports a new fancy EWMH for full screen messages. This is a change I wanted to make for a while but didn’t put on my list.

Fortunately, frstrikerman has had the courtesy of starting a pull request with enough energy to make it happen. I’ve guided him a bit and together we’ve implemented it.
I’m looking forward to writing more about changes I want to make, how to make them, but leave the door open for the changes to be done by contributors. This creates a learning opportunity for anyone interested.

Three new articles have seen the light of day on my blog, all of them, unexpectedly, in close tie with the topic of window management. I’ve also added some preloading/caching behavior on the index page of the blog.

All of the articles in themselves were popular although the window automation one got quite a crowd (25k+) the first two days coming from tech news aggregation websites. Clearly, someone had posted it and it attracted readers, all for the best.

Ascii Art & Art

Language: ASCII


Many packs have been posted by impure during the past period, impure 69, impure 70, and impure 71.
In 69 all I could manage to pull of was an abstract piece and in 70 and 71 I’ve had a cretaceous dinosaurs series.

In between, there was the Evoke 2018 where I scored fifth position. The idea of that compo was to restrict the size of the piece to 80 columns by 25 lines.

trapped within

I’ve also indirectly participated in Demosplash 2018 and scored the tenth position. I hadn’t really done anything particular for it but simply compiled together all the dinosaurs I had until that point.


A similarity you might have noticed is that I’ve toned down on the coloring. I’m trying to extract as much as possible from the art without having to think about anything other than message and form.

To put it bluntly I’ve turned off color mode:

syntax off

Right now I’m letting myself flow through a novel totem piece, a similar but more pronounced style to the one I had for Evoke 2017, that I call “dogmaIamgod” and which I should publish in the next two or three weeks.

Likewise, I’ve done some drawings and paintings too. I bought canvases for the first time and started experimenting, though I haven’t put them on the blog yet. Here’s a peek:


Life and other hobbies

Language: life

Fungi board game

The quest for wild mushrooms is still ongoing. My SO and I have been gathering and studying mushrooms almost everyday. /r/mycology has become one of my favorite place to browse on lazy mornings.

In autumn we went on many small hikes and some bigger ones too without much luck but with much fun. All we could find were amanita citrina and beautiful though non-edible elfin saddles. There were other non-interesting species too.

elfin saddle

However, we’re not giving up, we’re taking this hobby to the next step. We lately went shopping for some hiking equipments, brand new fancy boots, some sticks, a handmade straw basket, etc.. And we’re planning on going on more hikes in spring. Spring is not a high season for edibles but we’re still hoping to find morels as it’s their prime time.

Overall this hobby has got us closer together and brought excitement to our couple. We’ve even got a card game named fungi for valentine. Moreover, we’ve searched local places that serve wild mushrooms and went to all the ones we could find. They’re usually pricey but worth the culinary experience.

Other than that with my SO we’ve got into retro gaming, which I’ve written about before in the past. These days it has become trendy again and many console manufacturers are relaunching their old brand. I guess nostalgia is a market that is well tapped into.

I’m actively looking for mushroom books to order from local libraries but most of them are not available so I end up reading the PDF versions I can find online. I’ve also been binge watching on Carluccio’s mushroom recipes, such a master.

This got me back into honing my cooking knowledge and art. I got tired of overcooking on Sunday for the rest of the week so I’m trying to juggle 3 days a week of home-cooked meals with restaurants or others for the rest of the week.
Which all drove me into two ideas.

First, the idea for a specific cooking diary app.
Second, the creation of a Zomato account. This goes with the same mindset as when I created the Google map account. I want to contribute to the local community by sharing the places I like the most. My mantra is that there will be no bad reviews and only constructive criticism if any.

Lastly, on the topic of food, fermentation has got my attention and I’ve now got mason jars filled with awesome vegetables. I’m currently on my third batch and exploring different formulas.


When appetite is good, life’s good.

And life is!
Every year I normally fix a certain theme to rotate around and focus on. This time I’ve chosen to awaken my artistic side, spend more time in nature, organize more activities with my friends, write and contribute more to communities I’m part of through what I know best.

I did try to refresh my Spanish tongue but it wasn’t really part of anything and I didn’t follow up on it.

And to finish of, I’ve begun a daily diary. A quick summary of what I’ve done during the day, what’s on my mind, what I feel, what I want to do next. It’s a complement to what I was already doing through short, medium, and long term goals, associating it with my global path and intentions in life. In general this is revitalizing to do at the end of the day, it makes me more aware of my actions, appreciate the good parts and reflect on what could be done in the future.


Which all leads to what’s in store for tomorrow.
More of the same but upgraded.

I really want to work on the idea I got for the application. Contribute to community projects more, write more articles about what I know, share ideas. Obviously the newsletter is included with more mini-series.
2bwm needs a big shake to add the wanted EWMH.
Maybe I’ll get back into WeChall, though it’s not part of the priority of the year.
And definitely push other hobbies too!

I’m going to travel in June with a friend to New York and then Miami, that’ll shift my perspective for a while, maybe bring some new insights.

This is it!

As usual… If you want something done, no one’s gonna do it for you, use your own hands.
And let’s go for a beer together sometime, or just chill.

Joe Nelson (begriffs)

Browsing a remote git repository February 21, 2019 12:00 AM

Git has no built-in way to browse and display files from a remote repo without cloning. Its closest concept is git ls-remote, but this shows only the hashes for references like HEAD or master, and not the files inside.

I wrote a server to expose a git repo in a new way. Watch the short demo:

Watch Video

How to try it yourself

You can get the code at begriffs/gitftp. It’s currently a proof of concept. Once I’ve added some more features I’ll run a public server to host the project code using the project itself.

The server is written in C and requires only libgit2. It’s small and portable.

Why do it this way?

The standard solution is to use a web interface like GitHub, GitLab, cgit, stagit, klaus, GitWeb, etc. However these interfaces are fairly rigid, and don’t connect well with external tools. While some of these sites also provide RESTful APIs, the clients available to consume those APIs are limited. Also desktop clients for these proprietary services are often big Electron apps.

By serving a repo behind an FTP interface, we get these benefits:

  • Web browser supported but not required
  • Minimized network traffic, sending just the file data itself
  • Supported on all platforms, with dozens of clients already written
  • Support for both the command line and GUI

GitFTP reads from a git repo’s internal database and exposes the trees and blobs as a filesystem. It reads from the master branch, so each new connection sees the newest code. Any single connection sees the files in a consistent state, unchanging even if new commits happen during the duration of the connection. The FTP welcome message identifies the SHA being served.


This is a proof of concept. I’m putting it out there to gauge general interest. If we want to continue working on it, there are plenty of features to add, like providing a way to browse code at different commits, supporting SFTP so files cannot be changed by a man in the middle, etc etc. See the project issues for more.

February 20, 2019

Indrek Lasn (indreklasn)

How to create a blazing fast modern blog with Nuxt and Prismic February 20, 2019 04:25 PM

Let’s build a modern blog with Vue, Nuxt and Prismic.

I chose Vue + Nuxt since they’re fun to work with. It’s easy to start with, offers plenty of essential features out of the box, and provides good performance.

If you’re new to Vue, I encourage to take a look at this article for understanding the basics.

Nuxt is a Vue framework for server side rendering. It is a tool in the Vue ecosystem that you can use to build server-rendered apps from scratch without being bothered by the underlying complexities of rendering a JavaScript app to a server.

Why Nuxt?

Nuxt.js is an implementation of what we call a Universal Application.

It became famous with React but is currently getting more and more popular for many client-side libraries such as Angular, Vue.js, etc.

A Universal Application is a kind of application that renders your component on the server side.

Nuxt.js offers a simple way to first retrieve your asynchronous data from any data source and then renders it and sends it the the browser as HTML.

In terms of SEO, the Google bot crawler will get the rendered content and index it properly. In addition to that, the fact that your content can be pre-rendered and ready to be served increases the speed of your website and in that way, it also improves your SEO.

The Nuxt ecosystem is a never ending stream of handy tools and packages.


Fast rendering ensured by virtual DOM and minimal load time

Vue.js is only ~30 KB gzipped with the core module, the router and Vuex.

A minimal footprint offers short load time, meaning higher speed for users and better ranking on the speed criterium for Google crawler.

Virtual DOM!

Vue.js also took inspiration in ReactJS by implementing Virtual DOM under the hood since the version 2.0. Virtual DOM is basically a way to generate a version of the DOM in-memory each time you change a state and compare it to the actual DOM, so you can update only the part that needs to be updated instead of re-rendering everything.


Vue.js offers some really good overall performance as you can see on the following benchmarks:

Duration in milliseconds ± standard deviation (Slowdown = Duration)

Memory allocation in MB

(Source: third-party benchmarks by Stefan Krause)

What is Prismic and why should I care?

Prismic is a headless CMS. This means you edit your templates on your own server, but the backend runs on the cloud. This presents a few advantages such as being able to use an API to feed your content into external apps.

Imagine that you built a blog, not for yourself, but for someone else who is not a developer, so they can edit their content. You want to have full control over the layout (built with Vue) but you don’t want to go over the tedious process of deploying every time a new file for a new blog post is created.

This is where including a headless content management system (CMS) into your app is useful — so you don’t have to deal with that.

What’s the difference between a headless CMS and vanilla CMS?

A traditional CMS like Wordpress would provide the editing tools for managing content. But, it also would assume full control of the front-end of your website — the way the content is displayed is largely defined in a CMS.

Headless content management system, or headless CMS, is a back-end only content management system (CMS) built from the ground up as a content repository that makes content accessible via a RESTful API for display on any device.

If you want to know more, Prismic wrote a clear article about headless cms

I chose Prismic as my headless CMS — it’s super simple to set up and has great features out of the box.

Why I chose Prismic

  • Easy to setup — took me only couple hours to set-up the environment and push to production.
  • Live Preview mode — allows editors to preview their content on their website and apps whether it’s in a draft or scheduled to be published later. This allows marketing teams for example to have a full preview of their website for a specific date and time. This can be extremely useful to manage upcoming blog releases, and preview edits.
  • Slices — Slices are reusable components. Enabling Slices in your template will allow writers to choose between adding a text section, an image, or a quote in the piece of content they are creating. This gives writers the freedom to compose a blog post by alternating and ordering as many of these choices/content blocks as they want/need.
  • Simple and comprehensive documentation.
  • Strong community, e.g Google, New Relic, Ebay, etc are using Prismic
  • Friendly free tier

Setting up Prismic is very simple, let’s get started!

Head over to the Prismic website and create a new user.

After creating a new user on Prismic, we should see something like this:

Building our custom type

Custom Types are models of content that we setup for our marketing or writing team. The marketing team will fill them with content (text, images, etc.), and we’ll be able to retrieve this content through Prismic’s API.

There are two kinds of Custom Types: the Single Type and the Repeatable type.

The Single Type is used for pages of which there is only one instance (a home page, a pricing page, an about us page).

Repeatable Custom Types will be templates used in more than one document (ie. having many blog post pages, product pages, landing pages for your website).

We want a blog post. In fact we want many blog posts, so it should be a repeatable type.

choosing the type

Creating a repeatable type blog post.

We should be on the content builder now. Prismic gives us a lot of options to choose from. If you look on the right, you should see a sidebar including lots of options — from images, titles, content relationship to SEO options.

Let’s build a reusable blog post with the prismic builder. Our blog will include a title and the body.

Start with adding the following fields:

  • UID field
  • Title field
  • Rich text field

Each time you add a field you can define formatting options for it. The UID field is a unique identifier that can be used specifically to create SEO and user-friendly website URLs

Creating our blog post title

Don’t forget to save our progress!

Make sure you have the following fields for the blog post:

  • uid
  • blog_post_title
  • blog_content

So far we have the layout for our reusable blog post.

Custom types menu

Time to create a blog post! Head over to the content tab on the left.

Content tab

This will take us to the blog layout we built earlier. Insert the desired text for the uid, post_title, blog_content blocks.

Building our page with Prismic layout builder

Great! We have our blog post set up now. Look at the right-top, we should see a “save” button. Clicking this saves our progress. After saving we can publish our content. Publishing the content makes it available via the API for our front-end to consume.

Starting a new Nuxt project

Open your terminal and run this command. Make sure you have npx installed (shipped by default with npm +5.2.0)

$ npx create-nuxt-app vue-nuxt-prismic-blog

The Nuxt installer conveniently asks us our preferences and creates the project.

We should end up with a project structure like below:

Nuxt project structure

Great! Let’s build our blog now. We need to fetch the blog content from Prismic. Luckily, prismic gives us plenty of handy tools.

The prismic-javascript package includes many utilities, including fetching from our api. The prismic-dom gives us helper functions to render markup.

Prismic NPM package —

Let’s quickly create the prismic.config.js file in our root directory. This is where we’ll place our Prismic related configuration.

Note: Make sure you use the API endpoint associated with your blog.

Open the pages/index.vue file and import the Prismic library with our config.

Great! Next, we have to call the API somewhere, let’s use the asyncData life-cycle hook for this.

First, we initialize our API with the endpoint. Then, we query the API to return our blog post. We can specify the language and document type.

The Prismic API is promise based, which means we can call the API and chain promises. Hurray for promises. We also can use the async/await syntax to resolve promises. Check out this post I wrote about aysnc/await.

Prismic response

All we need to do is render the markup now!

There you go. We successfully fetched our blog post from the Prismic API.

Applying the styles — grab a copy and place it in the style section of the Vue component:

If we open our app, this is what we should see.

End result

Voilà! We have a modern server-side rendered blog built with Nuxt and Prismic.

We barely scratched the surface — we can do a lot more with Nuxt and Prismic. My favorite Prismic features are Slices and Live Preview. I encourage you to check them out!

Slices will allow you to create “dynamic” pages with richer content, and Live preview will allow you to instantly preview your edits in your webpage.


For example in this project we worked on only one post, but if we had created lots of posts in Prismic, then one really cool thing about Nuxt.js is that automatically creates routes for you.

Behind the scenes, it still uses Vue Router for this, but you don’t need to create a route config manually anymore. Instead, you create your routing using a folder structure — inside the pages folder. But you can read all about that in the official docs on routing in Nuxt.js.

Thanks for reading! If you found this useful, please give the article some claps so more people see it! ❤

In case you got lost, here’s the repository for our blog:


If you have any questions regarding this article, or anything general — I’m active on Twitter and always happy to read comments and reply to tweets.

Here are some of my previous articles:

How to create a blazing fast modern blog with Nuxt and Prismic was originally published in on Medium, where people are continuing the conversation by highlighting and responding to this story.

February 19, 2019

Dan Luu (dl)

Randomized trial on gender in Overwatch February 19, 2019 12:00 AM

A recurring discussion in Overwatch (as well as other online games) is whether or not women are treated differently from men. If you do a quick search, you can find hundreds of discussions about this, some of which have well over a thousand comments. These discussions tend to go the same way and involve the same debate every time, with the same points being made on both sides. Just for example, these three threads on reddit that spun out of a single post that have a total of 10.4k comments. On one side, you have people saying "sure, women get trash talked, but I'm a dude and I get trash talked, everyone gets trash talked there's no difference", "I've never seen this, it can't be real", etc., and on the other side you have people saying things like "when I play with my boyfriend, I get accused of being carried by him all the time but the reverse never happens", "people regularly tell me I should play mercy[, a character that's a female healer]", and so on and so forth. In less time than has been spent on a single large discussion, we could just run the experiment, so here it is.

This is the result of playing 339 games in the two main game modes, quick play (QP) and competitive (comp), where roughly half the games were played with a masculine name (where the username was a generic term for a man) and half were played with a feminine name (where the username was a woman's name). I recorded all of the comments made in each of the games and then classified the comments by type. Classes of comments were "sexual/gendered comments", "being told how to play", "insults", and "compliments".

In each game that's included, I decided to include the game (or not) in the experiment before the character selection screen loaded. In games that were included, I used the same character selection algorithm, I wouldn't mute anyone for spamming chat or being a jerk, I didn't speak on voice chat (although I had it enabled), I never sent friend requests, and I was playing outside of a group in order to get matched with 5 random players. When playing normally, I might choose a character I don't know how to use well and I'll mute people who pollute chat with bad comments. There are a lot of games that weren't included in the experiment because I wasn't in a mood to listen to someone rage at their team for fifteen minutes and the procedure I used involved pre-committing to not muting people who do that.

Sexual or sexually charged comments

I thought I'd see more sexual comments when using the feminine name as opposed to the masculine name, but that turned out to not be the case. There was some mention of sex, genitals, etc., in both cases and the rate wasn't obviously different and was actually higher in the masculine condition.

Zero games featured comments were directed specifically at me in the masculine condition and two (out of 184) games in the feminine condition featured comments that were directed at me. Most comments were comments either directed at other players or just general comments to team or game chat.

Examples of typical undirected comments that would occur in either condition include ""my girlfriend keeps sexting me how do I get her to stop?", "going in balls deep", "what a surprise. *strokes dick* [during the post-game highlight]", and "support your local boobies".

The two games that featured sexual comments directed at me had the following comments:

  • "please mam can i have some coochie", "yes mam please" [from two different people], ":boicootie:"
  • "my dicc hard" [believed to be directed at me from context]

During games not included in the experiment (I generally didn't pay attention to which username I was on when not in the experiment), I also got comments like "send nudes". Anecdotally, there appears to be a different in the rate of these kinds of comments directed at the player, but the rate observed in the experiment is so low that uncertainty intervals around any estimates of the true rate will be similar in both conditions unless we use a strong prior.

The fact that this difference couldn't be observed in 339 games was surprising to me, although it's not inconsistent with McDaniel's thesis, a survey of women who play video games. 339 games probably sounds like a small number to serious gamers, but the only other randomized experiment I know of on this topic (besides this experiment) is Kasumovic et al., which notes that "[w]e stopped at 163 [games] as this is a substantial time effort".

All of the analysis uses the number of games in which a type of comment occured and not tone to avoid having to code comments as having a certain tone in order to avoid possibly injecting bias into the process. Sentiment analysis models, even state-of-the-art ones often return nonsensical results, so this basically has to be done by hand, at least today. With much more data, some kind of sentiment analysis, done with liberal spot checking and re-training of the model, could work, but the total number of comments is so small in this case that it would amount to coding each comment by hand.

Coding comments manually in an unbiased fashion can also be done with a level of blinding, but doing that would probably require getting more people involved (since I see and hear comments while I'm playing) and relying on unpaid or poorly paid labor.

Being told how to play

The most striking, easy to quantify, difference was the rate at which I played games in which people told me how I should play. Since it's unclear how much confidence we should have in the difference if we just look at the raw rates, we'll use a simple statistical model to get the uncertainty interval around the estimates. Since I'm not sure what my belief about this should be, this uses an uninformative prior, so the estimate is close to the actual rate. Anyway, here are the uncertainty intervals a simple model puts on the percent of games where at least one person told me I was playing wrong, that I should change how I'm playing, or that I switch characters:

table {border-collapse: collapse;}table,th,td {border: 1px solid black;}td {text-align:center;}

Cond Est P25 P75
F comp 19 13 25
M comp 6 2 10
F QP 4 3 6
M QP 1 0 2

The experimental conditions in this table are masculine vs. feminine name (M/F) and competitive mode vs quick play (comp/QP). The numbers are percents. Est is the estimate, P25 is the 25%-ile estimate, and P75 is the 75%-ile estimate. Competitive mode and using a feminine name are both correlated with being told how to play. See this post by Andrew Gelman for why you might want to look at the 50% interval instead of the 95% interval.

For people not familiar with overwatch, in competitive mode, you're explicitly told what your ELO-like rating is and you get a badge that reflects your rating. In quick play, you have a rating that's tracked, but it's never directly surfaced to the user and you don't get a badge.

It's generally believed that people are more on edge during competitive play and are more likely to lash out (and, for example, tell you how you should play). The data is consistent with this common belief.

Per above, I didn't want to code tone of messages to avoid bias, so this table only indicates the rate at which people told me I was playing incorrectly or asked that I switch to a different character. The qualitative difference in experience is understated by this table. For example, the one time someone asked me to switch characters in the masculine condition, the request was a one sentence, polite, request ("hey, we're dying too quickly, could we switch [from the standard one primary healer / one off healer setup] to double primary healer or switch our tank to [a tank that can block more damage]?"). When using the feminine name, a typical case would involve 1-4 people calling me human garbage for most of the game and consoling themselves with the idea that the entire reason our team is losing is that I won't change characters.

The simple model we're using indicates that there's probably a difference between both competitive and QP and playing with a masculine vs. a feminine name. However, most published results are pretty bogus, so let's look at reasons this result might be bogus and then you can decide for yourself.

Threats to validity

The biggest issue is that this wasn't a pre-registered trial. I'm obviously not going to go and officially register a trial like this, but I also didn't informally "register" this by having this comparison in mind when I started the experiment. A problem with non-pre-registered trials is that there are a lot of degrees of freedom, both in terms of what we could look at, and in terms of the methodology we used to look at things, so it's unclear if the result is "real" or an artifact of fishing for something that looks interesting. A standard example of this is that, if you look for 100 possible effects, you're likely to find 1 that appears to be statistically significant with p = 0.01.

There are standard techniques to correct for this problem (e.g., Bonferroni correction), but I don't find these convincing because they usually don't capture all of the degrees of freedom that go into a statistical model. An example is that it's common to take a variable and discretize it into a few buckets. There are many ways to do this and you generally won't see papers talk about the impact of this or correct for this in any way, although changing how these buckets are arranged can drastically change the results of a study. Another common knob people can use to manipulate results is curve fitting to an inappropriate curve (often a 2nd a 3rd degree polynomial when a scatterplot shows that's clearly incorrect). Another way to handle this would be to use a more complex model, but I wanted to keep this as simple as possible.

If I wanted to really be convinced on this, I'd want to, at a minimum, re-run this experiment with this exact comparison in mind. As a result, this experiment would need to be replicated to provide more than a preliminary result that is, at best, weak evidence.

One other large class of problem with randomized controlled trials (RCTs) is that, despite randomization, the two arms of the experiment might be different in some way that wasn't randomized. Since Overwatch doesn't allow you to keep changing your name, this experiment was done with two different accounts and these accounts had different ratings in competitive mode. On average, the masculine account had a higher rating due to starting with a higher rating, which meant that I was playing against stronger players and having worse games on the masculine account. In the long run, this will even out, but since most games in this experiment were in QP, this didn't have time to even out in comp. As a result, I had a higher win rate as well as just generally much better games with the feminine account in comp.

With no other information, we might expect that people who are playing worse get told how to play more frequently and people who are playing better should get told how to play less frequently, which would mean that the table above understates the actual difference.

However Kasumovic et al., in a gender-based randomized trial of Halo 3, found that players who were playing poorly were more negative towards women, especially women who were playing well (there's enough statistical manipulation of the data that a statement this concise can only be roughly correct, see study for details). If that result holds, it's possible that I would've gotten fewer people telling me that I'm human garbage and need to switch characters if I was average instead of dominating most of my games in the feminine condition.

If that result generalizes to OW, that would explain something which I thought was odd, which was that a lot of demands to switch and general vitriol came during my best performances with the feminine account. A typical example of this would be a game where we have a 2-2-2 team composition (2 players playing each of the three roles in the game) where my counterpart in the same role ran into the enemy team and died at the beginning of the fight in almost every engagement. I happened to be having a good day and dominated the other team (37-2 in a ten minute comp game, while focusing on protecting our team's healers) while only dying twice, once on purpose as a sacrifice and second time after a stupid blunder. Immediately after I died, someone asked me to switch roles so they could take over for me, but at no point did someone ask the other player in my role to switch despite their total uselesses all game (for OW players this was a Rein who immediately charged into the middle of the enemy team at every opportunity, from a range where our team could not possibly support them; this was Hanamura 2CP, where it's very easy for Rein to set up situations where their team cannot help them). This kind of performance was typical of games where my team jumped on me for playing incorrectly. This isn't to say I didn't have bad games; I had plenty of bad games, but a disproportionate number of the most toxic experiences came when I was having a great game.

I tracked how well I did in games, but this sample doesn't have enough ranty games to do a meaningful statistical analysis of my performance vs. probability of getting thrown under the bus.

Games at different ratings are probably also generally different environments and get different comments, but it's not clear if there are more negative comments at 2000 than 2500 or vice versa. There are a lot of online debates about this; for any rating level other than the very lowest or the very highest ratings, you can find a lot of people who say that the rating band they're in has the highest volume of toxic comments.

Other differences

Here are some things that happened while playing with the feminine name that didn't happen with the masculine name during this experiment or in any game outside of this experiment:

  • unsolicited "friend" requests from people I had no textual or verbal interaction with (happened 7 times total, didn't track which cases were in the experiment and which weren't)
  • someone on the other team deciding that my team wasn't doing a good enough job of protecting me while I was playing healer, berating my team, and then throwing the game so that we won (happened once during the experiment)
  • someone on my team flirting with me and then flipping out when I don't respond, who then spends the rest of the game calling me autistic or toxic (this happened once during the experiment, and once while playing in a game not included in the experiment)

The rate of all these was low enough that I'd have to play many more games to observe something without a huge uncertainty interval.

I didn't accept any friend requests from people I had no interaction with. Anecdotally, some people report people will send sexual comments or berate them after an unsolicited friend request. It's possible that the effect show in the table would be larger if I accepted these friend requests and it couldn't be smaller.

I didn't attempt to classify comments as flirty or not because, unlike the kinds of commments I did classify, this is often somewhat subtle and you could make a good case that any particular comment is or isn't flirting. Without responding (which I didn't do), many of these kinds of comments are ambiguous

Another difference was in the tone of the compliments. The rate of games where I was complimented wasn't too different, but compliments under the masculine condition tended to be short and factual (e.g., someone from the other team saying "no answer for [name of character I was playing]" after a dominant game) and compliments under the feminine condition tended to be more effusive and multiple people would sometimes chime in about how great I was.

Non differences

The rate of complements and the rate of insults in games that didn't include explanations of how I'm playing wrong or how I need to switch characters were similar in both conditions.

Other factors

Some other factors that would be interesting to look at would be time of day, server, playing solo or in a group, specific character choice, being more or less communicative, etc., but it would take a lot more data to be able to get good estimates when adding it more variables. Blizzard should have the data necessary to do analyses like this in aggregate, but they're notoriously private with their data, so someone at Blizzard would have to do the work and then publish it publicly, and they're not really in the habit of doing that kind of thing. If you work at Blizzard and are interested in letting a third party do some analysis on an anonymized data set, let me know and I'd be happy to dig in.

Experimental minutiae

Under both conditions, I avoided ever using voice chat and would call things out in text chat when time permitted. Also under both conditions, I mostly filled in with whatever character class the team needed most, although I'd sometimes pick DPS (in general, DPS are heavily oversubscribed, so you'll rarely play DPS if you don't pick one even when unnecessary).

For quickplay, backfill games weren't counted (backfill games are games where you join after the game started to fill in for a player who left; comp doesn't allow backfills). 6% of QP games were backfills.

These games are from before the "endorsements" patch; most games were played around May 2018. All games were played in "solo q" (with 5 random teammates). In order to avoid correlations between games depending on how long playing sessions were, I quit between games and waited for enough time (since you're otherwise likely to end up in a game with some or many of the same players as before).

The model used probability of a comment happening in a game to avoid the problem that Kasumovic et al. ran into, where a person who's ranting can skew the total number of comments. Kasumovic et al. addressed this by removing outliers, but I really don't like manually reaching in and removing data to adjust results. This could also be addressed by using a more sophisticated model, but a more sophisticated model means more knobs which means more ways for bias to sneak in. Using the number of players who made comments instead would be one way to mitigate this problem, but I think this still isn't ideal because these aren't independent -- when one player starts being negative, this greatly increases the odds that another player in that game will be negative, but just using the number of players makes four games with one negative person the same as one game with four negative people. This can also be accounted for with a slightly more sophisticated model, but that also involves adding more knobs to the model.

Comments on other comments

If I step back from the gender experiment, the most baffling thing to me about Overwatch is the Herculean level of self-delusion adult Overwatch players have (based on the sounds of people's voice, I think that most people who are absolutely livid about the performance of a teammate are adults).

Just for example, if you look at games where someone leaves in the early phase of the game, causing it to get canceled, IME, this is almost always because one player was AFK and eventually timed out. I've seen this happen maybe 40 times, and the vast majority of the time, 1-3 players will rage about how they were crushing the other team and are being unfairly denied a (probable) victory. I have not seen a single case where someone notices that the game was 6v5 the entire time because someone was AFK out of nearly 40 games of people compaining about this.

If this were a field sport, I'd expect that players would notice that the other team is literally missing a player for the entire game by second grade. The median person at my rank (amazingly, above median for all players) who's raging at other players has roughly the same level of game sense as a first grader. And yet, they'll provide detailed (wrong, but detailed) commentary on other people's shortcomings, suggest that someone is "throwing [the game]" if they don't attempt to use a sophisticated strategy that pros have to practice to get right, etc.

Of course, by definition, people can't be aware that they're not aware that the other team is missing a player. However, someone with awareness that poor must get surprised by events in the game all the time (my awareness is substantially better than median for my rank because my technical skill is worse, and I still get blindsided multiple times per game; each time, it's a reminder that I'm not properly tracking what's happening in the game and can't even keep up with what my character should be doing).

It requires a staggering level of willful blindness to be that unaware, get surprised by events all the time, and still think that your understanding of the game is deep enough that you can, while playing your own character and tracking everything you're supposed to track, also track everything someone on your team is supposed to track and understand what's going on deeply enough to provide advice.

This stupefying level of confidence combined with willfull blindness was also found in this survey of people's perception of their own skill level vs. their actual rating found that 1% of people thought they were overrated, 32% of people thought they were rated accurately, and the other 77% of people thought they were underrated. If you're paying attention, you get feedback about how good you are all the time (for example, the game helpfully shows you exactly how you died every time you die), but the majority of players manage to ignore this feedback.

There's nothing wrong with not paying attention to how good you are and playing "just for fun", but these guys who rage at their teammates for bringing them down aren't playing for fun, they're serious. I don't understand how people who are so serious that they'd get angry at strangers for their perceived lack of skill in a video game would choose to ignore all of this information that they could use to get better and choose to yell at teammates (increasing the probability of a loss) instead of fixing the gross errors in their own gameplay (decreasing the probability of a loss).

Appendix: comments / advice to overwatch players

A common complaint, perhaps the most common complaint by people below 2000 SR (roughly 30%-ile) or perhaps 1500 SR (roughly 10%-ile) is that they're in "ELO hell" and are kept down because their teammates are too bad. Based on my experience, I find this to be extremely unlikely.

People often split skill up into "mechanics" and "gamesense". My mechanics are pretty much as bad as it's possible to get. The last game I played seriously was a 90s video game that's basically online asteroids and the last game before that I put any time into was the original SNES super mario kart. As you'd expect from someone who hasn't put significant time into a post-90s video game or any kind of FPS game, my aim and dodging are both atrocious. On top of that, I'm an old dude with slow reflexes and I was able to get to 2500 SR (roughly 60%-ile among players who play "competitive", likely higher among all players) by avoiding a few basic fallacies and blunders despite have approximately zero mechanical skill. If you're also an old dude with basically no FPS experience, you can do the same thing; if you have good reflexes or enough FPS experience to actually aim or dodge, you basically can't be worse mechnically than I am and you can do much better by avoiding a few basic mistakes.

The most common fallacy I see repeated is that you have to play DPS to move out of bronze or gold. The evidence people give for this is that, when a GM streamer plays flex, tank, or healer, they sometimes lose in bronze. I guess the idea is that, because the only way to ensure a 99.9% win rate in bronze is to be a GM level DPS player and play DPS, the best way to maintain a 55% or a 60% win rate is to play DPS, but this doesn't follow.

Healers and tanks are both very powerful in low ranks. Because low ranks feature both poor coordination and relatively poor aim (players with good coordination or aim tend to move up quickly), time-to-kill is very slow compared to higher ranks. As a result, an off healer can tilt the result of a 1v1 (and sometimes even a 2v1) matchup and a primary healer can often determine the result of a 2v1 matchup. Because coordination is poor, most matchups end up being 2v1 or 1v1. The flip side of the lack of coordination is that you'll almost never get help from teammates. It's common to see an enemy player walk into the middle of my team, attack someone, and then walk out while literally no one else notices. If the person being attacked is you, the other healer typically won't notice and will continue healing someone at full health and none of the classic "peel" characters will help or even notice what's happening. That means it's on you to pay attention to your surroundings and watching flank routes to avoid getting murdered.

If you can avoid getting murdered constantly and actually try to heal (as opposed to many healers at low ranks, who will try to kill people or stick to a single character and continue healing them all the time even if they're at full health), you outheal a primary healer half the time when playing an off healer and, as a primary healer, you'll usually be able to get 10k-12k healing per 10 min compared to 6k to 8k for most people in Silver (sometimes less if they're playing DPS Moira). That's like having an extra half a healer on your team, which basically makes the game 6.5 v 6 instead of 6v6. You can still lose a 6.5v6 game, and you'll lose plenty of games, but if you're consistently healing 50% more than an normal healer at your rank, you'll tend to move up even if you get a lot of major things wrong (heal order, healing when that only feeds the other team, etc.).

A corollary to having to watch out for yourself 95% when playing a healer is that, as a character who can peel, you can actually watch out for your teammates and put your team at a significant advantage in 95% of games. As Zarya or Hog, if you just boringly play towards the front of your team, you can basically always save at least one teammate from death in a team fight, and you can often do this 2 or 3 times. Meanwhile, your counterpart on the other team is walking around looking for 1v1 matchups. If they find a good one, they'll probably kill someone, and if they don't (if they run into someone with a mobility skill or a counter like brig or reaper), they won't. Even in the case where they kill someone and you don't do a lot, you still provide as much value as them and, on average, you'll provide more value. A similar thing is true of many DPS characters, although it depends on the character (e.g., McCree is effective as a peeler, at least at the low ranks that I've played in). If you play a non-sniper DPS that isn't suited for peeling, you can find a DPS on your team who's looking for 1v1 fights and turn those fights into 2v1 fights (at low ranks, there's no shortage of these folks on both teams, so there are plenty of 1v1 fights you can control by making them 2v1).

All of these things I've mentioned amount to actually trying to help your team instead of going for flashy PotG setups or trying to dominate the entire team by yourself. If you say this in the abstract, it seems obvious, but most people think they're better than their rating. It doesn't help that OW is designed to make people think they're doing well when they're not and the best way to get "medals" or "play of the game" is to play in a way that severely reduces your odds of actually winning each game.

Outside of obvious gameplay mistakes, the other big thing that loses games is when someone tilts and either starts playing terribly or flips out and says something to enrage someone else on the team, who then starts playing terribly. I don't think you can actually do much about this directly, but you can never do this, so 5/6th of your team will do this at some base rate, whereas 6/6 of the other team will do this. Like all of the above, this won't cause you to win all of your games, but everything you do that increases your win rate makes a difference.

Poker players have the right attitude when they talk about leaks. The goal isn't to win every hand, it's to increase your EV by avoiding bad blunders (at high levels, it's about more than avoiding bad blunders, but we're talking about getting out of below median ranks, not becoming GM here). You're going to have terrible games where you get 5 people instalocking DPS. Your odds of winning a game are low, say 10%. If you get mad and pick DPS and reduce your odds even further (say this is to 2%), all that does is create a leak in your win rate during games when your teammates are being silly.

If you gain/lose 25 rating per game for a win or a loss, your average rating change from a game is 25 (W_rate - L_rate) = 25 (2W_rate - 1). Let's say 1/40 games are these silly games where your team decides to go all DPS. The per-game SR difference of trying to win these vs. soft throwing is maybe something like 1/40 * 25 (2 * 0.08) = 0.1. That doesn't sound like much and these numbers are just guesses, but everyone outside of very high-level games is full of leaks like these, and they add up. And if you look at a 60% win rate, which is pretty good considering that your influence is limited because you're only one person on a 6 person team, that only translates to an average of 5SR per game, so it doesn't actually take that many small leaks to really move your average SR gain or loss.

Appendix: general comments on online gaming, 20 years ago vs. today

Since I'm unlikely to write another blog post on gaming any time soon, here are some other random thoughts that won't fit with any other post. My last serious experience with online games was with a game from the 90s. Even though I'd heard that things were a lot worse, I was still surprised by it. IRL, the only time I encounter the same level and rate of pointless nastiness in a recreational activity is down at the bridge club (casual bridge games tend to be very nice). When I say pointless nastiness, I mean things like getting angry and then making nasty comments to a teammate mid-game. Even if your "criticism" is correct (and, if you review OW games or bridge hands, you'll see that these kinds of angry comments are almost never correct), this has virtually no chance of getting your partner to change their behavior and it has a pretty good chance of tilting them and making them play worse. If you're trying to win, there's no reason to do this and good reason to avoid this.

If you look at the online commentary for this, it's common to see people blaming kids, but this doesn't match my experience at all. For one thing, when I was playing video games in the 90s, a huge fraction of the online gaming population was made up of kids, and online game communities were nicer than they are today. Saying that "kids nowadays" are worse than kids used to be is a pastime that goes back thousands of years, but it's generally not true and there doesn't seem to be any reason to think that it's true here.

Additionally, this simply doesn't match what I saw. If I just look at comments over audio chat, there were a couple of times when some kids were nasty, but almost all of the comments are from people who sound like adults. Moreover, if I look at when I played games that were bad, a disproportionately large number of those games were late (after 2am eastern time, on the central/east server), where the relative population of adults is larger.

And if we look at bridge, the median age of an ACBL member is in the 70s, with an increase in age of a whopping 0.4 years per year.

Sure, maybe people tend to get more mature as they age, but in any particular activity, that effect seems to be dominated by other factors. I don't have enough data at hand to make a good guess as to what happened, but I'm entertained by the idea that this might have something to do with it:

I’ve said this before, but one of the single biggest culture shocks I’ve ever received was when I was talking to someone about five years younger than I was, and she said “Wait, you play video games? I’m surprised. You seem like way too much of a nerd to play video games. Isn’t that like a fratboy jock thing?”

Appendix: FAQ

Here are some responses to the most common online comments.

Plat? You suck at Overwatch

Yep. But I sucked roughly equally on both accounts (actually somewhat more on the masculine account because it was rated higher and I was playing a bit out of my depth). Also, that's not a question.

This is just a blog post, it's not an academic study, the results are crap.

There's nothing magic about academic papers. I have my name on a few publications, including one that won best paper award at the top conference in its field. My median blog post is more rigorous than my median paper or, for that matter, the median paper that I read.

When I write a paper, I have to deal with co-authors who push for putting in false or misleading material that makes the paper look good and my ability to push back against this has been fairly limited. On my blog, I don't have to deal with that and I can write up results that are accurate (to the best of my abillity) even if it makes the result look less interesting or less likely to win an award.

Gamers have always been toxic, that's just nostalgia talking.

If I pull game logs for subspace, this seems to be false. YMMV depending on what games you played, I suppose. FWIW, airmash seems to be the modern version of subspace, and (until the game died), it was much more toxic than subspace even if you just compare on a per-game basis despite having much smaller games (25 people for a good sized game in airmash, vs. 95 for subsace).

This is totally invalid because you didn't talk on voice chat.

At the ranks I played, not talking on voice was the norm. It would be nice to have talking or not talking on voice chat be an indepedent variable, but that would require playing even more games to get data for another set of conditions, and if I wasn't going to do that, choosing the condition that's most common doesn't make the entire experiment invalid, IMO.

Some people report that, post "endorsements" patch, talking on voice chat is much more common. I tested this out by playing 20 (non-comp) games just after the "Paris" patch. Three had comments on voice chat. One was someone playing random music clips, one had someone screaming at someone else for playing incorrectly, and one had useful callouts on voice chat. It's possible I'd see something different with more games or in comp, but I don't think it's obvious that voice chat is common for most people after the "endorsements" patch.

Appendix: code and data

If you want to play with this data and model yourself, experiment with different priors, run a posterior predictive check, etc., here's a snippet of R code that embeds the data:


d <- tribble(
  ~game_type, ~gender, ~xplain, ~games,
  "comp", "female", 7, 35,
  "comp", "male", 1, 23,
  "qp", "female", 6, 149,
  "qp", "male", 2, 132

d <- d %>% mutate(female = ifelse(gender == "female", 1, 0), comp = ifelse(game_type == "comp", 1, 0))

result <-
  brm(data = d, family = binomial,
      xplain | trials(games) ~ female + comp,
      prior = c(set_prior("normal(0,10)", class = "b")),
      iter = 25000, warmup = 500, cores = 4, chains = 4)

The model here is simple enough that I wouldn't expect the version of software used to significantly affect results, but in case you're curious, this was done with brms 2.7.0, rstan 2.18.2, on R 3.5.1.

Thanks to Leah Hanson, Sean Talts and Sean's math/stats reading group, Annie Cherkaev, Robert Schuessler, Wesley Aptekar-Cassels, Julia Evans, Paul Gowder, Jonathan Dahan, Bradley Boccuzzi, Akiva Leffert, and one or more anonymous commenters for comments/corrections/discussion.

February 17, 2019

Simon Zelazny (pzel)

Figuring out a gen_tcp:recv limitation February 17, 2019 11:00 PM

In which a suprisingly pernicious framed payload leads to OTP spelunking.

The setup: sending a string over TCP

Let's say you want to send the ASCII string Fiat lux! to an Erlang process listening on the other side of a TCP connection. Not a big deal, right?

Our sending application is written in Python. Here's what it might look like:

#!/usr/bin/env python3
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("", 7777))
data_to_send = b"Fiat Lux!"

... and here's the receiving Erlang application:

#!/usr/bin/env escript
main(_) ->
  {ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}]),
  {ok, Sock} = gen_tcp:accept(L),
  {ok, String} = gen_tcp:recv(Sock, 0),
  io:format("Got string: ~ts~n", [String]),

If we start the Erlang receiver (in shell 1), then run the Python sender (in shell2), we should see the receiver emit the following:

$ ./receive.escript
Got string: Fiat Lux!

As you can see, we optimistically sent all our data over TCP from the Python app, and received all that data, intact, on the other side. What's important here is that our Erlang socket is in passive mode, which means that incoming TCP data needs to be recv'd off of the socket. The second argument in gen_tcp:recv(Sock, 0) means that we want to read however many bytes are available to be read from the OS's network stack. In this case all our data was kindly provided to us in one nice chunk.

Success! Our real production application will be dealing with much bigger pieces of data, so it behooves us to test with a larger payload. Let's try a thousand characters.

More data

We update the sender and receiver as follows:

#!/usr/bin/env python3
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("", 7777))
data_to_send = b'a' * 1000
#!/usr/bin/env escript
main(_) ->
  {ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reusaddr, true}]),
  {ok, Sock} = gen_tcp:accept(L),
  {ok, String} = gen_tcp:recv(Sock, 0),
  io:format("Got string of length: ~p~n", [byte_size(String)]),

When we run our experiment, we see that our Erlang process does indeed get all 1000 bytes. Let's add one more zero to the payload.

#!/usr/bin/env python3
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("", 7777))
data_to_send = b'a' * 10000

And we hit our first snag!

Got string of length: 1460

Aha! Our gen_tcp:recv(Sock, 0) call asked the OS to give us whatever bytes it had ready in the TCP buffer, and so that's what we received. TCP is a streaming protocol, and there is no guarantee that a given sequence of bytes received on the socket will correspond to a logical message in our application layer. The low-effort way of handling this issue is by prefixing every logical message on the TCP socket with a known-width integer, representing the length of the message in bytes. "Low-effort" sounds like the kind of thing you put in place when the deadline was yesterday. Onward!

Let's take our initial string as an example. Instead of sending the following sequence of 9 bytes on the wire:

Ascii:     F    i   a    t   ␣    l    u    x   !

Binary:   70  105  97  116  32  108  117  120  33

We'd first prefix it with an 32-bit integer representing its size in bytes, and then append the binary, giving 13 bytes in total.

Ascii:     ␀   ␀  ␀   ␉  F    i   a    t   ␣    l    u    x   !

Binary:    0   0   0   9  70  105  97  116  32  108  117  120  33

Now, the first 4 bytes that reach our receiver can be interpreted as the length of the next logical message. We can use this number to tell gen_tcp:recv how many bytes we want to read from the socket.

To encode an integer into 32 bits, we'll use Python's struct module. struct.pack(">I", 9) will do exactly what we want: encode a 32-bit unsigned Integer (9, in this case) in Big-endian (or network) order.

#!/usr/bin/env python3
import socket
import struct
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("", 7777))
data_to_send = b'a' * 10000
header = struct.pack(">I", len(data_to_send))
sock.sendall(header + data_to_send)

On the decoding side, we'll break up the receiving into two parts:

1) Read 4 bytes from the socket, interpret these as Header, a 32-bit unsigned int.

2) Read Header bytes off the socket. The receiving Erlang process will 'block' until that much data is read (or until the other side disconnects). The received bytes constitute a logical message.

#!/usr/bin/env escript
main(_) ->
  {ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}]),
  {ok, Sock} = gen_tcp:accept(L),
  {ok, <<Header:32>>} = gen_tcp:recv(Sock, 4),
  io:format("Got header: ~p~n", [Header]),
  {ok, String} = gen_tcp:recv(Sock, Header),
  io:format("Got string of length: ~p~n", [byte_size(String)]),

When we run our scripts, we'll see the Erlang receiver print the following:

Got header: 10000
Got string of length: 10000

Success! But apparently, our application needs to handle messages much bigger than 10 kilobytes. Let's see how far we can take this approach.

Yet more data

Can we do a megabyte? Ten? A hundred? Let's find out, using the following loop for the sender:

#!/usr/bin/env python3
import socket
import struct
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("", 7777))
for l in [1000, 1000*1000, 10*1000*1000, 100*1000*1000]:
    data_to_send = b'a' * l
    header = struct.pack(">I", len(data_to_send))
    sock.sendall(header + data_to_send)

...and a recursive receive function for the receiver:

#!/usr/bin/env escript
recv(Sock) ->
  {ok, <<Header:32>>} = gen_tcp:recv(Sock, 4),
  io:format("Got header: ~p~n", [Header]),
  {ok, String} = gen_tcp:recv(Sock, Header),
  io:format("Got string of length: ~p~n", [byte_size(String)]),

main(_) ->
  {ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}]),
  {ok, Sock} = gen_tcp:accept(L),

Running this will lead to our Erlang process crashing with an interesting message:

Got header: 1000
Got string of length: 1000
Got header: 1000000
Got string of length: 1000000
Got header: 10000000
Got string of length: 10000000
Got header: 100000000
escript: exception error: no match of right hand side value {error,enomem}

enomem looks like a strange kind of error indeed. It happens when we get the 100-megabyte header and attempt to read that data off the socket. Let's go spelunking to find out where this error is coming from.

Spelunking for {error, enomem}

First, let's take a look at what gen_tcp:recv does with its arguments. It seems that it checks inet_db to find our socket, and calls recv on that socket.

OK, let's check out inet_db. Looks like it retrieves module information stored via erlang:set_port_data, in the call above.

A grepping for a call to inet_db:register_module reveals that multiple modules register themselves this way. Among these, we find one of particular interest.

169:        inet_db:register_socket(S, ?MODULE),
177:        inet_db:register_socket(S, ?MODULE),

Let's see how inet_tcp.erl implements recv. Hmm, just a pass-through to prim_inet. Let's look there.

It seems here that our erlang call-chain bottoms out in a call to ctl_cmd, which is itself a wrapper to erlang:port_control, sending control data over into C-land. We'll need to look at out TCP port driver to figure out what comes next.

    case ctl_cmd(S, ?TCP_REQ_RECV, [enc_time(Time), ?int32(Length)])

A slight hitch is finding the source code for this driver. Perhaps the marco ?TCP_REQ_RECV can help us find what we're after?

$  rg 'TCP_REQ_RECV'
100:-define(TCP_REQ_RECV,           42).

584:    case ctl_cmd(S, ?TCP_REQ_RECV, [enc_time(Time), ?int32(Length)]) of

735:#define TCP_REQ_RECV           42
10081:    case TCP_REQ_RECV: {
10112:  if (enq_async(INETP(desc), tbuf, TCP_REQ_RECV) < 0)

A-ha! inet_drv.c, here we come!

Indeed, this C function here, responsible for the actual call to sock_select, will proactively reject recv calls where the requested payload size n is bigger than TCP_MAX_PACKET_SIZE:

    return ctl_error(ENOMEM, rbuf, rsize);

and TCP_MAX_PACKET_SIZE itself is defined in the same source file as:

#define TCP_MAX_PACKET_SIZE 0x4000000 /* 64 M */

thereby explaining our weird ENOMEM error.

Now, how to solve this conundrum? A possible approach would be to maintain some state in our receiver, optimistically read as much data as possible, and then try to reconstruct the logical messages, perhaps using something like erlang:decode_packet to take care of the book-keeping for us.

Taking a step back — and finding a clean solution

Before we jump to writing more code, let's consider our position. We're trying to read a framed message off of a TCP stream. It's been done thousands of times before. Surely the sagely developers whose combined experience is encoded in OTP have thought of an elegant solution to this problem?

It turns out that if you read the very long man entry for inet:setopts, you'll eventually come across this revealing paragraph:

{packet, PacketType}(TCP/IP sockets)

Defines the type of packets to use for a socket. Possible values:

raw | 0

No packaging is done.

1 | 2 | 4

Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. The header length can be one, two, or four bytes, and containing an unsigned integer in big-endian byte order. Each send operation generates the header, and the header is stripped off on each receive operation.

The 4-byte header is limited to 2Gb.

Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. Yes indeed they do! Let's try it out!

#!/usr/bin/env escript
recv(Sock) ->
  {ok, String} = gen_tcp:recv(Sock,0),
  io:format("Got string of length: ~p~n", [byte_size(String)]),

main(_) ->
  {ok, L} = gen_tcp:listen(7777, [binary, {active, false}, {reuseaddr, true}, {packet, 4}]),
  {ok, Sock} = gen_tcp:accept(L),

And the output is:

Got string of length: 1000
Got string of length: 1000000
Got string of length: 10000000
Got string of length: 100000000
escript: exception error: no match of right hand side value {error,closed}

Problem solved! (The last error is from a recv call on the socket after it has been closed from the Python side). Turns out that our TCP framing pattern is in fact so common, it's been subsumed by OTP as a mere option for gen_tcp sockets!

If you'd like to know why setting this option lets us sidestep the TCP_MAX_PACKET_SIZE check, I encourage you to take a dive into the OTP codebase and find out. It's suprisingly easy to navigate, and full of great code.

And if you ever find yourself fighting a networking problem using brute-force in Erlang, please consider the question: "Peraphs this was solved long ago and the solution lives in OTP?" Chances are, the answer is yes!

Ponylang (SeanTAllen)

Last Week in Pony - February 17, 2019 February 17, 2019 02:19 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Luke Picciau (user545)

Gnubee Install Guide February 17, 2019 01:51 AM

The GnuBee is an open source crowd funded nas system. It requires a fair bit of configuration to use and I found that the docs on the website to be fairly inadequet for setting up a working system so after a lot of messing around I have come up with a guide to get to a fully working debian system quickly. What you need The gnubee An SD card A 19v power supply A usb UART cable!

Kevin Burke (kb)

Going Solo, Successfully February 17, 2019 12:48 AM

Three years ago I quit my job and started consulting full time. It's worked out really well. I get to spend more time doing things I like to do and I've been able to deliver great products for clients. I wanted to go over some tips for starting a successful consulting business.

  • Charge more - Everyone says it and it's true. I started out charging a monthly rate that was close to my full time salary / 12. This is not a good idea because you have overhead that your employer is no longer covering - health care probably the biggest one, you don't have paid vacations, there may be unpaid downtime between contracts and also companies might not pay you. You need to be charging a lot more to just break even.

    I dread "what's your rate" conversations every time and they haven't gotten easier. Before I quote my rate I reread the details of the High Tech Employee Antitrust case to pump myself up - it reminds me that I'm negotiating with a company that couldn't care less really and I am the only one who's going to stand up for myself. If you think you don't need the extra money - get it anyway, and then donate more to charities at the end of the year/buy CD's/put it in the stock market/give it to the government. Amazon just made $11 billion and paid $0 in taxes; you are going to spend an additional dollar better than Amazon's executives will.

    If you are not sure how much to charge, quote each new client more than the last. Your quote is often a signal of your quality so it's not even really the case that demand slopes downwards as you charge more.

    If you are working with a client and they are very happy with your work and want to extend your contract consider asking for a higher rate. "Now that you know how good I am," etc.

  • Get the money - Signed contracts, work performed, don't matter until the money hits your bank account. I learned this the hard way. If a company is going under your invoices are worthless. You can hold onto the company IP but that's probably also worthless. You can sue but at the end you will win a judgment that you can collect money from a company that doesn't have any to pay you.

    Try to get as much of the contract as you can paid up front - I generally ask for half or more up front. If a company offers Net 30 ask if you can shorten it to Net 5 or 10 or submit your invoices in advance. Submit invoices on time - it's a very costly mistake and you won't learn its importance until it's too late.

    Try as hard as you can to figure out the financial health of the company - if you can do your homework in the press or ask questions to your primary point of contact, like how much cash they are burning, how many months of runway do you have. If a company is not forthcoming with this information it's a red flag that they may not be able to pay you.

    If you see any red flags - the company wants to cut the contract short, people start leaving, company suddenly cuts back on perks - tell your contact that you need to be paid upfront or you are not going to work anymore. If they push back on this they may not have the cash to pay you at all. It's a crappy situation but better to cut your losses than to work for someone that can't actually pay.

  • Don't charge by the hour - I have never actually done this so I can't speak to how bad it is but don't do this. You don't want a client to cut you loose at 3pm and suddenly you lose three hours you were counting on. Charge per week.

  • Get a lawyer - Get a lawyer to review every contract you sign. Read through them, flag concerning things to the lawyer. They will suggest language. Send the language to the company. You are not being difficult when you do this, the company does this all the time. Lawyers are expensive, expect to pay north of $400 per hour and contract review can take 30-60 minutes. This money is worth it.

    A good clause to try to push for is limitation of liability. You don't want to be in a situation where $2 million of damages occurred or a high value client left the company because of an error you pushed and the company is suddenly coming after everything you own. Similarly the company may want to protect against you trying to sue them for a high amount of damages to your reputation, future business etc. Limiting the total liability to the size of the contract, or a multiple of the size of the contract - on both sides - can be a good move.

  • Register as a Company - Consult with the lawyer you hired on what kind of company you want to be. Generally the more "company-like" you are the harder it is for companies to try to take your personal assets. I don't have employees or shareholders so I have a single member LLC that is disregarded for tax purposes — read this description from the IRS. Sometimes businesses are confused what this means when I tell them or try to sign up for things. Still, it is a good fit for me. It may not be for you - I am not a lawyer, you should talk with one, learn the tradeoffs and see what makes sense for your business.

  • Make Sure Contracts Are Signed With the Company - The contracts you sign should be between the client you are working with and your company NOT between the client and you personally. Discuss this with your lawyer.

  • Get an accountant - As a small business a lot of stuff is tax deductible - a home office, client travel, for example, even if it's just across town - and you want to make sure you are getting ~35% off on everything that you can. An accountant will help you with this.

  • Market yourself - Not necessarily ads or sponsorships, but: everyone you've worked with full time should know they can hire you now. If they don't then reach out to people and let them know. Put up a website that engineers can send to their boss. My website isn't fancy but it is effective. Order business cards - VistaPrint is garbage, order from If you have a website or open source projects put a note at the bottom advertising that you're available for hire, like the one at the bottom of this post.

  • Set up separate accounts for everything - Open separate accounts for your business. Get a business credit card or just a separate cash back card on your personal account. I don't have a checking account registered for the business but I opened a second checking account that I call my "business account". Clients pay into that account and I pay the business credit card out of that account. I even have a separate Clipper card that I use for business travel.

    There are two reasons for this. It makes tax accounting a lot easier. I know that every transaction on the business Clipper card is work travel and can be expensed; I don't have to try to remember what I was doing when I paid $2 to SamTrans at 5:34pm on July 27.

    Second, if you don't keep good records for the business - if you "commingle" funds between your personal life and the business - it makes it much easier for clients to go after your personal assets, what's called "piercing the veil." Separate accounts (and discipline about transfers!) make it much easier to argue that your business income and spending and personal income and spending are separate even if you don't necessarily have the legal structures to back them up.

    I also set up a new Github account for every company I work with. This avoids any issues with emails going to the wrong place, or the need to grant/revoke permissions to any 3rd party tools a company uses. I use to swap SSH settings between my Github accounts:

    $ cat $(which notion-github)
    #!/usr/bin/env bash
    ${GOPATH}/bin/swish --identity-file ${HOME}/.ssh/github_notion_ed25519 --user kevinburkenotion
  • Balancing multiple clients: If you can do this or do things like charge retainers, great. I find it really hard to switch contexts so I work with one client at a time and treat it as a full time job. Do what works for you.

  • Give back to the tools that make you successful - I give a percentage of my earnings every year to support software tools that help me do my job - iTerm2, Vim, Go, Postgres, Node.js, Python, nginx, various other open source projects. You should consider doing this too. (If you are an open source maintainer reading this - tell me how I can pay you!!)

February 16, 2019

David Wilson (dw)

Threadless mode in Mitogen 0.3 February 16, 2019 10:00 PM

Mitogen has been explicitly multi-threaded since the design was first conceived. This choice is hard to regret, as it aligns well with the needs of operating systems like Windows, makes background tasks like proxying possible, and allows painless integration with existing programs where the user doesn't have to care how communication is implemented. Easy blocking APIs simply work as documented from any context, and magical timeouts, file transfers and routing happen in the background without effort.

The story has for the most part played out well, but as work on the Ansible extension revealed, this thread-centric worldview is more than somewhat idealized, and scenarios exist where background threads are not only problematic, but a serious hazard that works against us.

For that reason a new operating mode will hopefully soon be included, one where relatively minor structural restrictions are traded for no background thread at all. This article documents the reasoning behind threadless mode, and a strange set of circumstances that allow such a major feature to be supported with the same blocking API as exists today, and surprisingly minimal disruption to existing code.


Above is a rough view of Mitogen's process model, revealing a desirable symmetry as it currently exists. In the master program and replicated children, the user's code maintains full control of the main thread, with library communication requirements handled by a background thread using an identical implementation in every process.

Keeping the user in control of the main thread is important, as it possesses certain magical privileges. In Python it is the only thread from which signal handlers can be installed or executed, and on Linux some niche system interfaces require its participation.

When a method like is invoked, an outgoing message is constructed and enqueued with the Broker thread, and a callback handler is installed to cause any return value response message to be posted to another queue created especially to receive it. Meanwhile the thread that invoked sleeps waiting for a message on the call's dedicated reply queue.


Those queues aren't simply Queue.Queue, but a custom reimplementation added early during Ansible extension development, as deficiencies in Python 2.x threading began to manifest. Python 2 permits the choice between up to 50 ms latency added to each Queue.get(), or for waits to execute with UNIX signals masked, thus preventing CTRL+C from interrupting the program. Given these options a reimplementation made plentiful sense.

The custom queue is called Latch, a name chosen simply because it was short and vaguely fitting. To say its existence is a great discomfort would be an understatement: reimplementing synchronization was never desired, even if just by leveraging OS facilities. True to tribal wisdom, the folly of Latch has been a vast time sink, costing many days hunting races and subtle misbehaviours, yet without it, good performance and usability is not possible on Python 2, and so it remains.

Due to this, when any thread blocks waiting for a result from a remote process, it always does so within Latch, a detail that will soon become important.

The Broker

Threading requirements are mostly due to Broker, a thread that has often changed role over time. Today its main function is to run an I/O multiplexer, like Twisted or asyncio. Except for some local file IO in master processes, broker thread code is asynchronous, regardless of whether it is communicating with a remote machine via an SSH subprocess or a local thread via a Latch.

When a user's thread is blocked on a reply queue, that thread isn't really blocked on a remote process - it is waiting for the broker thread to receive and decode any reply, then post it to the queue (or Latch) the thread is sleeping on.


Having a dedicated IO thread in a multi-threaded environment simplifies reasoning about communication, as events like unexpected disconnection always occur in a consistent location far from user code. But as is evident, it means every IO requires interaction of two threads in the local process, and when that communication is with a remote Mitogen process, a further two in the remote process.

It may come as no surprise that poor interaction with the OS scheduler often manifests, where load balancing pushes related communicating threads out across distinct cores, where their execution schedule bears no resemblance to the inherent lock-step communication pattern caused by the request-reply structure of RPCs, and between threads of the same process due to the Global Interpreter Lock. The range of undesirable effects defies simple description, it is sufficient to say that poor behaviour here can be disastrous.

To cope with this, the Ansible extension introduced CPU pinning. This feature locks related threads to one core, so that as a user thread enters a wait on the broker after sending it a message, the broker has much higher chance of being scheduled expediently, and for its use of shared resources (like the GIL) to be uncontended and exist in the cache of the CPU it runs on.

Runs of tests/bench/ with and without pinning.
Pinned? Round-trip delay
No 960 usec Average 848 usec ± 111 usec
782 usec
803 usec
Yes 198 usec Average 197 usec ± 1 usec
197 usec
197 usec

It is hard to overstate the value of pinning, as revealed by the 20% speedup visible in this stress test, but enabling it is a double-edged sword, as the scheduler loses the freedom to migrate processes to balance load, and no general pinning strategy is possible that does not approach the complexity of an entirely new scheduler. As a simple example, if two uncooperative processes (such as Ansible and, say, a database server) were to pin their busiest workers to the same CPU, both will suffer disastrous contention for resources that a scheduler could alleviate if it were permitted.

While performance loss due to scheduling could be considered a scheduler bug, it could be argued that expecting consistently low latency lock-step communication between arbitrary threads is unreasonable, and so it is desirable that threading rather than scheduling be considered at fault, especially as one and not the other is within our control.

The desire is not to remove threading entirely, but instead provide an option to disable it where it makes sense. For example in Ansible, it is possible to almost halve the running threads if worker processes were switched to a threadless implementation, since there is no benefit in the otherwise single-threaded WorkerProcess from having a distinct broker thread.

UNIX fork()

In its UNIX manifestation, fork() is a defective abstraction surviving through symbolism and dogma, conceived at a time long predating the 1984 actualization of the problem it failed to solve. It has remained obsolete ever since. A full description of this exceeds any one paragraph, and an article in drafting since October already in excess of 8,000 words has not yet succeeded in fully capturing it.

For our purposes it is sufficient to know that, as when mixed with most UNIX facilities, mixing fork() with threads is extremely unsafe, but many UNIX programs presently rely on it, such as in Ansible's forking of per-task worker processes. For that reason in the Ansible extension, Mitogen cannot be permanently active in the top-level process, but only after fork within a "connection multiplexer" subprocess, and within the per-task workers.

In upcoming work, there is a renewed desire for a broker to be active in the top-level process, but this is extremely difficult while remaining compatible with Ansible's existing forking model. A threadless mode would be immediately helpful there.

Python 2.4

Another manifestation of fork() trouble comes in Python 2.4, where the youthful implementation makes no attempt to repair its threading state after fork, leading to incurable deadlocks across the board. For this reason when running on Python 2.4, the Ansible extension disables its internal use of fork for isolation of certain tasks, but it is not enough, as deadlocks while starting subprocesses are also possible.

A common idea would be to forget about Python 2.4 as it is too old, much as it is tempting to imagine HTTP 0.9 does not exist, but as in that case, Python is treated not just as a language runtime, but as an established network protocol that must be implemented in order to communicate with infrastructure that will continue to exist long into the future.

Implementation Approach

Recall it is not possible for a user thread to block without waiting on a Latch. With threadless mode, we can instead reinterpret the presence of a waiting Latch as the user's indication some network IO is pending, and since the user cannot become unblocked until that IO is complete, and has given up forward execution in favour of waiting, Latch.get() becomes the only location where the IO loop must run, and only until the Latch that caused it to run has some result posted to it by the previous iteration.

def main(router):
    host1 = router.ssh(hostname='a.b.c')
    host2 = router.ssh(hostname='c.b.a')

    call1 = host1.call_async(os.system, 'hostname')
    call2 = host2.call_async(os.system, 'hostname')

    print call1.get().unpickle()
    print call2.get().unpickle()

In the example, after the (presently blocking) connection procedure completes, neither call_async() wakes any broker thread, as none exists. Instead they enqueue messages for the broker to run, but the broker implementation does not start execution until call1.get(), where get() is internally synchronized using Latch.

The broker loop ceases after a result becomes available for the Latch that is executing it, only to be restarted again for call2.get(), where it again runs until its result is available. In this way asynchronous execution progresses opportunistically, and only when the calling thread indicated it cannot progress until a result is available.

Owing to the inconvenient existence of Latch, an initial prototype was functional with only a 30 line change. In this way, an ugly and undesirable custom synchronization primitive has accidentally become the centrepiece of an important new feature.

Size Benefit

The intention is that threadless mode will become the new default in a future version. As it has much lower synchronization requirements, it becomes possible to move large pieces of code out of the bootstrap, including any relating to implementing the UNIX self-pipe trick, as required by Latch, and to wake the broker thread from user threads.

Instead this code can be moved to a new mitogen.threads module, where it can progressively upgrade an existing threadless mitogen.core, much like mitogen.parent already progressively upgrades it with an industrial-strength Poller as required.

Any code that can be removed from the bootstrap has an immediate benefit on cold start performance with large numbers of targets, as the bottleneck during cold start is often a restriction on bandwidth.

Performance Benefit

Threadless mode tallies in well with existing desires to lower latency and resource consumption, such as the plan to reduce context switches.

.right-aligned td, .right-aligned th { text-align: right; }
Runs of tests/bench/ with and without threadless
Threaded+Pinned Threadless
Average Round-trip Time 201 usec 131 usec (-34.82%)
Elapsed Time 4.220 sec 3.243 sec (-23.15%)
Context Switches 304,330 40,037 (-86.84%)
Instructions 10,663,813,051 8,876,096,105 (-16.76%)
Branches 2,146,781,967 1,784,930,498 (-15.85%)
Page Faults 6,412 17,529 (+173.37%)

Because no broker thread exists, no system calls are required to wake it when a message is enqueued, nor are any necessary to wake the user thread when a reply is received, nor any futex() calls due to one just-woke thread contending on a GIL that has not yet been released by a just-about-to-sleep peer. The effect across two communicating processes is a huge reduction in kernel/user mode switches, contributing to vastly reduced round-trip latency.

In the table an as-yet undiagnosed jump in page faults is visible. One possibility is that either the Python or C library allocator employs a different strategy in the absence of threads, the other is that a memory leak exists in the prototype.


Naturally this will place some restraints on execution. Transparent routing will no longer be quite so transparent, as it is not possible to execute a function call in a remote process that is also acting as a proxy to another process: proxying will not run while Dispatcher is busy executing the function call.

One simple solution is to start an additional child of the proxying process in which function calls will run, leaving its parent dedicated just to routing, i.e. exclusively dedicated to running what was previously the broker thread. It is expected this will require only a few lines of additional code to support in the Ansible extension.

For children of a threadless master, import statements will hang while the master is otherwise busy, but this is not much of a problem, since import statements usually happen once shortly after the first parent->child call, when the master will be waiting in a Latch.

For threadless children, no background thread exists to notice a parent has disconnected, and to ensure the process shuts down gracefully in case the main thread has hung. Some options are possible, including starting a subprocess for the task, or supporting SIGIO-based asynchronous IO, so the broker thread can run from the signal handler and notice the parent is gone.

Another restriction is that when threadless mode is enabled, Mitogen primitives cannot be used from multiple threads. After some consideration, while possible to support, it does not seem worth the complexity, and would prevent the aforementioned reduction of bootstrap code size.

Ongoing Work

Mitogen has quite an ugly concept of Services, added in a hurry during the initial Ansible extension development. Services represent a bundle of a callable method exposed to the network, a security policy determining who may call it, and an execution policy governing its concurrency requirements. Service execution always happens in a background thread pool, and is used to implement things like file transfer in the Ansible extension.

Despite heavy use, it has always been an ugly feature as it partially duplicates the normal parent->child function call mechanism. Looking at services from the perspective of threadless mode reveals some notion of a "threadless service", and how such a threadless service looks even more similar to a function call than previously.

It is possible that as part of the threadless work, the unification of function calls and services may finally happen, although no design for it is certain yet.


There are doubtlessly many edge cases left to discover, but threadless mode looks very doable, and promises to make Mitogen suitable in even more scenarios than before.

Until next time!

Just tuning in?

Jan van den Berg (j11g)

Ten years on Twitter 🔟❤️ February 16, 2019 07:44 AM

Today marks my ten year anniversary on Twitter! There are few other web services I have been using for ten years. Sure, I have been e-mailing and blogging for longer, but those are activities — like browsing — and not specifically tied to one service (e.g. Gmail is just one of many mail services). And after ten years, Twitter is still a valuable and fun addition to life online. But it takes a bit of work to keep it fun and interesting.


  • Twitter is your Bonsai tree: cut and trim.
  • Use the Search tab, it’s awesome!
  • Stay away from political discussions.
  • Be nice! No snark.
  • Bookmark all the things.

Twitter, the protocol

Twitter, of course, provides short synchronous one-to-all updates. In comparison; mail and blogging are asynchronous. Their feedback loop is different and they’re less instant. And WhatsApp or messaging are forms of one-to-many communication and they’re not public (so not one-to-all). So Twitter takes a unique place among these communication options.

Effectively the service Twitter provides is it’s own thing. Because Twitter is more a protocol, or an internet utility if you like. And more often than not, protocols or utilities tend to get used in ways they weren’t supposed to. I’ve written about Twitter many times before. And I love blogging and RSS but Twitter for me is still the place for near real-time updates. This post is part celebration of Twitter and part tips how I, personally, use this protocol to keep it fun and interesting.


Twitter can be many things to many people. For some people it can be the number one place to get their news on politics. For others Twitter is all about comedy (Twitter comedy is certainly a fun place!) or sports (I do follow quite a bit of NBA news). And some people just jump in, enjoy the moment, not caring about what came before and logging off again. And that is fine, but that is just not how I roll. When I follow you, I care about what you have to say, so I make an effort to read it.

So I am careful about following people that tweet very often. When I check out a profile page, and see a user with 45,978 updates, that’s usually an indication that I will not follow that account. But, this is me. I see my Twitter timeline like a bonsai tree, cutting and trimming is an integral part of keeping things manageable. Because when you’re not careful, Twitter can become overwhelming. Certainly when you’re a strict chronological timeline user, like I am. But, sparingly following accounts can make you miss out on great stuff, right?

Search tab

My solution to this problem is the Search tab (available on the app and mobile). Because this tab is actually pretty good! Twitter knows my interests based on a cross-section of accounts I follow, and in this tab it makes a nice selection of tweets that I need to see. It is my second home, my second timeline. Usually I catch up on interesting things of otherwise loud Twitter accounts (i.e. lots of tech accounts that I don’t follow). So Twitter helps me to point out things that I still would like to see. I get the best of both worlds. It’s great!


There are few subjects as flammable as politics on Twitter. So I try to stay away from politics and try not to get involved in political discussions. That doesn’t mean I am not aware of things going on, or that I am not interested in politics. Quite the opposite! I just don’t think Twitter is the best place for political debate. The new 280 character limit was an improvement, but it’s still too short for real discussions or nuance (maybe this is true for the internet as a whole). Sure, certain threads can provide insight, and some people really know what they’re talking about. But I will think twice before personally entering a discussion. I do follow quite a bit of blogs/sites on politics and Twitter helps me to point to those things. These places usually help me more in understanding things that are otherwise hard to express in 280 characters.

Be positive

It is very easy to be dismissive or negative on Twitter. But very little good comes from that. So I always try to add something positive. I recently came across this tweet, and I think this sums it up rather well:


Like stated Twitter can be many things to many people. But from day one, for me it has always been a place to point and get pointed to interesting things. The best Twitter for me is Twitter as a jump-off zone. My love for Twitter comes from the experience of being pointed to great books, movies, blogs, (music) videos and podcasts. And I am a heavy user of the bookmark option. (I tend to like very little on Twitter, which is more of a thank you nowadays.) But I bookmark all the things. Usually I scan and read my timeline on mobile, bookmark the interesting things and come back to it later in the day on a PC.

What’s next?

I had been blogging for a few years when Twitter came along. So I have never been able to shake the feeling of seeing Twitter as a micro blog for everyone. (Which is just one of its uses.) I am also aware of concepts like, or Mastodon. Services that, at the very least, have been inspired by Twitter, and build further on the idea of a communication protocol. But the thing is, Twitter was first, and Twitter is where everybody is. It’s part of the plumbing of the internet now, I don’t see it going away soon and that is all right by me! Cheers!

The post Ten years on Twitter 🔟❤️ appeared first on Jan van den Berg.

February 15, 2019

Pierre Chapuis (catwell)

Goodbye Lima February 15, 2019 07:40 PM

You may have heard it already: five years after I joined Lima, the company is shutting down.

Obviously, things did not go like we hoped they would. Customers are disappointed, and wondering what will happen now. Let me try to answer some of the questions I read online the best I can.

Please note that this is my personal take on things, and does not represent the views of anyone else but me (i.e. not Lima, not other employees...).

What happened to the company exactly?

Lima as a company no longer exists. It ran out of money. Its employees (including me) have all been fired, and its assets will be sold to pay its debts.

Regarding why the company died, it is a long story and it is not my place to tell it all. What I can say is that it ran into unexpected funding problems in early 2017, shortly after we started shipping the Lima Ultra. During most of 2017, there was strong demand for the product but we could not fulfill it because we did not have enough cash to pay for production and shipping (Remember the never-ending waiting list?) At the end of the year, we had to fire a large part of the team and we switched our business model to sell our software to other companies. We made a deal where we worked for another startup. The deal was good enough to keep the company afloat and the product alive for a year, but it forced us to stop selling Lima devices. What happened recently is that this deal eventually fell through, leaving us with no viable options.

This past year was not the best time of my life, or for any of the other employees who stayed. Many of us could have left for much better jobs at any time, some did and I cannot blame them. All those who stayed on board all this time did so hoping for a better end for the company and its customers.

What will happen to the devices?

Once Lima's servers shut down, Lima will keep working on your local LAN with the devices you have already paired with it. However, a lot of things will stop working.

First, it won't be possible to add new devices to the system. That's because, when you log a new device into Lima, you do so with an email and password. To find out which Lima those credentials belong to, the system asks a server, and that server won't answer anymore.

Second, it won't be possible to reset your password, because email confirmation will be broken. If you have forgotten your password, change it now while the servers are still up.

Third, the sharing feature will be broken, because it relies on sending HTTP requests to relay servers which will go down as well.

Finally, it won't be possible to access Lima from outside your home. This is a little harder to explain than the rest. Basically all communications between anything related to Lima (Lima devices, your devices, servers...) happen in a peer-to-peer VPN. To "locate" devices within the VPN (basically figure out how to talk to something), devices rely on a node which is called the ZVPN master. The IP address and public key of that node are hardcoded into every Lima client, and that node will go down as well. The use of that node is not needed on local networks because Lima devices and applications have a protocol to pair with other devices associated to the same account on a LAN without talking to any server.

Is there a risk for my personal data?

At that moment, not that I know of. Your data was never stored on Lima's servers, and all data traffic going through relay servers is end-to-end encrypted, which means that even if an attacker took control of one they couldn't decipher your data.

However in the long run there are two issues.

First, we won't be able to publish updates for the Lima firmware and applications anymore. If a security issue is found in one of the components they use, they may become vulnerable with no way to fix them.

Second, if someone was to acquire all the assets or Lima, including the domain and code signing certificate, they could theoretically do everything Lima was able to do, including publishing updates. That means they could publish malicious updates of the applications and firmware.

That second issue sounds scary but I do not think there is any chance it will happen. Potential acquirers will probably be more interested in Lima's technological IP, there are very few chances that an acquirer will get all the assets necessary for such an attack, and even if they do they probably won't have an interest in performing it. Even if it did happen, it would be easy to notice. Still, I have to mention it for transparency.

What I will personally do now, and what I advise users to do as well, is export all my data out of Lima, unplug the device and uninstall all the applications.

Note: If you have problems when trying to recover your data (due to e.g. a hardware issue with the USB drive), do not uninstall the applications. The data on your desktop might sometimes help recovering some of the files.

If you have an issue with the Decrypt Tool, check here for potential answers.

What can users replace Lima with?

It depends on the users. I don't know anything that is exactly like Lima. There was Helixee, which I have never tried out, but I just found out they are shutting down as well. I also learned that a project I had never heard about before called Amber had a special offer for Lima customers.

For technical people, you can probably do most of what you were doing with Lima with a Synology NAS, or a setup based on some small computer and Open Source software such as Nextcloud or Cozy Cloud.

However, Lima was never designed for technical customers. It was built for, marketed to and mostly bought by non-technical people. For them, I don't have a good answer. I heard that WD My Cloud Home had become a lot better than it once was, but I have not tried it personally.

Can you open-source the code?

To the best of my knowledge, there is no way that can happen. This makes me extremely sad, especially since I know there are parts of the code I would love to reuse myself, and that could be useful to other projects.

The reason why we cannot open-source is that the code does not belong to us, the employees, or the CEO. Intellectual property is considered an asset of a bankrupt company, and as such will be sold to the highest bidder to pay the company's debts.

That being said, Lima has contributed some source code to a few Open Source projects already. Most importantly we fixed the issues in OSXFUSE that prevented it from being used for something like Lima, and those fixes are now in the main branch.

Completely independently from the company, the former CTO of Lima has also released a project which looks a lot like a second, fully decentralized iteration of the Lima network layer ZVPN (using a DHT instead of a master node, and WireGuard instead of TLS). Let me be clear: this project contains no code or IP from Lima, it is a clean room implementation.

Can you give us root access to the device?

For Lima Original, no, I think that would be impossible (or rather, I can't see a solution that doesn't involve soldering...). The device is not worth much today anyway, its specs are so low I don't think you could run any other private cloud software on it.

For Lima Ultra, a few of us ex-Lima employees (and the CEO) are trying to figure out a way to let users get root access. We can't promise anything, but we will keep you informed if we do.

EDIT (2019-02-18): We did it, check this out!

Why does it say something different in the Kickstarter FAQ?

Some people have mentioned that what was happening was not in line with what had been said in the Kickstarter FAQ.

This FAQ has been written in 2013, before I or any other Lima developer joined the company. At the time Lima was a very small project with two founders trying to raise $70,000 to make their dream happen. Instead they raised $1,229,074, hired 12 people (including me), and the rest is history.

I do not think we have not communicated like that ever since, especially regarding decentralization. As far as what I know we have been transparent that our servers were needed for some major features of the product, as it was obvious the few times they went down. You may ask why we didn't amend this page then, and the answer is (I think) that it is technically impossible to edit it after the campaign is over.

Regarding Open Source, I sincerely believe the CEO of Lima would have done it if it was possible, but with the success of the Kickstarter the company had to take VC funding very early on (see below), and from that moment on I do not think it was in his hands.

Where did all that Kickstarter money go?

OK, let's address this last. What Kickstarter money?

Yeah, the founders raised over a million dollar. But do you remember how much the backers paid for those devices? From $59 to $79 each. Well, as bad as the hardware was, it was planned for about 1000 devices, not over 10,000. And it was pretty expensive.

I don't know the exact figures, but basically Lima did not make money on those devices, or no significant amount of money at least. Which is why it raised extra cash from VCs just afterwards, to pay the team that worked on the project, the production of more devices to sell, etc...

If you still think something shady went on with that money, rest assured: when a company like Lima goes bankrupt, its books are closely investigated by the state, which is one of its main creditors. So if you are right, the people responsible will end up in jail. (Spoiler: I really don't think it will happen.)

What are you going to do next?

Yes, I have plans.

No, they are not in any way related to Lima.

I will tell you more next month, probably.

Gustaf Erikson (gerikson)

Fotografiska, 14 Feb 2019 February 15, 2019 06:13 PM

Jonas Bendiksen - The Last Testament

Magnum photographer photographs seven people around the world who claim they are Jesus Christ. Great reportage.

Anja Niemi - In Character

Self-portraits (sometimes doubled), with that “2-stops overexposed Portra” aesthetics that the kids like so much these days. It’s nothing we haven’t seen before.

Kirsty Mitchell - Wonderland

Exceedingly lush tableaux, backed by a tragic backstory (the memory of the creator’s deceased mother) and a hugely successful Kickstarter campaign. There’s no denying the craftmanship nor the quality of the work, but somehow it feels a bit weird for an artist to so publicly involve crowdfunding in something so private. On the other hand the work of Niemi (above) struck me as very cold and solitary, so what do I know about how artists get inspiration from others.

Pierre Chapuis (catwell)

Software Architecture Principles February 15, 2019 10:20 AM

This is just a short post to share what I now consider, after 10 years in the industry (and almost twice as many writing code), my core software architecture principles.

You may or may not agree with them all, but if you design software or systems, you should have a similar list in your head; it really helps a lot when making decisions.

Without further ado, the principles are:

  • Separation of Concern often trumps not repeating oneself (DRY). In other words, avoiding duplication does not justify introducing coupling.

  • Gall's Law: "A complex system that works is invariably found to have evolved from a simple system that worked."

  • Conway's Law: "Organizations produce designs which are copies of their communication structures."

  • When writing code or designing, stop and think "consequences". What will be the impact of what you are doing on the rest of the systems? Could there be adverse side-effects?

  • Think about debuggability in production. There is nothing worse than having your software break and not being able to figure out why. Do not automate things you do not understand.

  • Write code that is easy to delete, not easy to extend.

Andreas Zwinkau (qznc)

The Spartan Web February 15, 2019 12:00 AM

Defining a label for websites I like to visit and would like to see more of.

Read full article!

February 14, 2019

Jan van den Berg (j11g)

Blue Bananas – Wouter de Vries jr. & Thiemo van Rossum February 14, 2019 06:36 PM

Blauwe Bananen (Blue Bananas) is a management book that was number one for 38 days on It is aimed at people who generally don’t read management books. So it sometimes tries to be unnecessarily funny, seemingly afraid to alienate the reader with otherwise dry concepts. Nonetheless the message itself is pretty solid. The theme being: how to become a blue banana. A blue banana is a business with a unique skill set or proposition.

Blue Bananas – Wouter de Vries jr. & Thiemo van Rossum (2012) – 94 pages

That the message carries merit is not a surprise. This book unabashedly builds on the famous organisational and business strategy theories laid out by Treacy & Wiersema and Hamel & Prahalad. The book introduces readers to a succinct and on-point summary of their concepts. It does so by guiding the reader through four steps: Pursuits, Promises, Perception, Proof (freely translated by me from the Dutch B letter words).

With these steps the book makes the theory practical and consequently is very direct. Which is a good thing. To further cement the theory it offers 29 exercises and practical thought experiments (Things like “write down what you think are unique talents of your organisation”). Overall it does a good job of landing one of the main messages: it does not matter what value you add, if your customer does not perceive it as such. Everything you do as an organisation should add value to your customers’ experience.

If you rarely read management books, Blue Bananas can be a good starting point and offers valid questions of how to add value to your organisation.

The post Blue Bananas – Wouter de Vries jr. & Thiemo van Rossum appeared first on Jan van den Berg.

February 13, 2019

Derek Jones (derek-jones)

Offer of free analysis of your software engineering data February 13, 2019 03:02 AM

Since the start of this year, I have been telling people that I willing to analyze their software engineering data for free, provided they are willing to make the data public; I also offer to anonymize the data for them, as part of the free service. Alternatively you could read this book, and do the analysis yourself.

What will you get out of me analyzing your data?

My aim is to find patterns of behavior that will be useful to you. What is useful to you? You have to be the judge of that. It is possible that I will not find anything useful, or perhaps any patterns at all; this does not happen very often. Over the last year I have found (what I think are useful) patterns in several hundred datasets, with one dataset that I am still scratching my head over it.

Data analysis is a two-way conversation. I find some patterns, and we chat about them, hopefully you will say one of them is useful, or point me in a related direction, or even a completely new direction; the process is iterative.

The requirement that an anonymized form of the data be made public is likely to significantly reduce the offers I receive.

There is another requirement that I don’t say much about: the data has to be interesting.

What makes software engineering data interesting, or at least interesting to me?

There has to be lots of it. How much is lots?

Well, that depends on the kind of data. Many kinds of measurements of source code are generally available by the truck load. Measurements relating to human involvement in software development are harder to come by, but becoming more common.

If somebody has a few thousand measurements of some development related software activity, I am very interested. However, depending on the topic, I might even be interested in a couple of dozen measurements.

Some measurements are very rare, and I would settle for as few as two measurements. For instance, multiple implementations of the same set of requirements provides information on system development variability; I was interested in five measurements of the lines of source in five distinct Pascal compilers for the same machine.

Effort estimation data used to be rare; published papers sometimes used to include a table containing the estimate/actual data, which was once gold-dust. These days I would probably only be interested if there were a few hundred estimates, but it would depend on what was being estimated.

If you have some software engineering data that you think I might be interested in, please email to tell me something about the data (and perhaps what you would like to know about it). I’m always open to a chat.

If we both agree that it’s worth looking at your data (I will ask you to confirm that you have the rights to make it public), then you send me the data and off we go.

February 10, 2019

David Wilson (dw)

Mitogen v0.2.4 released February 10, 2019 11:59 PM

Mitogen for Ansible v0.2.4 v0.2.5 has been released. This version is noteworthy as it contains major refinements to the core libary and Ansible extension to improve its behaviour during larger Ansible runs.

Work on scalability is far from complete, as it progresses towards inclusion of a patch held back since last summer to introduce per-CPU multiplexers. The current idea is to exhaust profiling gains from a single process before landing it, as all single-CPU gains continue to apply in that case, and there is much less risk of inefficiency being hidden in noise created by multiple multiplexer processes.

Please kick the tires, and as always, bug reports are welcome!

Just tuning in?

Ponylang (SeanTAllen)

Last Week in Pony - February 10, 2019 February 10, 2019 04:03 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

February 09, 2019

Unrelenting Technology (myfreeweb)

haha wow my SoundFixer addon is on the front page February 09, 2019 04:53 PM

haha wow my SoundFixer addon is on the front page

Nikita Voloboev (nikivi)

What problem did you encounter? February 09, 2019 03:59 PM

What problem did you encounter? I updated the article to include `:simlayers {:o-mode {:key :o}}` for launcher key blocks as it is needed and was missing.

I am using Mojawe and you can look at my config on GitHub for reference of a working config I use currently.