Planet Crustaceans

This is a Planet instance for lobste.rs community feeds. To add/update an entry or otherwise improve things, fork this repo.

December 07, 2019

Gustaf Erikson (gerikson)

Advent of Code 2019 December 07, 2019 07:19 PM

This blog post is a work in progress

Project website: Advent of Code 2019.

Previous years: 2015, 2016, 2017, 2018.

I use Perl for all the solutions.

Most assume the input data is in a file called input.txt in the same directory as the file.

A note on scoring

I score my problems to mark where I’ve finished a solution myself or given up and looked for hints. A score of 2 means I solved both the daily problems myself, a score of 1 means I looked up a hint for one of the problems, and a zero score means I didn’t solve any of the problems myself.

My goals for this year (in descending order of priority):

  • get 38 stars or more (75%)
  • solve all problems within 24 hours of release

Link to Github repo.

Day 1 - Day 2 - Day 3 - Day 4 - Day 5 - Day 6 - Day 7

Day 1 - The Tyranny of the Rocket Equation

Day 1 - complete solution

A nice and simple problem to kick off this year.

Score: 2

Day 2 - 1202 Program Alarm

Day 2 - complete solution

An earlier appearance of register rodeo than expected! I think we’ll see more of this going forward.

Score: 2

Day 3 - Crossed Wires

Day 3 - complete solution

This took longer than it had to. I messed up adding the paths, and only managed to get the correct answer to part 1 by chance. Once I plotted the example data I could correct the code, then add the logic for part 2.

I’m not entirely happy with the duplicated direction subroutines. Some people have used complex numbers to simplify this but that would require a separate Perl module to implement.

Score: 2.

Day 4 - Secure Container

Day 4 - complete solution

I blanked on this one and dreaded combinatorics. Turns out brute force is eminently doable. Credits to /u/andreyrmg in the daily solutions thread, and A_D and sim642 in the IRC channels for help in inspiration.

I still think my solution is my own though (and pretty Perlish), so full score today.

Score: 2.

Day 5 - Sunny with a Chance of Asteroids

Day 5 - complete solution

After struggling with the convoluted problem description I was pleasantly surprised to find my code ran flawlessly first try. I still have some niggling issues with the test data, and need to clear that up before the inevitable next intcode problem.

Score: 2.

Day 6 - Universal Orbit Map

Day 6 - complete solution

I bailed on this one and sought inspiration in the daily solutions subreddit. Credit in source!

Score: 0.

Day 7 - Amplification Circuit

Day 7 - part 1 Day 7 - part 2

A tough, but fun one. There were a lot of subtleties in the second part, and I got some pointers from the subreddit.

I got the chance to clean up my intcode implementation, and learned a new facet of Perl.

Score: 2.

Jan van den Berg (j11g)

Churchill – Sebastian Haffner December 07, 2019 05:02 PM

Writing a Churchill biography is not an easy assignment, even though it would be difficult to butcher the job. Churchill lead an unprecedented rich and varied life and just writing down the bare facts would already be enough for a great story. But it would also be a massive undertaking.

Churchill – Sebastian Haffner (1967/2002) – 206 pages

Haffner took a different route. He chose the high-level helicopter approach. And he managed to produce an impressive sketch and sagacious analysis of Englands’ most famous political figure, by focusing key on phases of his life. Haffner has the correct required biographers’ distance and writes with ultimate authority. His sentences are carved in stone and are a delight to read. And in some places he is as tough in his verdict as the man himself was.

The post Churchill – Sebastian Haffner appeared first on Jan van den Berg.

December 05, 2019

Frederik Braun (freddyb)

Help Test Firefox's built-in HTML Sanitizer to protect against UXSS bugs December 05, 2019 11:00 PM

This article first appeared on the Mozilla Security blog

I recently gave a talk at OWASP Global AppSec in Amsterdam and summarized the presentation in a blog post about how to achieve "critical"-rated code execution vulnerabilities in Firefox with user-interface XSS. The end of that blog posts encourages the reader to participate the bug bounty program, but did not come with proper instructions. This blog post will describe the mitigations Firefox has in place to protect against XSS bugs and how to test them. Our about: pages are privileged pages that control the browser (e.g., about:preferences, which contains Firefox settings). A successful XSS exploit has to bypass the Content Security Policy (CSP), which we have recently added but also our built-in XSS sanitizer to gain arbitrary code execution. A bypass of the sanitizer without a CSP bypass is in itself a severe-enough security bug and warrants a bounty, subject to the discretion of the Bounty Committee. See the bounty pages for more information, including how to submit findings.

How the Sanitizer works

The Sanitizer runs in the so-called "fragment parsing" step of innerHTML. In more detail, whenever someone uses innerHTML (or similar functionality that parses a string from JavaScript into HTML) the browser builds a DOM tree data structure. Before the newly parsed structure is appended to the existing DOM element our sanitizer intervenes. This step ensures that our sanitizer can not mismatch the result the actual parser would have created - because it is indeed the actual parser. The line of code that triggers the sanitizer is in nsContentUtils::ParseFragmentHTML and nsContentUtils::ParseFragmentXML. This aforementioned link points to a specific source code revision, to make hotlinking easier. Please click the file name at the top of the page to get to the newest revision of the source code. The sanitizer is implemented as an allow-list of elements, attributes and attribute values in nsTreeSanitizer.cpp. Please consult the allow-list before testing. Finding a Sanitizer bypass is a hunt for Mutated XSS (mXSS) bugs in Firefox -- Unless you find an element in our allow-list that has recently become capable of running script.

How and where to test

A browser is a complicated application which consists of millions of lines of code. If you want to find new security issues, you should test the latest development version. We often times rewrite lots of code that isn't related to the issue you are testing but might still have a side-effect. To make sure your bug is actually going to affect end users, test Firefox Nightly. Otherwise, the issues you find in Beta or Release might have already been fixed in Nightly.

Sanitizer runs in all privileged pages

Some of Firefox's internal pages have more privileges than regular web pages. For example about:config allows the user to modify advanced browser settings and hence relies on those expanded privileges. Just open a new tab and navigate to about:config. Because it has access to privileged APIs it can not use innerHTML (and related functionality like outerHTML and so on) without going through the sanitizer.

Using Developer Tools to emulate a vulnerability

From about:config, open The developer tools console (Go to Tools in the menu bar. Select Web Developers, then Web Console (Ctrl+Shift+k)). To emulate an XSS vulnerability, type this into the console: document.body.innerHTML = '<img src=x onerror=alert(1)>' Observe how Firefox sanitizes the HTML markup by looking at the error in the console: “Removed unsafe attribute. Element: img. Attribute: onerror.” You may now go and try other variants of XSS against this sanitizer. Again, try finding an mXSS bug or by identifying an allowed combination of element and attribute which execute script.

Finding an actual XSS vulnerability

Right, so for now we have emulated the Cross-Site Scripting (XSS) vulnerability by typing in innerHTML ourselves in the Web Console. That's pretty much cheating. But as I said above: What we want to find are sanitizer bypasses. This is a call to test our mitigations. But if you still want to find real XSS bugs in Firefox, I recommend you run some sort of smart static analysis on the Firefox JavaScript code. And by smart, I probably do not mean eslint-plugin-no-unsanitized.

Summary

This blog post described the mitigations Firefox has in place to protect against XSS bugs. These bugs can lead to remote code execution outside of the sandbox. We encourage the wider community to double check our work and look for omissions. This should be particularly interesting for people with a web security background, who want to learn more about browser security. Finding severe security bugs is very rewarding and we're looking forward to getting some feedback. If you find something, please consult the Bug Bounty pages on how to report it.

Pages From The Fire (kghose)

Old school on the E-M10 December 05, 2019 04:11 AM

While trying to figure out if I should splurge on a new lens for my E-M10 I re-discovered the joys of my old Nikon lenses thanks to a $16 adapter and the magic of my E-M10. SLR lenses on mirrorless micro 4/3 cameras are an especially good marriage because of the small flange distance of …

December 02, 2019

Derek Jones (derek-jones)

Christmas books for 2019 December 02, 2019 09:08 PM

The following are the really, and somewhat, interesting books I read this year. I am including the somewhat interesting books to bulk up the numbers; there are probably more books out there that I would find interesting. I just did not read many books this year, what with Amazon recommends being so user unfriendly, and having my nose to the grindstone finishing a book.

First the really interesting.

I have already written about Good Enough: The Tolerance for Mediocrity in Nature and Society by Daniel Milo.

I have also written about The European Guilds: An economic analysis by Sheilagh Ogilvie. Around half-way through I grew weary, and worried readers of my own book might feel the same. Ogilvie nails false beliefs to the floor and machine-guns them. An admirable trait in someone seeking to dispel the false beliefs in current circulation. Some variety in the nailing and machine-gunning would have improved readability.

Moving on to first half really interesting, second half only somewhat.

“In search of stupidity: Over 20 years of high-tech marketing disasters” by Merrill R. Chapman, second edition. This edition is from 2006, and a third edition is promised, like now. The first half is full of great stories about the successes and failures of computer companies in the 1980s and 1990s, by somebody who was intimately involved with them in a sales and marketing capacity. The author does not appear to be so intimately involved, starting around 2000, and the material flags. Worth buying for the first half.

Now the somewhat interesting.

“Can medicine be cured? The corruption of a profession” by Seamus O’Mahony. All those nonsense theories and practices you see going on in software engineering, it’s also happening in medicine. Medicine had a golden age, when progress was made on finding cures for the major diseases, and now it’s mostly smoke and mirrors as people try to maintain the illusion of progress.

“Who we are and how we got here” by David Reich (a genetics professor who is a big name in the field), is the story of the various migrations and interbreeding of ‘human-like’ and human peoples over the last 50,000 years (with some references going as far back as 300,000 years). The author tries to tell two stories, the story of human migrations and the story of the discoveries made by his and other people’s labs. The mixture of stories did not work for me; the story of human migrations/interbreeding was very interesting, but I was not at all interested in when and who discovered what. The last few chapters went off at a tangent, trying to have a politically correct discussion about identity and race issues. The politically correct class are going to hate this book’s findings.

“The Digital Party: Political organization and online democracy” by Paolo Gerbaudo. The internet has enabled some populist political parties to attract hundreds of thousands of members. Are these parties living up to their promises to be truly democratic and representative of members wishes? No, and Gerbaudo does a good job of explaining why (people can easily join up online, and then find more interesting things to do than read about political issues; only a few hard code members get out from behind the screen and become activists).

Suggestions for books that you think I might find interesting welcome.

December 01, 2019

Carlos Fenollosa (carlesfe)

November 30, 2019

Unrelenting Technology (myfreeweb)

I was wondering why I can’t watch Twitch streams in Firefox… turns out November 30, 2019 11:46 PM

I was wondering why I can’t watch Twitch streams in Firefox… turns out they serve a broken player if your User-Agent does not contain Linux/Windows/macOS. Fail.

Jan van den Berg (j11g)

Capitalism without brakes – Maarten van Rossem November 30, 2019 12:21 PM

In his highly distinctive ‘tone of voice’, Maarten van Rossem provides the most succinct available lecture on the root causes which lead to the 2008 financial crisis.

Capitalism without brakes (Kapitalisme zonder remmen) – Maarten van Rossem (2011) – 120 pages

From the change in Keynes thinking (after the 1920s) to the Hayek and Friedman ideology — embodied by the neoliberal policies of Reagan and Thatcher. Van Rossem explains how culture and ideology shifted and, combined with technology and humanity’s never-ending greed, provided the perfect ingredients for what happened in 2008. And what will probably happen again; because humans never tend to learn.

Van Rossem doesn’t wait for the reader, he uses very direct, compelling argumentation, but provides few footnotes or sources. So it’s a matter of believing what the messenger says, as opposed to the messenger providing evidence for his claims. But when you do, this book is the most tight high-level historical overview of the 2008 financial crisis you can find.

Side note: I found it remarkable that van Rossem (as a historian) shares similar ideas with Nassim Nicholas Taleb (who tends to dislike what historians do). E.g. they both subscribe to the idea of people’s general misinterpretation of the Gaussian distribution (the Bell curve). And they both share their admiration for Kahneman and they both seem to dislike the Nobel prize.

The post Capitalism without brakes – Maarten van Rossem appeared first on Jan van den Berg.

November 29, 2019

Jan van den Berg (j11g)

Dream Dare Do – Ben Tiggelaar November 29, 2019 07:34 PM

Dare Dream Do (Dromen Durven Doen) is one of the all-time bestselling Dutch self-management books. Tiggelaar is a popular figure and he has a charming, personal and pragmatic writing style.

Dear dream do / Dromen durven doen – Ben Tiggelaar (2010) – 152 pages

There are few new concepts in the book (at least for me). Practices like visualisation, goalsetting, checking goals, taking responsibility and being grateful. These are all familiar concepts, shared by many other similar well-known management theories.

And with that, Tiggelaar shows a direct linage with the likes of Covey, Kahneman and even Aurelius. But you wouldn’t know this if you’re not familiar with these theories. And that is precisely what makes this a good book. Tiggelaar has condensed this knowledge into an approachable, coherent, concise, practical and actionable book that can be read in a few short hours (or one sitting in my case). And if that’s what you’re looking for, go give it a read.

The post Dream Dare Do – Ben Tiggelaar appeared first on Jan van den Berg.

November 25, 2019

Pete Corey (petecorey)

Count the Divisible Numbers November 25, 2019 12:00 AM

Let’s try our hand at using a property test driven approach to solving a Codewars code kata. The kata we’ll be solving today is “Count the Divisible Numbers”. We’ll be solving this kata using Javascript, and using fast-check alongside Jest as our property-based testing framework.

The kata’s prompt is as follows:

Complete the [divisibleCount] function that takes 3 numbers x, y and k (where x y), and returns the number of integers within the range [x..y] (both ends included) that are divisible by k.

Writing Our Property Test

We could try to translate this prompt directly into a property test by generating three integers, x, y, and k, and verifying that the result of divisibleCount(x, y, k) matches our expected result, but we’d have to duplicate our implementation of divisibleCount to come up with that “expected result.” Who’s to say our test’s implementation wouldn’t be flawed?

We need a more obviously correct way of generating test cases.

Instead of generating three integers, x, y, and k, we’ll generate our starting point, x, the number we’re testing for divisibility, k, and the number of divisible numbers we expect in our range, n:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(), fc.nat(), (x, k, n) => {
      // TODO ...
    })
  );
});

Armed with x, k, and n, we can compute the end of our range, y:


let y = x + n * k;

Next, we’ll pass x, our newly commuted y, and k into divisibleCount and assert that the result matches our expected value of n:


return n === divisibleCount(x, y, k);

Our final property test looks like this:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(), fc.nat(), (x, k, n) => {
      let y = x + n * k;
      return n === divisibleCount(x, y, k);
    })
  );
});

Beautiful.

Our First Solution

Coming up with a solution to this problem is fairly straight-forward:


const divisibleCount = (x, y, k) => {
  return _.chain(y - x)
    .range()
    .map(n => x + n)
    .reject(n => n % k)
    .size()
    .value();
};

We generate an array of integers from x to y, reject those that aren’t divisible by k, and return the size of the resulting array.

Unfortunately, this simple solution doesn’t work as expected. Our property test reports a failing counterexample of [0, 0, 1] values for x, k, and n:


$ jest
 FAIL  ./index.test.js
  ✕ it works (10ms)
  
  ● it works
  
    Property failed after 1 tests
    { seed: 1427202042, path: "0:0:0:1:0:0:0", endOnFailure: true }
    Counterexample: [0,0,1]
    Shrunk 6 time(s)
    Got error: Property failed by returning false

Looking at our solution, this makes sense. The result of n % 0 is undefined. Unfortunately, the kata doesn’t specify what the behavior of our solution should be when k equals 0, so we’re left to figure that out ourselves.

Let’s just set up a precondition in our test that k should never equal 0:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(), fc.nat(), (x, k, n) => {
      fc.pre(k !== 0);
      let y = x + n * k;
      return n === divisibleCount(x, y, k);
    })
  );
});

Great!

Unfortunately, there’s another problem. Without putting an upper bound on the size of n * k, our solution will generate potentially massive arrays. This will quickly eat through the memory allocated to our process and result in a crash.

Let’s add some upper and lower bounds to our generated k and n values:


test("it works", () => {
  fc.assert(
    fc.property(fc.integer(), fc.integer(-100, 100), fc.nat(100), (x, k, n) => {
      fc.pre(k !== 0);
      let y = x + n * k;
      return n === divisibleCount(x, y, k);
    })
  );
});

Perfect. Our starting integer, x, can be any positive or negative integer, but our generated values of k are clamped between -100 and 100, and n ranges from 0 to 100. These values should be large enough to thoroughly test our solution, and small enough to prevent memory issues from arising.

Our Second Solution

In hindsight, our solution seems to be making inefficient use of both time and memory. If we consider the fact that our property test is computing y in terms of x, n, and k, it stands to reason that we should be able to compute n, our desired result, in terms of x, y, and k. If we can manage this, our solution will run in both constant time and constant space.

Let’s use some algebra and work our way backwards from calculating y to calculating n. If y = x + n * k, that means that y - x = n * k. Dividing by k gives us our equation for computing n: n = (y - x) / k.

Let’s replace our original divisibleCount solution with this equation:


const divisibleCount = (x, y, k) => (y - x) / k;

And rerun our test suite:


$ jest
 PASS  ./index.test.js
  ✓ it works (8ms)
  
  Test Suites: 1 passed, 1 total
  Tests:       1 passed, 1 total

Wonderful!

November 24, 2019

Carlos Fenollosa (carlesfe)

Phil Hagelberg (technomancy)

in which we get socially rendered November 24, 2019 02:42 PM

I joined the fediverse in early 2017. If you haven't heard of it, it's a distributed network providing social media type features without any one centralized authority. Users are in control of their data, and anyone can run their own servers with their own rules, including the ability to block all traffic from other servers if they tolerate abusive behavior, etc.

my profile

It took me a while to get to the point where I was comfortable on the Fediverse. I created an account on the oddly-named icosahedron.website in April, but it didn't stick immediately. It didn't feel like there was much going on because I hadn't found that many users to follow. After a few months of poking my head around, clicking around a bit, and then forgetting about it for another few weeks, I finally got enough momentum for it to be a compelling place for me, and by November I stopped using my Twitter account altogether. I had felt since the 2016 US election that Twitter had spiraled into a worse and worse condition; the site felt engineered to drive more and more "engagement" at the expense of human misery. So making a clean break dramatically improved my mental well-being.

Even tho it makes a few things more complicated (like finding new users to follow[1]), I deeply appreciate the emphasis on user empowerment that's inherent in the design of the fediverse. One of the cornerstones of this empowerment is the ability to run your own fediverse server, or instance. The most common fediverse server software is Mastodon, which could be considered the flagship of the fediverse. While it's very slick and full-featured, a big downside of Mastodon is that it's difficult to run your own server. Administering it requires running a Ruby on Rails application with Node.js, Postgres, Redis, Nginx, ElasticSearch, and more. For servers which serve a medium-to-large community, this overhead can be justifiable, but it requires a lot of mental energy to get started. There are a lot of places where things could go wrong.

The Pleroma project aims to reduce this by creating a dramatically simpler fediverse server. Running a Pleroma server requires just an Elixir application, a Postgres database, and Nginx to handle TLS. Since Elixir is a lot more efficient than Ruby, it's even possible to run it on a low-powered machine like a Raspberry Pi[2]. I set up my own Pleroma server a few weeks ago at hi.technomancy.us. It's running on the Pi in the photo.

a raspberry pi and hard drive

One downside of Pleroma being simpler is that it's really just an API server. All your interaction in the browser goes thru a separate Javascript application called pleroma-fe, and mobile clients like Tusky just hit the JSON API. The API-first design makes sense when you're using the application to browse, post, search, etc, but a big downside is that when you want to share a post with someone else, they have to load all of pleroma-fe just to see it. If you share it with someone who has scripting turned off, then they'll just see a blank white page, which is very unfriendly[3].

I wanted to start using Pleroma, but I wasn't comfortable with this unfriendly behavior. I wanted it so that if I sent a link to a post to a friend, the server would send them the HTML of the post![4] So I took a course of action I never could have taken with a centralized, commercial social network: I fixed it myself. I found that there had been an attempt to start this 8 months ago which had more or less been forgotten, so I used that as my starting point.

Pleroma is written in Elixir, which I had never used before, but I had learned Erlang a few years ago, and many of the core concepts are the same. Since I based my work on the old initial sketch, I was able to make quick progress and add several features, like threading, media, content warnings, and more. I got some really helpful review about how to improve it and test it, and it got merged a couple weeks ago. So now you can see it in action. I'm thankful to the Pleroma developers for their helpful and welcoming attitude.

pleroma screenshot

One of the reasons this is important to me is that I normally use a laptop that's a bit old. But I think it's important for software developers to keep some empathy for users who don't have the latest and greatest hardware. On my laptop, using the pleroma-fe Javascript application to view a post takes eight seconds[5] if you haven't already loaded pleroma-fe (which is the main use case for when you're sharing a link with a friend). If you have it loaded already, it's still 2-3 seconds to load in pleroma-fe. When you have the server generate the HTML, it takes between 200 and 500 milliseconds. But 500ms is nearly a worst-case scenario since it's running on a tiny Raspberry Pi server; on a high-end server it would likely be several times faster.

Running your own fediverse server is still much harder than it should be. I've glossed over the annoyances of Dynamic DNS, port forwarding, and TLS certificates. There's still a lot of opportunity for this to become better. I have a vision of a system where you could sign up for a fediverse server and it would pre-generate an SD card image with Pleroma, Postgres, and Nginx preinstalled and configured with the domain name of your choice, but right now shortcomings in typical consumer-grade routers and consumer ISPs make this impractical. But it's come a long way, and I think it's only going to get better going forward.

If you're interested in running your own fediverse server, you might find runyourown.social helpful, tho it focuses on Mastodon instead of Pleroma. If you're not interested in running your own server, check out instances.social for a listing of servers with open registration. There's never been a better time to ditch corporate social media and join the fediverse!


[1] When people get started on the Fediverse, the first question is just "which server should I choose?" As someone who's been around a while, it's tempting for me to say "it doesn't matter as long as you pick a place with a code of conduct that disallows abusive behavior; all the servers talk to each other, so you can follow any user from any server that hasn't de-federated yours." The problem is this isn't quite true due to the bootstrapping problem; when you're trying to find interesting people to follow, you'll have an easier time if you land on a server where people have interests that overlap with yours.

In a distributed system, one server can't know about every single user in the entire network; it's just too big. So server A only knows about users on server B if someone from server A has already made a connection with a user on server B. Once you choose an server, your view of the network will be determined by the sum total of those followed by your server-mates.

[2] Just don't make the same mistake I did and try to run Postgres on an SD card! I tried this initially, and after a few days I started seeing unexplained segmentation fault loops from Postgres. Apparently this is common behavior when a disk failure corrupts the DB's files. Moving everything over to an external USB drive made the problem go away, but it was certainly a surprise. Everything else can run on the SD card but the database.

[3] Note that this problem also occurs with Twitter. Mastodon is slightly better, but it still refuses to show you images or content-warnings without scripting.

[4] You used to be able to take this very basic behavior for granted, but since the arrival of the "single-page app", it has become some kind of ancient forgotten wisdom.

[5] Eight seconds sounds like a very slow application (and it is!) but it's hardly the worst offender for single-page applications. Trello takes 10 seconds, Jira takes 16 seconds, and Slack takes 18 seconds.

November 23, 2019

Jeremy Morgan (JeremyMorgan)

10 Places to Learn Golang November 23, 2019 04:19 AM

Though it's now ten years old, Golang (Google's Go language) is one of the fastest growing languages out there right now. Do you want to learn it? Here are 10 great places to start.

1. Basics of GoLang for Beginners

How to Learn Golang]

Click Here

This is a great place to get started from zero. If you're committed to downloading and installing it shows you how, and you dig into some basic stuff. It's a good start.



2. Go.Dev

How to Learn Golang

Click Here

This is a fairly new source so I haven't been able to dig into all the features of it, but it has tutorials for installing, doing a hello world, etc. It seems like the perfect jumping off point.



3. Tour of Golang

How to Learn Golang

Click Here

This is a good place to start to get started as easy as possible. It doesn't have you building real applications, but you also don't have to install or setup anything.



4. Go by Example

How to Learn Golang

Click Here

This one digs a little deeper and tackles things like pointers and concurrency which are tough at first.



5. Golang Bootcamp

How to Learn Golang

Click Here

This is a book in downloadable form or online, and it's well organized and clear for the basics.



6. Introducing Go

How to Learn Golang

Click Here

This is a great book to go from total beginner to digging into deeper topics. It's well worth the price and is a great way to get started writing real application.



7. Justforfunc YouTube Channel

How to Learn Golang

Click Here

The justforfunc youtube channel is great for digging into to Golang. Francesc Campoy is entertaining and knowledgeable. Check it out.



8. Introduction to Programming in Go

How to Learn Golang

Click Here

This is a little older but still a great book for getting the basics down. It's fun and well organized.



9. Golang Tutorial series

How to Learn Golang

Click Here

This is a great series of tutorials for learning go, that's easy to follow and fun.



10. Gophercises

How to Learn Golang

Click Here

So now you know some Go and want to play? This is a great place to polish your skills. This site has some cool coding exercises to try out.



GO Learn some Golang!!

Golang is awesome. I love working with it, and you probably will to. The resources above are great places to get your feet wet and really start developing some cool stuff. What do you think of the list? Should I add anything? Yell at me on Twitter with suggestions!

Oh yeah, and I'm going to be doing more streaming stuff on Twitch so check it out!

November 22, 2019

Grzegorz Antoniak (dark_grimoire)

C++: Shooting yourself in the foot #5 November 22, 2019 06:00 AM

Consider this code:

#include <iostream>
using namespace std;

struct Base {
};

template <typename Type>
struct Derived : Base {
    Type field;
};

You can see two classes here. The class Base contains nothing, it's a placeholder. Then, there's also class Derived, which contains one field. The type of this field is specified in the …

November 21, 2019

Jeff Carpenter (jeffcarp)

Marathon Training Update November 21, 2019 04:47 PM

I’m signed up for the California International Marathon (a.k.a. CIM) in Sacramento, which is in less than 3 weeks! This will be my first marathon (the SF Marathon this year was supposed to be, but I gave myself a toe injury by overtraining). My goal is simply to finish. I think having any sort of time-based goal would risk pushing me past the point of injury during the race. Am I ready?

November 20, 2019

Derek Jones (derek-jones)

A study, a replication, and a rebuttal; SE research is starting to become serious November 20, 2019 02:41 PM

tldr; A paper makes various claims based on suspect data. A replication finds serious problems with the data extraction and analysis. A rebuttal paper spins the replication issues as being nothing serious, and actually validating the original results, i.e., the rebuttal is all smoke and mirrors.

When I first saw the paper: A Large-Scale Study of Programming Languages and Code Quality in Github, the pdf almost got deleted as soon as I started scanning the paper; it uses number of reported defects as a proxy for code quality. The number of reported defects in a program depends on the number of people using the program, more users will generate more defect reports. Unfortunately data on the number of people using a program is extremely hard to come by (I only know of one study that tried to estimate number of users); studies of Java have also found that around 40% of reported faults are requests for enhancement. Most fault report data is useless for the model building purposes to which it is put.

Two things caught my eye, and I did not delete the pdf. The authors have done good work in the past, and they were using a zero-truncated negative binomial distribution; I thought I was the only person using zero-truncated negative binomial distributions to analyze software engineering data. My data analysis alter-ego was intrigued.

Spending a bit more time on the paper confirmed my original view, it’s conclusions were not believable. The authors had done a lot of work, this was no paper written over a long weekend, but lots of silly mistakes had been made.

Lots of nonsense software engineering papers get published, nothing to write home about. Everybody gets writes a nonsense paper at some point in their career, hopefully they get caught by reviewers and are not published (the statistical analysis in this paper was probably above the level familiar to most software engineering reviewers). So, move along.

At the start of this year, the paper: On the Impact of Programming Languages on Code Quality: A Reproduction Study appeared, published in TOPLAS (the first was in CACM, both journals of the ACM).

This replication paper gave a detailed analysis of the mistakes in data extraction, and the sloppy data analyse performed in the original work. Large chunks of the first study were cut to pieces (finding many more issues than I did, but not pointing out the missing usage data). Reading this paper now, in more detail, I found it a careful, well argued, solid piece of work.

This publication is an interesting event. Replications are rare in software engineering, and this is the first time I have seen a take-down (of an empirical paper) like this published in a major journal. Ok, there have been previous published disagreements, but this is machine learning nonsense.

The Papers We Love meetup group ran a mini-workshop over the summer, and Jan Vitek gave a talk on the replication work (unfortunately a problem with the AV system means the videos are not available on the Papers We Love YouTube channel). I asked Jan why they had gone to so much trouble writing up a replication, when they had plenty of other nonsense papers to choose from. His reasoning was that the conclusions from the original work were starting to be widely cited, i.e., new, incorrect, community-wide beliefs were being created. The finding from the original paper, that has been catching on, is that programs written in some languages are more/less likely to contain defects than programs written in other languages. What I think is actually being measured is number of users of the programs written in particular languages (a factor not present in the data).

Yesterday, the paper Rebuttal to Berger et al., TOPLAS 2019 appeared, along with a Medium post by two of the original authors.

The sequence: publication, replication, rebuttal is how science is supposed to work. Scientists disagree about published work and it all gets thrashed out in a series of published papers. I’m pleased to see this is starting to happen in software engineering, it shows that researchers care and are willing to spend time analyzing each others work (rather than publishing another paper on the latest trendy topic).

From time to time I had considered writing a post about the first two articles, but an independent analysis of the data meant some serious thinking, and I was not that keen (since I did not think the data went anywhere interesting).

In the academic world, reputation and citations are the currency. When one set of academics publishes a list of mistakes, errors, oversights, blunders, etc in the published work of another set of academics, both reputation and citations are on the line.

I have not read many academic rebuttals, but one recurring pattern has been a pointed literary style. The style of this Rebuttal paper is somewhat breezy and cheerful (the odd pointed phrase pops out every now and again), attempting to wave off what the authors call general agreement with some minor differences. I have had some trouble understanding how the rebuttal points discussed are related to the problems highlighted in the replication paper. The tone of the medium post is that there is nothing to see here, let’s all move on and be friends.

An academic’s work is judged by the number of citations it has received. Citations are used to help decide whether someone should be promoted, or awarded a grant. As I write this post, Google Scholar listed 234 citations to the original paper (which is a lot, most papers have one or none). The abstract of the Rebuttal paper ends with “…and our paper is eminently citable.”

The claimed “Point-by-Point Rebuttal” takes the form of nine alleged claims made by the replication authors. In four cases the Claim paragraph ends with: “Hence the results may be wrong!”, in two cases with: “Hence, FSE14 and CACM17 can’t be right.” (these are references to the original conference and journal papers, respectively), and once with: “Thus, other problems may exist!”

The rebuttal points have a tenuous connection to the major issues raised by the replication paper, and many of them are trivial issues (compared to the real issues raised).

Summary bullet points (six of them) at the start of the Rebuttal discuss issues not covered by the rebuttal points. My favourite is the objection bullet point claiming a preference, in the replication, for the use of the Bonferroni correction rather than FDR (False Discovery Rate). The original analysis failed to use either technique, when it should have used one or the other, a serious oversight; the replication is careful and does the analysis using both.

I would be very surprised if the Rebuttal paper, in its current form, gets published in any serious journal; it’s currently on a preprint server. It is not a serious piece of work.

Somebody who has only read the Rebuttal paper would take away a strong impression that the criticisms in the replication paper were trivial, and that the paper was not a serious piece of work.

What happens next? Will the ACM appoint a committee of the great and the good to decide whether the CACM article should be retracted? We are not talking about fraud or deception, but a bunch of silly mistakes that invalidate the claimed findings. Researchers are supposed to care about the integrity of published work, but will anybody be willing to invest the effort needed to get this paper retracted? The authors will not want to give up those 234, and counting, citations.

Update

The replication authors have been quick off the mark and posted a rebuttal of the Rebuttal.

The rebuttal of the Rebuttal has been written in the style that rebuttals are supposed to be written in, i.e., a point by point analysis of the issues raised.

Now what? I have no idea.

November 19, 2019

Gustaf Erikson (gerikson)

For the Soul of France: Culture Wars in the Age of Dreyfus by Frederick Brown November 19, 2019 06:13 PM

This is an excellent and entertaining view of the war between Republicans and their opponents in the years between 1870 and World War 1. It reminds the reader of the virulent anti-Semitism of French discourse at the time.

As an example, Lt. Col. Henry was instrumental in framing Alfred Dreyfus. He literally forged evidence to “prove” Dreyfus’ guilt. When he was arrested and committed suicide in prison, he was hailed a hero. A subscription was started to finance a lawsuit bought by his wife against Joseph Reinach for libel. A journalist collected the testimonials in a book, and the statements from that book, excerpted in a footnote, are among the most chilling in the entire book:

“From an antisemitic merchant in Boulogne-sur-Mer who hopes that the Hebes are blown away, above all Joseph Reinach, that unspeakable son-in-law and nephew of the Panama swindler one of whose victims I am.” “From a cook who would rejoice in roasting Yids in her oven.” “Long live Christ! Love live France! Long live the Army! A curate from a little very antisemitic village.” “One franc to pay for the cord that hangs Reinach.” “Joan of Arc, help us banish the new English.” “Two francs to buy a round of drinks for the troopers who will shoot Dreyfus, Reinach, and all the traitors.” A resident of Baccarat wanted “all the kikes” in the region—men, women, and children—thrown into the immense ovens of the famous crystal factory. Another contributor longed, prophetically, for the day that a “liberating boot” would appear over the horizon.

In these days where the ideas of sange et terre are making a resurgance, it’s instructive to look back on a time where the Right expressed itself in its true voice.

Jan van den Berg (j11g)

The Unicorn Project – Gene Kim November 19, 2019 04:06 PM

When I read The Phoenix Project last year, I was smitten. I loved the combination of using fiction to describe how to apply — management and DevOps — theory to true to life situations. So when the publisher asked if I wanted to review the follow-up, I didn’t hesitate. And I can safely say The Unicorn Project is just as much fun as its predecessor.

The Unicorn Project – Gene Kim (2019) – 406 pages

This fiction book takes place in the same universe as The Phoenix Project. Actually: the same company and even the same timeline. However the protagonist this time is female — Maxime Chambers — which is a welcome change.

The Unicorn Project builds a fictionalised business war story around DevOps theory.

A large incumbent auto parts manufacturer struggles to keep up with changing markets. And the hero of the story is given a thankless project and is told to keep her head down. But she doesn’t! She assembles a team and guided by the mysterious Erik Reid and his Five Ideals (and later the Three Horizons) she sets out to change the course of her project — and subsequently the company!

The reason why I love it so much, is because even though it is fiction the described situations are just too real. I know them all too well. And The Unicorn Project provides insight for how to deal with these situations. It does so by applying the Five Ideals.

Five Ideals

The main plot is built around seeing the application of these ideals unfold and their beneficial consequences.

Gene Kim has defined the following Five Ideals:

  • The First Ideal is Locality and Simplicity
  • The Second Ideal is Focus, Flow, and Joy
  • The Third Ideal is Improvement of Daily Work
  • The Fourth Ideal is Psychological Safety
  • The Fifth Ideal is Customer Focus

I will not go into detail about the definition and application of the Five Ideals — you should read the book! — but you can take an educated guess from their description what they mean.

And here is the author himself, giving some background information:

I’ve identified values and principles I call the Five Ideals to frame today’s most important IT challenges impacting engineering and business. … My main objective is to confirm the importance of the DevOps movement as a better way of working, and delivering better value, sooner, safer, and happier. I do this by addressing what I call the invisible structures, the architecture, needed to enable developers’ productivity and to scale DevOps across large organizations.

Gene Kim

Plot and references

The Unicorn Project is on the intersection of these things. Which make it pretty unique.

Saying there is a happy ending, is probably not a spoiler. The plot is the vehicle for carrying and embodying the DevOps concepts. My only critique is that the book is so chuck full of management and pop cultural references that their application to the story sometimes feel contrived. But not to a fault.

(And apart from that, I kept expecting the CISO to show up and complicating things. But he didn’t?)

I really love the References in the back of the book. I’m a sucker for references and further reading. And I now have a large YouTube playlist with talks about concepts in the book.

Functional programming

Of all the references and concepts mentioned, one in particular popped out: the author’s clear love for functional programming, and Clojure in particular. So I did some digging and sure enough, Gene Kim loves Clojure!

Here he is explaining his love for Clojure:

Bonus: the references from chapter 7 point to a talk by Rich Hickey (creator of Clojure), which is just a phenomenal talk.

Reading The Unicorn Project, will leave you smarter and energized to take on challenges you or your company might face. Go read it!

The post The Unicorn Project – Gene Kim appeared first on Jan van den Berg.

November 17, 2019

Ponylang (SeanTAllen)

Last Week in Pony - November 17, 2019 November 17, 2019 08:52 PM

The Pony Playground is back (mostly) after being down for a few weeks. Groxio has started a new course on Pony with a free intro video available on YouTube.

Derek Jones (derek-jones)

Design considerations for Mars colony computer systems November 17, 2019 04:19 PM

A very interesting article discussing SpaceX’s dramatically lower launch costs has convinced me that, in a decade or two, it will become economically viable to send people to Mars. Whether lots of people will be willing to go is another matter, but let’s assume that a non-trivial number of people decide to spend many years living in a colony on Mars; what computing hardware and software should they take with them?

Reliability and repairability are crucial. Same-day delivery of replacement parts is not an option; the opportunity for Earth/Mars travel occurs every 2-years (when both planets are on the same side of the Sun), and the journey takes 4-10 months.

Given the much higher radiation levels on Mars (200 mS/year; on Earth background radiation is around 3 mS/year), modern microelectronics will experience frequent bit-flips and have a low survival rate. Miniaturization is great for packing billions of transistors into a device, but increases the likelihood that a high energy particle traveling through the device will create a permanent short-circuit; Moore’s law has a much shorter useful life on Mars, compared to Earth. The lesser high energy particles can flip the current value of one or more bits.

Reliability and repairability of electronics, compared to other compute and control options, dictates minimizing the use of electronics (pneumatics is a viable replacement for many tasks; think World War II submarines), and simple calculation can be made using a slide rule or mechanical calculator (both are reliable, and possible to repair with simple tools). Some of the issues that need to be addressed when electronic devices are a proposed solution include:

  • integrated circuits need to be fabricated with feature widths that are large enough such that devices are not unduly affected by background radiation,
  • devices need to be built from exchangeable components, so if one breaks the others can be used as spares. Building a device from discrete components is great for exchangeability, but is not practical for building complicated cpus; one solution is to use simple cpus, and integrated circuits come in various sizes.
  • use of devices that can be repaired or new ones manufactured on Mars. For instance, core memory might be locally repairable, and eventually locally produced.

There are lots of benefits from using the same cpu for everything, with ARM being the obvious choice. Some might suggest RISC-V, and perhaps this will be a better choice many years from now, when a Mars colony is being seriously planned.

Commercially available electronic storage devices have lifetimes measured in years, with a few passive media having lifetimes measured in decades (e.g., optical media); some early electronic storage devices had lifetimes likely to be measured in decades. Perhaps it is possible to produce hard discs with expected lifetimes measured in decades, research is needed (or computing on Mars will have to function without hard discs).

The media on which the source is held will degrade over time. Engraving important source code on the walls of colony housing is one long term storage technique; rather like the hieroglyphs on ancient Egyptian buildings.

What about displays? Have lots of small, same size, flat-screens, and fit them together for greater surface area. I don’t know much about displays, so won’t say more.

Computers built from discrete components consume lots of power (much lower power consumption is a benefit of fabricating smaller devices). No problem, they can double as heating systems. Switching power supplies can be very reliable.

Radio communications require electronics. The radios on the Voyager spacecraft have been operating for 42 years, which suggests to me that reliable communication equipment can be built (I know very little about radio electronics).

What about the software?

Repairability requires that software be open source, or some kind of Mars-use only source license.

The computer language of choice is obviously C, whose advantages include:

  • lots of existing, heavily used, operating systems are written in C (i.e., no need to write, and extensively test, a new one),
  • C compilers are much easier to implement than, say, C++ or Java compilers. If the C compiler gets lost, somebody could bootstrap another one (lots of individuals used to write and successfully sell C compilers),
  • computer storage will be a premium on Mars based computers, and C supports getting close to the hardware to maximise efficient use of resources.

The operating system of choice may not be Linux. With memory at a premium, operating systems requiring many megabytes are bad news. Computers with 64k of storage (yes, kilobytes) used to be used to do lots of useful work; see the source code of various 1980’s operating systems.

Applications can be written before departure. Maintainability and readability are marketing terms, i.e., we don’t really know how to do this stuff. Extensive testing is a good technique for gaining confidence that software behaves as expected, and the test suite can be shipped with the software.

Mark J. Nelson (mjn)

Images from Wikidata November 17, 2019 12:00 PM

Enter a search term in one of the following six languages, and we'll try to find an image for it using Wikidata, via two search methods. Search terms can be people, places, events, objects, plants, etc. E.g. try "granite" or "Obama" in English, "ordinateur" in French, "δάσος" in Greek, etc. Full explanation and code below.

Search for:

Direct Wikidata search:

Via Wikipedia sitelinks:

Open-domain image retrieval

What's the motivation here?

One motivation: A thing people sometimes want in generative projects is to be able to get an image for any arbitrary search term, allowing the generator to be more open-domain than if you took the approach of starting with a fixed sprite pack. Getting good results is pretty hard, but if you're okay with just "something" for any search term, there have been various solutions over the years. A popular one used to be grabbing the first result from the now-deprecated Google Image Search API; Flickr search is another option. This post shows how to use Wikidata queries for that purpose. I believe using Wikidata for this has a few advantages, such as multi-lingual search and labeling, an open API run by a nonprofit organization, and at least some degree of human curation.

Wikidata

A quick overview of Wikidata: It's a structured-data project under the same umbrella organization as Wikipedia, and contains millions of "entities" that each represent some person, thing, event, etc. Examples of entities are granite (Q41177), Elvis Presley (Q303), and Great Molasses Flood (Q1129089). Entities are canonically referred to by this Qxxx ID, because no specific language is considered primary (the names I gave in the previous sentence are just the English labels attached to them).

Many of the entities were initially imported from Wikipedia articles, and are linked back to the corresponding Wikipedia articles. To each entity is attached a label and a short description in one or more languages, plus a series of "claims", which constitute the structured data. Claims are statements of properties attributed to the entity: date of birth (P569) for a person, DOI (P356) for a scientific paper, etc. These come from a mixture of manual curation, scripts importing data from Wikipedia articles (e.g. from infoboxes), and scripts importing data from other open-data sources.

One of the many possible properties that a Wikidata entity can have is image (P18), which, if it exists, links to one or more freely licensed images on the Wikimedia Commons media repository that are claimed to be a good visual representation of the entity. That's what this post is about retrieving.

Wikibase API

The most direct way of getting things out of Wikidata is to use the Wikibase API. You can use it directly, but this example uses the Wikibase SDK, a small Javascript library that provides some convenience methods to build query URLs and simplify the returned results.

The demo above implements two ways of searching. We can search Wikidata directly in a specified language, which is perhaps the obvious thing to do. Alternately, we can do a "sitelink" search, which means searching for a Wikipedia article in the specified language, and then grabbing its linked Wikidata entity. These often produce the same results, but not always.

A brief sketch of the pros and cons of the two search methods: For languages with large Wikipedias, the sitelink method seems to often produce better results for ambiguous searches, because it makes use of manually created redirects to the most common meaning. For example, a direct Wikidata search in English for "bee" turns up the letter B as the first result, while a sitelink search via the English Wikipedia turns up the insect, which is probably what we wanted. On the other hand, for languages with smaller Wikipedias that have fewer manually created redirects, the direct Wikidata search may produce better results. For example, a direct Wikidata search in Greek for "Έλβις" turns up Elvis Presley, while a sitelink search only works if you use his full name. You can try both methods in a few languages at the top of this post.

Code

For the actual implementation of the demo at the top of this post, view-source in your browser. Below is a cut-down example that doesn't do DOM manipulation or retrieve entity labels/descriptions, showing how to retrieve an image URL. It's a pretty quick-and-dirty implementation with minimal error handling, but it hopefully illustrates how to get an image out of the API.

const wdk = WBK({
    instance: "https://www.wikidata.org",
    sparqlEndpoint:"https://query.wikidata.org/sparql"
})

// Direct Wikidata search
function search(term, lang) {
    const searchUrl = wdk.searchEntities({
	search: term,
	language: lang,
	limit: 1
    })
    fetch(searchUrl).then(r => r.json())
        // grab ID of the first search result
	.then(r => r.search[0].id)
        // look up the claims for that ID
	.then(id => wdk.getEntities({ids:id, props:'claims'}))
	.then(entityUrl => fetch(entityUrl))
	.then(r => r.json())
	.then(wdk.simplify.entities)
        // grab the "P18" (image) claims, if they exist
	.then(r => r[Object.keys(r)[0]].claims["P18"])
        // get a URL for the first image if there is one (resized to width=300)
	.then(images => images ? wdk.getImageUrl(images[0], 300) : null)
        .then(imageUrl => alert(imageUrl)) // do something with the URL
}

// Search Wikidata via Wikipedia sitelinks
function searchSitelinks(term, lang) {
    const searchUrl = wdk.getEntitiesFromSitelinks(term, lang + 'wiki')
    fetch(searchUrl).then(r => r.json())
	.then(wdk.simplify.entities)
        // grab the "P18" (image) claims, if they exist
	.then(r => r[Object.keys(r)[0]].claims["P18"])
        // get a URL for the first image if there is one (resized to width=300)
	.then(images => images ? wdk.getImageUrl(images[0], 300) : null)
        .then(imageUrl => alert(imageUrl)) // do something with the URL
}

Carlos Fenollosa (carlesfe)

Google may terminate your account if you're not profitable November 17, 2019 10:05 AM

Youtube's new ToS, emphasis mine, via

YouTube may terminate your access, or your Google account's access to all or part of the Service if YouTube believes, in its sole discretion, that provision of the Service to you is no longer commercially viable.

Initially, this was interpreted as a way of kicking non-profitable channels out of the platform.

However, the implications are wider. Watching a Youtube video with adblock enabled may wipe your whole Google account.

It's not like they couldn't do this before, and good luck contacting Google's support channels, but the fact that they have made it explicit is a bit scary.

Personally, I've been slowly transitioning out of Google services for a while, but this is going to accelerate the process.

If you want to be safe, make sure that your gmail account is expendable before December 10th.

(Obligatory if you're not the customer, you're the product)

Tags: internet

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

November 16, 2019

Jan van den Berg (j11g)

Foster: how to build your own bookshelf management web application November 16, 2019 09:40 PM


foster
/ˈfɒstə/

verb

1. Encourage the development of (something, especially something desirable). “the teacher’s task is to foster learning”

TLDR: I made a personal bookshelf management web application and named it Foster and you can find it here. Here’s what I did (with gifs), so you might build your own.

Name

I named it Foster. Because of *this* blog post — it accompanies the application, so it’s self-referential. And also, because I am currently reading David Foster Wallace‘s magnum opus Infinite Jest. And the word ‘foster’ also makes a lot of sense otherwise, just read on 😉

Background

I like to read and I like to buy physical books — and keep them. Over the years I tracked both of these things in a spreadsheet. But this became unmanageable so I needed something else.

Something like Goodreads but self-hosted. So, preferably a web application where I could:

  • track my reading progress
  • keep track of my bookshelf

But I couldn’t find anything that fit, so I knew I probably had to roll my own. In simpler times MS Access could do this in a heartbeat. But it’s 2019 and I wanted a web application. However I am not a web developer and certainly not a frontend developer.

But when I came across https://books.hansdezwart.nl/ I knew this was what I was looking for! So I emailed Hans. He was very kind in explaining his application was self-coded and not open-source, but he did provide some pointers. Thanks Hans! So with those tips I built my own application (front and back) from scratch. And I decided to pass the knowledge on, with this blog.

The Foster fronted (I am still adding books)

This is what the Foster frontend looks like. Pretty self-explanatory: I can search *my* books, track and see reading progress, track collections, click through to book details and see the activity feed (more on that later). Oh, and it’s fast! ♥

Frontend

The five different parts in the frontend are: ‘Search’, ‘Statistics’, ‘Currently reading’, ‘Collections’ and ‘Activity feed’. They are presented as Bootstrap cards. The frontend is one index.php file with a layout of the cards. All cards (except ‘Search’) are dynamically filled with content expressed as a div class. The class content is generated by a JavaScript function (per card) which in turn calls a PHP file. And the PHP files just echo raw HTML.

Other than the index.php file there is one search.php file to make up the frontend. This file takes care of presenting the book details and search output and log and lists views (more on that below). So, most of what can be done and seen in the frontend the search.php file handles it.

The frontend is of course nothing unique. It’s just a representation of the data. The backend is a bit more interesting!

Database

So the frontend was the easy part. At least it was after I figured out the backend. I spent quite a bit of time thinking about the database design and what the backend would have to do. I thought the design for such a small application wouldn’t be too difficult. But I surprised myself with the number of changes I made to the design, to get it just right.

Rule of thumb: start with a good design, and everything else that comes after will be a lot easier.

chrome_2019-11-11_15-19-42.png (885×385)Self-explanatory view of the database design

The multiple foreign-key relations between tables (on ids etc.) are not defined in the database. I choose to do this in the code and the JOIN queries.

It’s not hard to understand the database design. The design could be a little tighter — two or three tables — but let me explain.

Log, actions and states

One of the main things I spent time thinking about are the actions and their respective states.

I figured you can do one of five things with a book (actions):

  • You want a book
  • You get/buy/own the book
  • You start reading it
  • You finish reading it
  • You purge/remove/sell/give away the book

Makes sense right? You could even call it a ‘life cycle proces‘. With one input and one output.

But, some books you already own without wanting it first. Or, you can read the same book more than once. Or, you can give a book to a friend, and buy it again for yourself. Or, you can finish reading a book, that you lent — from a friend or library — so it is not on your shelf anymore. All of these things happen. So these ‘life cycle’ actions are not a chronological fixed start-to-end tollgate process, it’s continuous and messy.

Book log

Every new action is added to the book log. In the frontend the last 25 entries to the book log are presented as the Activity feed. Every action has a timestamp when an action got logged and a date for that action. Which are two different things. So when I now add a book to my shelf that I acquired 4 years ago, the book log timestamp is now, but the date for the action: 4 years ago.

The Activity feed

With this log I can keep track of books even after I got rid of them (because selling/purging is just one action). So I don’t lose the log history of a book.

And I can also add books to my wanted list even if have owned them before (maybe I gave them away etc.). And I can start/finish reading the same book more than once. It doesn’t matter, it is just a log entry.

But with all this log information I can generate four states:

  • Books I want
  • Books I own
  • Books I have read
  • Books I had

These states are generated by specific hardcoded queries per state. They are generated on the fly by what is in the log file, and where the most recent log records prevail to decide the current status.

So Foster will track the complete history per book and at all times represent all books I want, own, have read or have owned, at that specific moment in time.

Lists

I could have defined these actions as a list: but lists are simpler.

I tend to collect and read specific genres of books, e.g. music, management and computer history books. So I tend to organize books like that. These descriptions/genres are all of course just lists.

Some books can be three of these things at the same time: part biography, part computer history part management. So one book can be a member of more than one list.

In the Foster backend I can add or delete books to and from as many lists as I like.

Easily adding/deleting books from a list with the same dropdown menu (click for a gif)

I can also easily create a new list. Say: a list of books that I want for my birthday, or books that are on loan, or books that are signed by the author etc. I just add one new list to my table, and the list will be available in the backend and presented in the frontend.

Collections

In the frontend the action log states and the different lists are grouped together under the Collections card. As stated the first 4 collections are populated from the log, and a book always has a last state. The others are just lists.

I can create or delete as many lists as I’d like, and it won’t affect the book log. So I can now organize my book collection far better than I could physically (a book can only have one spot on your physical shelf).

Adding books with the Bol.com API

This is where the magic happens. Bol.com — a large Dutch book retailer — has a very easy API you can use to query their book database. I use this to search and add books to my collection. So with one click I can get most book details: title, ISBN (=EAN), image, description etc. And I can pull them all into my own database. Including the image, which I copy and store locally. Like this:

Adding a book via bol.com API (click for a gif)

Of course I can also edit book details when necessary, or just enter a book by hand without the API. Sometimes Bol.com does not carry a book.

Backend

The bol.com search API is the start page of my backend. The other page is an overview of all my books. Clicking on the titles brings up an edit view of a book. But most importantly I can quickly add or delete books from lists here AND add actions (started reading, finished).

I have defined jQuery actions on the <select option> dropdown menus, which provide a popup — where I can fill in a date if necessary — and which trigger database inserts (there definitely might be some security concerns here: but the backend is not public).

Security

The frontend is open for everyone to see. I don’t mind sharing (my podcast list is also public), also because I always enjoy reading other peoples lists or recommendations. The backend is just one .htaccess password protected directory. In my first database design I had a user table with accounts/passwords etc. But the .htaccess file seemed like the easiest/quicker solution for now.

Tech

I built Foster from scratch, no Symphony/Laravel or what have you. And I am a bit annoyed surprised there is still no MS Access RAD equivalent for the web in 2019 (all in one tool: from DB design to logic to GUI design to runtime).

Django does most of the backend for you, I know, so I briefly looked at it. But for Foster I still ended up using PHP / MariaDB / Bootstrap 4 / JavaScript / jQuery. It’s a familiar and very portable stack that you can mostly just drop and run anywhere (and most answers are on StackOverflow 🤓). I’ve thought about using SQLite, but I am very familiar with MySQL/MariaDB so that made more sense. Also I learned more about Bootstrap than I actually cared about, but that’s alright. And I think I wrote my first serious piece of JavaScript code ever (for the dropdown select actions). So that was fun.

All in all: I spent a few days pondering the database design in the back of my mind. And 4 evenings programming front and backend. And now I am just polishing little things: which is a lot of fun.

Further development

Right now, I still have around 200 more books to catalogue correctly — that’s why some dates are set to 0000-00-00. But here are a few possible new features I am thinking about:

  • RSS feed for the activity log? Now that I am bulk adding books the activity feed is not so relevant, but when things settle down, who knows, people might be interested. After I wrote a draft of this blog I implemented this.
  • Twitter integration? Posting the log to a dedicated Twitter feed.
  • Adding books by scanning the barcode / ISBN with your phone camera? If I can just get the ISBN I can automate bol.com API to do the rest. Might speed things up a bit (and might be useful when you run a secondhand bookstore 😉). I created an iOS shortcut that does exactly this! It scans the book barcode/ISBN/EAN and opens the Foster edit.php URL with this ISBN number and from there I can add the book by clicking ‘Add’ (all book details are available and prefilled by the Bol.com API).
  • Storing/tracking more than books? CDs, DVDs, podcasts I listened too, movies I watched etc.
  • Multi-user? In the first database design there were multiple users that could access / add the books that were already in the database but still create their own log and lists. I think I could still add this to the current design.
  • As you can see in the database design, there is a remarks table. I haven’t used this table. A remark is a ‘blog’ (or a short self-written review) of a book, that can be presented with the book details. This is a one-to-many relationship, because you might want to make new remarks each time you reread a book. But, I currently blog about every book I read, so the remarks might be just an embedded blog link?

Just share the source already!

“Foster looks nice. Please share the code!” No, sorry, for several reasons.

  1. I made Foster specifically for me. So chances it will fit your needs are slim and you would probably still need to make changes. In this post I share my reasoning, but you should definitely try to build your own thing!
  2. When Foster was almost done, I learned about prepared statements (did I mention I am not a web developer?)… so I had to redo the frontend. But I haven’t redone the backend (yet): so it’s not safe from SQL injections or other pretty bad coding standards. Open sourcing it can of course generate code improvements, but it would first make my site vulnerable.
  3. But most importantly: Building a web application to scratch your own personal itch and learning new things can be one of the most fun and rewarding experiences you will have! And I hope this blog is useful to you, in achieving that goal.

The post Foster: how to build your own bookshelf management web application appeared first on Jan van den Berg.

Jeremy Morgan (JeremyMorgan)

A Visual Studio extension that makes you a better developer November 16, 2019 05:08 AM

It’s a bold claim to say an IDE extension can make you a better developer, but you should install this new Pluralsight extension and see for yourself.

What do you do when you run into a coding problem? When I’m working on something, I follow the same steps:

  • Try a bunch of stuff
  • Google it
  • Stack Overflow (Google usually sends me here)
  • Ask a peer/friend
  • Go to Pluralsight and search for the topic (especially if it’s something new to me)

In the last step, I search for a course in the library, then drill down to the thing I need, and see what I’m missing. This extensions will do that for you automatically.

This Pluralsight extension suggests clips from courses on the things you’re working on right now. It suggests content based on your code to help strengthen your skills. Here’s an example.

I have a React application loaded up and when I load up app.js:

Visual Studio Code

It’s suggesting some JavaScript clips I can watch based on what I’m working on in app.js. It drills down even deeper though. When I open up serviceWorker.js I see this:

Visual Studio Code

Now the extension suggests clips like how to register a service worker as well as some great demos. If I don’t know how a Service Worker operates, there are some quick ways to ramp up here.

I can open each clip with a single click and watch as many as I’d like.

Today the extension supports javascript and related technologies, but there are plans to include other languages support soon.

Here are some other cool features:

Dependencies Related View

Visual Studio Code

In this view, you see recommended content based on the javascript dependencies in your application. Not only that, you will see metrics for popularity, quality, and maintenance of those dependencies. This is a great way to see if the package is being actively maintained and if you should consider other alternatives.

Workspace Related Content

Visual Studio Code

In the workspace related view, it looks at all the major technologies used in your project and recommends content based on it.

Channels View

Visual Studio Code

If you log in with your Pluralsight account, the channels view can show you a listing of all your channels. I have created 18 channels on Pluralsight so this is a helpful way to browse them when needed.

Content Search

Visual Studio Code

With content search you can easily search Pluralsight’s content for related courses. This one is really handy for a quick search of something you may have temporarily forgotten or to brush up on a new framework, library, or coding practice.

Privacy

Pluralsight takes your privacy seriously. Your source code always stays on your machine and is not sent to Pluralsight. The extension looks for meaningful search terms in the active file, randomizes the order, removes any high entropy terms (potential passwords), eliminates any code comments, and then submits a request to the Pluralsight recommendations engine to find relevant clips. You can disable this if you’d like.

Why You Should Try This Extension

Habits in your workflow determine your success. By making learning part of your daily habits you’re giving yourself the advantage of constant improvement. This extension helps you learn in small doses when you’re stuck or when you’re curious and want to learn something new. That helps you become a better developer.

You can find the extension by searching for "Pluralsight" in the extensions view in VSCode or on the Visual Studio Marketplace.

If you have suggestions or feedback, share it with support@pluralsight.com.

November 15, 2019

Gustaf Erikson (gerikson)

Stålblankt och rostigt Rom [SvSe] November 15, 2019 03:17 PM

Rom: marmor och människor av Hans Furuhagen — The Storm Before the Storm - the Beginning of the End of the Roman Republic by Mike Duncan — Hos etruskerna av Alf Henrikson

Furuhagens bok är en lättläst krönika om den eviga staden, med mycket focus på dom människor som såg till att staden ser ut som den gör idag, samt själva byggnaderna och gatorna.

Med avstamp i Augustus första dagar blir det med tiden en hel del påvar, men även konstnärer. Det nordiska inslagen är naturligtvis väl täckta, med de två olika kvinnorna Birgitta och Christina som kontraster i tid och rum.


Duncan’s book has more details of the ground covered in his podcast, and I find the written word congenial in following along with the twists and turns of the violence of the late Republic.


Antingen älskar man Henrikson eller så tycker man han är mossigare än antiken han skiver om. Jag tillhör de förstnämnda.

Ett smakprov (s. 34):

[Etruskerna] utbildade ett legitemerat prästerskap med särskilda haruspices med långvarig skolning i framför all leverns topografi. Levern är ju ett stort organ och kan även hos friska djur förete stora individuella olikheter, upplyser länsveterinären Garmer på särskild förfrågan; det bör alltså ha varit möjligt för den troende att ur dess uppsyn ärligt utläsa både ditt och datt. […] På levern liksom på himlen finns en gynnsam pars familiaris och en motig pars hostilis. Gränslinjen dem emellan kallas fissum och en eventuell upphöjning på leverytan heter caput iocinoris, om det kan intressera någon nutida levande människa.

Björn Bergs illustrationer förstärker den mysiga mossiga stämningen.

November 13, 2019

Pete Corey (petecorey)

Setting Apollo Context from HTTP Headers in a Meteor Application November 13, 2019 12:00 AM

I was recently tasked with modifying an Apollo-based GraphQL endpoint served by a Meteor application to set a field in the resolvers’ context based on a HTTP header pulled out of the incoming GraphQL request.

This should be an easy, well-documented ask.

Unfortunately, Meteor’s Apollo integration seems to exist in an awkward, undocumented state. The createApolloServer function exposed by the meteor/apollo package doesn’t seem to have any real documentation anywhere I could find, and the Github repository linked to by the package doesn’t seem to relate to the code in question.

How can I access the current HTTP request when building my resolvers’ context?

With no documentation to guide me, I dove into the package’s source code on my machine to find answers. The source for the package in question lives in ~/.meteor/packages/apollo/<version>/os/src/main-server.js on my machine. The createApolloServer function accepts a customOptions object as its first argument. I quickly learned after digging through the source that if customOptions is a function, createApolloServer will call that function with the current request (req) as its only argument, and use the function call’s result as the value of customOptions:


const customOptionsObject =
  typeof customOptions === 'function'
    ? customOptions(req) 
	: customOptions;

This means we need to change our current call to createApolloServer from something like this:


import { createApolloServer } from "meteor/apollo";

createApolloServer({ schema }, { path });

To something like this:


createApolloServer(
  req => ({
    context: {
      someValue: req.headers['some-value']
    },
    schema
  }),
  { path }
);

This information is probably only pertinent to myself and an ever-shrinking number of people, but if you’re one of those people, I hope this helps.

November 11, 2019

Patrick Louis (venam)

No, Alfa isn't draining your data without your knowledge November 11, 2019 10:00 PM

Allegory of the Truth by Cesare d’Arpino in Iconologia

In Lebanon conspiracy theories are such a common occurrence that the whole world but yourself is to blame for your ailment.
I usually dismiss them but the one in this post got on my nerves, and moreover a quite simple experiment could finally shatter it and remove it as an option from all conversations.

The conspiracy goes as follows:

When I visit Lebanon my data is consumed way faster on a lebanese carrier. They deny it and say that I am delusional, I guess they enjoy being screwed with.


A MB isn’t a MB in Lebanon



It pisses me off that it seems like data runs out much quicker in Lebanon, I am confident that they do not calculate this properly



The data consumption shown by operators in Lebanon isn’t the real data you are consuming, they’re playing with numbers



They screw people with crazy data consumption calculation! I think it has something to do with how they handle uploads.



ALFA & TOUCH

Let’s first assess the plausibility of the scenario.
We known that Touch and Alfa, the operators in Lebanon, benefit from a duopoly. From this we also know that they have full leverage on the data prices. It is unlikely that with such control one would try to cheat on the usage calculation, why go through a recent drop in data prices then. Additionally, there’s the risk that people will notice the discrepancy between the data consumed and what carriers show on their websites. In this regard, we all have a feature in our phones that do this.
Then why do people still believe in this conspiracy? I’m not sure but let’s gather actual numbers that speak for themselves. I’ll limit myself to Alfa as I don’t have a subscription with Touch.

The questions we want to answer:

  • Are the numbers shown on the Alfa website the same as the actual data being consumed?

  • Are upload and download considered the same in the mobile data consumption calculation?

What we’ll need:

  • A machine that we have full control over with no data pollution
  • Scripts to download and upload specific amounts of data
  • A tool to monitor how much is actually used on the network as both download and upload
  • A machine that is on a different network to check the consumption on Alfa website

Let’s peek into the values we’re expecting, testing the monitoring tool. For this wireshark running on a Linux machine that has nothing else running will do the job.

We need a way to download and upload specific values. I’ve prepared three files of different sizes for this, 1KB, 10KB, and 100KB. Let’s remind readers that 1KB is 1024 bytes, and thus a 10KB files is composed 10240 bytes.
I’ve then added those files on my server for download and set a PHP script for the upload test.

<?php

$size = $_FILES["fileToUpload"]["size"];
echo("up $size bytes");

What does a download request look like:

> curl 'https://venam.nixers.net/alfa_consumption/1kb' 

wireshark download

Trace of download

On wireshark we see:

TX: 2103, RX 5200 -> TOTAL 7300B

TX stands for transmit and RX for receive. Overall 7KB both ways, 2KB upload, 5KB download.

That’s 7KB for a download of only 1KB, is something wrong? Nope, nothing’s wrong, you have to account for the client/server TCP handshake, the HTTP headers, and the exchange of certificates for TLS. This δ becomes negligible with the size of the request the bigger it is, and the longer we keep it open, the less we consume on transaction initialization.

The average numbers expected for different size:

> curl 'http://venam.nixers.net/alfa_consumption/10kb' 

TX:2278B RX:14KB -> TOTAL 16.3KB

> curl 'http://venam.nixers.net/alfa_consumption/100kb' 

TX:6286B RX:113KB -> TOTAL 119.2KB

And regarding upload we have:

wireshark download

Trace of upload

curl -F "fileToUpload=@1kb" 'https://venam.nixers.net/alfa_consumption/upload.php'

TX: 3475, RX 4133 -> 7608B

curl -F "fileToUpload=@10kb" 'https://venam.nixers.net/alfa_consumption/upload.php'

TX: 13KB, RX 4427 -> 17.4KB

curl -F "fileToUpload=@100kb" 'https://venam.nixers.net/alfa_consumption/upload.php'

TX: 110KB, RX 6672 -> 116.5KB

Overall, it’s not a good idea to keep reconnecting when doing small requests. People should be aware that downloading something of 1KB doesn’t mean they’ll actually use 1KB on the network, regardless of the carrier.

In the past few years there have been campaigns to encrypt the web, make it secure. Things like letsencrypt free TLS certificate, and popular browsers flagging unsecure websites. According to Google transparency report 90% of the web now runs securely encrypted compared to only 50% in 2014. This encryption comes at a price, the small overhead when initiating the connection to a website.

Now we’ve got our expectations straight, the next step is to inspect how we’re going to appreciate our consumption on Alfa’s website.

After login at the dashboard page we can see a request akin to: https://www.alfa.com.lb/en/account/getconsumption?_=1573381232217

Disregard the Unix timestamp in milliseconds on the right, it’s actually useless.

The response to this request:

{
    "CurrentBalanceValue": "$ XX.XX",
    "ExtensionData": {},
    "MobileNumberValue": "71234567",
    "OnNetCommitmentAccountValue": null,
    "ResponseCode": 8090,
    "SecurityWatch": null,
    "ServiceInformationValue": [
        {
            "ExtensionData": {},
            "ServiceDetailsInformationValue": [
                {
                    "ConsumptionUnitValue": "MB",
                    "ConsumptionValue": "1880.17",
                    "DescriptionValue": "Mobile Internet",
                    "ExtensionData": {},
                    "ExtraConsumptionAmountUSDValue": "0",
                    "ExtraConsumptionUnitValue": "MB",
                    "ExtraConsumptionValue": "0",
                    "PackageUnitValue": "GB",
                    "PackageValue": "20",
                    "ValidityDateValue": "",
                    "ValidityValue": ""
                }
            ],
            "ServiceNameValue": "Shared Data Bundle"
        }
    ],
    "SubTypeValue": "Normal",
    "TypeValue": "Prepaid",
    "WafferValue": {
        "ExtensionData": {},
        "PendingAccountsValue": []
    }
}

Notice the ConsumptionValue which is shown in MB that’s as precise as we can get. I’ll assume every hundredth decimal point is 10.24KB.

Setting this aside we’re missing the last element of our experiment: isolating the sim card and using it only on the machine that has the monitoring tool, gaining full control of what happens on the network.

Huawei H5220s ouside

I’ve opted to use a Huawei mobile wifi E5220s model, a pocket-sized MiFi, initially used for Sodetel. This setup would let me use the device as a hotspot for 3G+ connection from the monitoring machine.

Huawei H5220s inside
Sim card form factors
The device takes a 2FF sim card, so have a holder of this size at hand.

Unfortunately, the device came locked to the vendor, Sodetel, and I had to do some reverse engineering to figure out how to configure the APN settings for the Alfa card.
Long story short, parts of the documentation of the E5220s is available online and it hinted me on what to do.

To add the new Alfa APN:

curl 'http://192.168.1.1/api/dialup/profiles' -H 'User-Agent: Mozilla/5.0
(X11; Linux x86_64; rv:69.0) Gecko/20100101 Firefox/69.0' -H 'Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H
'Authorization: Basic YWRtaW46YWRtaW4=' -H 'Connection: keep-alive' -d
'<request>
	<Delete>0</Delete>
	<SetDefault>1</SetDefault>
	<Modify>1</Modify>
	<Profile>
		<Index></Index>
		<IsValid>1</IsValid>
		<Name>alfa</Name>
		<ApnIsStatic>1</ApnIsStatic>
		<ApnName>alfa</ApnName>
		<DialupNum></DialupNum>
		<Username></Username>
		<Password></Password>
		<AuthMode>0</AuthMode>
		<IpIsStatic>0</IpIsStatic>
		<IpAddress/>
		<Ipv6Address/>
		<DnsIsStatic>0</DnsIsStatic>
		<PrimaryDns/>
		<SecondaryDns/>
		<PrimaryIpv6Dns/>
		<SecondaryIpv6Dns/>
		<ReadOnly>0</ReadOnly>
	</Profile>
</request>'

Remember to login on the web before, even though we’re sending basic authorization it won’t accept the request if not logged in.

A successful request returns:

<?xml version="1.0" encoding="UTF-8"?><response>OK</response>

We then have to set the Alfa profile as default and disable the PIN1, as it is not required on Alfa sim cards.

Disable PIN:

curl 'http://192.168.1.1/api/pin/operate' -H 'User-Agent: Mozilla/5.0
(X11; Linux x86_64; rv:69.0) Gecko/20100101 Firefox/69.0' -H 'Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H
'Authorization: Basic YWRtaW46YWRtaW4=' -H 'Connection: keep-alive' -d
'<request>
	<OperateType>2</OperateType>
	<CurrentPin>0000</CurrentPin>
</request>'

We’re all set, we can gather the data!

DOWNLOAD DATA:

24 requests of 1KB
Initial value:1835.06MB
Sent: 176KB total, RX 126KB, TX=50K
Expected value:
	If TX/RX considered: 1835.23MB
	If RX only considered: 1835.18MB
	If TX only considered: 1835.11MB
Consumption value on website: 1835.13

The reason I executed 24 requests instead of a single one was that the consumption on the website updates only when it reaches a threshold or specific interval.
As you can see the value on the website is lower than the combination of both download and upload together and it’s exactly because it updates by threshold. This means that the value shown will always be less than the one consumed unless you consume the exact threshold chunk.

For a visual explanation of what’s happening: chunk update

Here are more tests:

18 requests of 100KB
Initial value:1835.93MB
Sent: 2110KB total, RX 2013KB, TX=97KB
Expected value:
	If TX/RX considered: 1837.96MB
	If RX only considered: 1837.86MB
	If TX only considered: 1836.02MB
Consumption value on website: 1837.89
18 requests of 100KB
Initial value:1837.89MB
Sent: 2117KB total, RX 2015KB, TX=101KB
Expected value:
	If TX/RX considered: 1839.92MB
	If RX only considered: 1839.82MB
	If TX only considered: 1836.99MB
Consumption value on website: 1839.84

UPLOAD DATA:

18 requests of 100KB
Initial value:1839.84MB
Sent: 2101KB total, RX 120KB, TX=1981KB
Expected value:
	If TX/RX considered: 1841.86MB
	If RX only considered: 1839.96MB
	If TX only considered: 1841.48MB
Consumption value on website: 1841.79
18 requests of 100KB
Initial value:1841.79MB
Sent: 2101KB total, RX 120KB, TX=1981KB
Expected value:
	If TX/RX considered: 1843.81MB
	If RX only considered: 1841.91MB
	If TX only considered: 1843.69MB
Consumption value on website: 1843.75

So, Alfa actually shows a bit less than the value we are using, that’s the opposite of the conspiracy. What is happening, do we have an explanation.

The truth is that we give mobile carriers more credit than they need to. They aren’t all powerful omnipotent owners of the network, knowing and building it from scratch. The all powerful being is a theme that’s recurrent in conspiracy theories because it makes it seem like people are powerless in face of it. On the contrary, most mobile operators are not constituted of technical persons, they are, like the term implies, operating the instruments they buy, caring only if they do the job. Business is business.

In the core network, one of these pieces of equipment is called a TDF/PCEF or PCRF, the Traffic Detection Function + Policy and Charging Enforcement Rules Function along with an implementation of a CPS system, Cost Per Sale system also known as charging system.
The operator buys such hardware, they don’t know much else about what they do, be it an Ericson charging system, or and Oracle CPS system , or another Oracle CPS, , or a Cisco CPS , or virtual ones (5G anyone) like Allot.

Reference Architecture
Figure 5.1-1: Overall PCC logical architecture (non-roaming) when SPR is used taken from ETSI TS 123 203 V15.5.0 (2019-10)

An example scenario where Usage Monitoring Control is useful is when the operator wants to allow a subscriber a certain high (e.g. unrestricted) bandwidth for a certain maximum volume (say 2 gigabytes) per month. If the subscriber uses more than that during the month, the bandwidth is limited to a smaller value (say 0.5 Mbit/s) for the remainder of the month. Another example is when the operator wants to set a usage cap on traffic for certain services, e.g. to allow a certain maximum volume per month for a TV or movie-on-demand service.

That paragraph taken from this article or here.

dodo

So what are the reasons for all the conspiracy theories:

  • Mysticism and lack of education on the topic
  • Having an agenda (Confirmation bias)
  • Being delusional

This doesn’t reduce the fact that people are genuinely feeling like they’re using data quicker than what they are “really” using. The truth is that we’ve moved in the recent years to a world where we’re always connected to the web, and the web has morphed to be media heavy. In the tech world this is not news anymore, we call it the web-obesity crisis, websites are bloated with content, entangled in a mess of dependencies.
As I’m posting this, Chrome is starting an endeavour to move the web to a faster and lighter one, see this article for more info along with web.dev/fast.

This topic has been talked a lot these years, The Average Webpage Is Now the Size of the Original Doom, The average web page is 3MB. How much should we care?, web-obesity, more privacy and fast blogs, another blog about fast websites.
I’ve also been doing my part, making my blog lighter.

In sum, this is all about web-literacy.

Thanks for reading.











Attributions:

  • Cesare d’Arpino [CC0]
  • Pearson Scott Foresman [Public domain]

For more info on the core network there’s some good discussion in those links:

Jeremy Morgan (JeremyMorgan)

Setting up Golang on Manjaro Linux November 11, 2019 12:10 AM

Today we're going to set up a Golang development environment in Manjaro Linux. It's super easy.

I've been playing around with Manjaro a lot lately and it's a pretty cool distribution, it's based off Arch Linux, which I'm a huge fan of.

Step 1: Update the System

Golang on Manjaro

At your Manjaro desktop, open a command prompt. The first thing you'll want to do is update the system.

We do that by typing in

sudo pacman -Syu

This will make sure your system is up to date.

Step 2: Install Golang

Next, install Golang

Type in

sudo pacman -S go

you can verify it's installed by typing in:

go version

Golang on Manjaro

So now I'm going to create a projects folder (/home/jeremy/Projects) and create a hello world to test this out.

Create a file named hello.go and put in the hello world code:

``` package main import "fmt" func main() {

fmt.Println("hello world")

} ```

Save the file, and then build it with the following command:

go build hello.go

This will build an executable you can run named "hello"

if you run it, it should output your message:

Golang on Manjaro

Step 3: Install Visual Studio Code

Now the Version of Visual Studio code that I want is in the AUR. So we'll install git.

sudo pacman -S git

I like to create a sources folder for building AURs(/home/jeremy/Sources) but you can put it where you want. To run AURs you must clone or download them first.

I will go back to the AUR page and copy the git clone url and clone it:

git clone https://aur.archlinux.org/packages/visual-studio-code-bin/?O=10&PP=10

Next, run makepkg -i

Golang on Manjaro

I'm getting this error, and you might too because this is a brand new Manjaro install so I haven't installed the tools yet to make AUR packages.

To build AURs you'll have to install some tools:

Type in

pacman -S base-devel And you can choose 3 for bin utils, or enter for all. I'm going to install all because I'll be building a lot of things on this machine.

Now run

makepkg -i

Golang on Manjaro

And we're back in business.

Start Coding!

We load up visual studio code and now I'll go back to that projects folder.

It recommends the go extension so we'll go ahead and install it.

As you can see it recommends we install go outline.

We will just click on install all, and it will build in the dependencies we need.

Golang on Manjaro

Now this is all set up for us, it's super easy.

We can even run and debug the file.

Now I've put some simple code here to add a couple numbers.

I'll add in a break point and run it.

So it's that easy to set up a go development environment in Manjaro. It's simple and easy, so have fun and get to coding!!

Here's a video covering the same thing:

If you'd like to learn more about Golang, check out Pluralsight's courses on Golang today, there are tons of great courses to get you ramped up fast.

November 10, 2019

Jeff Carpenter (jeffcarp)

I'm Joining Waymo November 10, 2019 11:38 PM

Quick life update: I’ve left the Chrome team and joined Waymo (formerly the Google self-driving car project). It was a fantastic whirlwind 3 years working on infrastructure for Chromium and helping to–in a very small way–push the open web forward. On the team I launched wpt.fyi, a resource to help align the APIs of all browsers. I worked on syncing source code across repos. I launched a couple TensorFlow ML models.

Derek Jones (derek-jones)

Adjectives in source code analysis November 10, 2019 08:57 PM

The use of adjectives to analysis source code is something of a specialist topic. This post can only increase the number of people using adjectives for this purpose (because I don’t know anybody else who does 😉

Until recently the only adjective related property I used to help analyse source was relative order. When using multiple adjective, people have a preferred order, e.g., in English size comes before color, as in “big red” (“red big” sounds wrong), and adjectives always appear before the noun they modify. Native speakers of different languages have various preferred orders. Source code may appear to only contain English words, but adjective order can provide a clue about the native language of the developer who wrote it, because native ordering leaks into English usage.

Searching for adjective patterns (or any other part-of-speech pattern) in identifiers used to be complicated (identifiers first had to be split into their subcomponents). Now, thanks to Vadim Markovtsev, 49 million token-split identifiers are available. Happy adjective pattern matching (Size Shape Age Color is a common order to start with; adjective pairs are found in around 0.1% of identifiers; some scripts).

Until recently, gradable adjectives were something that I had been vaguely aware of; these kinds of adjectives indicate a position on a scale, e.g., hot/warm/cold water. The study Grounding Gradable Adjectives through Crowdsourcing included some interesting data on the perceived change of an attribute produced by the presence by a gradable adjective. The following plot shows perceived change in quantity produced by some quantity adjectives (code+data):

Gradable adjective ranking.

How is information about gradable adjectives useful for analyzing source code?

One pattern that jumps out of the plot is that variability, between people, increases as the magnitude specified by the adjective increases (the x-axis shows standard deviations from the mean response). Perhaps the x-axis should use a log scale, there are lots of human related response characteristics that are linear on a log scale (I’m using the same scale as the authors of the study; the authors were a lot more aggressive in removing outliers than I have been), e.g., response to loudness of sound and Hick’s law.

At the moment, it looks as if my estimate of the value of a “small x” is going to be relatively closer to another developers “small x“, than our relative estimated value for a “huge x“.

Carlos Fenollosa (carlesfe)

Tobias Pfeiffer (PragTob)

Slides: Elixir & Phoenix – Fast, Concurrent and Explicit (Øredev) November 10, 2019 05:39 PM

I had the great pleasure to finally speak at Øredev! I wanted to speak there for so long, not only because it’s a great conference but also because it’s in the city of Malmö. A city that I quite like and a city I’m happy to have friends in 🙂 Anyhow, all went well although […]

Ponylang (SeanTAllen)

Last Week in Pony - November 10, 2019 November 10, 2019 04:09 PM

Last week has seen a ton of improvements to our CI and release automation thanks to Sean T. Allen. We have also been working toward making ponyup the default for installing and managing the Pony compiler and tools. Three new members have been inducted into the Pony core team.

November 07, 2019

Andreas Zwinkau (qznc)

Pondering a Monorepo Version Control System November 07, 2019 12:00 AM

How a VCS designed for monorepos would look like and why I don't build it.

Read full article!

November 06, 2019

Pete Corey (petecorey)

The Collatz Sequence in J November 06, 2019 12:00 AM

I’ve been playing around quite a bit with the Collatz Conjecture after watching the latest Coding Train video on the subject. In an effort to keep my J muscles from atrophying, I decided to use the language to implement a Collatz sequence generator.

At its heart, the Collatz sequence is a conditional expression. If the current number is even, the next number in the sequence is the current number divided by two. If the current number is odd, the next number in the sequence is one plus the current number multiplied by three:

We can write each of these branches with some straight-forward J code. Our even case is simply the divide (%) verb bonded (&) to 2:


   even =: %&2

And our odd case is a “monadic noun fork” of 1 plus (+) 3 bonded (&) to multiply (*):


   odd =: 1 + 3&*

We can tie together (`) those two verbs and use agenda (@.) to pick which one to call based on whether the argument is even:


   next =: even`odd@.pick

Our pick verb is testing for even numbers by checking if 1: equals (=) 2 bonded to the residue verb (|).


   pick =: 1:=2&|

We can test our next verb by running it against numbers with known next values. After 3, we’d expect 10, and after 10, we’d expect 5:


   next 3
10
   next 10
5

Fantastic!

J has an amazing verb, power (^:), that can be used to find the “limit” of a provided verb by continuously reapplying that verb to its result until a repeated result is encountered. If we pass power boxed infinity (<_) as an argument, it’ll build up a list of all the intermediate results.

This is exactly what we want. To construct our Collatz sequence, we’ll find the limit of next for a given input, like 12:


   next^:(<_) 12

But wait, there’s a problem! A loop exists in the Collatz sequences between 4, 2, and 1. When we call next on 1, we’ll receive 4. Calling next on 4 returns 2, and calling next on 2 returns 1. Our next verb never converges to a single value.

To get over this hurdle, we’ll write one last verb, collatz, that checks if the argument is 1 before applying next:


   collatz =: next`]@.(1&=)

Armed with this new, converging collatz verb, we can try finding our limit again:


   collatz^:(<_) 12
12 6 3 10 5 16 8 4 2 1

Success! We’ve successfully implemented a Collatz sequence generator using the J programming langauge. Just for fun, let’s plot the sequence starting with 1000:


   require 'plot'
   plot collatz^:(<_) 1000

November 04, 2019

Andrew Montalenti (amontalenti)

Work is a Queue of Queues November 04, 2019 09:31 PM

Do you ever get that feeling like no matter how hard you work, you just can’t keep up?

This isn’t a problem uniquely faced by modern knowledge workers. It’s also a characteristic of certain software systems. This state — of being perpetually behind on intended work-in-progress — can fall naturally out of the data structures used to design a software system. Perhaps by learning something about these data structures, we can learn something about the nature of work itself.

Let’s start with the basics. In computer science, one of the most essential data structures is the stack. Here’s the definition from Wikipedia:

… a stack is a data type that serves as a collection of elements, with two principal operations: (1) “push”, which adds an element to the collection; and (2) “pop”, which removes the most recently added element that was not yet removed. The order in which elements come off [is known as] LIFO, [or last in, first out]. Additionally, a “peek” operation may give access to the top without modifying the stack.

From here on out, we’ll use the computer science (mathematical) function call notation, f(), whenever we reference one of the operations supported by a given data structure. So, for example, to refer to the “push” operation described above, we’ll notate it as push().

I remember learning the definition of a stack in college and being a little surprised at “LIFO” behavior. That is, if you push() three items onto a stack — 1, 2, and 3 — when you pop() the stack, you’ll get the last item you added — in this case, 3. This means the last item, 3, is the first one pop()‘ed off the stack. Put another way, the first item you put on the stack, 1, only gets processed once you pop() all the other items — 3, 2 — off the stack, and then pop() once more to (finally) remove item 1.

Practically speaking, this seems like a “frenetic” or “unfair” way to do work — you’re basically saying that the last item always gets first service, and so, if items are push()’ed onto the stack faster than they are pop()’ed, some items will never be serviced (like poor item 1, above).

That’s when you tend to learn that stacks aren’t usually used to track “work-in-progress”. Instead, they are used to do structured processing , e.g. stack machines, or stack-based programming languages.

The alternative data structure to the stack which allows you to do work “fairly” is the queue, defined thusly:

… a queue is a collection in which the entities in the collection are kept in order and the principal (or only) operations on the collection are the addition of entities to the rear terminal position, known as “enqueue”, and removal of entities from the front terminal position, known as “dequeue”. This makes the queue a FIFO data structure, [or, first-in, first-out].

In modern work, we tend to think about work in queues, because this seems like the “fair” data structure. For example, your list of bugs is typically managed like a queue: the bugs are serviced in such a way that the longer ago the bug was filed, the more likely it is to have been either completed or discarded, whereas more-recently-filed bugs can wait for triage.

Some teams even assign priorities, which makes it behave more like a priority queue.

Email is interesting, because most people wish email were a queue, and many people try to turn email into a queue using a number of tools and practices, but, in its standard implementation, email behaves much more like a stack. (Some people joke that “email is your to-do list, but made by other people.”)

But just thinking in terms of the general default workflow, we can recognize that the latest email you receive goes to the top of your inbox. And you tend to pop() and peek() items at the top of your inbox. That means some unlucky emails can end up at the bottom of the stack, and if your intake of new emails overtakes your ability to process the stack, some emails will be altogether neglected. Just like “poor item 1” in the stack we worked through above. This feels even worse when you reflect on how anyone else can push() items onto your stack, but only you can pop() items.

Slack (and other real-time communication tools) are likewise much closer to a stack. But with an even odder behavior. I might describe it as a “fixed-length stack with item drop-off”. That is, items are push()’ed onto your stack during the workday, and you have to decide whether to peek() or pop() the stack at any given moment in time. But, as more and more items are push()’ed onto the stack, older items start to automatically get expired or removed from the stack. The stack can only have 500-or-so items on it (this being the typical “scrollback” that people can tolerate before getting annoyed or bored) and old items are simply discarded.

Software project plans, meanwhile, are typically assembled as a kind of queue of queues. Let me explain what I mean.

First, some queues are organized: bugs, product ideas, customer requests, team technical debt payback projects, experiments. Then, a person or group — typically a product manager — tries to create a “queue of queues”, or a single unified queue — the project plan — made up of items from the many smaller queues.

You might decide that certain big bugs very much need to be fixed in the next iteration. Certain technical debt projects cannot wait, due to their risk. And, with the remaining estimated time and capacity available, you attempt to create an ordered list of items among product ideas and customer requests.

Once a product manager stares at this list, it has to be further ordered to remove bottlenecks and dependencies — some parts of the plan may need to be done before other parts. And then, the plan begins — and we track its progress by how close to empty the queue gets. We cut scope by dropping items out of the queue, and we prevent anxiety by splitting large queue items into many smaller queue items, representing their sub-parts.

It’s interesting to reflect on the fact that, in our work life, stacks and queues are almost like mortal enemies.

That is, if you’re working from the ordered queue of items — and you have settled on the queue and its order being roughly correct — then the worst thing you can do is peek() at items or pop() items off the various stacks in your work life. These items haven’t formally entered the work queue, and only have “priority” due to their recency, which is a very arbitrary thing.

The ideal situation would be that once you’ve decided to work on a given queue, as an individual or a team, all the stack-based workflows would shift from push()’ing onto your stack and gaining your attention, to merely enqueue()’ing onto your backlog queue, to get your ordered attention at a later date.

The “kanban” concept of flow reflects on this a lot with regard to teams. It tries to limit “work-in-progress” (WIP) to those items that a team has capacity to handle, and it suggests isolating a team from other sources of new work which might distract from pushing the WIP items to completion.

It rightly recognizes that in many workplaces, a team is desperately trying to work through its committed FIFO items, but being told to pay attention to a parallel LIFO workstream. And that this tends to lead to missed deadlines, death marches, as well as work dissatisfaction.

Effective executives have learned to ask systematically and without coyness: “What do I do that wastes your time without contributing to your effectiveness?” To ask this question, and to ask it without being afraid of the truth, is a mark of the effective executive.
-Peter Drucker

I’ve personally found myself in a situation where I have to apply this thinking even to the work of a single person: myself. This only happened to me in the last few years, where several new “item stacks” began to appear before me.

Yes, I have the common technology executive problem of a busy e-mail inbox and Slack @-mention list. Those are universal as a company’s size scales and as an executive has to apply the “Andy Grove algorithm” for keeping the wheels of the organization turning.

Andy Grove talks a lot about the process of management in his famous book, “High-Output Management”. Here, he uses the simple analogy of “how to prepare breakfast” to talk about the idea of identifying “limiting steps” in any parallel process, e.g. in this case, boiling eggs.

But, there’s more than just the overflowing inbox in email and the never-ending stream of Slack.

My calendar is also a stack: it’s block-scheduled with 1:1 meetings with staff, work anniversaries, performance reviews, prospective sales conversations, partner checkpoints, hiring calls, board meetings, investor update calls, and so on. Lots of push() and peek(), with a rate of pop() necessarily limited by the passage of time.

Especially in dealings with external parties, the effect of “calendar stack intake flow” is pernicious: it’s very hard to tell an important customer or partner, “Yes, I’d love to meet with you — but you’ll need to wait until my queues clear out in the next 2-3 months.” It’s even harder to tell this to an employee who has a sensitive work-related issue that can’t be handled elsewhere in the organization.

To tackle these problems, I’ve applied my hacker brain, perhaps unsurprisingly. (“When all you have is a hammer.”) I started with email, and asked myself, “How can I automate email and turn it into a proper FIFO work queue?” I did this with a couple of tricks.

I wrote a JavaScript “Google Apps Script” program that has a JSON configuration file that aggressively filters my inbox, using one of three label types: @@now, @later, and @never. This script uses the GMail API and runs against my inbox every 15 minutes and automatically cleans the whole thing, using a number of advanced GMail search queries and a nice system of nested labels. I borrowed the “now” vs “later” terminology from David Allen’s “GTD” framework. The @@now label is an indication that this is an email worth responding to immediately. I have very specific filters (rules) that identify these emails: for example, emails addressed from my co-founder or executive team, and addressed to me, alone. Or, an email from one of our critical alerting systems. I have 50+ such rules, and I change them over time. Here is a view of just ~15 lines of this 800-line script:

An example of a @later message would be a message addressed to a team’s email group, a product release email, a usage/cost report, or an email newsletter subscription. As for @never, that’s mostly spam, but might also include record-keeping emails like invoices and bills.

I gave the @@now label the “double-at-sign (@@)” prefix on purpose: it’s the only label with such a prefix in GMail. As a result, the hotkey x l @ @ for me serves as a shorthand to label a given email message as “now”. (Also: the @@ prefix makes sure it always appears at the top of my label list, making it easy to use conveniently in mobile contexts.) As a result, once my email inbox is cleaned out, I can scan it looking for items I didn’t identify as “now”, but which could benefit from a response soon. I then hit x l @ @ and hit ENTER to label that message as @@now. You can see a little glimpse into my @@now messages in GMail below, which includes a mixture of automatically-marked messages to process today, as well as manually triaged ones:

I use the @@now label search to look at my items. I also prefer to look at the items in chronological order (remember: I want a FIFO queue), starting from the bottom, rather than the GMail default of reverse chronological (the default of a LIFO stack).

Implementing this system had a profound effect on my relationship with email. Prior, email was something that gave me a lot of daily anxiety. I received hundreds of messages per day, and on some very bad days, it might even have been thousands. (Yes, alert fatigue is very real.) But, now my actual inbox usually gets a few hundred @later items, and only a handful of @now items, as well as a handful more of items that I need to manually filter/triage.

Once I had properly converted my email inbox from a stack to a queue, and quantified its effect in terms of tens of saved hours per month, I started to look for this pattern everywhere.

In Slack, I found a simple solution, in a somewhat underutilized feature that Slack has called “Starred Items”. You see, you can mark every message in Slack with a “Star”. Doing so has only one effect: it shows up in your own “Starred Items” list, which is revealed via a star icon in the top-right of the app. Here is how one of my starred items looks in the Slack app:

The reason this is nice is that the items are shown in the starred items list in reverse chronological message order. (It’d be great if you could reverse this order, but, alas.) What this lets me do, though, is to “sample the stream” (or, tame that fixed-length stack I mentioned earlier). As I come across items, especially from my @-mentions and DMs, I can star them. And now they are all in one place, I can scan the starred item list to take action on each of them. Typically, I’ll convert the Slack message either into an email action, or a task in one of my other “queues of record”, like Todoist.

What’s also nice is that if you unstar an item, it’s like removing it from your temporary Slack queue. Once you have taken action on all your starred items, you’re at “Slack inbox zero”. Yes, I recognize the irony of “turning Slack into an inbox” when its whole market positioning is “the end of email”. But remember, email was a stack, too, before I tamed it with automation.

So, where else do my work queues live? For me personally, there are four other places: (1) GitHub — which has issues, pull requests, and project boards, and this is also where all my team’s work lives. (2) Todoist — this is where my personal and professional to-do list lives, and it also serves as my GTD-style “thought inbox”. (3) Notion — which is where my team’s knowledge base and wiki lives, as well as our project plans in kanban board form. And (4) Trello — which is a historical/personal relic, where I maintain a “crapban board” that’s shared with my wife for personal errands, vacation plans, and life events. So those are the work queues that live beyond my life of email, calendar, and real-time chat. Here is the current view of my “crapban board” in Trello:

As might be exceedingly obvious at this point, if I’m to survive this onslaught of stacks and queues, I have to create a “master queue”. It’s not good enough to merely convert every stack to a queue. That’s a good start, but now I still have several queues to contend with. I need “the queue of queues” that manages all of my work, across all my queues.

Historically, what I used for that was a single Todoist view — called simply “Today” — since it was quite easy to add tasks there, re-order them, and link to other systems from there. Here is how that looks in Todoist:

Lately, I’ve been treating Todoist more like an intake/staging area, and been using a tool called Sunsama to plan out my work days. Basically, Sunsama is a personal kanban system that lets you visually lay out your work tasks alongside your calendar, treating your calendar as the main constraint. And it automatically links to tools I was already using — namely Google Calendar, GitHub, Trello, and Todoist — making it easier for me to get the global “queue of queues” view. Here is how that looks:

My daily habit has therefore become: (1) review my email inbox, and process items in the queue there, adding Todoist tasks as necessary (convert email stack into email message & task queue); (2) process Slack notifications and star relevant items (convert Slack stack into a Slack message & task queue); (3) organize and plan my day in Sunsama across my Google Calendar, Todoist, and even other systems like Trello and GitHub, treating working hours and my calendar as a constraint. Sometimes, I’ll plan two or three days ahead, but usually this is a daily ritual.

Once I finish my daily planning ritual, I “burn the bridges”. The Sunsama personal kanban view becomes the only place I look for “what I’m doing today”. And I let all the other work stacks in my life accumulate, without peek()ing or pop()ing. The only exception, of course, would be an emergency, outage, or something like that. I also use Toggl to track my actual time spent, as a way of keeping my time estimates honest. Here’s a view of my trailing 30-day time tracking:

Sunsama is a nicely-designed personal kanban tool, but I have seen people achieve the same organizing principal using their calendar itself, a simple written list of daily priorities, or an isolated digital task list that is habitually emptied every day. The key thing is that you actually perform a daily planning ritual with a realistic set of tasks, and that you actually “burn the bridges” and ignore the stacks! Sunsama has a UX around this that can help turn it into a pleasant daily habit.

Getting things done is about draining your committed work queue to zero. What I’ve learned is that too much of our modern work life is set up like a bunch of stacks, when we should really have a single queue.

This isn’t too big a deal when the rate of stack item push()’ing is low and manageable, but it can spiral out of control when stacks grow unbounded, leading to stack overflow.

The way I have managed my own personal work also has implications for how I manage teams. Almost by definition, any modern software team with more than a handful of people has this problem. To tame the work, we need to learn to enqueue(), re-order, commit, and dequeue() with focus and satisfaction, even as a team. We also need to let the rest of world enqueue() into our backlog, rather than push()’ing onto one or more monitored stacks.

Mastery of your focus and attention is really a matter of committing you and your team to a single work queue, and ensuring that other work items “get in line”. If your team operates off a single queue, and each individual operates off their own single queue of personal tasks, then you truly do get “continuous flow”, and everything hums. Queues all the way down leads to continuous completion of works-in-progress, which is, after all, The Goal.

I’ve come to realize that this requires an active management process. The default behavior of our work tools has a “frecency” bias that behaves like an unruly stack — frequent item push(), and recent item peek() and pop(). It takes leadership to create the unified queue, and to create the workflows that de-prioritize frequent/recent requests into a backlog queue.

Hopefully this reflection will get you to start thinking about how to recognize when a given workflow is behaving too much like a stack (or list of stacks), and how you might be able to hack it to behave more like a queue (or queue of queues).

I’d love to hear your tips and tricks on @amontalenti on Twitter, or in the comments!


Tools mentioned in this post

  • Starred Items in Slack — turns a Slack stream of messages into a queue
  • GitHub — issues, pull requests, and project boards
  • Todoist — personal and professional to-do list, GTD-style “thought inbox”
  • Notion — team’s knowledge base and wiki, as well as project plans in kanban board form
  • Trello — personal “crapban board” for personal errands, vacation plans, and life events
  • Sunsama — visual personal kanban system representing my daily “queue of queues”
  • GMail Service + API in Google Apps Script — can help tame an unruly inbox… with JavaScript!

Acknowledgements

Thank you to Jeff Bordogna, Ashutosh Priyadarshy, and Aakash Shah for reviewing and providing feedback on earlier drafts of this essay.

Gustaf Erikson (gerikson)

November 03, 2019

Derek Jones (derek-jones)

Student projects for 2019/2020 November 03, 2019 10:20 PM

It’s that time of year when students are looking for an interesting idea for a project (it might be a bit late for this year’s students, but I have been mulling over these ideas for a while, and might forget them by next year). A few years ago I listed some suggestions for student projects, as far as I know none got used, so let’s try again…

Checking the correctness of the Python compilers/interpreters. Lots of work has been done checking C compilers (e.g., Csmith), but I cannot find any serious work that has done the same for Python. There are multiple Python implementations, so it would be possible to do differential testing, another possibility is to fuzz test one or more compiler/interpreter and see how many crashes occur (the likely number of remaining fault producing crashes can be estimated from this data).

Talking to the Python people at the Open Source hackathon yesterday, testing of the compiler/interpreter was something they did not spend much time thinking about (yes, they run regression tests, but that seemed to be it).

Finding faults in published papers. There are tools that scan source code for use of suspect constructs, and there are various ways in which the contents of a published paper could be checked.

Possible checks include (apart from grammar checking):

Number extraction. Numbers are some of the most easily checked quantities, and anybody interested in fact checking needs a quick way of extracting numeric values from a document. Sometimes numeric values appear as numeric words, and dates can appear as a mixture of words and numbers. Extracting numeric values, and their possible types (e.g., date, time, miles, kilograms, lines of code). Something way more sophisticated than pattern matching on sequences of digit characters is needed.

spaCy is my tool of choice for this sort of text processing task.

Ponylang (SeanTAllen)

Last Week in Pony - November 3, 2019 November 03, 2019 03:34 PM

Pony 0.33.0 is out! This release includes changes to the pony runtime options and a major step toward providing prebuild binaries for Linux platforms. This is the first Ponyc release availible as tar.gz packages for x86-based Linux distributions. The ponyup tool will soon support managing release and nightly installations of pony toolchains.

Carlos Fenollosa (carlesfe)

Bogdan Popa (bogdan)

Announcing setup-racket November 03, 2019 08:00 AM

GitHub Actions are going to become generally-available next week so I created an action for installing Racket. You can find it on the marketplace. Here’s what a minimal CI configuration for a Racket package might look like:

November 02, 2019

Jeremy Morgan (JeremyMorgan)

Getting Started with Haxe November 02, 2019 01:25 AM

The Haxe Foundation recently released Haxe 4.0 and I decided to check it out. Here's what's new in version 4.

Haxe runs on Windows, Mac, and Linux. You can download it here.

What is Haxe?

According to the website:

Haxe is an open-source high-level strictly-typed programming language with a fast optimizing cross-compiler.

So the high level strictly typed programming language makes sense, but a fast optimizing cross compiler? What's that about?

So the general idea here is "one language to rule them all". You write your application in Haxe, then it compiles to another language to target a platform. Basically it treats the output language (like JavaScript, C#, Python, etc) as bytecode for your application.

So I decided to try that out.

Get started with Haxe

I was delightfully surprised to see a bunch of extensions for Haxe in Visual Studio Code.

Hello World

So the hello world in Haxe is simple:

``` class HelloWorld {

static public function main() {
    trace("Hello World");
}

} ```

This looks pretty much like any other C based language you've looked at. We can clearly see a class defined with a static function, and in that is "trace"? (weird name for an output function) and the text to be displayed. So first let's build a javascript file from it:

haxe -main HelloWorld -js HelloWorld.js

The file runs and it outputs this:

``` // Generated by Haxe 4.0.0+ef18b627e (function ($global) { "use strict"; var HelloWorld = function() { }; HelloWorld.main = function() {

console.log("HelloWorld.hx:3:","Hello World");

}; HelloWorld.main(); })({}); ```

Let's see if it works. I included it in an HTML file and got the following:

Get started with Haxe

Cool it works!

So I decided to play around with it a little:

``` class HelloWorld {

static public function main() {

    var today = Date.now();
    trace("Today is " + today);
}

} ```

And then I generated another JS file:

Get started with Haxe

That's a little crazy, but then I load it up in a browser:

Get started with Haxe

And it works! It's written the date to the console. Pretty cool.

Let's try another language.

Hello World in Python

So enough playing around with JavaScript, let's generate our hello world in Python:

haxe -main HelloWorld -python HelloWorld.py

I run it and it generates the following code:

Get started with Haxe

Wow, hello world in only 164 lines of code! great. So I run it:

Get started with Haxe

And it looks like it works, with the exact code I used for JavaScript. So far so good.

So let's run my date code again. I'm using the exact same Haxe file:

``` class HelloWorld {

static public function main() {

    var today = Date.now();
    trace("Today is " + today);
}

} ``` and it generates this:

Get started with Haxe

Wow. Even more insane code to generate... a date? Well ok. I run it.

Get started with Haxe

Cool! It actually worked with zero modification to the original haxe file.

Instead of the 486 line python file haxe generated I could have simply written:

import datetime print ("Today is %s" % datetime.datetime.now())

But I digress. This is an edge case and not a thorough test so I'll withhold judgment for now, I'm sure haxe has to scaffold a lot of things and do some setup for when you truly use the language to develop a real application.

Let's try another language.

Hello World in C++

So with C++ and given what I've seen so far I have no idea what to expect.

With this one I have to install the hxcpp library, but it's easy enough:

haxelib install hxcpp

Then I just change my command a little here:

sudo haxelib install hxcpp

and after running it, appears to scaffold a bunch of stuff:

Get started with Haxe

and it generates a folder named HelloWorld.cpp:

Get started with Haxe

So I go in and run the exe.

Get started with Haxe

And it works!

again it generates a ton of code:

Get started with Haxe

but again it's probably just scaffolding for when people build actual applications with it.

The date display works the same:

Get started with Haxe

So this is pretty cool stuff! Imagine something that generates JavaScript, Python, or C++ from a single codefile.

Creating a Node webserver

So let's build a little node webserver, I stole the idea from this tutorial

first I have to install the Haxe nodejs library:

sudo haxelib install hxnodejs

Then I create a file named Main.hx

``` class Main { static function main() {

// Configure our HTTP server to respond with Hello World to all requests.
var server = js.node.Http.createServer(function(request, response) {
  response.writeHead(200, {"Content-Type": "text/plain"});
  response.end("Hello World\n");
});

// Listen on port 8000, IP defaults to 127.0.0.1
server.listen(8000);

// Put a console.log on the terminal
trace("Server running at 127.0.0.1:8000");

} } ```

and then compile (transpile?) it:

haxe -lib hxnodejs -main Main -js main.js

And then run it with nodeJS:

Get started with Haxe

And it runs! Pretty easy. The JavaScript it generated looks like this:

``` // Generated by Haxe 4.0.0+ef18b627e (function ($global) { "use strict"; var Main = function() { }; Main.main = function() {

var server = js_node_Http.createServer(function(request,response) {
    response.writeHead(200,{ "Content-Type" : "text/plain"});
    response.end("Hello World\n");
});
server.listen(8000);
console.log("Main.hx:13:","Server running at 127.0.0.1:8000");

}; var js_node_Http = require("http"); Main.main(); })({}); ```

Which actually isn't too bad!!

Takeaways

This isn't a thorough review of Haxe, just playing around. While doing this I didn't:

  • build a real application
  • utilize any of Haxe's advanced language features
  • use the Hashlink Virtual Machine

You can see some of the new capabilities of Haxe in this video

What I liked:

-It's easy to install - It's available in Windows, Mac, or Linux. I used Arch Linux for this article, at this time the repository still has 3.4.7-1 but I was able to grab the 4.0 binaries, point haxe to the standard library and get going. It took minutes.

-It worked as expected - I generated a lot of stuff just playing around with it, and I didn't find any "why does this happen" moments. Clearly some work has been put into making this solid, and it shows.

-It's a really cool concept - I like the idea of using one solid language to generate many kinds of outputs. Some may avoid this pattern but I think if it's done right it can reduce developer effort.

Why would anyone use this?

Get started with Haxe

So you might be asking yourself why anyone would use this? It's basically a fancy transpiler right?

Not exactly. While I didn't really build anything out I did research some of the language features and it appears to be a very mature language with a lot of features I like, such as:

  • Classes, interfaces and inheritance
  • Conditional compilation
  • Static extensions
  • Pattern matching
  • Anonymous structures

If I were building a real application with this, I'd be tempted to evaluate it based on what I've seen.

From my first impressions I think it would be a very good way to build solid JavaScript applications. Since JavaScript as a language is pretty lacking in many ways, writing it in something like Haxe and generating JavaScript might be a great way to build a larger application. It's the "JavaScript as bytecode" pattern I'm a fan of.

Remember languages are not meant for compilers, they're meant for you. Any language that you can get good at and feel productive in is worth a look, and maybe Haxe is the one for you.

Try it out and let me know what you think in the comments!

November 01, 2019

Ponylang (SeanTAllen)

0.33.0 Released November 01, 2019 05:56 PM

Pony version 0.33.0 is now available. The release includes a couple of small breaking changes in command line handling when starting up Pony applications.

October 31, 2019

Simon Zelazny (pzel)

SIMON'S FEED HAS MOVED, PLEASE UPDATE YOUR READER October 31, 2019 11:00 PM

Thank you for reading my blog! I'd be honored to have your continued readership.
Please re-subscribe at: https://pzel.name.
Once again, thank you!

Wesley Moore (wezm)

An Illustrated Guide to Some Useful Command Line Tools October 31, 2019 09:00 PM

Inspired by a similar post by Ben Boyter this a list of useful command line tools that I use. It's not a list of every tool I use. These are tools that are new or typically not part of a standard POSIX command line environment.

This post is a living document and will be updated over time. It should be obvious that I have a strong preference for fast tools without a large runtime dependency like Python or node.js. Most of these tools are portable to *BSD, Linux, macOS. Many also work on Windows. For OSes that ship up to date software many are available via the system package repository.

Last updated: 31 Oct 2019

About my CLI environment: I use the zsh shell, Pragmata Pro font, and base16 default dark color scheme. My prompt is generated by promptline.

Table of Contents

  • Alacritty — Terminal emulator
  • alt — Find alternate files
  • batcat with syntax highlighting
  • bb — System monitor
  • chars — Unicode character search
  • dot — Dot files manager
  • dust — Disk usage analyser
  • eva — Calculator
  • exa — Replacement for ls
  • fd — Replacement for find
  • hexyl — Hex viewer
  • hyperfine — Benchmarking tool
  • jqawk/XPath for JSON
  • mdcat — Render Markdown in the terminal
  • pass — Password manager
  • Podman — Docker alternative
  • Restic — Encrypted backup tool
  • ripgrep — Fast, intelligent grep
  • shotgun — Take screenshots
  • skim — Fuzzy finder
  • slop — Graphical region selection
  • Syncthing — Decentralised file synchronisation
  • tig — TUI for git
  • titlecase — Convert text to title case
  • Universal Ctags — Maintained ctags fork
  • watchexec — Run commands in response to file system changes
  • z — Jump to directories
  • zola — Static site compiler
  • Changelog — The changelog for this page

Alacritty Language: Rust

Alacritty is fast terminal emulator. Whilst not strictly a command line tool, it does host everything I do in the command line. It is the terminal emulator in use in all the screenshots on this page.

Homepage

alt Language: Rust

alt is a tool for finding the alternate to a file. E.g. the header for an implementation or the test for an implementation. I use it paired with Neovim to easily toggle between tests and implementation.

$ alt app/models/page.rb
spec/models/page_spec.rb

Homepage

bat Language: Rust

bat is an alternative to the common (mis)use of cat to print a file to the terminal. It supports syntax highlighting and git integration.

bat screenshot

Homepage

bb Language: Rust

bb is system monitor like top. It shows overall CPU and memory usage as well as detailed information per process.

bb screenshot

Homepage

chars Language: Rust

chars shows information about Unicode characters matching a search term.

chars screenshot

Homepage

dot Language: Rust

dot is a dotfiles manager. It maintains a set of symlinks according to a mappings file. I use it to manage my dotfiles.

dot screenshot

Homepage

dust Language: Rust

dust is an alternative du -sh. It calculates the size of a directory tree, printing a summary of the largest items.

dust screenshot

Homepage

exa Language: Rust

exa is a replacement for ls with sensible defaults and added features like a tree view, git integration, and optional icons. I have ls aliased to exa in my shell.

exa screenshot

Homepage

eva Language: Rust

eva is a command line calculator similar to bc, with syntax highlighting and persistent history.

eva screenshot

Homepage

fd Language: Rust

fd is an alternative to find and has a more user friendly command line interface and respects ignore files, like .gitignore. The combination of its speed and ignore file support make it excellent for searching for files in git repositories.

fd screenshot

Homepage

hexyl Language: Rust

hexyl is a hex viewer that uses Unicode characters and colour to make the output more readable.

hexyl screenshot

Homepage

hyperfine Language: Rust

hyperfine command line benchmarking tool. It allows you to benchmark commands with warmup and statistical analysis.

hyperfine screenshot

Homepage

jq Language: C

jq is kind of like awk for JSON. It lets you transform and extract information from JSON documents.

jq screenshot

Homepage

mdcat Language: Rust

mdcat renders Markdown files in the terminal. In supported terminals (not Alacritty) links are clickable (without the url being visible like in a web browser) and images are rendered.

mdcat screenshot

Homepage

pass Language: sh

pass is a password manager that uses GPG to store the passwords. I use it with the passff Firefox extension and Pass for iOS on my phone.

pass screenshot

Homepage

Podman Language: Go

podman is an alternative to Docker that does not require a daemon. Containers are run as the user running Podman so files written into the host don't end up owned by root. The CLI is largely compatible with the docker CLI.

podman screenshot

Homepage

Restic Language: Go

restic is a backup tool that performs client side encryption, de-duplication and supports a variety of local and remote storage backends.

Homepage

ripgrep Language: Rust

ripgrep (rg) recursively searches file trees for content in files matching a regular expression. It's extremely fast, and respects ignore files and binary files by default.

ripgrep screenshot

Homepage

shotgun Language: Rust

shotgun is a tool for taking screenshots on X.org based environments. All the screenshots in this post were taken with it. It pairs well with slop.

$ shotgun $(slop -c 0,0,0,0.75 -l -f "-i %i -g %g") eva.png

Homepage

skim Language: Rust

skim is a fuzzy finder. It can be used to fuzzy match input fed to it. I use it with Neovim and zsh for fuzzy matching file names.

skim screenshot

Homepage

slop Language: C++

slop (Select Operation) presents a UI to select a region of the screen or a window and prints the region to stdout. Works well with shotgun.

$ slop -c 0,0,0,0.75 -l -f "-i %i -g %g"
-i 8389044 -g 1464x1008+291+818

Homepage

Syncthing Language: Go

Syncthing is a decentralised file synchronisation tool. Like Dropbox but self hosted and without the need for a central third-party file store.

Homepage

tig Language: C

tig is a ncurses TUI for git. It's great for reviewing and staging changes, viewing history and diffs.

tig screenshot

Homepage

titlecase Language: Rust

titlecase is a little tool I wrote to format text using a title case format described by John Gruber. It correctly handles punctuation, and words like iPhone. I use it to obtain consistent titles on all my blog posts.

$ echo 'an illustrated guide to useful command line tools' | titlecase
An Illustrated Guide to Useful Command Line Tools

I typically use it from within Neovim where selected text is piped through it in-place. This is done by creating a visual selection and then typing: :!titlecase.

Homepage

Universal Ctags Language: C

Universal Ctags is a fork of exuberant ctags that is actively maintained. ctags is used to generate a tags file that vim and other tools can use to navigate to the definition of symbols in files.

$ ctags --recurse src

Homepage

watchexec Language: Rust

watchexec is a file and directory watcher that can run commands in response to file-system changes. Handy for auto running tests or restarting a development web server when source files change.

# run command on file change
$ watchexec -w content cobalt build

# kill and restart server on file change
$ watchexec -w src -s SIGINT -r 'cargo run'

Homepage

z Language: sh

z tracks your most used directories and allows you to jump to them with a partial name.

z screenshot

Homepage

zola Language: Rust

zola is a full-featured very fast static site compiler.

zola screenshot

Homepage

Changelog

  • 31 Oct 2019 -- Add bb, and brief descriptions to the table of contents
  • 28 Oct 2019 -- Add hyperfine

Comments



Previous Post: What I Learnt Building a Lobsters TUI in Rust

Bogdan Popa (bogdan)

Announcing nemea October 31, 2019 09:00 AM

I just open sourced one of the very first Racket code bases I’ve worked on. The project is called nemea and it’s a tiny, privacy-preserving, website analytics tracker. It doesn’t do anything fancy, but it does enough for my needs and, possibly, yours too so check it out!

October 29, 2019

Carlos Fenollosa (carlesfe)

Mass cellphone surveillance experiment in Spain October 29, 2019 02:18 PM

Spanish Statistics Institute will track all cellphones for eight days (2 min, link in Spanish, via)

A few facts first:

  • Carriers geotrack all users by default, using cell tower triangulation. They also store logs of your calls and sms, but that is a story for another day.
  • This data is anonymized and sold to third parties constantly, it's part of the carriers business model
  • With a court order, this data can be used to identify and track an individual...
  • ... which means that it is stored de-anonymized in the carrier servers
  • This has nothing to do with Facebook, Google or Apple tracking with cookies or apps
  • You cannot disable it with software, it is done at a hardware level. If you have any kind of phone, even a dumbphone, you are being tracked
  • It is unclear whether enabling airplane mode stops this tracking. The only way to make sure is to remove the SIM card and battery from the phone.

This is news because it's not a business deal but rather a collaboration between Spain's National Statistics Institute and all Spanish carriers, and because it's run at a large scale. But, as I said above, this is not technically novel.

On paper, and also thinking as a scientist, it sounds very interesting. The actual experiment consists on tracking most Spanish phones for eight days in order to learn about holiday trips. With the results, the Government expects to improve public services and infrastructures during holiday season.

The agreement indicates that no personally identifiable data will be transferred to the INE, and I truly believe that. There is nothing wrong about using aggregated data to improve public services per se, but I am concerned about two things.

First of all, Spain is a country where Congress passed a law to create political profiles of citizens by scraping social networks —fortunately rejected by the Supreme Court— and also blocked the entire IPFS gateway to silence political dissent.

I'd say it is quite reasonable to be a bit suspicious of the use that the Institutions will make of our data. This is just a first warning for Spanish citizens: if there is no strong backlash, the next experiment will maybe work with some personal identifiable data, "just to improve the accuracy of results". And yada yada yada, slippery slope, we end up tracking individuals in the open.

Second, and most important. This is no longer a topic of debate! We reached a compromise a few years ago, and the key word is consent.

All scientists have to obtain an informed and specific consent to work with personal data, even if it is anonymous, because it is trivially easy to de-anonymize individuals when you cross-reference the anonymous data with known data: credit cards, public cameras, public check-ins, etc. In this case, once again, the Spanish institutions are above the law, and also above what is ethically correct.

No consent, no data shared, end of story. Nobody consented to this nor were we given an option to opt out.

P.S. Of course, this is a breach of GDPR, but nobody cares.

Tags: law, security

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

October 28, 2019

Andrew Owen (yumaikas)

Nim: Scripting ease in a compiled language October 28, 2019 08:19 PM

Nim went 1.0 around the beginning of October, 2019. I’ve taken the chance to dig into it for the past three weeks, and while I can’t call myself an Nim expert as of yet, I think I can offer a decently informed opinion.

I suppose I should add a bit of context: I’ve used Go as a side-project language since 2014, mostly for Gills, PISC, and the blog you’re reading right now. It’s definitely served me well, but I’ve been growing rather tired of some of it’s… choices. It’s not been terrible, but I’ve grown to miss generics and exceptions from C#, and Go just tends to be verbose in general.

That being said, there are some things I like about Go: It compiles on both Windows and Linux, it makes executable files that are easy to deploy (at least for my amateur sysadmin sensibilities), and it gets web servers spun up relatively quickly, all things considered.

For a while, I got around Go’s verbosity by using PISC and then Lua as scripting languages for the projects I had going in Go. I see myself using Lua as a scripting layer for Gills for quite some time to come, just due to how nice a search/tag driven CMS is.

But, when Nim announced their 1.0 status, I decided to do a few small projects to see if it was the language I wanted to make my side-project language of choice for the next 5-10 years, the way that Go has been my side-project language of choice (outside of games, where I’ve mostly used Lua) for the past 4 years.

Project Zero: Porting Jumplist

So, a project that I’ve landed on as a decent way to exercise a programming language without requiring the language to have a full web stack (which is a very heavy requirement) is to implement a directory jumplist manager. I first wrote one in bash that was an abomination of sed, awk, and bash that had slightly self-modifying code. The concept was to avoid calling into windows processes in git-bash, since that seemed to bring a non-trivial amount of lag.

When I wanted to bring the same jumping experience to cmd.exe as a result of using that a lot more at my new job, I opted to write a jumplist manager in Go and some wrapper batch scripts.

Porting this program to Nim was mostly a pleasant experience. I was able to remove about half my code, because Nim had a pre-existing insertion-ordered table, so I didn’t have to write my own insertion-ordered key-value store like I did in Go. The only grumble I had from this project that I had some difficulties figuring out some of the type errors I got at first. But, seeing half of the code I wrote disappear, and the code that I wrote being much smaller was really nice.

Project One: Porting kb_wiki

After I ported the Jumplist script, I had a pretty solid idea of my next target: The various instances of the kilobyte_wiki software. This was software I’d written in Erlang a while back. It was still running in tmux panes because I’d not figured out how to do proper erlang releases while using erlang.mk as my build system. (Side note: I recommend trying out rebar3 over erlang.mk, as with erlang.mk, you have debug a complicated makefile on top of the erlang tools. From what I’ve heard, rebar3 is more user-friendly than erlang.mk, which was difficult for this erlang and make newb to use). I also wanted to stand up an instance for tracking various articles and the best comments I’d found on Lobsters and Hacker News in a given month. In the process of porting kb_wiki from Erlang to Nim, I ended up finding a few things that kinda confused me at first glance, but that make sense in retrospect.

First, the standard library database modules only return database results as seq[string], roughly a List<string> if you’re from C#, or an ArrayList<string> if you’re from Java. This is a deliberate decision to optimize for the likely case where most of the data is going back into other strings in other parts of the application most of the time, and to try to help keep the database code from having to expose different sets of types for different databases, (as I gathered from talking Araq, the creator of Nim on IRC). While it’s not the choice I’d go with for a database application where I was heavily concerned with I/O performance, it’s certainly not a terrible choice for my more standard Sqlite+Web Server type stuff I’ve done a lot of lately.

Project One and a half: Understanding Jester

Also, though it currently (in October of 2019) not look the most kept up to date, Jester is definitely very nice to use as a web framework. Here are a few things that I ran across when porting wiki from Erlang to Nim:

  • You’ll want to make sure to read up on the settings: macro if you want to set what ports and hostnames your web server is binding to. I use it in kbwiki, if you want a small example.
  • cond expressions in the router that evaluate to false just won’t match, so you’ll get 404 errors, which can make debugging tricky. I recommend making them if statements if you’re trying to debug why a route isn’t matching, and then use your debugging method of choice to figure out what’s going on.
  • For now, Jester only properly supports HTML forms that use an enctype="multipart/form-data" in their HTTP posts. I don’t think this is a major problem, but it tripped me up when I was trying to forms to work on my website, so I figured I’d mention it here to save the next person making websites some time.
  • The Jester README current states that it requires a devel version of the Nim compiler, but I think at least the version in Nimble works fine with Nim 1.0. I am investigating this, but for now, it’s not something huge to worry about.
  • Jester is a bit underdocumented, which means that understanding certain parts will require reading it’s source. I didn’t have much difficulty there, and I’d like to help with the docs for it, as I have time. This paragraph is a first attempt at helping with them.

Cool things learned from two and a half projects

So, I had already learned from my first project in Nim that it was a lot more concise than Go, but it was in this second project where that truly became evident. See, Nim is the language I’ve had the easiest experience creating executables in. If you look at the code for kb_wiki, you’ll notice the main app, kbwiki.nim, and that it requires database.nim, kb_config.nim and views.nim. Unrelated to the main kbwiki app are mdtest.nim, todo.nim, and scraper.nim. They are all separate applications that can be compiled into executables using nim c mdtest, nim c todo or nim c scraper, and they can use other nim files.

But, because files are that easy to re-use, and executables are the easy to make, it makes code re-use much easier, at least in the small, than I’ve ever had it be until now. In Go, creating new executables involves a lot more ceremony, in terms of making sure that a new folder structure is created and such. In C#, unless you use LinqPad, you have either MSBuild/VS projects, or you have to create a whole folder and use then dotnet tool to create projects and explain to the C# compiler how to turn a set of source files into an executable. But, in Nim, you just write your file, install the right things via nimble, and then you can compile it into an executable.

Project Three: Scripts and utilities at work

This nimbleness of executable creation makes Nim the first complied language I’d use for general purpose command-line tools, where the value of writing 10 lines of code isn’t dominated by the process of setting up a folder structure and project files to get everything set up.

So, I’ve made some small utilities for my day job. An program for spitting out “banners”, aka the “-” with a user-specified color in the background. A little command-line punch-clock for helping me keep track of what time I’m working (when working flex hours, it’s helpful to be able to punch in and out for personal tracking). A tool for copying these little tools from the directory they get developed in to the designated “My command-line tools live here to be on the $PATH” directory.

Project the Fourth: Scraping kb_wiki

Most recently, like I mentioned above, I wrote a scraper that can be pointed at instances of kbwiki and download their content. The motivation for the project was pretty simple: I had two instances of the Erlang version of kbwiki floating around, and I wanted to get the content hosted on them in Sqlite files.

The Erlang versions of the wiki use dets tables, because that’s what was easiest to get access to when I was writing the wiki in the first place. So, at first, I looked into trying to connect my Erlang code to SQLite. But, after digging into that for about 90 minutes, I was running into issues around gcc compiler versions on my linux box. Not wanting to spend until the wee hours of the morning trying to fight both Erlang, GCC, and Sqlite, I decided to switch from trying to dump the content into Sqlite files from Erlang, to writing a Nim scraper that basically gloms over the HTML, logs in as an admin, and then downloads all of the markdown for the articles. That took me just under two and a half hours (give or take) to finish, and then I was able to scrap down the wikis in a matter of seconds.

After that, I moved https://idea.junglecoder.com and https://drift.junglecoder.com from running on Erlang to running on Nim. And I didn’t have to fuss with certificates or the like, due to them running behind nginx. I also put up https://feed.junglecoder.com, where I’ve been keeping a feed of what seem to me to be notable comments, articles, and such.

Conclusions

So, having done 4.314 things in Nim in the space of just under a month, I’ve got to say that I quite like Nim. I don’t think it’s quite ready to be a programmer’s first language, a la Python or Javascript, mostly due to a lack of beginner friendly documenation. I also think it still has a lot of room to grow. But I’d like to try to help it grow. Jester is super handy for making websites with. I’ve discovered a whole new level of code-reusability, and the language is just so darn handy to build small/medium things with. I’d like to try to build something bigger with it, probably something off my idea backlog.

David Wilson (dw)

Operon: Extreme Performance For Ansible October 28, 2019 12:30 PM

I'm very excited to unveil Operon, a high performance replacement for Ansible® Engine, tailored for large installations and offered by subscription. Operon runs your existing playbooks, modules, plug-ins and third party tools without modification using an upgraded engine, dramatically increasing the practical number of nodes addressable in a single run, and potentially saving hours on every invocation.

Operon can be installed independently or side-by-side with Ansible Engine, enabling it to be gradually introduced to your existing projects or employed on a per-run basis.

Here is the runtime for 416 tasks of common.yml from DebOps 0.7.2 deployed via SSH:


Operon reduces runtime by around 60% compared to Ansible for a single node, but things really heat up for large runs. See how runtime scales using a 24 GiB, 8 core Xeon E5530 deploying to Google Cloud VMs over an 18 ms SSH connection:


Each run executed 416 tasks per node, including loop items. In the 1,024 node run, 490,496 tasks executed in 54 minutes, giving an average throughput of 151 tasks per second. Linear scaling is apparent, with just under 4x time increase moving from 256 to 1,024 nodes.

The 256 node Ansible run was cancelled following a lengthy period with no output, after many re-runs to iteratively reduce forks from 40 to 10, so Ansible would not exceed RAM. A 13 fork run may have succeeded, but further attempts were abandoned having consumed two days worth of compute time.

In the final run, Ansible completed 89% of tasks in 6h 13m prior to cancellation:

256 Nodes, DebOps common.yml

Operon deployed to all nodes in parallel for every run presented. Operon has imperceptible overhead executing 1,024 forks given 8 cores and cleanly scales to at least 6,144 given 24 cores. Had these results been recorded using 16 cores rather than 8, we expect the 1,024 node run would complete in 27 minutes rather than 54 minutes.

Memory usage is highly predictable and significantly decoupled from forks. With 256 forks, Operon uses 4x less RAM than Ansible uses for 10 forks, while consuming at least 15x less controller CPU time to achieve the same outcome.


This graph is crooked as the 64 node Ansible run executed with 40 forks, while the 256 node run executed with 10 forks. Ansible required 1.6 GiB per fork for the 256 node run, placing a severe restraint on achievable parallelism regardless of available RAM.

Operon is the progression of a design approach first debuted in Mitogen for Ansible. It inherits massive low-level efficiency improvements from that work, already depended on by thousands of users:



Beyond software

Performance is a secondary effect of a culture shift towards stronger user friendliness, compatibility and cost internalization. There is a lot to reveal here, but to offer a taste of what's planned, I'm pleased to announce a forwards-compatible playbook syntax guarantee, in addition to restoration of specific Ansible Engine constructs marked deprecated.

.fonk { border-spacing: 15px; border-collapse: separate; } .fonk1 td { padding-right: 15px; padding-bottom: 15px; } .fonk td { width: 50%; }
include:
- include: "i-will-always-work.yml"

"with" loops
- debug: msg={{item}}
  with_items: ["i", "will", "always", "work"]
"squash actions"
- apt:
    name: "{{item}}"
  with_items: ["i", "will",
               "always", "work"]
hyphens in group names
  $ cat hosts
  [i-will-always-work.us.mycorp.com]
  host1

hash merging
  # I will always work
  [defaults]
  hash_behaviour = merge

The Ansible 2.9-compatible syntax Operon ships will always be supported, and future syntax deprecations in Ansible Engine do not apply in Operon. Changes like these harm working configurations without improving capability, and are a major source of error-prone labour during upgrades.

Over time this guarantee will progressively extend to engine semantics and outwards.

How can I get this?

Operon is initially distributed with support from Network Genomics, backed by experience and dedication to service unavailable elsewhere. If your team are gridlocked by deployments or fatigued by years of breaking upgrades, consider requesting an evaluation, and don't hesitate to drop me an e-mail with any questions and concerns.

Software is always better in the open, so a public release will happen when some level of free support can be provided. Subscribe to the operon-announce mailing list to learn about future releases.

Will Operon help Windows performance?

Yes. If you're struggling with performance deploying to Windows, please get in touch.

Will Operon help network device performance?

Yes. Operon features an architectural redesign that extends far beyond the transport layer, and applying to all connection types equally.

Is Operon a fork of Ansible?

No. Operon is an incremental rewrite of the engine, a small component of around 60k code lines, of which around a quarter are replaced. Every Ansible installation includes around 715k lines, of which the vast majority is independently maintained by the wider Ansible community, just as Operon is.

Will Operon help improve Ansible Engine?

Yes. Operon is already promoting improvement within Ansible Engine, and since it remains an upstream, an incentive exists to contribute code upstream where practical.

Is Operon free software?

Yes. Operon is covered by same GPL license that covers Ansible, and you are free to make use of the code to the full extent of that license.

Does Operon break compatibility?

No. Operon does not break compatibility with the standard module collection, plug-in interfaces, or the surrounding Ansible ecosystem, and never plans to. Compatibility is a primary deliverable, including to keep pace with future improvements, and backwards compatibility such as improved playbook syntax stability.

I target only one node, what can Operon do for me?

Operon will help ensure the continued marketability of skills you have heavily invested in. It offers a powerful new flexibility that previously could not exist: your freedom to choose an engine. Whether you use it directly or not, you already benefit from Operon.


David

h3 { color: #7f0000; padding-top: 1em; }

October 27, 2019

Derek Jones (derek-jones)

Projects chapter of ‘evidence-based software engineering’ reworked October 27, 2019 11:44 PM

The Projects chapter of my evidence-based software engineering book has been reworked; draft pdf available here.

A lot of developers spend their time working on projects, and there ought to be loads of data available. But, as we all know, few companies measure anything, and fewer hang on to the data.

Every now and again I actively contact companies asking data, but work on the book prevents me spending more time doing this. Data is out there, it’s a matter of asking the right people.

There is enough evidence in this chapter to slice-and-dice much of the nonsense that passes for software project wisdom. The problem is, there is no evidence to suggest what might be useful and effective theories of software development. My experience is that there is no point in debunking folktales unless there is something available to replace them. Nature abhors a vacuum; a debunked theory has to be replaced by something else, otherwise people continue with their existing beliefs.

There is still some polishing to be done, and a few promises of data need to be chased-up.

As always, if you know of any interesting software engineering data, please tell me.

Next, the Reliability chapter.

Ponylang (SeanTAllen)

Last Week in Pony - October 27, 2019 October 27, 2019 02:59 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

October 25, 2019

Gergely Nagy (algernon)

Firmware porting project update October 25, 2019 02:45 PM

A while ago I put up a project page, where I offered a few deals: send me a keyboard to port Kaleidoscope to, or donate to my Liberapay fund, and once there's enough funds there, I'll get a board from the wishlist, and do the porting work. There are some news to share about the project!

A few people contacted me about boards, but nothing came out of that yet. Others generously donated to the Liberapay fund, enough to allow me to order an Azeron keypad! I paid the VAT & shipping cost, but the rest has been covered from donations. Thank you, all of you!

Now to wait 'till the board gets here, and then I can figure out all the nuances of porting Kaleidoscope to something wild.

October 24, 2019

Unrelenting Technology (myfreeweb)

FreeBSD and custom firmware on the Google Pixelbook October 24, 2019 12:00 AM

Back in 2015, I jumped on the ThinkPad bandwagon by getting an X240 to run FreeBSD on. Unlike most people in the ThinkPad crowd, I actually liked the clickpad and didn’t use the trackpoint much. But this summer I’ve decided that it was time for something newer. I wanted something..

  • lighter and thinner (ha, turns out this is actually important, I got tired of carrying a T H I C C laptop - Apple was right all along);
  • with a 3:2 display (why is Lenovo making these Serious Work™ laptops 16:9 in the first place?? 16:9 is awful in below-13-inch sizes especially);
  • with a HiDPI display (and ideally with a good size for exact 2x scaling instead of fractional);
  • with USB-C ports;
  • without a dGPU, especially without an NVIDIA GPU;
  • assembled with screws and not glue (I don’t necessarily need expansion and stuff in a laptop all that much, but being able to replace the battery without dealing with a glued chassis is good);
  • supported by FreeBSD of course (“some development required” is okay but I’m not going to write big drivers);
  • how about something with open source firmware, that would be fun.

The Qualcomm aarch64 laptops were out because embedded GPU drivers like freedreno (and UFS storage drivers, and Qualcomm Wi-Fi..) are not ported to FreeBSD. And because Qualcomm firmware is very cursed.

Samsung’s RK3399 Chromebook or the new Pinebook Pro would’ve been awesome.. if I were a Linux user. No embedded GPU drivers on FreeBSD, again. No one has added the stuff needed for FDT/OFW attachment to LinuxKPI. It’s rather tedious work, so we only support PCIe right now. (Can someone please make an ARM laptop with a PCIe GPU, say with an MXM slot?)

So it’s still gonna be amd64 (x86) then.

I really liked the design of the Microsoft Surface Book, but the iFixit score of 1 (one) and especially the Marvell Wi-Fi chip that doesn’t have a driver in FreeBSD are dealbreakers.

I was considering a ThinkPad X1 Carbon from an old generation - the one from the same year as the X230 is corebootable, so that’s fun. But going back in processor generations just doesn’t feel great. I want something more efficient, not less!

And then I discovered the Pixelbook. Other than the big huge large bezels around the screen, I liked everything about it. Thin aluminum design, a 3:2 HiDPI screen, rubber palm rests (why isn’t every laptop ever doing that?!), the “convertibleness” (flip the screen around to turn it into.. something rather big for a tablet, but it is useful actually), a Wacom touchscreen that supports a pen, mostly reasonable hardware (Intel Wi-Fi), and that famous coreboot support (Chromebooks’ stock firmware is coreboot + depthcharge).

So here it is, my new laptop, a Google Pixelbook.

What is a Chromebook, even

The write protect screw is kind of a meme. All these years later, it’s That Thing everyone on various developer forums associates with Chromebooks. But times have moved on. As a reaction to glued devices and stuff, the Chrome firmware team has discovered a new innovative way of asserting physical presence: sitting around for a few minutes, pressing the power button when asked. Is actually pretty clever though, it is more secure than.. not doing that.

Wait, what was that about?

Let’s go back to the beginning and look at firmware security in Chromebooks and other laptops.

These devices are designed for the mass market first. Your average consumer trusts the vendor and (because they’ve read a lot of scary news) might be afraid of scary attackers out to install a stealthy rootkit right into their firmware. Businesses are even more afraid of that, and they push for boot security on company laptops even more. This is why Intel Boot Guard is a thing that the vast majority of laptops have these days. It’s a thing that makes sure only the vendor can update firmware. Evil rootkits are out. Unfortunately, the user is also out.

Google is not like most laptop vendors.

Yes, Google is kind of a surveillance capitalism / advertising monster, but that’s not what I’m talking about here. Large parts of Google are very much driven by FOSS enthusiasts. Or something. Anyway, the point is that Chromebooks are based on FOSS firmware and support user control as much as possible. (Without compromising regular-user security, but turns out these are not conflicting goals and we can all be happy.)

Instead of Boot Guard, Google has its own way of securing the boot process. The root of trust in modern (>=2017) devices is a special Google Security Chip, which in normal circumstances also ensures that only Google firmware runs on the machine, but:

  • if you sit through the aforementioned power-button-clicking procedure, you get into Developer Mode: OS verification is off, you have a warning screen at boot, and you can press Ctrl-D to boot into Chrome OS, or (if you enabled this via a command run as root) Ctrl-L to open SeaBIOS.

    • Here’s the fun part.. it doesn’t have to be SeaBIOS. You can flash any Coreboot payload into the RW_LEGACY slot right from Chrome OS, reboot, press a key and you’re booting that payload!
  • if you also buy or solder a special cable (“SuzyQable”) and do the procedure a couple times more, your laptop turns into the Ultimate Open Intel Firmware Development Machine. Seriously.

    • the security chip is a debug chip too! Case Closed Debugging gives you serial consoles for the security chip itself, the embedded controller (EC) and the application processor (AP, i.e. your main CPU), and it also gives you a flasher (via special flashrom for now, but I’m told there’s plans to upstream) that allows you to write AP and EC firmware;
    • some security is still preserved with all the debugging: you can (and should) set a CCD password, which lets you lock-unlock the debug capabilities and change write-protect whenever you want, so that only you can flash firmware (at least without opening the case and doing very invasive things, the flash chip is not even SOIC anymore I think);
    • and you can hack without fear: the security chip is not brickable! Yes, yes, that means the chip is only a “look but don’t touch” kind of open source, it will only boot Google-signed firmware. Some especially paranoid people think this is An NSA Backdoor™. I think this is an awesome way to allow FULL control of the main processor and the EC, over just a cable, with no way of bricking the device! And to solve the paranoia, reproducible builds would be great.

You mentioned something about FreeBSD in the title?

Okay, okay, let’s go. I didn’t even want to write an introduction to Chromebooks but here we are. Anyway, while waiting for the debug cable to arrive, I’ve done a lot of work on FreeBSD, using the first method above (RW_LEGACY).

SeaBIOS does not have display output working in OSes that don’t specifically support the Coreboot framebuffer (OpenBSD does, FreeBSD doesn’t), and I really just hate legacy BIOS, so I’ve had to install a UEFI implementation into RW_LEGACY since I didn’t have the cable yet. My own EDK2 build did not work (now I see that it’s probably because it was a debug build and that has failing assertions). So I’ve downloaded MrChromebox’s full ROM image, extracted the payload using cbfstool and flashed that. Boom. Here we go, press Ctrl-L for UEFI. Nice. Let’s install FreeBSD.

The live USB booted fine. With the EFI framebuffer, an NVMe SSD and a PS/2 keyboard it was a working basic system. I’ve resized the Chrome OS data partition (Chrome OS recovers from that fine, without touching custom partitions), found that there’s already an EFI system partition (with a GRUB2 setup to boot Chrome OS, which didn’t boot like that o_0), installed everything and went on with configuration and developing support for more hardware.

(note: I’m leaving out the desktop configuration part here, it’s mostly a development post; I use Wayfire as my display server if you’re curious.)

So how’s the hardware?

Wi-Fi and Bluetooth

Well, that was easy. The Pixelbook has an Intel 7265. The exact same wireless chip that was in my ThinkPad. So, Wi-Fi works great with iwm.

Bluetooth.. if this was the newer 8265, would’ve already just worked :D

These Intel devices present a “normal” ubt USB Bluetooth adapter, except it only becomes normal if you upload firmware into it, otherwise it’s kinda dead. (And in that dead state, it spews interrupts, raising the idle power consumption by preventing the system from going into package C7 state! So usbconfig -d 0.3 power_off that stuff.) FreeBSD now has a firmware uploader for the 8260/8265, but it does not support the older protocol used by the 7260/7265. It wouldn’t be that hard to add that, but no one has done it yet.

Input devices

Keyboard

Google kept the keyboard as good old PS/2, which is great for ensuring that you can get started with a custom OS with a for-sure working keyboard.

About the only interesting thing with the keyboard was the Google Assistant key, where the Win key usually is. It was not recognized as anything at all. I used DTrace to detect the scancode without adding prints into the kernel and rebooting:

dtrace -n 'fbt::*scancode2key:entry { printf("[st %x] %x?\n", *(int*)arg0, arg1); } \
  fbt::*scancode2key:return { printf("%x\n", arg1);  }'

And wrote a patch to interpret it as a useful key (right meta, couldn’t think of anything better).

Touch*

The touchpad and touchscreen are HID-over-I²C, like on many other modern laptops. I don’t know why this cursed bus from the 80s is gaining popularity, but it is. At least FreeBSD has a driver for Intel (Synopsys DesignWare really) I²C controllers.

(Meanwhile Apple MacBooks now use SPI for even the keyboard. FreeBSD has an Intel SPI driver but right now it only supports ACPI attachment for Atoms and such, not PCIe yet.)

The even better news is that there is a nice HID-over-I²C driver in development as well. (note: the corresponding patch for configuring the devices via ACPI is pretty much a requirement, uncomment -DHAVE_ACPI_IICBUS in the iichid makefile too to get that to work. Also, upcoming Intel I²C improvement patch.)

The touchscreen started working with that driver instantly.

The touchpad was.. a lot more “fun”. The I²C bus it was on would just appear dead. After some debugging, it turned out that the in-progress iichid driver was sending a wrong extra out-of-spec command, which was causing Google’s touchpad firmware to throw up and lock up the whole bus.

But hey, nice bug discovery, if any other device turns out to be as strict in accepting input, no one else would have that problem.

Another touchpad thing: by default, you have to touch it with a lot of pressure. Easily fixed in libinput:

% cat /usr/local/etc/libinput/local-overrides.quirks
[Eve touchpad]
MatchUdevType=touchpad
AttrPressureRange=12:6

UPD 2019-10-24 Pixelbook Pen

The touchscreen in the Pixelbook is made by Wacom, and supports stylus input like the usual Wacom tablets. For USB ones, on FreeBSD you can just use webcamd to run the Linux driver in userspace. Can’t exactly do that with I²C.

But! Thankfully, it exposes generic HID stylus reports, zero Wacom specifics required. I’ve been able to write a driver for that quite easily. Now it works. With pressure, tilt, the button, all the things :)

Display backlight brightness

This was another “fun” debugging experience. The intel_backlight console utility (which was still the thing to use on FreeBSD) did nothing.

I knew that the i915 driver on Chrome OS could adjust the brightness, so I made it work here too, and all it took is:

  • adding more things to LinuxKPI to allow uncommenting the brightness controls in i915kms;
  • (and naturally, uncommenting them);
  • finding out that this panel uses native DisplayPort brightness configured via DPCD (DisplayPort Configuration Data), enabling compat.linuxkpi.i915_enable_dpcd_backlight="1" in /boot/loader.conf;
  • finding out that there’s a fun bug in the.. hardware, sort of:

    • the panel reports that it supports both DPCD backlight and a direct PWM line (which is true);
    • Google/Quanta/whoever did not connect the PWM line;
    • (the panel is not aware of that);
    • the i915 driver prefers the PWM line when it’s reported as available.

Turns out there was a patch sent to Linux to add a “prefer DPCD” toggle, but for some reason it was not merged. The patch does not apply cleanly so I just did a simpler hack version:

--- i/drivers/gpu/drm/i915/intel_dp_aux_backlight.c
+++ w/drivers/gpu/drm/i915/intel_dp_aux_backlight.c
@@ -252,8 +252,12 @@ intel_dp_aux_display_control_capable(struct intel_connector *connector)
         * the panel can support backlight control over the aux channel
         */
        if (intel_dp->edp_dpcd[1] & DP_EDP_TCON_BACKLIGHT_ADJUSTMENT_CAP &&
-           (intel_dp->edp_dpcd[2] & DP_EDP_BACKLIGHT_BRIGHTNESS_AUX_SET_CAP) &&
-           !(intel_dp->edp_dpcd[2] & DP_EDP_BACKLIGHT_BRIGHTNESS_PWM_PIN_CAP)) {
+           (intel_dp->edp_dpcd[2] & DP_EDP_BACKLIGHT_BRIGHTNESS_AUX_SET_CAP)
+/* for Pixelbook (eve), simpler version of https://patchwork.kernel.org/patch/9618065/ */
+#if 0
+            && !(intel_dp->edp_dpcd[2] & DP_EDP_BACKLIGHT_BRIGHTNESS_PWM_PIN_CAP)
+#endif
+                       ) {
                DRM_DEBUG_KMS("AUX Backlight Control Supported!\n");
                return true;
        }

And with that, it works, with 65536 steps of brightness adjustment even.

Suspend/resume

The Pixelbook uses regular old ACPI S3 sleep, not the fancy new S0ix thing, so that’s good.

On every machine with a TPM though, you have to tell the TPM to save state before suspending, otherwise you get a reset on resume. I already knew this because I’ve experienced that on the ThinkPad.

The Google Security Chip runs an open-source TPM 2.0 implementation (fun fact, written by Microsoft) and it’s connected via… *drum roll* I²C. Big surprise (not).

FreeBSD already has TPM 2.0 support in the kernel, the userspace tool stack was recently added to Ports as well. But of course there was no support for connecting to the TPM over I²C, and especially not to the Cr50 (GSC) TPM specifically. (it has quirks!)

I wrote a driver (WIP) hooking up the I²C transport (relies on the aforementioned ACPI-discovery-of-I²C patch). It does not use the interrupt (I found it buggy: at first attachment, it fires continuously, and after a reattach it stops completely) and after attach (or after system resume) the first command errors out, but that can be fixed and other than that, it works. Resume is fixed, entropy can be harvested, it could be used for SSH keys too.

Another thing with resume: I’ve had to build the kernel with nodevice sdhci to prevent the Intel SD/MMC controller (which is not attached to anything here - I’ve heard that the 128GB model might be using eMMC instead of NVMe but that’s unclear) from hanging for a couple minutes on resume.

Dynamic CPU frequency

At least on the stock firmware, the old-school Intel SpeedStep did not work because the driver could not find some required ACPI nodes (perf or something).

Forget that, the new Intel Speed Shift (which lets the CPU adjust frequency on its own) works nicely with the linked patch.

Tablet mode switch

When the lid is flipped around, the keyboard is disabled (unless you turn the display brightness to zero, I’ve heard - which is fun because that means you can connect a montior and have a sort-of computer-in-a-keyboard look, like retro computers) and the system gets a notification (Chrome OS reacts to that by enabling tablet mode).

Looking at the DSDT table in ACPI, it was quite obvious how to support that notification:

Device (TBMC) {
    Name (_HID, "GOOG0006")  // _HID: Hardware ID
    Name (_UID, One)  // _UID: Unique ID
    Name (_DDN, "Tablet Motion Control")  // _DDN: DOS Device Name
    Method (TBMC, 0, NotSerialized) {
        If ((RCTM () == One)) { Return (One) }
        Else { Return (Zero) }
    }
}

On Linux, this is exposed as an evdev device with switch events. I was able to replicate that quite easily. My display server does not support doing anything with that yet, but I’d like to do something like enabling an on-screen keyboard to pop up automatically when tablet mode is active.

Keyboard backlight brightness

I generally leave it off because I don’t look at the keyboard, but this was a fun and easy driver to write.

Also obvious how it works when looking at ACPI:

Device (KBLT) {
    Name (_HID, "GOOG0002")  // _HID: Hardware ID
    Name (_UID, One)  // _UID: Unique ID
    Method (KBQC, 0, NotSerialized) {
        Return (^^PCI0.LPCB.EC0.KBLV) /* \_SB_.PCI0.LPCB.EC0_.KBLV */
    }
    Method (KBCM, 1, NotSerialized) {
        ^^PCI0.LPCB.EC0.KBLV = Arg0
    }
}

Using the debug cable on FreeBSD

The debug cable presents serial consoles as bulk endpoints without any configuration capabilities. On Linux, they are supported by the “simple” USB serial driver.

Adding the device to the “simple” FreeBSD driver ugensatook some debugging. The driver was clearing USB stalls when the port is opened. That’s allowed by the USB spec and quite necessary on some devices. Unfortunately, the debug interface throws up when it sees that request. The responsible code in the device has a /* Something we need to add support for? */ comment :D

Audio?

The only thing that’s unsupported is onboard audio. The usual HDA controller only exposes the DisplayPort audio-through-the-monitor thing. The speakers, mic and headphone jack are all connected to various codecs exposed via… yet again, I²C. I am not about to write the drivers for these codecs, since I’m not really interested in audio on laptops.

Firmware is Fun

After the debug cable arrived, I’ve spent some time debugging the console-on-FreeBSD thing mentioned above, and then started messing with coreboot and TianoCore EDK2.

My discoveries so far:

  • there’s nothing on the AP console on stock firmware because Google compiles release FW with serial output off, I think to save on power or something;
  • me_cleanerneeds to be run with -S -w MFS. As mentioned in the --help, the MFS partition contains PCIe related stuff. Removing it causes the NVMe drive to detach soon after boot;
  • upstream Coreboot (including MrChromebox’s builds) fails to initialize the TPM, just gets zero in response to the vendor ID request. Funnily enough, that would’ve solved the resume problem without me having to write the I²C TPM driver for FreeBSD - but now that I’ve written it, I’d prefer to actually have the ability to use the TPM;
  • EDK2’s recent UefiPayloadPkg doesn’t support PS/2 keyboard and NVMe out of the box, but they’re very easy to add (hopefully someone would add them upstream after seeing my bug reports);
  • UefiPayloadPkg supports getting the framebuffer from coreboot very well;
  • coreboot can run Intel’s GOP driver before the payload (it’s funny that we’re running a UEFI module before running the UEFI implementation) and that works well;
  • but libgfxinit - the nice FOSS, written-in-Ada, verified-with-SPARK implementation of Intel GPU initialization and framebuffer configuration - supports Kaby Lake now!

    • however, we have a DPCD thing again with this display panel here - it reports max lane bandwidth as 0x00, libgfxinit interprets that as the slowest speed and we end up not having enough bandwidth for the high-res screen;
    • I’ve been told that this is because there’s a new way of conveying this information that’s unsupported. I’ll dig around in the Linux i915 code and try to implement it properly here but for now, I just did a quick hack, hardcoding the faster bandwidth. Ta-da! My display is initialized with formally verified open source code! Minus one blob running at boot!
  • persistent storage of EFI variables needs some SMM magic. There’s a quick patch that changes EDK2’s emulated variable store to use coreboot’s SMM store. EDK2 has a proper SMM store of its own, I’d like to look into making that coreboot-compatible or at least just writing a separate coreboot-compatible store module.
  • UPD 2019-10-24 for external displays, DisplayPort alt mode on USB-C can be used. Things to note:

    • DP++ (DP cables literally becoming HDMI cables) can’t work over USB Type C, which is why there are no HDMI-A-n connectors on the GPU, so a passive HDMI-mDP dongle plugged into a mDP-TypeC dongle won’t work;
    • the Chrome EC runs the alt mode negotiation, the OS doesn’t need any special support;
    • for DP dongles to work at all, the EC must run RW firmware and that doesn’t happen as-is with upstream coreboot. There is a jump command on the EC console. Also this patch should help?? (+ this)

An aside: why mess with firmware?

If you’re not the kind of person who’s made happy by just the fact that some more code during the boot process of their laptop is now open and verified, and you just want things to work, you might not be as excited about open source firmware development as I am.

But you can do cool things with firmware that give you practical benefit. The best example I’m seeing is better Hackintosh support. Instead of patching macOS to work on your machine, you could patch your machine to pretend to almost be a Mac:

Is this cool or what?

Conclusion

Pixelbook, FreeBSD, coreboot, EDK2 good.

Seriously, I have no big words to say, other than just recommending this laptop to FOSS enthusiasts :)

October 23, 2019

Aaron Bieber (qbit)

Websockets with OpenBSD's relayd October 23, 2019 03:00 PM

The need

I am in the process of replacing all my NGINX instances with httpd/relayd. So far this has been going pretty smoothly.

I did, however, run into an issue with websockets in Safari on iOS and macOS which made me think they weren’t working at all! Further testing proved they were working fine in other browsers, so .. more digging needs to be done!

The configs

I tested this in a VM running on OpenBSD. It’s ‘external’ IP is 10.10.10.15.

This config also works with TLS but for simplicity, this example will be plain text.

relayd.conf

ext_addr="10.10.10.15"

log connection errors

table <websocketd> { 127.0.0.1 }

http protocol ws {
	match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
	match request header append "X-Forwarded-By" \
		value "$SERVER_ADDR:$SERVER_PORT"
	match request header "Host" value "10.10.10.15" forward to <websocketd>

	http websockets
}

relay ws {
	listen on $ext_addr port 8000
	protocol ws
	forward to <websocketd> port 9999
}

Here we are setting up a “websocket” listener on port 8000 and forwarding it to port 9999 on 127.0.0.1 where we will be running websocketd.

The key directive is http websockets in the http block. Without this the proper headers won’t be set and the connection will not work.

httpd.conf

# $OpenBSD: httpd.conf,v 1.20 2018/06/13 15:08:24 reyk Exp $

server "10.10.10.15" {
	listen on * port 80
	location "/*" {
		directory auto index
	}
}

Pretty simple. We are just going to serve the html file below.

/var/www/htdocs/index.html

This html blurb simply creates a websocket and pumps the data it receives into a div that we can see.

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>ws test</title>
</head>
<body>
  <div id="output"></div>
</body>
<script>
let ws = new WebSocket("ws://10.10.10.15:8000/weechat");
let d = document.getElementById('output');
ws.onopen = function() {
  ws.send("hi");
};

ws.onmessage = function (e) { 
  d.innerText = d.innerText + " " + e.data;
};

ws.onclose = function() { 
  d.innerText += (' done.'); 
};
</script>
</html>

websocketd

Now we use websocketd to serve up some sweet sweet websocket action!

#!/bin/sh

echo 'hi'

for i in $(jot 5); do
	echo $i;
	sleep 1;
done

Use websocketd to run the above script:

websocketd --port 9999 --address 127.0.0.1 ./above_script.sh

Now point your browser at http://10.10.10.15/! You will see “hi” and every second for five seconds you will see a count appended to <div id="output"></div>!

The issues

The error I saw on Safari on iOS and macOS is:

'Connection' header value is not 'Upgrade'

Which is strange, bucaese I can see that it is infact set to ‘Upgrade’ in a tcpdump.

October 22, 2019

Simon Zelazny (pzel)

I'm co-authoring a new blog October 22, 2019 10:00 PM

An announcement

I've started a new blog with my friend Rafal Studnicki. We'll be covering a wide variety of software engineering topics, focusing chiefly on Elixir & Erlang, functional programming, testing, and software correctness.

The idea is to try to put in writing our shared and individual experience from the world of software development. This blog will continue to serve as my personal scratchpad.

October 21, 2019

Unrelenting Technology (myfreeweb)

It’s nice that Microsoft is pushing for all pen tablet (stylus) support in laptops... October 21, 2019 08:12 PM

It’s nice that Microsoft is pushing for all pen tablet (stylus) support in laptops to use the obvious generic set of HID reports. Quite probably, Microsoft is to thank for the Wacom touchscreen in my Pixelbook implementing that. I’ve seen the heaps of code in the Linux kernel to support Wacom’s custom protocols, that would’ve been very NOT fun to implement :)

Took like an hour max to get to working reports in console (dmesg), all that’s left is to evdev-ify it. Coming to iichid pull requests soon (but for now there’s no multiple device support in hidbus, so won’t be mergeable yet).

asrpo (asrp)

Roll your own GUI automation library October 21, 2019 10:37 AM

Sikuli is a tool for automating repetitive tasks.

Screenshot of Sikuli

To automate a program, it makes use of screenshots and image recognition to decide where to click and type [automation_methods]. As you can see above, Sikuli uses Python for its script's language (plus rendered image). But under the hood, its implemented in Java and runs Jython. Since nothing Sikuli uses really needs Java, we'll try to implement some GUI automation in pure Python.

October 20, 2019

Derek Jones (derek-jones)

Three books discuss three small data sets October 20, 2019 07:47 PM

During the early years of a new field, experimental data relating to important topics can be very thin on the ground. Ever since the first computer was built, there has been a lot of data on the characteristics of the hardware. Data on the characteristics of software, and the people who write it has been (and often continues to be) very thin on the ground.

Books are sometimes written by the researchers who produce the first data associated with an important topic, even if the data set is tiny; being first often generates enough interest for a book length treatment to be considered worthwhile.

As a field progresses lots more data becomes available, and the discussion in subsequent books can be based on findings from more experiments and lots more data

Software engineering is a field where a few ‘first’ data books have been published, followed by silence, or rather lots of arm waving and little new data. The fall of Rome has been followed by a 40-year dark-age, from which we are slowly emerging.

Three of these ‘first’ data books are:

  • “Man-Computer Problem Solving” by Harold Sackman, published in 1970, relating to experimental data from 1966. The experiments investigated the impact of two different approaches to developing software, on programmer performance (i.e., batch processing vs. on-line development; code+data). The first paper on this work appeared in an obscure journal in 1967, and was followed in the same issue by a critique pointing out the wide margin of uncertainty in the measurements (the critique agreed that running such experiments was a laudable goal).

    Failing to deal with experimental uncertainty is nothing compared to what happened next. A 1968 paper in a widely read journal, the Communications of the ACM, contained the following table (extracted from a higher quality scan of a 1966 report by the same authors, and available online).

    Developer performance ratios.

    The tale of 1:28 ratio of programmer performance, found in an experiment by Grant/Sackman, took off (the technical detail that a lot of the difference was down to the techniques subjects’ used, and not the people themselves, got lost). The Grant/Sackman ‘finding’ used to be frequently quoted in some circles (or at least it did when I moved in them, I don’t know often it is cited today). In 1999, Lutz Prechelt wrote an expose on the sorry tale.

    Sackman’s book is very readable, and contains lots of details and data not present in the papers, including survey data and a discussion of the intrinsic uncertainties associated with the experiment; it also contains the table above.

  • “Software Engineering Economics” by Barry W. Boehm, published in 1981. I wrote about the poor analysis of the data contained in this book a few years ago.

    The rest of this book contains plenty of interesting material, and even sounds modern (because books moving the topic forward have not been written).

  • “Program Evolution: Process of Software Change” edited by M. M. Lehman and L. A. Belady, published in 1985, relating to experimental data from 1977 and before. Lehman and Belady managed to obtain data relating to 19 releases of an IBM software product (yes, 19, not nineteen-thousand); the data was primarily the date and number of modules contained in each release, plus less specific information about number of statements. This data was sliced and diced every which way, and the book contains many papers with the same data appearing in the same plot with different captions (had the book not been a collection of papers it would have been considerably shorter).

    With a lot less data than Isaac Newton had available to formulate his three laws, Lehman and Belady came up with five, six, seven… “laws of software evolution” (which themselves evolved with the publication of successive papers).

    The availability of Open source repositories means there is now a lot more software system evolution data available. Lehman’s laws have not stood the test of more data, although people still cite them every now and again.

Ponylang (SeanTAllen)

Last Week in Pony - October 20, 2019 October 20, 2019 03:04 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

October 19, 2019

Pierre Chapuis (catwell)

Opinions October 19, 2019 03:30 PM

You know how some discussions can make you pause and introspect for a while after they happen? Well, I have had such a discussion recently about how strong my opinions are nowadays, and I decided to write something about it.

Forming opinions

People who have spent significant time around me know all too well how I behave when I get interested in a topic. I can spend weeks (and sometimes even much longer) reading all I can find about it until I know its ins and outs. This results in me being able to discuss and have opinions about things which are apparently completely random, but which I ended up picking up along the way for some reason.

For the longest time, I have believed in strong opinions (somewhat) weakly held, and for the most part I still do. In particular, I agree with the "somewhat" part, which says that the strength with which you should hold an opinion depends on how you formed it.

What has changed in the last 5-7 years is my view on how explicit those opinions should be, and how much I should fight for them.

Expressing opinions

I was strongly influenced by the Open Source community of the early 2000s where, if you wanted something to happen, you had to state your opinion and be ready to defend it - sometimes ferociously - with arguments. If you didn't, you would be ignored.

What outsiders often overlook about this method is that it worked, and it was pretty efficient. Most people got used to it, and in general it did not turn into endless debates, because in the end we had benevolent dictators to settle the matter.

But a problem I could not see for a long time is that this excludes a lot of people from the discussion. People who, because of their education or the position society put them in, won't express their own opinion if it contradicts the current consensus or that of someone they either respect or fear; or people who won't engage in anything remotely resembling conflict because that's not in their nature.

It is especially easy to ignore the existence of these people when the only way you are communicating is through asynchronous text over the Internet. However they often have very interesting things to say, with different points of view that can make headway or prevent big mistakes.

There are lots of ways to work around that issue, including timeboxing contributions on a topic while keeping them secret, or simply having people in positions of power and people more comfortable with their opinion express it last.

Defending opinions

Another matter is how much you should fight for your opinion. I am not talking about the famous XKCD comic, I know I can be this guy sometimes, but that is something else.

Some time ago I was struggling with how to handle disagreement at work, and I became an adept of the way Amazon does things, its Leadership Principles, and in particular Disagree and Commit:

Leaders are obligated to respectfully challenge decisions when they disagree, even when doing so is uncomfortable or exhausting. Leaders have conviction and are tenacious. They do not compromise for the sake of social cohesion. Once a decision is determined, they commit wholly.

When organizations follow this principle, it helps a lot with the issue, because it makes it clear that you can - and should - express dissenting opinion, and how you can align on things you do not agree with without making it look like you changed your mind when you did not.

(On a side note, I even think in some cases making dissent mandatory by instituting a Tenth Man / Devil's Advocate rule can be beneficial.)

However, it turns out this principle is more complicated than it sounds, and there are two important points I did not immediately understand, which are much more explicit in Bezos' 2016 letter to shareholders.

The first is that the person who "disagrees but commits" is not necessarily the subordinate in a power relationship. Bezos says:

This isn’t one way. If you’re the boss, you should do this too. I disagree and commit all the time. We recently greenlit a particular Amazon Studios original. I told the team my view: debatable whether it would be interesting enough, complicated to produce, the business terms aren’t that good, and we have lots of other opportunities. They had a completely different opinion and wanted to go ahead. I wrote back right away with “I disagree and commit and hope it becomes the most watched thing we’ve ever made.” Consider how much slower this decision cycle would have been if the team had actually had to convince me rather than simply get my commitment.

The second point I missed is the meaning of the sentence "they do not compromise for the sake of social cohesion." The "compromise" part is clear enough; here compromising would have meant giving the green light but with, say, fewer resources. From experience compromises like this often end up poorly. But the hard part is "for the sake of social cohesion". Here is what Bezos has to say:

Note what this example is not: it’s not me thinking to myself "well, these guys are wrong and missing the point, but this isn’t worth me chasing." It’s a genuine disagreement of opinion, a candid expression of my view, a chance for the team to weigh my view, and a quick, sincere commitment to go their way.

Unlike Bezos my current (weakly held) opinion is that sometimes keeping social peace is worth not engaging in some minor issues. Some people will always appreciate being told when you think they're mistaken, but others won't even if they end up admitting they were wrong in the end, so it only makes sense to contradict them if it is worth risking being resented for it.

Tolerating opinions

The first step for all this to work is probably to admit that opinions can exist without being "right" or "wrong".

That may be obvious to you but for a long time it was not for me. When I was a kid I looked at things in binary: true or false, right or wrong, better or worse. This made me enjoy CS and math, but paradoxically learning more advanced math (partial orders, Simpson's paradox, Gödel's incompleteness theorems...) showed me that even in the hardest of sciences things were more nuanced.

Anyway, there are reasons why people come to hold a set of beliefs which makes sense at least locally. Learning about this background is important, whether it helps you convince them or changes your own mind.

Don't worry though, I still hold a few strong opinions, both in my field (some quite strongly - you won't easily make me change those for instance) and outside of it!

October 18, 2019

Carlos Fenollosa (carlesfe)

October 17, 2019

Andreas Zwinkau (qznc)

TLA+ is easier than I thought October 17, 2019 12:00 AM

Did a small exercise with TLA+, an easy model checker.

Read full article!

October 15, 2019

Gustaf Erikson (gerikson)

The Big Short: Inside the Doomsday Machine by Michael Lewis October 15, 2019 12:13 PM

I caught the movie adaptation of this book on a flight, and wanted to get some more background. Honestly, the movie does a good job of summarizing the contents, but also adds some humanizing touches such as the visit to the ground zero of the housing market: Florida. Both are recommended.

October 14, 2019

Pete Corey (petecorey)

Rendering a React Application Across Multiple Containers October 14, 2019 12:00 AM

A few of my recent articles have been embedding limited builds of Glorious Voice Leader directly into the page. At first, this presented an interesting challenge. How could I render a single React application across multiple container nodes, while maintaining shared state between all of them?

While the solution I came up with probably isn’t best practice, it works!

As a quick example, imagine you have a simple React component that manages a single piece of state. The user can change that state by pressing one of two buttons:


const App = () => {
  let [value, setValue] = useState("foo");
  return (
    <div>
      <button onClick={() => setValue("foo")}>
        Value is "{value}". Click to change to "foo"!
      </button>
      <button onClick={() => setValue("bar")}>
        Value is "{value}". Click to change to "bar"!
      </button>
    </div>
  );
};

Normally, we’d render our App component into a container in the DOM using ReactDOM.render:


ReactDOM.render(<App />, document.getElementById('root'));

But what if we want to render our buttons in two different div elements, spread across the page? Obviously, we could build out two different components, one for each button, and render these components in two different DOM containers:


const Foo = () => {
  let [value, setValue] = useState("foo");
  return (
    <button onClick={() => setValue("foo")}>
      Value is "{value}". Click to change to "foo"!
    </button>
  );
};

const Bar = () => {
  let [value, setValue] = useState("foo");
  return (
    <button onClick={() => setValue("bar")}>
      Value is "{value}". Click to change to "bar"!
    </button>
  );
};

ReactDOM.render(<Foo />, document.getElementById('foo'));
ReactDOM.render(<Bar />, document.getElementById('bar'));

But this solution has a problem. Our Foo and Bar components maintain their own versions of value, so a change in one component won’t affect the other.

Amazingly, it turns out that we can create an App component which maintains our shared state, render that component into our #root container, and within App we can make additional calls to ReactDOM.render to render our Foo and Bar components. When we call ReactDOM.render we can pass down our state value and setters for later use in Foo and Bar:


const App = () => {
  let [value, setValue] = useState("foo");
  return (
    <>
      {ReactDOM.render(
        <Foo value={value} setValue={setValue} />,
        document.getElementById("foo")
      )}
      {ReactDOM.render(
        <Bar value={value} setValue={setValue} />,
        document.getElementById("bar")
      )}
    </>
  );
};

Our Foo and Bar components can now use the value and setValue props provided to them instead of maintaining their own isolated state:


const Foo = ({ value, setValue }) => {
  return (
    <button onClick={() => setValue("foo")}>
      Value is "{value}". Click to change to "foo"!
    </button>
  );
};

const Bar = ({ value, setValue }) => {
  return (
    <button onClick={() => setValue("bar")}>
      Value is "{value}". Click to change to "bar"!
    </button>
  );
};

And everything works! Our App is “rendered” to our #root DOM element, though nothing actually appears there, and our Foo and Bar components are rendered into #foo and #bar respectively.

Honestly, I’m amazed this works at all. I can’t imagine this is an intended use case of React, but the fact that it’s still a possibility made my life much easier.

Happy hacking.

October 13, 2019

Derek Jones (derek-jones)

Comparing expression usage in mathematics and C source October 13, 2019 10:11 PM

Why does a particular expression appear in source code?

One reason is that the expression is the coded form of a formula from the application domain, e.g., E=mc^2.

Another reason is that the expression calculates an algorithm/housekeeping related address, or offset, to where a value of interest is held.

Most people (including me, many years ago) think that the majority of source code expressions relate to the application domain, in one-way or another.

Work on a compiler related optimizer, and you will soon learn the truth; most expressions are simple and calculate addresses/offsets. Optimizing compilers would not have much to do, if they only relied on expressions from the application domain (my numbers tool throws something up every now and again).

What are the characteristics of application domain expression?

I like to think of them as being complicated, but that’s because it used to be in my interest for them to be complicated (I used to work on optimizers, which have the potential to make big savings if things are complicated).

Measurements of expressions in scientific papers is needed, but who is going to be interested in measuring the characteristics of mathematical expressions appearing in papers? I’m interested, but not enough to do the work. Then, a few weeks ago I discovered: An Analysis of Mathematical Expressions Used in Practice, by Clare So; an analysis of 20,000 mathematical papers submitted to arXiv between 2000 and 2004.

The following discussion uses the measurements made for my C book, as the representative source code (I keep suggesting that detailed measurements of other languages is needed, but nobody has jumped in and made them, yet).

The table below shows percentage occurrence of operators in expressions. Minus is much more common than plus in mathematical expressions, the opposite of C source; the ‘popularity’ of the relational operators is also reversed.

Operator  Mathematics   C source
=         0.39          3.08
-         0.35          0.19 
+         0.24          0.38
=        0.06          0.04
>         0.041         0.11
         0.037         0.22

The most common single binary operator expression in mathematics is n-1 (the data counts expressions using different variable names as different expressions; yes, n is the most popular variable name, and adding up other uses does not change relative frequency by much). In C source var+int_constant is around twice as common as var-int_constant

The plot below shows the percentage of expressions containing a given number of operators (I've made a big assumption about exactly what Clare So is counting; code+data). The operator count starts at two because that is where the count starts for the mathematics data. In C source, around 99% of expressions have less than two operators, so the simple case completely dominates.

Percentage of expressions containing a given number of operators.

For expressions containing between two and five operators, frequency of occurrence is sort of about the same in mathematics and C, with C frequency decreasing more rapidly. The data disagrees with me again...

Ponylang (SeanTAllen)

Last Week in Pony - October 13, 2019 October 13, 2019 03:19 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Carlos Fenollosa (carlesfe)

October 12, 2019

Carlos Fenollosa (carlesfe)

US Software companies comply with international law, to their great regret October 12, 2019 04:58 PM

This week has been very heavy on China-related software scandals:

On Apple's side, as usual, ther has been more media coverage:

US companies and entities are forced to apply international law, sometimes breaking universal human rights.

This is a difficult topic. On one hand, States are sovereign. On the other, we should push for a better world. However, to which degree has a private company the right to ignore state rulings? They can, and suffer the consequences. That would be consistent. Are they ready to boycott a whole country, or risk a banishment from that country?

As an individual, the take home message is that if you delegate some of your tasks to a private company, or you rely on a private company in some degree, you risk being unable to access your data or virtual possessions at any time. Be it due to international law, or to some stupid enforcement or terms-of-service bullshit.

Please follow the HN discussions on the "via" links above, they are very informative.

Tags: internet, law

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

October 10, 2019

Chris Allen (bitemyapp)

Why I use and prefer GitLab CI October 10, 2019 12:00 AM

In the past, I talked about how to make your CI builds faster using Drone CI. I don't use DroneCI any more and haven't for a couple years now so I wanted to talk about what I use now.

October 09, 2019

eta (eta)

Distributed state and network topologies in chat systems October 09, 2019 11:00 PM

[This is post 3 about designing a new chat system. Have a look at the first post in the series for more context!]

The funny thing about trying to design communications protocols seems to be how much of the protocol’s code ends up just dealing with networking – or rather, that certainly seems to be one of the main things a design is focused around. We’ve talked previously about federation, and the benefits and downsides associated with having a completely open approach, as well as the issues of distributed state and spam that inevitably come up. In this blog post, I want to propose and explore the networking and federation parts of an idealistic new chat protocol, and attempt to come up with my own solution1! (If you’re just interested in a summary, skip past the following ~2,000 words to the “Conclusions” header.)

The basic model: IRC server linking

I’m going to assume, for the purposes of this blog post, that “full mesh” networking is a bad design choice (as discussed in the previous post). This is partially because there are already protocols like ActivityPub, XMPP, and Matrix that use full mesh networking, so it’s worth exploring something different for a change; I also believe that full mesh is a rather inefficient way to run a network, but you’re free to disagree with me on this2.

So, let’s start with the very simplistic network model used by IRC: a simple spanning-tree arrangement, where servers are linked to one another with neither loops nor redundancy. This looks a bit like this:

Picture of multiple IRC servers linked together in a spanning tree arrangement

This is stupidly simple for server implementors to implement. Here’s how it works: if you’re connected to servers A, B, and C, and you receive a message from server A, you just process the message, and send it on to servers B and C (i.e. the rest of the network). You don’t even need to worry about deduplication, because there’s no way you could get sent the message again (if all the servers behave, that is3). As Rachel Kroll notes in “IRC netsplits, spanning trees, and distributed state” (a post from 2013 – go and read it, it’s pretty interesting!):

…the entire network is a series of links which themselves constitute a single point of failure for people on either side of it. When something goes wrong between two servers, this manifests as a “netsplit”.

Whole swaths of users (appear to) sign off en masse when this happens, and then will similarly seem to rejoin after the net relinks, whether via the same two servers or through some other path.

This was the situation back in the ’90s, and it’s still happening today.

She then goes on to talk about a number of ways in which this situation could be rather trivially avoided – for example, using the Spanning Tree Protocol, which uses an algorithm to detect and avoid possible loops between interconnected network switches, such that you can have a bunch of redundant links that only need to be used when one of the links fail, or by simply using some smarts to tag messages with an ID and the list of servers they’ve already been through to avoid duplicates.

This all makes things less than stupidly simple, but there’s a difference between necessary and unnecessary complexity. As Rachel notes:

There are solutions to problems which don’t exist, and then there are assortments of technologies which might be useful for problems which technically exist, but for which nobody may care about fixing. I suspect the whole IRC netsplit thing is a case of the latter.

1-to-1 vs group chats

So, delivering messages from one place to another is all fine and good, but the real fun starts when you try and set up a group chat, which involves multiple people, and – quite crucially – some idea of distributed state, which isn’t the easiest thing to achieve. This state is required for all sorts of common features to work – from things as simple as setting the name and topic of the chatroom to being able to give people administrator powers, kick and ban users, etc. Getting this wrong has real consequences; in the olden days, netsplits could lead to IRC takeovers, where users would gain administrator access on one side of the netsplit (perhaps due to the channel having no users left on that side, which leaves the first user to rejoin with admin powers) and then retain it after the network reformed, with disastrous consequences.

The Matrix protocol, mentioned before, is essentially a massive effort to solve this problem through modeling state changes that occur in a chatroom as a directed acyclic graph, and running the state resolution algorithm to resolve multiple conflicting pieces of state after network partitions (or at any time, really). This algorithm is pretty complex stuff (and it seems like they came up with it mostly on their own!); I’ve stated in the first blog post of this series that doing things this way is ‘questionable’ – but it was later pointed out that my view was somewhat out of date4, so I’m not really an authority on this.

The Matrix approach relies on the room state being independently calculated by each participating server, using the algorithm to determine which events to accept and reject – which makes servers not really depend on one another that much, at the cost of some additional complexity. It does, however, mean that rooms continue to operate – and can accept state changes – even when network partitions occur; the state resolution algorithm will eventually restore consistency. In other words, this satisfies the AP parts of the CAP theorem, with eventual consistency (things will eventually be consistent after servers reconnect, but are not, of course, consistent all the time).

XEP-0045 (XMPP’s group chat extension), on the other hand, is at the complete other end of the spectrum: group chats seem to be exclusively owned by one server, and are unusable if said server goes down. This obviously makes things a lot simpler, but means that a group chat – even if it were to have hundreds or thousands of participants – is entirely reliant on that one server to relay messages and handle state updates, which is a massive single point of failure. Other servers couldn’t possibly step in under this model, because otherwise the group chat would be open to attack; any random server could jump in and claim ownership of the room, resulting in chaos similar to that of the IRC takeovers mentioned earlier.

What are we defending against?

If we lived in a world where everyone was honest and well-meaning, the lives of chat protocol designers worldwide would be made much easier. In this world, we could just blindly accept whatever people said had happened to the room, from any server that sent us a message about it – if you were to list the things that could happen, it might look like:

  1. A malicious actor could introduce their own server to a group chat, and manipulate its state in unwanted ways – like making themselves administrator without the consent of the prior administrators, or ‘resetting’ it to an earlier time when they weren’t banned, or something like that.
  2. A malicious actor could introduce their own server to a group chat, and start sending large volumes of spam into said group chat.
  3. A malicious server owner could take control of certain important user accounts on their server, like accounts that have administrator powers in big popular rooms, and use those to take control of the rooms by sending legitimate messages coming from that user.

These are all things that systems like Matrix, XMPP, IRC and others try to prevent. IRC doesn’t let people introduce their own untrusted servers into an IRC network, which makes things a lot easier; the others do allow for this possibility, meaning they now have to decide what information is trustworthy and what is not, using a varied set of methods to determine this.

A ‘middle way’

Through our earlier comparison, it becomes apparent that there’s a sort of spectrum of how much trust the protocol requires you to have in other people and their servers. XMPP is at one end, with “trust one server absolutely, and no others”; Matrix is at the other, with “individually verify the room state on each participating server, with a fancy algorithm that helps keep things in check”. With things laid out like this, is it not worth asking the question of whether our new chat standard can try an approach somewhere in the middle of those two?

How about trusting some servers – more than one, to provide a degree of redundancy, but not going as far as allowing any server to change the room’s state? I posit that, in practice, users are having to do this anyway in protocols like Matrix, simply because pretty much any protocol that exists today is vulnerable to problem (3) from the earlier list5. So, for each group chat, we could define a set of “sponsoring servers” that are responsible for upholding law and order in the chatroom.

The sponsoring server model

Rather like in XMPP, all traffic for the group chat has to go through one of the sponsoring servers. If you want to send a message, ban someone, change the topic, or do anything, you have to send your message through a sponsoring server first. These servers rebroadcast the message you’ve sent to other connected servers, after checking that you’re allowed to make that change given their current view of the world.

Other servers can now authenticate messages very easily: if the message was sent by a sponsoring server, it’s valid, because we trust the sponsoring servers to validate everything for us. If it wasn’t, it isn’t. Spam, therefore, can be curtailed; since servers won’t accept messages that don’t come from sponsoring servers, sponsoring servers can do whatever they want to make sure that messages aren’t spam before passing them on, like asking new users to fill out a CAPTCHA or whatever.

What about distributed state?

The Paxos consensus protocol – actually something mentioned by Rachel Kroll in the cited blog post from earlier6 – is a tried and tested way to generate consensus among a network of servers, some of which may disconnect occasionally, go offline, or otherwise fail. It’s not guaranteed to always make progress, especially not when some of the servers are being evil (although a variant, called Byzantine Paxos, is supposed to fix this), but it’s reliable enough to be used in a number of prominent places, such as in various Google, IBM and Microsoft production services. Paxos is also not the only consensus protocol out there – the Raft consensus protocol is another, apparently known for being easier to implement7, which has similar properties. Here’s a cool Raft visualization!

As you can probably see, we can use Paxos or Raft to resolve state among our sponsoring servers, in a way that’s tolerant of these servers occasionally going down; as long as a majority remains (or whatever criteria the chosen consensus protocol specifies), the group chat will continue to operate in a safe manner. We trust all of our sponsoring servers, so we don’t have to worry about them trying to break the consensus protocol (in fact, we can even elect to use something like Byzantine Paxos which provides additional protections if we’re really paranoid).

What about network partitions?

Both Paxos and Raft are equipped to deal with the problem of network partitions; depending on the exact nature of the problem, some sponsoring servers may be unable to update the state of the group chat while the partition is in effect, but all servers should reach an eventually consistent view of the state. Messages sent while partitioned can simply be queued, and resent once the network link reestablishes itself.

How do we figure out which servers should be sponsoring servers?

The rules will probably vary depending on the circumstances – for example, a private group chat where all members trust one another can have all participating servers be sponsoring servers. In contrast, large, public group chats, with potentially lots of untrusted servers taking part, can choose a small number of servers which they trust – servers which contain chat administrators would be a good target, seeing as a great deal of trust is placed in those servers anyway.

Here’s an example policy: if users are invited by a pre-existing administrator of the room, their server becomes a sponsoring server. If users ask to join a room (by talking to one of the existing sponsoring servers), their server does not become a sponsoring server. That way, new sponsoring servers are added organically; if you’re inviting someone, you presumably trust them and their server anyway. (However, as I said, this may not be the only way of doing things.)

Networking considerations

We can also go and look back at the networking improvements mentioned earlier, and fit these in to our sponsoring-server model. For one-to-one private conversations, it makes a certain degree of sense to just connect the two communicating servers directly – while this may be somewhat more resource-intensive than having some linked network which the messages could travel through, doing it any other way would raise privacy concerns, as well as issues of trust (how can you be sure that the 3rd-party server you’re relaying messages through is actually delivering them and sending you the responses, for example?).

However, group chats are now free to use one of the protocols mentioned earlier (e.g. Ethernet Spanning Tree Protocol) for routing messages, but only amongst the sponsoring servers. So-called ‘leaf’ servers (ones that aren’t sponsoring) must connect to a sponsoring server and communicate via it. This model makes the network slightly less painful than full-mesh: leaf servers only ever need to talk to sponsoring servers (making group chat involvement less of a pain for them), while sponsoring servers deal with their own batch of leaf servers, and route messages intelligently to other sponsoring servers.

Conclusions

This blog post proposes a new model for handling distributed state in chat protocols, which involves choosing a set of trusted “sponsoring servers” for each group chat. These servers are responsible for maintaining consensus about the state of the chat (through consensus algorithms like Paxos or Raft), and act as the ‘gatekeepers’ for new messages and state changes, providing a bunch of helpful functionality against spam and abuse.

I believe this new model is a sort of compromise solution that makes the protocol less complex, but still distributed and somewhat fault-tolerant, at the expense of requiring users to trust some servers/people not to be malicious. Of course, at this point, this is mostly speculation8; I’ll have to write some code and see how it actually works in the wild, though!


  1. Here’s where things get interesting; instead of blithely criticizing other people’s hard work, I’m actually having to do some myself… 

  2. A counterargument might be, as before, “we already have the Internet Protocol to do networking things, so why should we bother implementing more stuff on top of that?” 

  3. An open federation obviously needs to account for the fact that all servers might not behave. 

  4. The protocol linked in this paragraph is their new state resolution algorithm, which prevents all of the security issues that plagued the first iteration and that I ranted about originally. 

  5. Maybe something like Keybase Chat isn’t, but I’m pretty sure all the mainstream ones don’t have this property. And for good reason; doing your own public/private key management as a user is painful, or rather something that most users don’t seem to bother with at present. 

  6. This is how I first heard about it! 

  7. Citation: some random person on HN 

  8. If anyone who actually reads this blog and reckons they know stuff feels like chiming in in the comments, that would be much appreciated! 

October 07, 2019

Benaiah Mischenko (benaiah)

Configuring Go Apps with TOML October 07, 2019 08:10 PM

Configuring Go Apps with TOML

So you’ve been writing an application in Go, and you’re getting to the point where you have a lot of different options in your program. You’ll likely want a configuration file, as specifying every option on the command-line can get difficult and clunky, and launching applications from a desktop environment makes specifying options at launch even more difficult.

This post will cover configuring Go apps using a simple, INI-like configuration language called TOML, as well as some related difficulties and pitfalls.

TOML has quite a few implementations, including several libraries for Go. I particularly like BurntSushi’s TOML parser and decoder, as it lets you marshal a TOML file directly into a struct. This means your configuration can be fully typed and you can easily do custom conversions (such as parsing a time.Duration) as you read the config, so you don’t have to do them in the rest of your application.

Configuration location

The first question you should ask when adding config files to any app is "where should they go?". For tools that aren’t designed to be run as a service, as root, or under a custom user (in other words, most of them), you should be putting them in the user’s home directory, so they’re easily changed. A few notes:

  • Even if you currently have only one file, you should use a folder and put the config file within it. That way, if and when you do need other files there, you won’t have to clutter the user’s home directory or deal with loading config files that could be in two different locations (Emacs, for instance, supports both ~/.emacs.d/init.el and ~/.emacs for historical reasons, which ends up causing confusing problems when both exist).

  • You should name your configuration directory after your program.

  • You should typically prefix your config directory with a . (but see the final note for Linux, as configuration directories within XDG_CONFIG_HOME should not be so prefixed).

  • On most OSs, putting your configuration files in the user’s “home” directory is typical. I recommend the library go-homedir, rather than the User.Homedir available in the stdlib from os/user. This is because use of os/user uses cgo, which, while useful in many situations, also causes a number of difficulties that can otherwise be avoided - most notably, cross-compilation is no longer simple, and the ease of deploying a static Go binary gets a number of caveats.

  • On Linux specifically, I strongly encourage that you do not put your configuration directory directly in the user’s home directory. Most commonly-used modern Linux distributions use the XDG Base Directory Specification from freedesktop.org, which specifies standard locations for various directories on an end-user Linux system. (Despite this, many applications don’t respect the standard and put their configurations directly in ~ anyway). By default, this is ~/.config/, but it can also be set with the XDG_CONFIG_HOME environment variable. Directories within this should not use a leading ., as the directory is already hidden by default.

The following function should get you the correct location for your config directory on all platforms (if there’s a platform with a specific convention for config locations which I’ve missed, I’d appreciate you letting me know so I can update the post - my email is at the bottom of the page).

import (
    "path/filepath"
    "os"
    "runtime"

    "github.com/mitchellh/go-homedir"
)

var configDirName = "example"

func GetDefaultConfigDir() (string, error) {
    var configDirLocation string

    homeDir, err := homedir.Dir()
    if err != nil {
        return "", err
    }

    switch runtime.GOOS {
    case "linux":
        // Use the XDG_CONFIG_HOME variable if it is set, otherwise
        // $HOME/.config/example
        xdgConfigHome := os.Getenv("XDG_CONFIG_HOME")
        if xdgConfigHome != "" {
            configDirLocation = xdgConfigHome
        } else {
            configDirLocation = filepath.Join(homeDir, ".config", configDirName)
        }

    default:
        // On other platforms we just use $HOME/.example
        hiddenConfigDirName := "." + configDirName
        configDirLocation = filepath.Join(homeDir, hiddenConfigDirName)
    }

    return configDirLocation, nil
}

Within the config folder, you can use any filename you want for your config - I suggest config.toml.

Loading the config file

To load a config file, you’ll first want to define the what config values you’ll use. burntsushi/toml will ignore options in the TOML file that you don’t use, so you don’t have to worry about that causing errors. For instance, here’s the proposed configuration for a project I’m maintaining, wuzz (the keybindings aren’t currently implemented, but I’ve left them in for the sake of demonstration):

type Config struct {
    General GeneralOptions
    Keys    map[string]map[string]string
}

type GeneralOptions struct {
    FormatJSON             bool
    Insecure               bool
    PreserveScrollPosition bool
    DefaultURLScheme       string
}

It’s pretty simple. Note that we use a named struct for GeneralOptions, rather than making Config.General an anonymous struct. This makes nesting options simpler and aids tooling.

Loading the config is quite easy:

import (
    "errors"
    "os"
    
    "github.com/BurntSush/toml"
)

func LoadConfig(configFile string) (*Config, error) {
    if _, err := os.Stat(configFile); os.IsNotExist(err) {
        return nil, errors.New("Config file does not exist.")
    } else if err != nil {
        return nil, err
    }
    
    var conf Config
    if _, err := toml.DecodeFile(configFile, &conf); err != nil {
        return nil, err
    }

    return &conf, nil
}

toml.DecodeFile will automatically populate conf with the values set in the TOML file. (Note that we pass &conf to toml.DecodeFile, not conf - we need to populate the struct we actually have, not a copy). Given the above Config type and the following TOML file…

[general]
defaultURLScheme = "https"
formatJSON = true
preserveScrollPosition = true
insecure = false

[keys]

  [keys.general]
  "C-j" = "next-view"
  "C-k" = "previous-view"
  
  [keys.response-view]
  "<down>" = "scroll-down"

…we’ll get a Config like the following:

Config{
    General: GeneralOptions{
        DefaultURLScheme:       "https",
        FormatJSON:             true,
        PreserveScrollPosition: true,
        Insecure:               false,
    },
    Keys: map[string]map[string]string{
        "general": map[string]string{
            "C-j": "next-view",
            "C-k": "previous-view",
        },
        "response-view": map[string]string{
            "<down>": "scroll-down",
        },
    },
}

Automatically decoding values

wuzz actually uses another value in its config - a default HTTP timeout. In this case, though, there’s no native TOML value that cleanly maps to the type we want - a time.Duration. Fortunately, the TOML library we’re using supports automatically decoding TOML values into custom Go values. To do so, we’ll need a type that wraps time.Duration:

type Duration struct {
    time.Duration
}

Next we’ll need to add an UnmarshalText method, so we satisfy the toml.TextUnmarshaler interface. This will let toml know that we expect a string value which will be passed into our UnmarshalText method.

func (d *Duration) UnmarshalText(text []byte) error {
    var err error
    d.Duration, err = time.ParseDuration(string(text))
    return err
}

Finally, we’ll need to add it to our Config type. This will go in Config.General, so we’ll add it to GeneralOptions:

type GeneralOptions struct {
    Timeout                Duration
    // ...
}

Now we can add it to our TOML file, and toml.DecodeFile will automatically populate our struct with a Duration value!

Input:

[general]
timeout = "1m"
# ...

Equivalent output:

Config{
    General: GeneralOptions{
        Timeout: Duration{
            Duration: 1 * time.Minute
        },
        // ...
    }
}

Default config values

We now have configuration loading, and we’re even decoding a text field to a custom Go type - we’re nearly finished! Next we’ll want to specify defaults for the configuration. We want values specified in the config to override our defaults. Fortunately, toml makes really easy to do.

Remember how we passed in &conf to toml.DecodeFile? That was an empty Config struct - but we can also pass one with its values pre-populated. toml.DecodeFile will set any values that exist in the TOML file, and ignore the rest. First we’ll create the default values:

import (
    "time"
)
var DefaultConfig = Config{
    General: GeneralOptions{
        DefaultURLScheme:       "https",
        FormatJSON:             true,
        Insecure:               false,
        PreserveScrollPosition: true,
        Timeout: Duration{
            Duration: 1 * time.Minute,
        },
    },
    // You can omit stuff from the default config if you'd like - in
    // this case we don't specify Config.Keys
}

Next, we simply modify the LoadConfig function to use DefaultConfig:

func LoadConfig(configFile string) (*Config, error) {
    if _, err := os.Stat(configFile); os.IsNotExist(err) {
        return nil, errors.New("Config file does not exist.")
    } else if err != nil {
        return nil, err
    }

    conf := DefaultConfig
    if _, err := toml.DecodeFile(configFile, &conf); err != nil {
        return nil, err
    }

    return &conf, nil
}

The important line here is conf := DefaultConfig - now when conf is passed to toml.DecodeFile it will populate that.

Summary

I hope this post helped you! you should now be able to configure Go apps using TOML with ease.

If this post was helpful to you, or you have comments or corrections, please let me know! My email address is at the bottom of the page. I’m also looking for work at the moment, so feel free to get in touch if you’re looking for developers.

Complete code

package config

import (
    "errors"
    "path/filepath"
    "os"
    "runtime"
    "time"

    "github.com/BurntSushi/toml"
    "github.com/mitchellh/go-homedir"
)

var configDirName = "example"

func GetDefaultConfigDir() (string, error) {
    var configDirLocation string

    homeDir, err := homedir.Dir()
    if err != nil {
        return "", err
    }

    switch runtime.GOOS {
    case "linux":
        // Use the XDG_CONFIG_HOME variable if it is set, otherwise
        // $HOME/.config/example
        xdgConfigHome := os.Getenv("XDG_CONFIG_HOME")
        if xdgConfigHome != "" {
            configDirLocation = xdgConfigHome
        } else {
            configDirLocation = filepath.Join(homeDir, ".config", configDirName)
        }

    default:
        // On other platforms we just use $HOME/.example
        hiddenConfigDirName := "." + configDirName
        configDirLocation = filepath.Join(homeDir, hiddenConfigDirName)
    }

    return configDirLocation, nil
}

type Config struct {
    General GeneralOptions
    Keys    map[string]map[string]string
}

type GeneralOptions struct {
    DefaultURLScheme       string
    FormatJSON             bool
    Insecure               bool
    PreserveScrollPosition bool
    Timeout                Duration
}

type Duration struct {
    time.Duration
}

func (d *Duration) UnmarshalText(text []byte) error {
    var err error
    d.Duration, err = time.ParseDuration(string(text))
    return err
}

var DefaultConfig = Config{
    General: GeneralOptions{
        DefaultURLScheme:       "https",
        FormatJSON:             true,
        Insecure:               false,
        PreserveScrollPosition: true,
        Timeout: Duration{
            Duration: 1 * time.Minute,
        },
    },
}

func LoadConfig(configFile string) (*Config, error) {
    if _, err := os.Stat(configFile); os.IsNotExist(err) {
        return nil, errors.New("Config file does not exist.")
    } else if err != nil {
        return nil, err
    }

    conf := DefaultConfig
    if _, err := toml.DecodeFile(configFile, &conf); err != nil {
        return nil, err
    }

    return &conf, nil
}

If you’d like to leave a comment, please email benaiah@mischenko.com

Andrew Owen (yumaikas)

What 8 years of side projects has taught me October 07, 2019 08:10 PM

I’ve been a professional software developer for almost 8 years now. I’ve been paid to write a lot of software in those years. Far more interesting to me has been the recurring themes that have come up in my side-projects and in the software I’ve been personally compelled to write.

Lesson 0: Programming in a void is worthless

When I wanted to learn programming, I always had to come to the keyboard with a purpose. I couldn’t just sit down and start writing code, I had to have built an idea of where I was going.

For that reason, I’ve always used side-projects as a mean of learning programming languages. When I wanted to learn QBasic, there were a number of games, one based in space, another was a fantasy game. When I wanted to learn Envelop Basic, I attempted making a Yahtzee Clone, a Space Invaders Clone, a Hotel running game, and made a MicroGame based on the ones I’d seen videos of in WarioWare DIY.

When I wanted to learn C#, I went through a book, but I also, at the same time, worked on building a Scientific Calculator. (put a pin in that idea). When I wanted to learn Go, I wrote the CMS for the blog you’re reading right now. To pick up Lua, I used Love2D in several game jams. The only reason I have more than a passing familiarity with Erlang is because I used it for idea.junglecoder.com

Most times, when I’ve tried to learn a programming technology without a concrete goal to get something built, it is hard for me to maintain interest. That hasn’t kept me from trying out a lot of things in the past, but it’s the ones that allowed me to build useful or interesting things that have stuck with me the most. Right now, for side-projects, that list includes Go, Lua, Tcl, and Bash.

Lesson 1: Building a programming language is hard, but rewarding

Ever since I first started cutting my teeth on C#, the ideas of parsing have held a certain fascination to me. Like I said before, I started wanting to write a scientific calculator. But, because I was a new programmer, with no idea of what building a scientific calculator should look like, I did a lot of inventing things from first principles. It felt like a divine revelation when I reasoned out a add/multiply algorithm for parsing numbers. It also took me the better part of two weeks to puzzle it out.

I was so proud of that, in fact, that I copied that code into one of my work projects, a fact which really amuses me now that I know about the existance of Regex and Int.Parse().

Eventually, I worked out a very basic notion of doing recursive descent parsing, and some tree evaluation, so that, on a good day, I had a basic, but working, math expression evaluator.

Working on that calculator, however, set me on a course of wanting to understand how programming languages worked. In the process of wanting to understand them, I’ve looked over papers on compilers more than once, but never quite had the patience to actually write one out. In the process of wanting to make a programming language, I ended up writing two before PISC. One that was a “I want to write something in a night that is Turing Complete” langauge that was basically a bastard version of assembly. At the time, I called it SpearVM. I had intended it to be a compilation target for some higher level language, but it mostly just served as stepping stone for the next two projects.

The second one was a semester-long moonshot project where I wanted to try to make a visual programming language, using either Java Swing or JavaFX, inspired by Google’s Blockly environment. Unfortunately, I could not figure out nesting, so I giving the ideas I’d had in SpearVM a visual representation, and using that for my class assignment.

The combination of all of these experiences, and discovering the Factor language, set me thinking about trying to build a programming language that was stack-based, especially since parsing it seemed a far easier task than what I’d been trying to do until then. A couple late nights later, and I’d built out a prototype in Go.

I’ve had a number of co-workers impressed that I’ve written a scripting language. Thing is, it took me like 7 false starts to find a way to do it that made sense to me (and that was a stack-based language with almost 0 lexing). It’s only now, that I’m on the other side of that learning experience, that I’d feel comfortable approach writing a language with C-like syntax. PISC, as I’ve had time to work on it, has actually started to develop more things like parsing, lexing, and even compiling. In fact, I’ve got a small prototype of a langauge called Tinscript that isn’t nearly so post-fix oriented as PISC, though it’s still stack based.

And, PISC, to boot, is still what I’d consider easy-mode when it comes to developing a programming language. Factor, Poprc, or even a run-of-the mill C-like language all strike me as projects that take more tenacity to pull off.

Lesson 2: Organizing my thoughts is important, but tricky to figure out

If the early years of my programming side-projects often focused on how to build programming languages, and how to better understand computers, the more recent years have a had a much stronger focus on how to harness computers to augment my mind. A big focus here was for me to find ways to reduce the working set I needed in my mind at any given time. This has resulted in no less than 7 different systems for trying to help keep track of things.

  • A TCL application for launching programs I used on a regular basis.
  • A C# application for the same, but with a few more bells and whistles
  • Trying out Trello
  • Trying out various online outliners, ultimately being satisfied with none of them.
  • A months-long foray into trying to learn and apply Org-mode and Magit in Emacs, and ultimately giving up due to slowness on Windows, and the fact that my org-files kept getting too messy.
  • ideas.junglecoder.com, a place meant for me to shunt my stray thoughts to get them off my mind during the work day.
  • Another TCL application called PasteKit, which was designed to help me juggle all of the 4-6 digit numbers I was juggling at one of my jobs.
  • A C# version of PasteKit, that also had a customizeable list of launchers
  • .jumplist.sh
  • Bashmarks, but for CMD.exe

These are all approaches I’ve invested non-trivial amounts of time into over the past three years, trying to figure out a way to organize my thoughts as a software developer, but none of them lasted much longer than a month or so.

All of this came to a head during Thanksgiving weekend of 2018. My work at Greenshades often involved diving deep into tickets and opening a lot of SQL scripts in SSMS, and I had found no good way to organize them all. So, in a move that felt rather desperate at the time, I wrote a C# program that was a simple journal, but one that had a persistent search bar, and stored all of its entries in a SQLite database. And I used a simple tagging scheme for the entries, of marking them with things like @ticket65334, and displaying the most recent 5 notes.

It was finally a system that seemed to actually work for how I liked to think about things. The UI was a fairly simple 3-column layout. In the leftmost column, I had a “scratchpad” where I kept daily notes of what I’d worked on, in the middle I had my draft pad, and on the right I had the feed of notes, based on the search I’d done. I also had a separate screen dedicated to searching through all the notes that had previously been recorded.

There were several benefits to how this system worked:

  • It allowed me to forget things by putting notes under a different tags. That meant they wouldn’t show up on my focused feed, but that I could get back to them later.
  • It allowed me to regain context after getting interrupted much easier. - It gave me a virtual rubber duck. Since I often was trying to figure out issues by writing them to my teammates anyway, the journal gave me a very good first port of call when my stream of consciousness got blocked by an obstacle. This helped dramatically with keeping me off distracting websites like Hackernews or Reddit.
  • It allowed old information to fall out of relevance. One of the biggest problems with all the various tracking systems I’d used before, especially Trello and Org-mode, is that the system filled up, it was hard for old items to fall out of relevance without also being just a bit harder to access. Due to the nature of the feed, this system made it much more natural for information to fall off. And if I wanted it to stick around, I could just copy the relevant bits to a new note, which I often do.

All of this added up to me feeling like I’d found a missing piece of my mind. Almost like I’d created a REPL for my thought process.

Unfortunately, I don’t have that C# version any more. I do have a Go/Lua version, which is webapp based, though I still need to put some time into making the feedback-loop for it tighter, since my first versions weren’t quite as tightly focused on that, as much as they were focused on replicating the UI layout of the C# version. I’d argue that the tight feedback loop that the C# version had would be more important now, and I’ve slowly been working on adding it back.

The nice thing about the Go/Lua journal is that it’s far more flexible than the C# version, due to being able to write pages in Lua. Which means I’ll be able to

Lesson 3: Search is a great tool for debugging and flexible organization

Exhaustive string search of both code and notes has proven to be a surprisingly effective tool for understanding and cataloging large systems for me. To this end, Gills (my journaling software), Everything Search (search over the paths and file names on your laptop) and RipGrep have been extremely handy tools to have on hand. The nice thing about search as a tool is that it can be adapted into other things quite nicely. In fact, I would argue that fast search, both via Google, and via the tools I’d mentioned above, is one of the more influential changes we’ve seen in programming in the last 20 years.

Coda: Sticky ideas

8 years is a long time, and there are lot more ideas that I’d like to get into later. However, these are the ideas and things I’ve worked on that have proven to be surprisingly sticky. Perhaps they might help you, or give you some ideas of where to focus.

Published September 9th, 2019

Jan van den Berg (j11g)

Iedere dag vrij – Bob Crébas October 07, 2019 08:00 PM

I remember exactly where I was when, in 2004, I heard that Dutch ad site marktplaats.nl was sold for a staggering 224,5 million euros to eBay. A polder Cinderella story.

This success was, however, no accident. Of course, luck was involved, but this is true of all successful businesses. A few years after this deal Bob Crébas (don’t forget the acute accent), wrote down his experiences that lead to this deal. And this has resulted in a very fun auto-biography.

Iedere dag vrij (Every day off) – Bob Crébas (2006) – 238 pages

Bob

A former, staunch anti-nuclear energy, jobless and musically inclined hippie, from a farmer family, who was not particularly concerned with appearance and image, first grew a thrift-store chain into a multi-million euro business before deciding to jump on the internet bandwagon. And then he defied several odds (there were *many* competitors) before striking gold with this internet thing.

I read this book in one sitting: it is an absolutely fun, well-written and energizing story. And the internet deal (my primary interest) is only a small part of this story. Which is proof that this is a balanced story and that there is more to the writer than just this one deal.

I particularly liked how he weaved the entrepreneurial and pioneering spirit of the new land and the zeitgeist of the 60s and 70s into this story. And of course, the bands he played in are absolutely fantastic fun to read about.

Bob comes off as an interesting character and this auto-biography seems to be the accumulation of someone who is able to define and articulate his life philosophy succinctly: it’s about being not having, and it’s about creating and giving and not taking.

The post Iedere dag vrij – Bob Crébas appeared first on Jan van den Berg.

A Moveable Feast – Ernest Hemingway October 07, 2019 07:49 PM

Hemingway, the writers’ writer, is famously known for having spent his early years in Paris. Freshly married, this struggling and then unknown writer was honing his craft and subsequently defining what it means to be a writer in a vibrant post World War I Paris. Where he wrote his first big novel.

A Moveable Feast – Ernest Hemingway (1964) – 192 pages

In later life Hemingway wrote up his 5 year experience in a couple of loosely related stories. Which involve interactions with other writers (mainly Scott Fitzgerald) and poets. And which offer very specific details (drinks, prices, addresses etc.). Which is almost odd, since the stories were written some forty years later? These stories were posthumously bundled and released as this memoir.

This memoir offers a perfect insight in understanding Hemingway better and his writing. Blunt, scarce, dead-serious (to a point of being humourless even), and without pretense, A Moveable Feast is quintessential Hemingway and a must-read for anyone who wants to better understand him.

The post A Moveable Feast – Ernest Hemingway appeared first on Jan van den Berg.

Pete Corey (petecorey)

Generating Guitar Chords with Cartesian Products October 07, 2019 12:00 AM

Given two or more lists, like [1, 2] and [3, 4], the Cartesian product of those lists contains all ordered combinations of the elements within those lists: [1, 3], [1, 4], [2, 3], and [2, 4]. This may not seem like much, but Cartesian products are an algorithmic superpower. Maybe it’s J’s subtle influence over my programming style, but I find myself reaching more and more for Cartesian products in the algorithms I write, and I’m constantly awed by the simplicity and clarity the bring to my solutions.

As an example of how useful they can be, let’s look at the problem of generating all possible guitar chord voicings, like I do in Glorious Voice Leader. As a quick aside, if you want to know more about Glorious Voice Leader, check out last week’s post!

Imagine we’re trying to generate all possible C major chord voicings across a guitar’s fretboard. That is, we’re trying to find all playable combinations of the notes C, E, and G. How would we do this?

One approach, as you’ve probably guessed, is to use Cartesian products!

Let’s assume that we have a function, findNoteOnFretboard, that gives us all the locations (zero-based string/fret pairs) of a given note across the fretboard. For example, if we pass it a C (0 for our purposes), we’ll receive an array of string/fret pairs pointing to every C note on the fretboard:


[[0,8],[1,3],[1,15],[2,10],[3,5],[3,17],[4,1],[4,13],[5,8]]

Plotted on an actual guitar fretboard, we’d see all of our C notes exactly where we’d expect them to be:

Now imagine we’ve done this for each of our notes, C, E, and G:


let cs = findNoteOnFretboard(frets, strings, tuning)(0);
let es = findNoteOnFretboard(frets, strings, tuning)(4);
let gs = findNoteOnFretboard(frets, strings, tuning)(7);

The set of all possible voicings of our C major chord, or voicings that contain one of each of our C, E, and G notes, is just the cartesian product of our cs, es, and gs lists!


let voicings = _.product(cs, es, gs);

We’re using the lodash.product here, rather than going through the process of writing our own Cartesian product generator.

We can even generalize this to any given array of notes, and wrap it up in a function:


const voicings = (
  notes,
  tuning = [40, 45, 50, 55, 59, 64],
  frets = 18,
  strings = _.size(tuning)
) =>
  _.chain(notes)
    .map(findNoteOnFretboard(frets, strings, tuning))
    .thru(notesOnFretboard => _.product(...notesOnFretboard))
    .value();

Finding Notes on the Fretboard

So that’s great and all, but how do we implement our findNoteOnFretboard function? With Cartesian products, of course! We’ll generate a list of every string and fret position on the fretboard by computing the Cartesian product of each of our possible string and fret values:


const findNoteOnFretboard = (frets, strings, tuning) => note =>
  _.chain(_.product(_.range(strings), _.range(frets)))
    .value();

Next, we’ll need to filter down to just the string/fret pairs that point to the specified note:


const isNote = (note, tuning) => ([string, fret]) =>
  (tuning[string] + fret) % 12 === note;

const findNoteOnFretboard = (frets, strings, tuning) => note =>
  _.chain(_.product(_.range(strings), _.range(frets)))
    .filter(isNote(note, tuning))
    .value();

The isNote helper function returns whether the note at the given string/fret is the note we’re looking for, regardless of octave.

Filtering Out Doubled Strings

Currently, our chord voicing generator looks like this:


const isNote = (note, tuning) => ([string, fret]) =>
  (tuning[string] + fret) % 12 === note;

const findNoteOnFretboard = (frets, strings, tuning) => note =>
  _.chain(_.product(_.range(strings), _.range(frets)))
  .filter(isNote(note, tuning))
  .value();

const voicings = (
  notes,
  tuning = [40, 45, 50, 55, 59, 64],
  frets = 18,
  strings = _.size(tuning)
) =>
  _.chain(notes)
    .map(findNoteOnFretboard(frets, strings, tuning))
    .thru(notesOnFretboard => _.product(...notesOnFretboard))
    .value();

Not bad. We’ve managed to generate all possible voicings for a given chord in less than twenty lines of code! Unfortunately, we have a problem. Our solution generates impossible voicings!

The first problem is that it can generate voicings with two notes on the same string:

On a stringed instrument like the guitar, it’s impossible to sound both the C and E notes simultaneously. We’ll need to reject these voicings by looking for voicings with “doubled strings”. That is, voicings with two or more notes played on the same string:


const voicings = (
  notes,
  tuning = [40, 45, 50, 55, 59, 64],
  frets = 18,
  strings = _.size(tuning)
) =>
  _.chain(notes)
    .map(findNoteOnFretboard(frets, strings, tuning))
    .thru(notesOnFretboard => _.product(...notesOnFretboard))
    .reject(hasDoubledStrings)
    .value();

Our hasDoubledStrings helper simply checks if the size of the original voicing doesn’t match the size of our voicing after removing duplicated strings:


const hasDoubledStrings = chord =>
  _.size(chord) !==
  _.chain(chord)
    .map(_.first)
    .uniq()
    .size()
    .value();

Filtering Out Impossible Stretches

Unfortunately, our solution has one last problem. It can generate chords that are simply too spread out for any human to play. Imagine trying to stretch your hand enough to play this monster of a voicing:

No good. We’ll need to reject these voicings that have an unplayable stretch:


const voicings = (
  notes,
  tuning = [40, 45, 50, 55, 59, 64],
  frets = 18,
  maxStretch = 5,
  strings = _.size(tuning)
) =>
  _.chain(notes)
    .map(findNoteOnFretboard(frets, strings, tuning))
    .thru(notesOnFretboard => _.product(...notesOnFretboard))
    .reject(hasDoubledStrings)
    .reject(hasUnplayableStretch(maxStretch))
    .value();

Let’s keep things simple for now and assume that an “unplayable stretch” is anything over five frets in distance from one note in the voicing to another.


const hasUnplayableStretch = maxStretch => chord => {
  let [, min] = _.minBy(chord, ([string, fret]) => fret);
  let [, max] = _.maxBy(chord, ([string, fret]) => fret);
  return max - min > maxStretch;
};

Expansion and Contraction

Our voicings function now generates all possible voicings for any given set of notes. A nice way of visualizing all of these voicings on the fretboard is with a heat map. Here are all of the C major voicings we’ve generated with our new Cartesian product powered voicings function:

The darker the fret, the more frequently that fret is used in the set of possible voicings. Click any fret to narrow down the set of voicings.

The Cartesian product, at least in the context of algorithms, embodies the idea of expansion and contraction. I’ve found that over-generating possible results, and culling out impossibilities leads to incredibly clear and concise solutions.

Be sure to add the Cartesian product to your programming tool box!

#cs, #double, #stretch, #all { width: 100%; } #all { cursor: pointer; } #cs .fretboard, #cs canvas, #double .fretboard, #double canvas, #stretch .fretboard, #stretch canvas, #all .fretboard, #all canvas { width: 120% !important; margin-left: -10%; }

October 06, 2019

Derek Jones (derek-jones)

Cost ratio for bespoke hardware+software October 06, 2019 09:32 PM

What percentage of the budget for a bespoke hardware/software system is spent on software, compared to hardware?

The plot below has become synonymous with this question (without the red line, which highlights 1973), and is often used to claim that software costs are many times more than hardware costs.

USAF bespoke hardware/Software cost ratio from 1955 to 1980.

The paper containing this plot was published in 1973 (the original source is a Rome period report), and is an extrapolation of data I assume was available in 1973, into what was then the future. The software and hardware costs are for bespoke command and control systems delivered to the U.S. Air Force, not commercial off-the-shelf solutions or even bespoke commercial systems.

Does bespoke software cost many times more than the hardware it runs on?

I don’t have any data that might be used to answer this questions, to any worthwhile degree of accuracy. I know of situations where I believe the bespoke software did cost a lot more than the hardware, and I know of some where the hardware cost more (I have never been privy to exact numbers on large projects).

Where did the pre-1973 data come from?

The USAF funded the creation of lots of source code, and the reports cite hardware and software figures from 1972.

To summarise: the above plot is for USAF spending on bespoke command and control hardware and software, and is extrapolated from 1973 into the future.

Bogdan Popa (bogdan)

Announcing redis-rkt October 06, 2019 04:00 PM

Another Racket thing! redis-rkt is a new Redis client for Racket that I’ve been working on these past few weeks. Compared to the existing redis and rackdis packages, it: is fully documented, is safer due to strict use of contracts, is faster, supports more commands and its API tries to be idiomatic, rather than being just a thin wrapper around Redis commands. Check it out!

Ponylang (SeanTAllen)

Last Week in Pony - October 6, 2019 October 06, 2019 03:57 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, or our Zulip community.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Carlos Fenollosa (carlesfe)

October 05, 2019

Bit Cannon (wezm)

Ryzen 9 SFF PC October 05, 2019 11:11 PM

I built this machine for work use. I started a new job in March 2019 that involves working with two compiled languages: Mercury and Rust. I wanted a machine with lots of cores/threads to keep the edit-compile-test cycle as short as possible.

Specifications

Component Description
CPU AMD Ryzen 9 3900X (Base: 3.80GHz, Boost: 4.60GHz, Cores: 12, Threads: 24)
CPU Cooling Noctua NH-U9S
Case Cooling Noctua NF-A14-PWM
Motherboard Gigabyte GA-X570 I Aorus Pro WiFi
Memory Corsair Vengeance LPX 16GB (2x8GB), PC4-25600 (3200MHz) DDR4, 16-18-18-36
Storage Samsung 500GB SSD, 970 EVO Plus, M.2 NVMe
Case Streacom DA2
Power Supply Corsair 450W SF450 High Performance SFX
Graphics Card Gigabyte Radeon RX 560 16CU Gaming OC
Display Dell P2415Q 23.8inch 4K (3840 × 2160) LCD

José Padilla (jpadilla)

Richard Kallos (rkallos)

Presentation: Hybrid Logical Clocks @ PWLMTL October 05, 2019 03:22 PM

This past thursday (2019–10–03), I presented this paper at Papers We Love Montreal. I had a really fun time!

Here are my slides.

October 03, 2019

Chris Double (doublec)

Defining Types in Shen October 03, 2019 05:00 AM

The Shen programming language has an extensible type system. Types are defined using sequent calculus and the system is powerful enough to create a variety of exotic types but it can be difficult when first starting with Shen to know how to use that power. In this post I hope to go through some basic examples of defining types in Shen without needing to know too much sequent calculus details.

For an overview of Shen there is Shen in 15 minutes and the Shen OS Kernel Manual. An interactive JavaScript REPL exists to try examples in the browser or pick on one of the existing Shen language ports. For these examples I'm using my Wasp Lisp port of Shen.

Shen is optionally typed. The type checker can be turned off and on. By default it is off and this can be seen in the Shen prompt by the presence of a '-' character:

(0-) ...shen code...

Turning type checking on is done by using (tc +). The '-' in the prompt changes to a '+' to show type checking is active. It can be turned off again with (tc -):

(0-) (tc +)
(1+) (tc -)
(2-) ...

Types in Shen are defined using datatype. The body of the datatype definition contains a series of sequent calculus rules. These rules define how an object in Shen can be proved to belong to a particular type. Rather than go through a detailed description of sequent calculus, I'm going to present common examples of types in Shen to learn by example and dive into details as needed. There's the Shen Language Book for much more detail if needed.

Records

One way of storing collections of data in Shen is to use lists or vectors. For example, given an the concept of a 'person' that has a name and age, this can be stored in a list with functions to get the relevent data:

(tc -)

(define make-person
  Name Age -> [Name Age])

(define get-name
  [Name Age] -> Name)

(define get-age
  [Name Age] -> Age)

(get-age (make-person "Person1" 42))
 => 42

In the typed subset of Shen we can define a type for this person object using datatype:

(datatype person
  N : string; A : number;
  _______________________
  [N A] : person;)

This defines one sequent calculus rule. The way to read it is starting with the code below the underscore line, followed by the code above it. In this case the rule states that if an expression matching the pattern [N A] is encountered, where N is a string and A is a number, then type that expression as person. With that rule defined, we can ask Shen if lists are of the type person:

(0+) ["Person" 42] : person
["Person1" 42] : person

(1+) ["Person1" "Person1"] : person
[error shen "type error"]

(2+) ["Person 42"]
["Person1" 42] : person

Given this person type, we might write a get-age function that is typed such that it only works on person objects as follows (The { ...} syntax in function definitions provides the expected type of the function):

(define get-age
  { person --> number }
  [N A] -> A)
[error shen "type error in rule 1 of get-age"]

Shen rejects this definition as not being type safe. The reason for this is because our datatype definition only states that [N A] is a person if N is a string and A is a number. It does not state that a person object is constructed only of a string and number. For example, we could have an additional definition as follows:

(datatype person2
  N : string; A : string;
  _______________________
  [N A] : person;)

Now we can create different types of person objects:

(0+) ["Person" 42 ] : person
["Person" 42] : person

(1+) ["Person" "young"] : person
["Person" "young"] : person

get-age is obviously badly typed in the presence of this additional type of person which is why Shen rejected it originally. To resolve this we need to tell Shen that an [N A] is a person if and only if N is a string and A is a person. This is done with what is called a 'left rule'. Such a rule defines how a person object can be deconstructed. It looks like this:

(datatype person3
  N : string, A: number >> P;
  ___________________________
  [N A] : person >> P;)

The way to read this type of rule is that, if [N A] is a person then N is a string and A is a number. With that loaded into Shen, get-age type checks:

(define get-age
   { person --> number }
   [N A] -> A)
get-age : (person --> number)

(0+) (get-age ["Person" 42])
42 : number

The need to create a left rule, dual to the right rule, is common enough that Shen has a short method of defining both in one definition. It looks like this - note the use of '=' instead of '_' in the separator line:

(datatype person
   N : string; A : number;
   =======================
   [N A] : person;)

(define get-age
   { person --> number }
   [N A] -> A)
get-age : (person --> number)

(0+) (get-age ["Person" 42])
42 : number

The above datatype is equivalent to declaring the two rules:

(datatype person
  N : string; A : number;
  _______________________
  [N A] : person;

  N : string, A: number >> P;
  ___________________________
  [N A] : person >> P;)

Controlling type checking

When progamming at the REPL of Shen it's common to create datatype definitions that are no longer needed, or part of a line of thought you don't want to pursue. Shen provides ways of excluding or including rules in the typechecker as needed. When defining a set of rules in a datatype, that datatype is given a name:

(datatype this-is-the-name
   ...
)

The rules within that definition can be removed from selection by the typechecker using preclude, which takes a list of datatype names to ignore during type checking:

(preclude [this-is-the-name])

To re-add a dataype, use include:

(include [this-is-the-name])

There is also include-all-but and preclude-all-but to include or remove all but the listed names. These commands are useful for removing definitions you no longer want to use at the REPL, but also for speeding up type checking in a given file if you know the file only uses a particular set of datatypes.

Enumerations

An example of an enumeration type would be days of the week. In an ML style language this can be done like:

datatype days =   monday | tuesday | wednesday
                | thursday | friday | saturday | sunday

In Shen this would be done using multiple sequent calculus rules.

(datatype days
    ____________
    monday : day;

    ____________
    tuesday : day;

    ____________
    wednesday : day;

    ____________
    thursday : day;

    ____________
    friday : day;

    ____________
    saturday : day;

    ____________
    sunday : day;)

Here there are no rules above the dashed underscore line, meaning that the given symbol is of the type day. A function that uses this type would look like:

(define day-number
  { day --> number }
  monday    -> 0
  tuesday   -> 1
  wednesday -> 2
  thursday  -> 3
  friday    -> 4
  saturday  -> 5
  sunday    -> 6)

It's quite verbose to define a number of enumeration types like this. It's possible to add a test above the dashed underline which allows being more concise. The test is introduced using if:

(datatype days
  if (element? Day [monday tuesday wednesday thursday friday saturday sunday])
  ____________________________________________________________________________
  Day : day;)

(0+) monday : day
monday : day

Any Shen code can be used in these test conditions. Multiple tests can be combined:

(datatype more-tests
  if (number? X)
  if (>= X 5)
  if (<= X 10)
  ___________
  X : between-5-and-10;)

  (2+) 5 : between-5-and-10
  5 : between-5-and-10

  (3+) 4 : between-5-and-10
  [error shen "type error\n"]

Polymorphic types

To create types that are polymorphic (ie. generic), like the built-in list type, include a free variable representing the type. For example, something like the built in list where the list elements are stored as pairs can be approximated with:

(datatype my-list
   _____________________
   my-nil : (my-list A);

   X : A; Y : (my-list A);
   ========================
   (@p X Y) : (my-list A);)


(define my-cons
  { A --> (my-list A) --> (my-list A) }
  X Y -> (@p X Y))

(0+) (my-cons 1 my-nil)
(@p 1 my-nil) : (my-list number)

(1+) (my-cons 1 (my-cons 2 my-nil))
(@p 1 (@p 2 my-nil)) : (my-list number)

(2+) (my-cons "a" (my-cons "b" my-nil))
(@p "a" (@p "b" my-nil)) : (my-list string)

Notice the use of the '=====' rule to combine left and right rules. This is required to enable writing something like my-car which requires proving that the type of the car of the list is of type A:

(define my-car
   { (my-list A) --> A }
   (@p X Y) -> X)

List encoded with size

Using peano numbers we can create a list where the length of the list is part of the type:

(datatype list-n
  ______
  [] : (list-n zero A);

  X : A; Y : (list-n N A);
  ================================
  [ X | Y ] : (list-n (succ N) A);)      

(define my-tail
  { (list-n (succ N) A) --> (list-n N A) }
  [Hd | Tl] -> Tl)

(define my-head
  { (list-n (succ N) A) --> A }
  [Hd | Tl] -> Hd)

This gives a typesafe head and tail operation whereby they can't be called on an empty list:

(0+) [] : (list-n zero number)
[] : (list-n zero number)

(1+) [1] : (list-n (succ zero) number)
[1] : (list-n (succ zero) number)

(2+) (my-head [])
[error shen "type error\n"]

(3+) (my-head [1])
1 : number

(4+) (my-tail [1 2 3])
[2 3] : (list-n (succ (succ zero)) number)

(5+) (my-tail [])
[error shen "type error\n"]      

Power and Responsibility

Shen gives a lot of power in creating types, but trusts you to make those types consistent. For example, the following creates an inconsistent type:

(datatype person
  N : string; A : number;
  _______________________
  [N A] : person;

  N : string; A : string;
  _______________________
  [N A] : person;

  N : string, A: number >> P;
  ___________________________
  [N A] : person >> P;)

Here we are telling Shen that a string and number in a list is a person, and so too is a string and another string. But the third rule states that given a person, that is is composed of a string and a number only. This leads to:

(0+) (get-age ["Person" "Person"])
...

This will hang for a long time as Shen attempts to resolve the error we've created.

Conclusion

Shen provides a programmable type system, but the responsibility lies on the programmer for making sure the types are consisitent. The examples given here provide a brief overview. For much more see The Book of Shen. The Shen OS Kernel Manual also gives some examples. There are posts on the Shen Mailing List that have more advanced examples of Shen types. Mark Tarver has a case study showing converting a lisp interpreter in Shen to use types.

October 01, 2019

Simon Zelazny (pzel)

For focused reading, disconnect wifi October 01, 2019 10:00 PM

Today I had a 50-page PDF whitepaper to read. I didn't know how long it was going to take, and — wanting to conserve my laptop battery charge – I disabled my wifi. It took me suprisingly little time to skim the paper and dig into the more relevant parts in detail, taking some notes as I went along.

During the ~2 hours it took me to work through the document, I tried to access the Internet about ten times. Either by clicking a link in the whitepaper itself, or trying to follow up on a tangential thought with some info online. But! My wifi was switched off, and I'd need to click the network manager icon to re-connect to the net. I chose not to do that, and instead continue reading.

Each time my impulse to access information was frustrated, I realized that had I been online, I would have wasted precious minutes reading tangentially-related web pages, and then some more time again, trying to get back to reading the original PDF, reestablishing the reading context and exerting willpower to stay in the PDF reader.

Acting on these distractions would have defintely prevented me from ingesting the whitepaper in 2 hours.

After the fact, I realized that what I'd achieved accidentally is the productivity hack identified by Matt Might as crippling your technology. By removing functionality from our tools and keeping that which is strictly necessary for completing the task at hand, we remove the 'friction' that gets in the way of sustained attention. Yes, we do "lose" some capabilities, but we make up for it by making it easier for ourselves to focus on the goal.

I'll try to keep this technique in my tool-belt, especially when I need long periods of focus.

Apart from Matt Might's productivity writings, a lot more in this vein can be found in Cal Newport's books & blog posts.

Gustaf Erikson (gerikson)

Pete Corey (petecorey)

Animating a Canvas with Phoenix LiveView: An Update October 01, 2019 12:00 AM

In my previous post on animating an HTML5 canvas using Phoenix LiveView, we used both a phx-hook attribute and a phx-update="ignore" attribute simultaneously on a single DOM element. The goal was to ignore DOM updates (phx-update="ignore"), while still receiving updated data from our server (phx-hook) via our data-particles attribute.

Unfortunately, the technique of using both phx-hook and phx-update="ignore" on a single component no longer works as of phoenix_live_view version 0.2.0. The "ignore" update rule causes our hook’s updated callback to not be called with updates. In hindsight, the previous behavior doesn’t even make sense, and the new behavior seems much more consistent with the metaphors in play.

Joxy pointed this issue out to me, and helped me come up with a workaround. The solution we landed on is to wrap our canvas component in another DOM element, like a div. We leave our phx-update="ignore" on our canvas to preserve our computed width and height attributes, but move our phx-hook and data attributes to the wrapping div:


<div
  phx-hook="canvas"
  data-particles="<%= Jason.encode!(@particles) %>"
>
  <canvas phx-update="ignore">
    Canvas is not supported!
  </canvas>
</div>

In the mounted callback of our canvas hook, we need to look to the first child of our div to find our canvas element:


mounted() {
  let canvas = this.el.firstElementChild;
  ...
}

Finally, we need to pass a reference to a Phoenix Socket directly into our LiveSocket constructor to be compatible with our new version of phoenix_live_view:


import { Socket } from "phoenix";
let liveSocket = new LiveSocket("/live", Socket, { hooks });

And that’s all there is to it! Our LiveView-powered confetti generator is back up and running with the addition of a small layer of markup.

For more information on this update, be sure to check out this issue I filed to try to get clarity on the situation. And I’d like to give a huge thanks to Joxy for doing all the hard work in putting this fix together!

September 30, 2019

Pete Corey (petecorey)

All Hail Glorious Voice Leader! September 30, 2019 12:00 AM

I’ve been writing code that generates guitar chords for over a year now, in various languages and with varying degrees of success. My newest addition to this family of chord-creating programs is Glorious Voice Leader!

Glorious Voice Leader is an enigmatic leader tool who’s mission is to help you voice lead smoothly between chords. It does this by generating all possible (and some impossible) voicings of a given chord, and sorting them based on the chromatic distance from the previous chord in the progression.

Glorious Voice Leader says, “the less you move, the more you groove!”

Obviously, this robotic “rule” needs to be tempered by human taste and aesthetic, so the various choices are presented to you, the user, in the form of a heat map laid over a guitar fretboard. The notes in the voicings that Glorious Voice Leader think lead better from the previous chord are darkened, and notes that don’t lead as well are lightened.

To get a grasp on this, let’s consider an example.

Let’s pretend we’re trying to play a ii-V-I progression on the guitar in the key of C. When we tell Glorious Voice Leader that our first chord will be a Dm7, it gives us a heat map of the various initial voicings to choose from:

With this initial chord, darker notes in the heat map are used more frequently by the generated voicings, and lighter notes are used more rarely. Click on the notes of the Dm7 voicing you want to start with.

Once we’ve told Glorious Voice Leader where to start, we can tell it where we want to go next. In our case, our next chord will be a G7. Here’s where things get interesting. Glorious Voice Leader generates all possible G7 voicings, and ranks them according to how well they lead from the Dm7 we just picked out.

Pick out a G7 voicing with darkened notes:

Now we tell Glorious Voice Leader that we want to end our progression with a Cmaj7 chord.

Choose your Cmaj7 voicing:

That’s it! With Glorious Voice Leader’s help, we’ve come up with an entire ii-V-I chord progression. Grab yourself a guitar and play through the whole progression. I’m willing to bet it sounds pretty nice.

For this example, we’ve embedded a small, reluctant version of Glorious Voice Leader directly into this page. Check out the above example in its full-fledged glory at the Glorious Voice Leader website. If you’re eager for another example, here’s the entire series of diatonic seventh chords descending in fourths, as suggested by Glorious Voice Leader.

If you find this interesting, be sure to give Glorious Voice Leader a try and let me know what you think! Expect more features and write-ups in the near future.

#d, #g, #c { width: 100%; cursor: pointer; } #d .fretboard, #d canvas, #g .fretboard, #g canvas, #c .fretboard, #c canvas { width: 120% !important; margin-left: -10%; }

September 29, 2019

Carlos Fenollosa (carlesfe)

checkm8: What you need to know to keep your iPhone safe September 29, 2019 06:12 PM

A couple days ago, Twitter user axi0mX introduced checkm8, a permanent unpatchable bootrom exploit for iPhones 4S to X

The jailbreak community celebrated this great achievement, the netsec community was astounded at the scope of this exploit, and regular users worried what this meant for their phone's security.

Even though I've jailbroken my iPhone in the past, I have no interest to do it now. If you want to read the implications for the jailbreak community, join the party on /r/jailbreak

I have been reading articles on the topic to understand what are the implications for regular people's security and privacy. All my family has A9 iPhones which are exploitable, and I wanted to know whether our data was at risk and, if such, what could we do to mitigate attacks.

I think the best way to present the findings is with a FAQ so people can understand what's going on.

1-Line TL;DR

If you have an iPhone 4s, 5, or 5c, somebody who has physical access to your phone can get all the data inside it. If your phone is more modern and the attacker doesn't know your password, they can still install malware, but rebooting your phone makes it safe again.

What is Jailbreak?

Your iPhone is controlled by Apple. You own it, but you are limited in what you can do with it.

Some people like this approach, others prefer to have total control of their phone.

A jailbreak is a way of breaking these limitations so you can 100% control what's running on your phone.

The goal of jailbreaking is not necessarily malicious. In fact, the term "jailbreak" has the connotation that the user is doing it willingly.

However, the existence of a jailbreak method means that an attacker could use this same technique to compromise your phone. Therefore, you must understand what is going on and how to protect yourself from these attackers.

Jailbreaking has existed since the first iPhone. Why is this one different?

Typically, jailbreaking methods exploit a software bug. This means that Apple can (and does) fix that bug in the next software release, negating the method and any related security issues.

This method, however, exploits a hardware bug on the bootrom. The bootrom is a physical chip in your iPhone that has some commands literally hard-wired in the chip. Apple cannot fix the bug without replacing the chip, which is unfeasible.

Therefore, it is not possible to fix this bug, and it will live with your phone until you replace it

These kind of bugs are very rare. This exact one has been already patched on recent phones (XS and above) and it has been a long time since the last one was found.

☑ This bug is extremely rare and that is why it's important to know the consequences.

How can an attacker exploit this bug? Can I be affected by it without my knowledge?

This exploit requires an attacker to connect your phone to a computer via Lightning cable.

It cannot be triggered by visiting a website, receiving an email, installing an app, or any non-suspicious action.

☑ If your phone never leaves your sight, you are safe.

I left my phone somewhere out of sight. May it be compromised?

Yes. However, if you reboot your phone, it goes back to safety. Any exploit does not persist upon reboots, at least, at this point in time. If that changes, this text will be updated to reflect that.

Any virus or attack vector will be uninstalled or disabled by Apple's usual protections after a reboot.

If you feel that you are targeted by a resourceful attacker, read below "Is there a feasible way to persist the malware upon reboot?"

☑ If you are not sure about the safety of your phone, reboot it.

Can my personal data be accessed if an attacker gets physical access to my phone?

For iPhones 4S, 5 and 5c, your data may be accessed regardless of your password. For iPhones 5s and above (6, 6s, SE, 7, 8, X), your data is safe as long as you have a strong password.

If you have an iPhone 4s, 5, or 5c, anybody with physical access to your phone will have access to its contents if your password is weak (4 to 8 digit PIN code, or less than 8 characters alphanumeric code)

If your iPhone 4s-5-5c has a strong password, and the attacker does not know it and cannot guess it, they may need a long time (months to years) to extract the data. Therefore this attack cannot be run in the scenario where the phone leaves your sight for a few minutes, but you get it back quickly afterwards. However, if your phone 4s-5-5c is stolen, assume that your data is compromised.

It is unknown if this exploit allows the attacker to guess your password quicker than a "months to years" period on older iPhones.

iPhones 5s and above have a separate chip called the Secure Enclave which manages access to your personal data. Your data is encrypted on the device and can not be accessed. The Secure Enclave does not know your password, but uses some math to decrypt it with your password.

If you have an iPhone 5s and above, an attacker can only access your data if they know, or can easily guess, your password.

☑ Use a strong password (>8 alphanumeric characters) that an attacker can not guess

Can it be used to disable iCloud lock, and therefore re-use stolen phones?

It is unknown at this point.

Assuming the scenario where iCloud lock is not broken, and the Secure Enclave is not affected, what is the worst that can happen to my phone?

You may suffer a phishing attack: they install a fake login screen on your iPhone, or replace the OS with an exact copy that works as expected, but it also sends all your keystrokes and data to the attacker.

The fake environment may be indistinguishable from the real one. If you are not aware of this attack, you will fall for it.

Fortunately, this malware will be purged or disabled upon reboot.

All phones (4s to X) are vulnerable to this attack.

☑ Always reboot your phone if you think it may be compromised.

Is there a feasible way to persist the malware upon reboot?

Unlikely. The jailbreak is tethered, which means that the phone must be connected to a computer every time it boots.

However, somebody may develop a tiny device that connects to the Lightning port of the iPhone and conveniently injects code/malware every time it is rebooted.

This device may be used on purpose by jailbreakers, for convenience (i.e. a Lightning-USB key, or a small computer) or inadvertently installed by a sophisticated attacker (i.e. a phone case that by-passes the lightning port without the victim knowing)

In most cases, this external device will be easy to spot even to the untrained eye.

An extremely sophisticated attacker may develop a custom chip that is connected internally to the Lightning port of the iPhone and runs the malware automatically and invisibly. To do so, they would need physical access to your phone for around 10 minutes, the time it takes to open the phone, solder the new chip, and close it again.

☑ Watch out for unexpected devices connected to your Lightning port

Who are these "attackers" you talk about?

Three-letter agencies (NSA, FBI, KGB, Mossad...) and also private companies who research their own exploits (Cellebrite, Greyshift) to sell them to the former.

It is entirely possible that the above already knew about this exploit, however.

Other attackers may be regular thieves, crackers, pranksters, or anybody interested in developing a virus for the iPhone.

If you are a regular user who is not the target of a Government or Big Criminal, remember:

  1. Don't let people connect your iPhone to an untrusted device
  2. Otherwise, reboot it when you get it back
  3. Watch out for small devices on your Lightning port
~~~~~~

References:

Tags: apple, security

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Famous public figure in tech suffers the consequences for asshole-ish behavior September 29, 2019 12:05 PM

This last month, a very famous computer guy who regularly appears in public and has amassed a cultish-like following has been forced to step down due to pressure from journalists.

Let's make a list of all the unacceptable behaviours of Computer Guy:

Living in his office and disgusting smell

He is not really homeless, but Computer Guy used to sleep in his office.

Coworkers and friends reported that he reeked and would avoid contact with him.

Sexually harassing employees

It has been reported, even in video, that Computer Guy made inappropriate sexual utterances to their colleagues.

Of course, Computer Guy denied it.

Drug intake

Computer Guy is a known hippie, I mean, just look at his appearance.

He is not ashamed to admit that he has taken illegal drugs and that they are an important part of his life.

Psychological abuse to women

Not many people know this, but Computer Guy has a daughter which he denied for a long time.

Computer Guy basically abandoned his former partner who was pregnant with their daughter, denied her alimony, and even abused the child psychologically when she was 9.

Keeping payments from group projects for himself

In one of his projects, Computer Guy profited more that he had earned by lying to colleagues. Instead of fairly distributing the money from a project, he decided to take most of it for himself.

In a similar case, he denied fair compensation to an old friend of his.

Bad temper

All these examples can be summarized as: Computer Guy is an asshole who must be taken down.

Even though Computer Guy did nothing technically illegal, being such a big asshole must not be acceptable in our society and the right thing to do is to pressure him to resign from his public positions.

~~~~~~

Since mobs don't read the news, only the headlines, and I don't want any association with any of the parties in this drama, I think I must write the non-snarky interpretation of the events.

Of course, the headlines above are about Steve Jobs, not Richard Stallman.

I only had one goal with this piece: to reflect on the double standards in society.

Being an asshole is acceptable if you are a respected powerful businessman. You are portrayed as a quirky millionaire. However, it is not acceptable if you're a contrarian weird hippie. You are portrayed as a disgusting creep.

I obviously have no interest or authority to defend or justify their actions. They're adults and their behavior is their own. Screw their asshole-ism. They should have been better people. Stallman is a stubborn asshole, Jobs was an even bigger stubborn asshole.

The truth is, there is a strong correlation between being a powerful public figure and being a stubborn asshole. This is because after some point, non-assholes quit the race because they are not willing to pay the toll it takes to be at the top. That is unfortunate, and we should definitely push for respecful leaders.

Why did two independent journalists take Stallman down, and not Jobs, or any of the other assholes in the world?

Probably, because they could.

It's their right to free speech, and ultimately it was a consequence of Stallman's actions. And I can't reflect on whether it's fair or good that Stallman is forced to step down, because I'm not smart enough to foresee the positive or negative consequences. So maybe after a few months we all realize it was the right thing to do, and end this discussion once and for all.

However, one thing is still true, again, the only point that should be taken from this article: to hell with double standards when representing public figures.

~~~~~~

Not that it matters for this article, and it's outside the scope of my point, but I want to share my personal vision on Stallman and Jobs. The thing is, this was a difficult article to write. They are both people who I strongly admire and have had a great influence in my life.

Reading Stallman's essays are what got me into Free Software. I have attended his conferences twice and his brave stance on freedom and privacy is flawless and admirable. I have a small laptop that Stallman signed and many of his books. He has constantly fought for the rights of the people against corporations. I hope he keeps doing it.

The world is a better place thanks to Stallman.

Jobs was an inspiration. I own most books about him, an Apple "Think Different" poster is hanging at my office, and I treasure the issue Time released after his death. He was a genius, a visionary, he basically invented consumer computers and smartphones. I do not doubt that the contributions of Woz and other people at Apple were instrumental, but he was the mastermind behind the strategy. What Jobs achieved with his work is beyond belief and 100% worth of praise.

The world is a better place thanks to Jobs.

If you want more context about the actual facts, I wrote about the news a week ago.

Tags: news

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

September 28, 2019

Frederik Braun (freddyb)

Remote Code Execution in Firefox beyond memory corruptions September 28, 2019 10:00 PM

This is the blog post version of my presentation form OWASP Global AppSec in Amsterdam 2019. It was presented in the AllStars Track.

Abstract:

Browsers are complicated enough to have attack surface beyond memory safety issues. This talk will look into injection flaws in the user interface of Mozilla Firefox, which is implemented in JS, HTML, and an XML-dialect called XUL. With an Cross-Site Scripting (XSS) in the user interface attackers can execute arbitrary code in the context of the main browser application process. This allows for cross-platform exploits of high reliability. The talk discusses past vulnerabilities and will also suggest mitigations that benefit Single Page Applications and other platforms that may suffer from DOM-based XSS, like Electron.

Prologue

(This is the part, where we reduce the lighting and shine a flashlight into my face)

Listen, well, young folks. Old people, browser hackers or Mozilla fanboys, might use this as an opportunity to lean back and stroke their mighty neckbeard, as they have heard all of this before

It was the year 1997, and people thought XML was a great idea. In fact, it was so much better than its warty and unparseable predecessor HTML. While XHTML was the clear winner and successor for great web applications, it was obvious that XML would make a great user interface markup language to create a powerful cross-platform toolkit dialect. This folly marks the hour of birth for XUL. XUL was created as the XML User Interface Language at Netscape (the company that created the origins of the Mozilla source code. Long story. The younger folks might want to read upon Wikipedia or watch the amazing Movie "Code Rush", which is available on archive.org). Jokingly, XUL was also a reference to the classic 1984 movie Ghostbusters, in which an evil deity called Zuul (with a Z) possesses innocent people.

Time went by and XUL did not take off as a widely-recognized standard for cross-platform user interfaces. Firefox has almost moved from XUL and re-implemented many parts in HTML. Aptly named after an evil spirit, we will see that XUL still haunts us today.

Mapping the attack surface

Let's look into Firefox, to find some remnants of XUL, by visiting some internal pages. Let's take a look at some Firefox internal pages. By opening about:preferences in a new tab (I won't be able to link to it for various good reasons). Now either look at the source code using the Developer Tools (right-click "Inspect Element") or view the source code of Firefox Nightly using the source code search at searchfox.org.

We can also open the developer console and poke around with the obscure objects and functions that are available for JavaScript in privileged pages. As a proof-of-concept, we may alert(Components.stack), which gives us a stringified JavaScript call stack - notably this is a JavaScript object that is left undefined for normal web content.

Inspecting the source code we also already see some markup that screams both XML as well as XML-dialect. While still in our information gathering phase, we will not go too deep, but make note of two observations: - XUL is not HTML. To get a better understanding of elements like <command>, <colorpicker> or <toolbar>, we will be able to look at the XUL Reference on MDN - XUL is scriptable! A <script> tag exists and it may contain JavaScript.

There are also some newer pages like about:crashes, which holds previously submitted (or unsubmitted) crash reports. Whether those internal pages are written in (X)HTML or XUL. Most of the interacive parts are written in JavaScript. I suppose most of you will by now understand that we are looking for Cross-Site Scripting (XSS) vulnerabilities in the browser interface. What's notable here, is that this bypasses the sandbox.

As an aside the page behind about:cache is actually implemented using C++ that emits HTML-ish markup.

Let's start with search and grep

Being equipped with the right kind of knowledge and the crave for a critical Firefox bug under my name, I started using our code search more smartly. Behold:

Search: .innerHTML =

Number of results: 1000 (maximum is 1000)

Hm. Excluding test files.

Search: innerHTML =

Number of results: 414

That's still a lot. And that's not even all kinds of XSS sinks. I would also look for outerHTML, insertAdjacentHTML and friends.

Search (long and hairy regular expression that tries to find more than innerHTML)

Number of results: 997

That's bad. Let's try to be smarter!

JavaScript Parsing - Abstract Syntax Trees. ESLint to the rescue!

I've actually dabbled in this space for a long while before. This would be another talk, but a less interesting one. So I'll skip ahead and tell you that I wrote an eslint plugin, that will analyze JavaScript files to look for the following:

  1. Checking the right-hand side in assignments (+, +=) where the left part ends with either innerHTML or outerHTML.
  2. Checking the first argument in calls to document.write(), document.writeln(), eval and the second argument for insertAdjacentHTML.

For both, we'll check whether they contain a variable. String literals or empty strings are ignored. The plug-in is available at as eslint-plugin-no-unsanitized and allows for configuration to detect and ignore built-in escape and sanitize functions. If you're worried about DOM XSS, I recommend you check it out.

Discovered Vulnerabilities

Using this nice extension to scan all of Firefox yields us a handy amount of 32 matches. We create a spreadsheet and audit all of them by hand. Following around long calling chains, with unclear input values and patterns that either escape HTML close to the final innerHTML, upon creation or stuff that's extracted from databses (like the browsing history), which does its escaping upon insertion.


Many nights later


A first bug appears

Heureka! This sounds interesting:

  let html = `
    <div style="flex: 1;
                display: flex;
                padding: ${IMAGE_PADDING}px;
                align-items: center;
                justify-content: center;
                min-height: 1px;">
      <img class="${imageClass}"
           src="${imageUrl}"/>       <----- boing
    </div>`;
  // …
  div.innerHTML = html;

When hovering over an markup that points to an image in the web developer tools, they will helpfully create a tooltip that preloads and shows the image for the web developer to enjoy. Unfortunately, that URL is not escaped.

Firefox Developer Tools Inspector opening images in a tooltip when hovering an image element's source attribute

Writing the exploit

After spending a few sleepless nights on this, I didn't get anything beyond a XML-conformant proof of concept of <button>i</button>. At some point I filed the bug as sec-moderate, i.e., this is almost bad, but likely needs another bug to be actually terrible. I wrote:

I poked a bit again and I did not get further than <button>i</button> for various reasons … In summary: I'd be amazed to see if someone else gets any farther.

A few nights later, I actually came up with an exploit that breaks the existing syntax while staying XML conformant. We visit an evil web page that looks like this:

<img src='data:bb"/><button><img src="x" onerror="alert(Components.stack)" /></button><img src="x'>

The image URL that is used in the vulnerable code spans all the way from data: to the closing single quote at the end. Our injection alerts Components.stack, which indicates that we have left the realms of mortal humans.

This is Bug 1372112 (CVE-2017-7795). Further hikes through our spreadsheets of eslint violations lead to Bug 1371586 (CVE-2017-7798). Both were fixed in Firefox 56, which was released in the fall of 2017.

We find and fix some minor self-xss bugs (e.g., creating a custom preference in about:config with the name of <button>hi</button> lead to XUL injections. All of them are fixed and we're fearful that mistakes will be made again.

Critical bugs are a great way to impact coding style discussions and it is decided that the linter might as well be included in all of our linters. innerHTML and related badness is forbidden and we rub our hands in glee. Unfortunately, it turned out that lots of legacy code will not be rewritten and security engineers do not want to deal with the affairs of front end engineers (joke's on me in the end though, I promise). So, we allow some well-audited and finely escaped functions with a granular and exception, that gives us a confident feeling of absolute security (it's a trap!)

// eslint-disable-next-line no-unsanitized/property

A Dark Shadow

I feel like I have eradicated the bug class from the entirety of our codebase. We may now look for more complicated bugs and our days get more exciting.

Of course, I wander through the office bragging with my cleverness, warning young folks from the danger of XSS and proudly wearing my security t-shirts. There's lots of colorful war stories to be told and even more free snacks or fizzy drinks to be consumed.

Meanwhile: My great colleagues that contribute and actually develope useful stuff. On top of their good work, some of them even mentor aspiring students and enthusiastic open source fans. Having listened to my stories of secure and well-audited code that should eventually be replaced, they make an effort to get someone remove all of the danger, so we get to live in an exception-less world that truly disallows all without these pesky eslint-disable-next-line comments.

Naturally, code is being moved around, refactored and improved by lots of other people in the organization.

So, while I'm sitting there, enjoying my internet fame (just browsing memes, really), people show up at my desk asking me for a quick look at something suspicious:

// eslint-disable-next-line no-unsanitized/property
doc.getElementById("addon-webext-perm-header").innerHTML = strings.header;

// data coming *mostly* from localization-templates
    let strings = {
      header: gNavigatorBundle.getFormattedString("webextPerms.header", [data.name]),
      text: gNavigatorBundle.getFormattedString("lwthemeInstallRequest.message2",
                                                [uri.host]),
// ..
// but of course all goes through _sanitizeTheme(aData, aBaseURI, aLocal)
// (which does not actually sanitize HTML)

I feel massively stupid and re-create my spreadsheet. Setting eslint to ignore the disable-next-line stuff locally allows me to start all over. We build an easy exploit that pops calc. How funny! We also notice that a few more bugs like that have crept in, since the "safe" call sites were whitelisted. Yikes.

Having learned about XML namespaces, a simpler example payload (without the injection trigger) would like look this:

<html:img onerror='Components.utils.import("resource://gre/modules/Subprocess.jsm");Subprocess.call({ command: "/usr/bin/gnome-open" });' src='x'/>,

This is Bug 1432778.

Hope on the horizon

A good patch is made and circulated with a carefully selected group of senior engineers. We have various people working on the code and are concerned about this being noticed by bad actors. With the help of the aforementioned group, we convince engineering leadership that this warrants an unscheduled release of Firefox. We start a simplified briefing for Release Management and QA.

People point out that updates always take a while to apply to all of our release base and shipping a new version with a single commit that replaces .innerHTML with .textContent seems a bit careless. Anyone with a less-than sign on their keyboard could write a "1-day exploit" that would affect lots of users.

What can we do? We agree that DOM XSS deserves a heavier hammer and change our implementation for HTML parsing (which is being used in innerHTML, outerHTML, insertAdjacentHTML, etc.). Normally, this function parses the DOM tree and inserts where assigned. But now, for privileged JavaScript, we parse the DOM tree and omit all kinds of badness before insertion. Luckily, we have something like that in our source tree. In fact, I have tested it in 2013. We also use this to strip HTML email from <script> and its friends in Thunderbird, so it's even battle-tested. On top, we do some additional manual testing and identify some problems around leaving form elements in, which warrants follow-up patches in the future.

A nice benefit is that a commit which changes how DOM parsing works, doesn't allow reverse engineering our vulnerability from the patch. Neat.

In the next cycles, we've been able to make it stricter and removed more badness (e.g., form elements). This was Bug 1432966: Sanitize HTML fragments created for chrome-privileged documents (CVE-2018-5124)

Closing credits and Acknowledgements

Exploitation and Remediation were achieved with the support of various people. Thanks to, security folks, Firefox engineers, release engineers, QA testers. Especially to Johnathan Kingston (co-maintainer of the eslint plugin), Johann Hofman, who found the bad 0day in 2018 and helped testing, shaping of and arguing for an unscheduled release version of Firefox.

No real geckos were harmed in the making of this blog post.

September 27, 2019

Carlos Fenollosa (carlesfe)

The absolute best puzzle game for your phone September 27, 2019 01:23 PM

Sorry for the clickbaity title. I was slightly misleading.

Simon Tatham's Portable Puzzle Collection is not only the best puzzle game for your phone, it is actually a collection of the best puzzle games.

Wait! It is actually the best puzzle game collection for any device, since all games are playable via web (js and *cough* Java), and there are native binaries for Windows and UNIX.

In Simon's own words:

[This is] a collection of small computer programs which implement one-player puzzle games. All of them run natively on Unix (GTK), on Windows, and on Mac OS X. They can also be played on the web, as Java or Javascript applets.

I wrote this collection because I thought there should be more small desktop toys available: little games you can pop up in a window and play for two or three minutes while you take a break from whatever else you were doing.

Simon's collection consists of very popular single player puzzle games, like Sudoku, Minesweeper, Same Game, Pegs, and Master Mind, and some lesser known, at least for me, but extremely fun to play: Pattern, Signpost, Tents, Unequal.

All games are extremely configurable and can usually be learned by reading the instructions and trying to play on a small board where the solution is usually trivial. Then, when you are ready, start expanding the board size and enabling some of the higher difficulty board generators!

Greg Hegwill ported the games to iOS and Chris Boyle ported them to Android. Other people have ported it to more platforms, like the Palm, Symbian or Windows phone

The games can of course run in old devices, are 3x free (money free, free software, and free of ads) and, as context, each game takes around 300kb of space (yes, kb). The full collection weighs 3.5Mb on iOS. For reference, Simon's mines.exe is 295kb, where Windows 3.1's winmine.exe was 28kb.

Simon's last commit is from April 2019 and he is still adding improvements to make the games more fun.

I don't really know how this could not be the Best Puzzle Game Ever. Download it right now on your phone (iOS, Android) and you'll thank me later.

Tags: software, mobile, retro

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

Scott Sievert (stsievert)

Better and faster hyperparameter optimization with Dask September 27, 2019 05:00 AM

Dask’s machine learning package, Dask-ML now implements Hyperband, an advanced “hyperparameter optimization” algorithm that performs rather well. This post will

  • describe “hyperparameter optimization”, a common problem in machine learning
  • describe Hyperband’s benefits and why it works
  • show how to use Hyperband via example alongside performance comparisons

In this post, I’ll walk through a practical example and highlight key portions of the paper “Better and faster hyperparameter optimization with Dask”, which is also summarized in a ~25 minute SciPy 2019 talk.

Problem

Machine learning requires data, an untrained model and “hyperparameters”, parameters that are chosen before training begins that help with cohesion between the model and data. The user needs to specify values for these hyperparameters in order to use the model. A good example is adapting ridge regression or LASSO to the amount of noise in the data with the regularization parameter.1

Model performance strongly depends on the hyperparameters provided. A fairly complex example is with a particular visualization tool, t-SNE. This tool requires (at least) three hyperparameters and performance depends radically on the hyperparameters. In fact, the first section in “How to Use t-SNE Effectively” is titled “Those hyperparameters really matter”.

Finding good values for these hyperparameters is critical and has an entire Scikit-learn documentation page, “Tuning the hyperparameters of an estimator.” Briefly, finding decent values of hyperparameters is difficult and requires guessing or searching.

How can these hyperparameters be found quickly and efficiently with an advanced task scheduler like Dask? Parallelism will pose some challenges, but the Dask architecture enables some advanced algorithms.

Note: this post presumes knowledge of Dask basics. This material is covered in Dask’s documentation on Why Dask?, a ~15 minute video introduction to Dask, a video introduction to Dask-ML and a blog post I wrote on my first use of Dask.

Contributions

Dask-ML can quickly find high-performing hyperparameters. I will back this claim with intuition and experimental evidence.

Specifically, this is because Dask-ML now implements an algorithm introduced by Li et. al. in “Hyperband: A novel bandit-based approach to hyperparameter optimization”. Pairing of Dask and Hyperband enables some exciting new performance opportunities, especially because Hyperband has a simple implementation and Dask is an advanced task scheduler.2

Let’s go through the basics of Hyperband then illustrate its use and performance with an example. This will highlight some key points of the corresponding paper.

Hyperband basics

The motivation for Hyperband is to find high performing hyperparameters with minimal training. Given this goal, it makes sense to spend more time training high performing models – why waste more time training time a model if it’s done poorly in the past?

One method to spend more time on high performing models is to initialize many models, start training all of them, and then stop training low performing models before training is finished. That’s what Hyperband does. At the most basic level, Hyperband is a (principled) early-stopping scheme for RandomizedSearchCV.

Deciding when to stop the training of models depends on how strongly the training data effects the score. There are two extremes:

  1. when only the training data matter
    • i.e., when the hyperparameters don’t influence the score at all
  2. when only the hyperparameters matter
    • i.e., when the training data don’t influence the score at all

Hyperband balances these two extremes by sweeping over how frequently models are stopped. This sweep allows a mathematical proof that Hyperband will find the best model possible with minimal partial_fit calls3.

Hyperband has significant parallelism because it has two “embarrassingly parallel” for-loops – Dask can exploit this. Hyperband has been implemented in Dask, specifically in Dask’s machine library Dask-ML.

How well does it perform? Let’s illustrate via example. Some setup is required before the performance comparison in Performance.

Example

Note: want to try HyperbandSearchCV out yourself? Dask has an example use. It can even be run in-browser!

I’ll illustrate with a synthetic example. Let’s build a dataset with 4 classes:

>>> from experiment import make_circles
>>> X, y = make_circles(n_classes=4, n_features=6, n_informative=2)
>>> scatter(X[:, :2], color=y)

Note: this content is pulled from stsievert/dask-hyperband-comparison, or makes slight modifications.

Let’s build a fully connected neural net with 24 neurons for classification:

>>> from sklearn.neural_network import MLPClassifier
>>> model = MLPClassifier()

Building the neural net with PyTorch is also possible4 (and what I used in development).

This neural net’s behavior is dictated by 6 hyperparameters. Only one controls the model of the optimal architecture (hidden_layer_sizes, the number of neurons in each layer). The rest control finding the best model of that architecture. Details on the hyperparameters are in the Appendix.

>>> params = ...  # details in appendix
>>> params.keys()
dict_keys(['hidden_layer_sizes', 'alpha', 'batch_size', 'learning_rate'
           'learning_rate_init', 'power_t', 'momentum'])
>>> params["hidden_layer_sizes"]  # always 24 neurons
[(24, ), (12, 12), (6, 6, 6, 6), (4, 4, 4, 4, 4, 4), (12, 6, 3, 3)]

I choose these hyperparameters to have a complex search space that mimics the searches performed for most neural networks. These searches typically involve hyperparameters like “dropout”, “learning rate”, “momentum” and “weight decay”.5 End users don’t care hyperparameters like these; they don’t change the model architecture, only finding the best model of a particular architecture.

How can high performing hyperparameter values be found quickly?

Finding the best parameters

First, let’s look at the parameters required for Dask-ML’s implementation of Hyperband (which is in the class HyperbandSearchCV).

Hyperband parameters: rule-of-thumb

HyperbandSearchCV has two inputs:

  1. max_iter, which determines how many times to call partial_fit
  2. the chunk size of the Dask array, which determines how many data each partial_fit call receives.

These fall out pretty naturally once it’s known how long to train the best model and very approximately how many parameters to sample:

n_examples = 50 * len(X_train)  # 50 passes through dataset for best model
n_params = 299  # sample about 300 parameters

# inputs to hyperband
max_iter = n_params
chunk_size = n_examples // n_params

The inputs to this rule-of-thumb are exactly what the user cares about:

  • a measure of how complex the search space is (via n_params)
  • how long to train the best model (via n_examples)

Notably, there’s no tradeoff between n_examples and n_params like with Scikit-learn’s RandomizedSearchCV because n_examples is only for some models, not for all models. There’s more details on this rule-of-thumb in the “Notes” section of the HyperbandSearchCV docs.

With these inputs a HyperbandSearchCV object can easily be created.

Finding the best performing hyperparameters

This model selection algorithm Hyperband is implemented in the class HyperbandSearchCV. Let’s create an instance of that class:

>>> from dask_ml.model_selection import HyperbandSearchCV
>>>
>>> search = HyperbandSearchCV(
...     est, params, max_iter=max_iter, aggressiveness=4
... )

aggressiveness defaults to 3. aggressiveness=4 is chosen because this is an initial search; I know nothing about how this search space. Then, this search should be more aggressive in culling off bad models.

Hyperband hides some details from the user (which enables the mathematical guarantees), specifically the details on the amount of training and the number of models created. These details are available in the metadata attribute:

>>> search.metadata["n_models"]
378
>>> search.metadata["partial_fit_calls"]
5721

Now that we have some idea on how long the computation will take, let’s ask it to find the best set of hyperparameters:

>>> from dask_ml.model_selection import train_test_split
>>> X_train, y_train, X_test, y_test = train_test_split(X, y)
>>>
>>> X_train = X_train.rechunk(chunk_size)
>>> y_train = y_train.rechunk(chunk_size)
>>>
>>> search.fit(X_train, y_train)

The dashboard will be active during this time6:

Your browser does not support the video tag.

How well do these hyperparameters perform?

>>> search.best_score_
0.9019221418447483

HyperbandSearchCV mirrors Scikit-learn’s API for RandomizedSearchCV, so it has access to all the expected attributes and methods:

>>> search.best_params_
{"batch_size": 64, "hidden_layer_sizes": [6, 6, 6, 6], ...}
>>> search.score(X_test, y_test)
0.8989070100111217
>>> search.best_model_
MLPClassifier(...)

Details on the attributes and methods are in the HyperbandSearchCV documentation.

Performance

I ran this 200 times on my personal laptop with 4 cores. Let’s look at the distribution of final validation scores:

The “passive” comparison is really RandomizedSearchCV configured so it takes an equal amount of work as HyperbandSearchCV. Let’s see how this does over time:

This graph shows the mean score over the 200 runs with the solid line, and the shaded region represents the interquartile range. The dotted green line indicates the data required to train 4 models to completion. “Passes through the dataset” is a good proxy for “time to solution” because there are only 4 workers.

This graph shows that HyperbandSearchCV will find parameters at least 3 times quicker than RandomizedSearchCV.

Dask opportunities

What opportunities does combining Hyperband and Dask create? HyperbandSearchCV has a lot of internal parallelism and Dask is an advanced task scheduler.

The most obvious opportunity involves job prioritization. Hyperband fits many models in parallel and Dask might not have that workers available. This means some jobs have to wait for other jobs to finish. Of course, Dask can prioritize jobs7 and choose which models to fit first.

Let’s assign the priority for fitting a certain model to be the model’s most recent score. How does this prioritization scheme influence the score? Let’s compare the prioritization schemes in a single run of the 200 above:

These two lines are the same in every way except for the prioritization scheme. This graph compares the “high scores” prioritization scheme and the Dask’s default prioritization scheme (“fifo”).

This graph is certainly helped by the fact that is run with only 4 workers. Job priority does not matter if every job can be run right away (there’s nothing to assign priority too!).

Amenability to parallelism

How does Hyperband scale with the number of workers?

I ran another separate experiment to measure. This experiment is described more in the corresponding paper, but the relevant difference is that a PyTorch neural network is used through skorch instead of Scikit-learn’s MLPClassifier.

I ran the same experiment with a different number of Dask workers.8 Here’s how HyperbandSearchCV scales:

Training one model to completion requires 243 seconds (which is marked by the white line). This is a comparison with patience, which stops training models if their scores aren’t increasing enough. Functionally, this is very useful because the user might accidentally specify n_examples to be too large.

It looks like the speedups start to saturate somewhere between 16 and 24 workers, at least for this example. Of course, patience doesn’t work as well for a large number of workers.9

Future work

There’s some ongoing pull requests to improve HyperbandSearchCV. The most significant of these involves tweaking some Hyperband internals so HyperbandSearchCV works better with initial or very exploratory searches (dask/dask-ml #532).

The biggest improvement I see is treating dataset size as the scarce resource that needs to be preserved instead of training time. This would allow Hyperband to work with any model, instead of only models that implement partial_fit.

Serialization is an important part of the distributed Hyperband implementation in HyperbandSearchCV. Scikit-learn and PyTorch can easily handle this because they support the Pickle protocol10, but Keras/Tensorflow/MXNet present challenges. The use of HyperbandSearchCV could be increased by resolving this issue.

Appendix

I choose to tune 7 hyperparameters, which are

  • hidden_layer_sizes, which controls the activation function used at each neuron
  • alpha, which controls the amount of regularization

More hyperparameters control finding the best neural network:

  • batch_size, which controls the number of examples the optimizer uses to approximate the gradient
  • learning_rate, learning_rate_init, power_t, which control some basic hyperparameters for the SGD optimizer I’ll be using
  • momentum, a more advanced hyperparameter for SGD with Nesterov’s momentum.
  1. Which amounts to choosing alpha in Scikit-learn’s Ridge or LASSO 

  2. To the best of my knowledge, this is the first implementation of Hyperband with an advanced task scheduler 

  3. More accurately, Hyperband will find close to the best model possible with $N$ partial_fit calls in expected score with high probability, where “close” means “within log terms of the upper bound on score”. For details, see Corollary 1 of the corresponding paper or Theorem 5 of Hyperband’s paper

  4. through the Scikit-learn API wrapper skorch 

  5. There’s less tuning for adaptive step size methods like Adam or Adagrad, but they might under-perform on the test data (see “The Marginal Value of Adaptive Gradient Methods for Machine Learning”) 

  6. But it probably won’t be this fast: the video is sped up by a factor of 3. 

  7. See Dask’s documentation on Prioritizing Work 

  8. Everything is the same between different runs: the hyperparameters sampled, the model’s internal random state, the data passed for fitting. Only the number of workers varies. 

  9. There’s no time benefit to stopping jobs early if there are infinite workers; there’s never a queue of jobs waiting to be run 

  10. Pickle isn’t slow, it’s a protocol” by Matthew Rocklin 

September 26, 2019

Carlos Fenollosa (carlesfe)

Sourcehut, the free software development cloud September 26, 2019 01:00 PM

I've been following sourcehut for some time and realized that I hadn't talked about it yet.

Sourcehut, at a first glance, is a service that provides git repos and code/project management tools, but it's much more.

It runs CI through virtualised builds on various Linuxes and BSD, provides code review tools, tasks and third party integrations, and of course, mailing lists and wikis.

I think the landing page does a good job of explaining how it is different from other code hosting services. Especially:

  • Composable Unix-style mini-services
  • Powerful APIs and webhooks
  • Secure, reliable, and safe
  • Absolutely no tracking or advertising
  • All features work without JavaScript
  • 100% free and open source software

These bullet points are quite important: sourcehut is the web equivalent of piping UNIX commands for development and is built entirely on free software. The fact that it works without js is just a great bonus.

Sourcehut is the project of a single developer, Drew DeVault, better known as sir_cmpwn on the internet, and he's quite active on Mastodon in case you want to follow him. Amazing work!

Tags: software, unix

&via=cfenollosa">&via=cfenollosa">Comments? Tweet  

September 25, 2019

eta (eta)

Designing a new chat system - federation September 25, 2019 11:00 PM

[This is post 2 about designing a new chat system. See the previous post on chat systems for more context!]

Okay, so, the previous post on chat systems seemed to generate a fair deal of discussion (in that the Matrix lead developer showed up and started refuting all of my points1…!).

Partially since I said I’d try my own hand at making something new in that post, and partially because this project is also going to be my Computing A-level Non-Examined Assessment2, I now need to actually get on with it and try and make something! So, without further ado…

Key implementation goals

From the frustrations expressed in the previous post, and my own personal experience, I believe a working chat system probably wants to include some of the following points:

  • Reliability. First and foremost, the system needs to deliver the messages to the recipient. Notifications should work, and be as timely as possible. The system should avoid just dropping messages on the floor without any indication as to this happening.
    • If messages cannot be delivered, the system should make this abundantly clear to the user, so they know about it and can try again.
    • Read / delivery receipts, as well as presence3, should be taken into account.
  • Interoperability. As discussed previously, your chat system is useless if it doesn’t talk to the ones that are already out there, unless you’re really good at persuading people. There should be some provision for bridging to existing systems, and potentially some for federation as well.
    • More on this later; federation is arguably a questionable design decision, but we’ll discuss this.
  • Persistence. People nowdays have phones and computers that aren’t online 24/7. The server should account for this, and implement some sort of basic “store-and-forward” mechanism, so you can send messages to people when they’re offline.
  • Flexibility. Using an online chat system often means losing a degree of control about how you come across, and what information you send. Things like read receipts and presence should be configurable, to allow for users having different privacy preferences. Notifications should also be relatively granular, so you can avoid your phone vibrating every time someone says something in one of your many chatrooms.

Ideally I’d do some kind of research to prove that these goals are actually prized by your average user4 – who knows, maybe sending emoji or something is actually way more important, and users will sacrifice one or more of the above in order to get special features like that5 – but we’ll roll with this for now. (In fact, if you happen to be reading this blog and have strong opinions on the matter, please leave a comment; it’d be really helpful!) I suppose we’ll just say this blog post is a living document, and leave things at that for now.

The first one of these bullet-points I plan to tackle is interoperability – and specifically, whether this magical new ideal chat system should federate or not.

The question of federation

Federation refers to the practice of a bunch of people agreeing on a common standard, and designing software using this standard to create an open network of interconnected servers. The example cited in the previous post was Mastodon, which uses the ActivityPub standard to enable distributed social networking; the Matrix is another federated standard for chat6. Federated protocols, at least at a first glance, aren’t doing too badly; this site gives an overview of various federated protocols and their uptake in terms of users.

In fact, the biggest federated protocol in the world is email – you can set up your own mail server with your own domain (e.g. hi@theta.eu.org), and start emailing with people on any other server; email is based on a set of open standards that anyone can implement.

The spam problem

However, being federated can both be a blessing and a curse! Having your communications platform be wide open, so anybody can create an account, or federate their messages into it, or whatever, seems like a good idea at first. However, there’s another side to it: how do you control spam and abuse? Most people get spam emails nowadays, which is one of the unfortunate side effects of this openness; if anyone can connect to your mail server and send you email, without having to do anything first to confirm themselves as legitimate or trusted, you’re opening yourself up to receiving a whole bunch of junk.

There are ways to combat this – spam email checkers nowadays, like the venerable SpamAssassin7, tend to use a rule-based approach, essentially giving emails a ‘spam score’ based on various metrics of sketchiness: does the email pass SPF and DKIM checks? Does the subject line contain something like “get rich quick”? Is the sender using a commonly abused free email service, like Gmail or Yahoo Mail? (and so on and so forth). These work pretty well for the run-of-the-mill spam, but still can’t really protect against actively malicious people who conform to all the standards, and still send you spam.

In many chat protocols, we see a similar story. IRC has suffered greatly at the hands of robots that connect to any IRC servers they can find, join all the channels they can, and start flooding text until something notices and kicks them off (having still managed to cause a considerable amount of annoyance). The freenode IRC network suffered a recent spamwave that was actually targeted at them, with people spamming rude and unhelpful messages about the network administration until something was done about it8. Matrix looks like it’s also vulnerable to the same kind of thing as well910. Spam is a non-negotiable part of the internet, and efforts to fight it are almost as old as the internet itself!

Mastodon manages to avoid a lot of these problems (at least from what I can see) simply as a result of what it is – in something like Twitter, you don’t really get spam in your timeline unless you explicitly choose to follow the spammers, and posts only start federating from instance to instance once an actual human issues that follow request. This is in contrast to, say, a wide-open federated chatroom, where people can just join on as many anonymous clients as they want whenever they feel like flooding a channel with text.

The full-mesh problem

The other main issue with federation is somewhat more practical: if your chat system, or social network, or whatever, is now distributed across multiple servers, how do you actually shunt messages around between them in a vaguely efficient manner? So far, the answer seems to just be “give up and use a full mesh architecture” – that is, if we need to send messages out to users on 20 different servers, connect individually to each 20 and deliver the message. (‘what happens when you activity post’ is a good, short explainer for how this works in terms of ActivityPub, Mastodon’s federation protocol.)

This works fine for smaller use-cases, but can be somewhat problematic when it comes to either (a) large follower counts, or (b) sending big files like images. As Ted Unangst says in his honk 0.1 announcement:

I post a honk, with a picture of a delicious pickle, and it goes out to all my followers. These are small messages, great. Their instances will in turn immediately come back and download the attached picture, however, resulting in a flood of traffic. If a particularly popular follower shares the post, even bigger flood.

There’s only so much honk can do here, since even trickling out outbound notes doesn’t control what happens when another instance shares it. Ironically, I think some [Mastodon] instances are spared from overload because other instances are already overloaded and unable to immediately process inbound messages.

Essentially, the issue is twofold: firstly, when you’re sending something in a federated environment, you usually have to talk to each server in your chatroom, or with your followers on it, or whatever, to deliver your message, which takes time and OS resources (most OSes limit the number of outgoing TCP connections, for example, and constructing/destructign them isn’t free in any case). Secondly, as mentioned above, if you do something like join a new chatroom or have your popular message shared by another larger server, hundreds of interested servers might go and hammer yours asking for information of some kind, causing you to become somewhat overloaded. (For example, protocols often require users to have cryptographic keys, in order to be able to verify message authenticity, and getting these keys usually involves going to the server and asking for them – which is a problem if there are suddenly loads of new servers that have never heard of your user before!).

In contrast, non-openly-federated protocols like IRC don’t have this problem; IRC is as old as dirt and still realised that having everything be full-mesh isn’t the best of ideas. Instead, servers are linked in a spanning tree (see ircdocs.horse for a better explanation and pretty diagram), such that servers only need to broadcast messages to the servers they’re directly connected to, which will forward the messages on further, propagating them throughout the tree without any need for non-connected servers to ever talk to one another. This is far more efficient – but it does assume a high degree of trust in all the servers that make up the network, which wouldn’t work in a federated context where you can’t trust anyone.

The CAP theorem problem

The other big problem is that this thing called the CAP theorem exists, which says we can only have at most two of the following three guarantees, if trying to store state across a distributed system:

  • Consistency - all servers have a consistent view of the world
  • Availability - the system can be queried at all times, although asking two servers may return different results
  • Partition tolerance - the system doesn’t break if there’s a network split

Essentially, what this distils down to is the idea that you will have a network split or partition somewhere (where some servers are unreachable for whatever reason), and, when this happens, you will either have two sides of the network that have a different view of the world (thus violating consistency), or you will have to stop accepting new data in order to make sure that stuff doesn’t get out of sync (thus violating availability). This isn’t really something you can get around; it’s a fact of life when designing distributed systems.

Different systems deal with this in different ways:

  • If an IRC network suffers a [netsplit][ircns], the two sides of the network see all the users on the other side quitting, and they can’t talk to them any more.
  • ActivityPub doesn’t really have to care about this, because there’s no distributed state anywhere; if an AP server wants to find out something about a user, it asks that user’s server.
  • Matrix actually has this rather nifty “eventual consistency” property, where you sacrifice consistency in the short-term when netsplits occur, but the system eventually sorts itself out when everyone gets reconnected again.

Given that we’ve specified our ideal chat system wants to be persistent as well as interoperable, we’re going to have to consider this problem somewhere – unless we get rid of distributed state entirely, that is. However, chat systems usually do have some idea of state, like who has administrator permissions in a chatroom.

Conclusions

Federation has been the source of many good things (interoperability! no lock-in!) and many bad ones (spam! networking complications!). As such, it’s not immediately obvious that completely open federation works for chat – contrary to what was said in the previous post, wholesale copying ActivityPub might not really work for something like chat, due to spam and issues with distributed state. Instead, we’re going to need something else – and we’ll explore what that something else could perhaps be in further blog posts in this series!


  1. It’s worth reading through his comments, and coming to your own judgement as to whether you think what I’ve said is fair or not. I didn’t really respond, because I think we fundamentally disagree on whether the Matrix architecture is a good idea, and there wasn’t much point debating that further; the comments are there, and you’re welcome to form your own opinion! 

  2. (Blogging about this process is arguably a questionable plan.) 

  3. Presence is the feature that lets you know whether someone’s online or not, or when they were last online. WhatsApp calls it “last seen”; IRC has the /away command; multiple other examples exist. 

  4. The A-level examiners tend to like that sort of stuff – and for a good reason; it’s nice to know that you aren’t just blindly spending time implementing something nobody wants. 

  5. Snapchat is a pretty good example of people sacrificing both interoperability (you can’t bridge Snapchat to anything else) and flexibility (you have very little control over read receipts, presence, and things like that). However, the added functionality seems to be worth it! 

  6. …which I somewhat harshly criticised in the last blog post. 

  7. This is what theta.eu.org runs for spam detection, actually, and it mostly works! 

  8. Actually, I’m still not entirely sure why this spamwave stopped; I’d be tempted to believe that the spammers giving up and stopping it themselves was probably the main reason, although I know some additional filtering protections were put in place. 

  9. Citation: I was lurking in #matrix on Freenode yesterday (2019-09-25, ~20:00) when some random angry user started coming in and flooding the channel with random spam images. 

  10. (I swear, I don’t have a vendetta against them or anything!) 

Unrelenting Technology (myfreeweb)

Noticed something on dmesgd… looks like MIPS (64) isn’t that dea... September 25, 2019 03:34 PM

Noticed something on dmesgd… looks like MIPS (64) isn’t that dead: new(ish) Ubiquiti EdgeRouters have newer Octeon processors — quad-core 1GHz (and with an FPU). And 1GB RAM. That’s much better than the Lite’s dual-core 500MHz && 512MB RAM.

…wait, actually, there’s even big 16-cores (and 16GB RAM) in 10G routers!

September 23, 2019

Pete Corey (petecorey)

Apollo Quirks: Polling After Refetching with New Variables September 23, 2019 12:00 AM

While working on a recent client project, Estelle and I ran into a fun Apollo quirk. It turns out that an Apollo query with an active pollInterval won’t respect new variables provided by calls to refetch.

To demonstrate, imagine we’re rendering a paginated table filled with data pulled from the server:


const Table = () => {
    let { data } = useQuery(gql`
        query items($page: Int!) {
            items(page: $page) {
                pages
                results {
                    _id
                    result
                }
            }
        }
    `, {
        pollInterval: 5000
    });
    
    return (
        <>
            <table>
                {data.items.results.map(({ _id, result }) => (
                    <tr key={_id}>
                        <td>{result}</td>
                    </tr>
                ))}
            </table>
        </>
    );
};

The items in our table change over time, so we’re polling our query every five seconds.

We also want to give the user buttons to quickly navigate to a given page of results. Whenever a user presses the “Page 2” button, for example, we want to refetch our query with our variables set to { page: 2 }:


 const Table = () => {
-    let { data } = useQuery(gql`
+    let { data, refetch } = useQuery(gql`
         query items($page: Int!) {
             items(page: $page) {
                 pages
                 results {
                     _id
                     result
                 }
             }
         }
     `, {
         pollInterval: 5000
     });
     
+    const onClick = page => {
+        refetch({ variables: { page } });
+    };
     
     return (
         <>
             <table>
                 {data.items.results.map(({ _id, result }) => (
                     <tr key={_id}>
                         <td>{result}</td>
                     </tr>
                 ))}
             </table>
+            {_.chain(data.items.pages)
+                .map(page => (
+                    <Button onClick={() => onClick(page)}>
+                        Page {page + 1}
+                    </Button>
+                ))
+                .value()}
         </>
     );
 };

This works… for a few seconds. But then we’re unexpectedly brought back to the first page. What’s happening here?

It turns out that our polling query will always query the server with the variables it was given at the time polling was initialized. So in our case, even though the user advanced to page two, our polling query will fetch page one and render those results.

So how do we deal with this? This GitHub issue on the apollo-client project suggests calling stopPolling before changing the query’s variables, and startPolling to re-enable polling with those new variables.

In our case, that would look something like this:


 const Table = () => {
-    let { data, refetch } = useQuery(gql`
+    let { data, refetch, startPolling, stopPolling } = useQuery(gql`
         query items($page: Int!) {
             items(page: $page) {
                 pages
                 results {
                     _id
                     result
                 }
             }
         }
     `, {
         pollInterval: 5000
     });
     
     const onClick = page => {
+        stopPolling();
         refetch({ variables: { page } });
+        startPolling(5000);
     };
     
     return (
         <>
             <table>
                 {data.items.results.map(({ _id, result }) => (
                     <tr key={_id}>
                         <td>{result}</td>
                     </tr>
                 ))}
             </table>
             {_.chain(data.items.pages)
                 .map(page => (
                 <Button onClick={() => onClick(page)}>
                     Page {page + 1}
                 </Button>
                 ))
                 .value()}
         </>
     );
 };

And it works! Now our polling queries will fetch from the server with the correctly updated variables. When a user navigates to page two, they’ll stay on page two!

My best guess for why this is happening, and why the stopPolling/startPolling solution works is that when polling is started, the value of variables is trapped in a closure. When refetch is called, it changes the reference to the options.variables object, but not the referenced object. This means the value of options.variables doesn’t change within polling interval.

Calling stopPolling and startPolling forces our polling interval to restart under a new closure with our new variables values.

September 22, 2019

Carlos Fenollosa (carlesfe)

September 21, 2019

Gonçalo Valério (dethos)

kinspect – quickly look into PGP public key details September 21, 2019 07:59 PM

Sometimes I just need to look into the details of a PGP key that is provided in its “armored” form by some website (not everyone is publishing their keys to the keyservers).

Normally I would have to import that key to my keyring or save it into a file and use gnupg to visualize it (as it is described in this Stack Overflow answers).

To avoid this hassle I just created a simple page with a text area where you can paste the public key and it will display some basic information about it. Perhaps an extension would be a better approach, but for now this works for me.

You can use it on: https://kinspect.ovalerio.net

In case you would like to contribute in order to improve it or extend the information displayed about the keys, the source code is available on Github using a Free Software license: https://github.com/dethos/kinspect