Planet Crustaceans

This is a Planet instance for lobste.rs community feeds. To add/update an entry or otherwise improve things, fork this repo.

December 12, 2018

Unrelenting Technology (myfreeweb)

The best Cloud-to-Butt replacement I've seen yet, thanks to this page: Butthole was released... December 12, 2018 12:52 PM

The best Cloud-to-Butt replacement I've seen yet, thanks to this page:

Butthole was released in 2016 for Firefox to make Buttflare captchas less painful

Pages From The Fire (kghose)

Tests and code coverage in Python December 12, 2018 03:30 AM

Not only is Python a nice language but it has always had a lot of tooling around it. I’ve always taken advantage of Python’s tooling around testing (okay, not always …) and recently I began to pay attention to code coverage again. Python makes it all so simple and delicious. I have used nose in …

Gokberk Yaltirakli (gkbrk)

Free Hotel Wifi with Python and Selenium December 12, 2018 02:33 AM

Recently I took my annual leave and decided to visit my friend during the holidays. I stayed at a hotel for a few days but to my surprise, the hotel charged money to use their wifi. In $DEITY‘s year 2000 + 18, can you imagine?

But they are not so cruel. You see, these generous people let you use the wifi for 20 minutes. 20 whole minutes. That’s almost half a Minecraft video.

If they let each device use the internet for a limited amount of time, they must have a way of identifying each device. And a router tells devices apart is by their MAC addresses. Fortunately for us, we can change our MAC address easily.

Enter macchanger

There is a really useful command-line tool called macchanger. It lets you manually change, randomize and restore the MAC address of your devices. The idea here is randomizing our MAC regularly (every 20 minutes) in order to use the free wifi over and over indefinitely.

There are 3 small commands you need to run. This is needed because macchanger can’t work while your network interface is connected to the router.

# Bring network interface down
ifconfig wlp3s0 down

# Get random MAC address
macchanger -r wlp3s0

# Bring the interface back up
ifconfig wlp3s0 up

In the commands above, wlp3s0 is the name of my network interface. You can find yours by running ip a. If you run those commands, you can fire up your browser and you will be greeted with the page asking you to pay or try it for 20 minutes. After your time is up, you can run the commands again and keep doing it.

But this is manual labor, and doing it 3 times an hour is too repetitive. Hmm. What’s a good tool to automate repetitive stuff?

Enter Selenium

First, lets get those commands out of the way. Using the os module, we can run macchanger from our script.

import os

interface = 'wlp3s0'

os.system(f'sudo ifconfig {interface} down')
os.system(f'sudo macchanger -r {interface}')
os.system(f'sudo ifconfig {interface} up')

After these commands our computer should automatically connect to the network as a completely different device. Let’s fire up a browser and try to use the internet.

d = webdriver.Chrome()
d.get('http://example.com')
d.get('https://www.wifiportal.example/cp/sponsored.php')

The sponsored.php URL is where I ended up after pressing the Free Wifi link, so the script should open the registration form for us. Let’s fill the form.

In my case, all it asked for was an email address and a full name. If there are more fields, you can fill them in a similar fashion.

num   = random.randint(0, 99999)
email = f'test{num}@gmail.com'

d.find_element_by_name('email').send_keys(email)
d.find_element_by_name('name').send_keys('John Doe\n')

This should fill the form and press enter to submit it. Afterwards, the portal asked me if I wanted to subscribe to their emails or something like that. Of course, we click Reject without even reading it and close the browser.

d.find_elements_by_class_name('reject')[0].click()
d.close()

After this, you should have an internet connection. You can either run the script whenever you notice your connection is gone, or put it on a cron job / while loop.

December 11, 2018

Gustaf Erikson (gerikson)

Advent of Code 2018 December 11, 2018 09:33 PM

This blog post is a work in progress

Project website: Advent of Code 2018.

Previous years: 2015, 2016, 2017.

I use Perl for all the solutions.

Most assume the input data is in a file called input.txt in the same directory as the file.

A note on scoring

I score my problems to mark where I’ve finished a solution myself or given up and looked for hints. A score of 2 means I solved both the daily problems myself, a score of 1 means I looked up a hint for one of the problems, and a zero score means I didn’t solve any of the problems myself.

My goals for this year (in descending order of priority):

  • match or beat last year’s score of 49/50.
  • solve all problems within 24 hours of release

Link to Github repo.

TODO

This year’s todo list:

  • Day 11 - faster solution
  • general - implement blogging and listing publication using templates
  • Day 8 - clean up and publish solution

Day 1 - Day 2 - Day 3 - Day 4 - Day 5 - Day 6 - Day 7 - Day 8 - Day 9 - Day 10 - Day 11

Day 1 - Chronal Calibration

Day 1 - complete solution

A nice start to this year’s puzzles.

I had planned on getting up at 6AM to start with this but I spent the night before playing Fallout 4 so that didn’t happen.

Score: 2.

Day 2 - Inventory Management System

Day 2 - complete solution

Another O(n^2) solution, but for the small sizes involved it barely matters.

Perl’s string munging is helpful as always.

Score: 2.

Day 3 - No Matter How You Slice It

Day 3 - complete solution

Nice and clean. I was worried I’d have to do some complicated lookups to single out the correct swatch in part 2, but it turns out that there was only one possible solution in my input.

Score: 2.

Day 4 - Repose Record

Day 4 - complete solution

Getting the correct answer with outresorting to an external run through sort -n was the hardest part of this problem!

There are some nice solutions using a Schwartzian transform out there but to be honest I’d rather have something straightforward and dumb that can understood later.

TODO: better variable naming.

Score: 2.

Day 5 - Alchemical Reduction

Day 5 - complete solution

A quick and simple problem.

Score: 2.

Day 6 - Chronal Coordinates

Day 6 - complete solution

This took some careful perusal of the text and example to figure out. I had a lot of problems with my solution when I tried to merge part 1 and part 2 into one solution. Unfortunately I had overwritten the code that actually gave me the first star, and I broke it.

I felt justified to check out other solutions and finally re-wrote using those (essentially using a representation of the board to keep track of the data). The day after I took the time to re-write it again into a version closer to my original idea.

Runtime is a dismal 16s.

Score: 2.

Day 7 - The Sum of Its Parts

Day 7 - part 1 Day 7 - part 2

I’m really happy with solving this. I’m graph-aphasic, as in I’ve never ever grokked them in code, and was dreading finding some random Perl module that I didn’t really understand. In the end I just found the “endpoints” and processed it using a queue.

This made part 2 reasonably easy to code.

Score: 2.

Day 8 - Memory Maneuver

Day 8 - complete solution

I had a lot of trouble with this, which wasn’t helped by stuff like people coming over to visit, dinner to cook and eat, and wine to drink. After these distractions were done with I could revisit the problem with fresh eyes.

Score: 2.

Day 9 - Marble Mania

Day 9 - complete solution

This was a fun problem that was marred by some irritating off-by-one errors in the results. We were blessed with a wealth of example inputs, and I could make most of them work by taking the largest or the next largest value. This gave me my second star, but it was hardly repeatable.

Double-checking my logic against other solution revealed I was thinking correctly, but had a stupid overrun where I managed to take more marbles than were actually available…

Runtime for part 2 is 30s, but you need quite a lot of memory to run it. My wimpy VPS cannot run it natively.

Score: 2.

Day 10 - The Stars Align

Day 10 - complete solution

A nice palate-cleanser after the horrors of the week-end.

Runtime: 4.6s.

Score: 2.

Day 11 - Chronal Charge

Day 11 - complete solution

Blog entry not yet written.

Átila on Code (atilaneves)

Improvements I’d like to see in D December 11, 2018 10:43 AM

D, as any language that stands on the shoulder of giants, was conceived to not repeat the errors of the past, and I think it’s done an admirable job at that. However, and perfectly predictably, it made a few of its own. Sometimes, similar to the ones it avoided! In my opinion, some of them […]

Andreas Zwinkau (qznc)

Waterfall December 11, 2018 12:00 AM

The "Waterfall" methodology was a historic accident and they knew it.

Read full article!

Alex Wilson (mrwilson)

Notes from the Week #12 December 11, 2018 12:00 AM

I was expecting a big parliamentary slap-fight today, but the Prime Minister had other ideas, so I’m going to write my week-notes instead — it’s a very short one because we’re about to start the new quarter, so most of my time is taken up with discussions that I can’t write about yet!

Two Things That Happened

One

There’s an art to enabling yourself to do things, at work and at home — and as with anything, there is a balance.

On one hand if you only ever do what you want then there’s going to likely be conflict with those around you. On the other hand if you never do what you want and put everyone else’s needs above your own then this is equally destructive but far more insidious.

Eventually your self-censuring will bottle up your own needs inside you and possibly explode in a very unhelpful way, depending on how you handle stress.

It’s been pointed out to me this week that I have a habit of doing the latter, and getting inevitably frustrated when I don’t get to attend to my needs for an entirely arbitrary and self-imposed reason.

I’ll be trying to keep an eye out for this in future and making an effort to be kinder to myself.

Two

This week Shift have been working with other teams to improve our department-wide business continuity plans — making our answers more efficient/better to questions like

“What do we do if the office is consumed in a tragic X-based accident?”

Where X can be:

  • Fire
  • Politics
  • Blancmange
  • Irony

It’s really felt like having multiple pots boiling on the hob which is great for the part of me that loves context switching and dashing from problem to problem but not great for my ability to generally keep calm and collected.

On the flipside, it was a great opportunity for us (Shift) to dive into the nitty-gritty with other teams and show them what we’re able to do — I’ve written a bit about building social capital in previous week notes and this is exactly what we’ve been doing this week.

Originally published at blog.probablyfine.co.uk on December 11, 2018.

December 10, 2018

Jeff Carpenter (jeffcarp)

Book Review: Mindset: The New Psychology of Success December 10, 2018 11:38 PM

This book is about two ways of thinking: the fixed mindset and the growth mindset. In the fixed mindset you’re a finished product. Expending any extra effort is unthinkable because supposedly you’re already perfect. Then there’s the growth mindset, which tells us the only way you learn is from mistakes, talent doesn’t get you very far, and the people who succeed are the ones who work the hardest.

Frederik Braun (freddyb)

logging with MOZ_LOG on the try server December 10, 2018 11:00 PM

Preamble

NB: This is mostly for my own public reference. I had written about this elsewhere in 2016 but when arriving at a similar problem, failed to reproduce this. You may skip the following section, if you're familiar the terminology in the title

what is MOZ_LOG?

MOZ_LOG is an environment variable Firefox developers can use to tell specific code sections to emit verbose (or very verbose) status messages for some if its inner workings. This is also called Gecko Logging.

what is the try server

The try server is a repository that allows you to submit code without actually checking it into the public repository. Pushes to try get run through all of our tests, which helps identifying problems and test failures before they are part of our code.

logging with MOZ_LOG on the try server

There is a test failure on Mac OS X, that I can hardly debug. As a first step, I'll push this to the try-server with more logging output enabled.

My test is a mochitest, so I modified testing/mochitest/runtests.py:

diff --git a/testing/mochitest/runtests.py b/testing/mochitest/runtests.py
index 45545b4..5afdffd 100644
--- a/testing/mochitest/runtests.py
+++ b/testing/mochitest/runtests.py
@@ -91,7 +91,7 @@ here = os.path.abspath(os.path.dirname(__file__))
 # Try run will then put a download link for all log files
 # on tbpl.mozilla.org.

-MOZ_LOG = ""
+MOZ_LOG = "nsDocShellLogger:4,CSPParser:4,CSPUtils:4,CSPContext:4,CSP:4"

And now we play the waiting game.

Derek Jones (derek-jones)

Impact of group size and practice on manual performance December 10, 2018 02:04 PM

How performance varies with group size is an interesting question that is still an unresearched area of software engineering. The impact of learning is also an interesting question and there has been some software engineering research in this area.

I recently read a very interesting study involving both group size and learning, and Jaakko Peltokorpi kindly sent me a copy of the data.

That is the good news; the not so good news is that the experiment was not about software engineering, but the manual assembly of a contraption of the experimenters devising. Still, this experiment is an example of the impact of group size and learning (through repeating the task) on time to complete a task.

Subjects worked in groups of one to four people and repeated the task four times. Time taken to assemble a bespoke, floor standing rack with some odd-looking connections between components was measured (the image in the paper shows something that might function as a floor standing book-case, if shelves were added, apart from some component connections getting in the way).

The following equation is a very good fit to the data (code+data). There is theory explaining why log(repetitions) applies, but the division by group-size was found by suck-it-and-see (in another post I found that time spent planning increased with teams size).

There is a strong repetition/group-size interaction. As the group size increases, repetition has less of an impact on improving performance.

time = 0.16+ 0.53/{group size} - log(repetitions)*[0.1 + {0.22}/{group size}]

The following plot shows one way of looking at the data (larger groups take less time, but the difference declines with practice), lines are from the fitted regression model:

Time taken (hours) for various group sizes, by repetition.

and here is another (a group of two is not twice as fast as a group of one; with practice smaller groups are converging on the performance of larger groups):

Time taken (hours) for various repetitions, by group size.

Would the same kind of equation fit the results from solving a software engineering task? Hopefully somebody will run an experiment to find out :-)

December 09, 2018

Ponylang (SeanTAllen)

Last Week in Pony - December 9, 2018 December 09, 2018 02:41 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, our users’ mailing list or join us on IRC.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

December 08, 2018

Benjamin Pollack (gecko)

The Death of Edge December 08, 2018 01:16 AM

Edge is dead. Yes, its shell will continue, but its rendering engine is dead, which throws Edge into the also-ran pile of WebKit/Blink wrappers. And no, I’m not thrilled. Ignoring anything else, I think EdgeHTML was a solid rendering engine, and I wish it had survived because I do believe diversity is good for the web. But I’m not nearly as upset as lots of other pundits I’m seeing, and I was trying to figure out why.

I think it’s because the other pundits are lamenting the death of some sort of utopia that never existed, whereas I’m looking at the diversity that actually exists in practice.

The people upset about Edge’s death, in general, are upset because they have this idea that the web is (at least in theory) a utopia, where anyone could write a web browser that conformed to the specs and (again, theoretically) dethrone the dominant engine. They know this hasn’t existed de facto for at least some time–the specs that now exist for the web are so complicated that only Mozilla, with literally hundreds of millions of dollars of donations, can meaningfully compete with Google–but it’s at least theoretically possible. The death of Edge means one less browser engine to push back against Chrome, and one more nail in the coffin of that not-ever-quite-here utopia.

Thing is, that’s the wrong dynamic.

The dynamic isn’t Gecko v. EdgeHTML v. Blink v. WebKit. It’s any engine v. native. That’s it. The rendering engine wars are largely over: while I hope that Gecko survives, and I do use Firefox as my daily driver, that’s largely irrelevant; Gecko has lost by at least as much as Mac OS Classic ever lost. What does matter is that most people access the web via mobile apps now. It’s not about whether you like that, or whether I like that, or whether it’s the ideal situation; that’s irrelevant. The simple fact is, most people use the web through apps, period. In that world, Gecko v. Blink v. WebKit is an implementation detail; what matters is the quality of mobile app you ship.

And in that world, the battle’s not over. Google agrees. You know how I know? Because they’re throwing a tremendous amount of effort at Flutter, which is basically a proprietary version of Electron that doesn’t even do desktop apps.1 That only makes sense if you’re looking past the rendering engine wars–and if already you control effectively all rendering engines, then that fight only matters if you think the rendering engine wars are already passé.

So EdgeHTML’s death is sad, but the counterbalance isn’t Gecko; it’s Cocoa Touch. And on that front, there’s still plenty of diversity. Here’s to the fight.


  1. Yeah, I know there’s an effort to make Flutter work on desktops. I also know that effort isn’t driven by Google, though. [return]

Pete Corey (petecorey)

Advent of Code: Memory Maneuver December 08, 2018 12:00 AM

Today’s Advent of Code challenge asks us to parse a sequence of numbers that describe a tree. Each node of the tree consists of metadata, a list of numbers, and zero or more children. We’re asked to find the sum of all metadata entries throughout the tree. Let’s use the J programming language to solve this problem!

My gut reaction when I hear the word “tree” is to reach for recursion. Let’s write a recursive verb in J that processes each node described by our input and builds up our tree as we go:

    process =. 3 : 0
      siblings =. 0 {:: y
      childrens =. 0 { 1 {:: y
      metadatas =. 1 { 1 {:: y
      rest =. 2 }. 1 {:: y
      if. childrens = 0 do.
        children =. 0 1 $ 0
      else.
        next =. process^:childrens (0 1 $ 0);rest
        children =. 0 {:: next
        rest =. 1 {:: next
      end.
      metadata =. (i. metadatas) { rest
      rest =. metadatas }. rest
      (siblings,children,metadata);rest
    )

The recursion here is fairly straight forward. If the current node has children, I’m using the ^: adverb to repeatedly, recursively apply the process verb to each of its sibling nodes.

I return any passed in siblings appended to the children we just processed, along with the set of metadata on each node.

We can find our final answer by raveling together all of the collected metadata and summing them together:

    echo +/,0{::process (0 1 $ 0);input

Part Two

Part two revealed that the metadata in each node actually refers to the (1-based) indexes of that node’s children. Calculating the cost of nodes with children is done by adding up the cost of each node specified in the metadata list. The cost of a leaf node is the sum of its metadata.

I figured that the best way to tackle this was to rework my process verb to return the entire, correctly structured tree:

    process =. 3 : 0
      siblings =. 0 {:: y
      childrens =. 0 { 1 {:: y
      metadatas =. 1 { 1 {:: y
      rest =. 2 }. 1 {:: y
      if. childrens = 0 do.
        children =. 0 1 $ 0
      else.
        next =. process^:childrens (0 1 $ 0);rest
        children =. 0 {:: next
        rest =. 1 {:: next
      end.
      metadata =. (i. metadatas) { rest
      node =. metadata;<children
      rest =. metadatas }. rest
      (siblings,node);rest
    )

The final structure of the sample input looks like this:

┌─────┬─────────────────┐
│1 1 2│┌────────┬──────┐│
│     ││10 11 12│      ││
│     │├────────┼──────┤│
│     ││2       │┌──┬─┐││
│     ││        ││99│ │││
│     ││        │└──┴─┘││
│     │└────────┴──────┘│
└─────┴─────────────────┘

For each node, the metadata is on the left, and the boxed list of children is on the right.

I wrote a count verb that recursively counts the cost of a given node. If the node has no children, I return the sum of its metadata. Otherwise, I return the sum of count applied to its children:

    count =. 3 : 0
      metadata =. 0{::y
      children =. 1{::y
      if. 0 = # children do.
        +/ metadata
      else.
        indexes =. 1 -~ metadata
        indexes =. indexes #~ _1 < indexes
        indexes =. indexes #~ -. (1 -~ # children) < indexes
        +/ count"_1 indexes { children
      end.
    )

I can use these two together to get my final answer:

    tree =. 0{0{::process(0 1 $ 0);input
    echo count tree

Notes

  • This page on working with trees in J was incredibly helpful.
  • I’ve been using #~ quite a bit to build a mask and remove items from an array based on that mask.
  • I made heavy use of the if control structure when solving these problems. No need to be a hero.

Andreas Zwinkau (qznc)

Dependency Abstraction December 08, 2018 12:00 AM

A design pattern which generalizes Dependency Inversion and can also be applied on an architectural level.

Read full article!

December 07, 2018

Pete Corey (petecorey)

Advent of Code: The Sum of Its Parts December 07, 2018 12:00 AM

Day seven of this year’s Advent of Code asks us to find the order in which we must complete a set of steps in a directed graph. Let’s see how well we can do with this task using the J programming language!

My high level plan of attack for this task is to keep each pair of dependencies in their current structure. I’ll build a verb that takes a list of “completed” steps, and the list of pairs relating to uncompleted steps. My verb will find the first (alphabetically) step that doesn’t have an unmet dependency in our list, append that step to our list of completed steps, and remove all pairs that are waiting for that step being completed.

Thankfully, parsing our input is easy today:

    parse =. (5 36)&{"1
    pairs =. |."1 parse input
AC
FC
BA
DA
EB
ED
EF

We can write a helper that takes our list of pairs and returns all of the steps referenced in them in a raveled list:

    steps =. [: /: [: . ,
    steps pairs
ABCDEF

Now we can write our verb that completes each step of our instructions:

    next =. 3 : 0
      done =. 0 {:: y
      pairs =. 1 {:: y
      steps =. steps pairs
      left =. {."1 pairs
      right =. {:"1 pairs
      next_step =. {. steps #~ -. steps e. ~. left
      next_pairs =. pairs #~ -. right e. next_step
      remaining_pairs =. pairs #~ right e. next_step

      append =. (done,next_step)"_
      return =. (done)"_
      next_step =. (append ` return @. (0 = # remaining_pairs)"_) _

      next_step;next_pairs
    )

I’m trying to be more explicit here, and rely less on tacit verbs. Last time I found myself getting lost and hacking together solutions that I didn’t fully understand. I’m trying to pull back and bit and do things more intentionally.

We can converge on the result of repeatedly applying next to our list of pairs and an empty starting set of completed steps:

    0{:: next^:_ '' ; pairs
CABDF

An unfortunate side effect of our algorithm is that our last step in our graph is never appended to our list. We need to find this step and append it ourselves:

    append_last =. 4 : 0
      steps =. steps x
      missing =. steps #~ -. steps e. ~. y
      y,missing
    )
    echo pairs append_last 0{:: next^:_ '' ; pairs
CABDFE

And that’s all there is to it!

Part Two

Part two was much more complicated than part one. Each step takes a specified amount of time to complete, and we’re allowed to work on each step with up to four workers, concurrently.

This was the hardest problem I’ve solved so far throughout this year’s Advent of Code. My general strategy was to modify my next verb (now called tick) to additionally keep track of steps that were actively being worked on by concurrent workers.

Every tick, I check if there are any available steps and any space in the worker queue. If there are, I move the step over. Next, I go through each step being worked on by each worker and subtract 1. If a step being worked on reaches 0 seconds of work remaining, I add it to the done list.

Eventually, this solution converges on my answer.

I’m not at all happy with my code. I found myself getting deeply lost in the shape of my data. After much struggling, I started to make heavy use of $ to inspect the shape of nearly everything, and I peppered my code with echo debug statements. The final solution is a nasty blob of code that I only just barely understand.

Enjoy.

Notes

December 06, 2018

Stjepan Golemac (stjepangolemac)

The first thing that comes to my mind is throttling the logout saga. Did you try that? December 06, 2018 10:33 AM

The first thing that comes to my mind is throttling the logout saga. Did you try that?

https://redux-saga.js.org/docs/api/#throttlems-pattern-saga-args

Pete Corey (petecorey)

Advent of Code: Chronal Coordinates December 06, 2018 12:00 AM

Today’s Advent of Code challenge asked us to plot a Manhattan distance Voronoi diagram of a collection of points, and to find the area of the largest, but finite, cell within our diagram.

I’ll be honest. This was a difficult problem for me to solve with my current level of J-fu.

My high level plan of attack was to build up a “distance matrix” for each of the points in our diagram. The location of a point would have a value of 0, neighbors would have a value of 1, and so on. In theory, I’d be able to write a verb that combines two matrices and returns a new matrix, with tied distances represented as _1. I could insert (/) this verb between each of my matrices, reducing them down to a final matrix representing our Voronoi diagram.

I wrote some quick helper verbs to find the distance between two points:

    d =. [: +/ |@-

Find the width and height of the bounding rectangle of my input:

    wh =. 1 + >./ - <./

Generate the set of coordinates for my matrices (this one took some serious trial and error):

    coords =. 3 : 0
      'w h' =. (1 + >./ - <./) y
      (<./y) +"1  (i. h) ,"0~ ((h,w) $ i. w)
    )

And to fill that matrix with the distances to a given point:

    grid =. 4 : 0
      (x d ])"1 coords > y
    )

The process of adding together two matrices was more complicated. I went through many horribly broken iterations of this process, but I finally landed on this code:

    compare =. 4 : 0
      'vx ix' =. x
      'vy iy' =. y
      vx = vy
    )

    tie =. 4 : 0
      (0 {:: x);_1
    )

    pick =. 4 : 0
      'vx ix' =. x
      'vy iy' =. y
      v =. vx ((y"_) ` (x"_) @. <) vy
    )

    add =. 4 : 0
      x (pick ` tie @. compare) y
    )

With that, I could compute my final grid:

    numbers =. ". input
    grids =. ([ ;"0 i.@#) numbers grid"1 <numbers
    sum =. add"1/ grids

Our sum keeps track of closest input point at each position on our grid, and also the actual distance value to that point. The closest input point is what we’re trying to count, so it’s probably the more interesting of the two values:

    groups =. (1&{::)"1 sum
 0  0  0 0 _1 2 2  2
 0  0  3 3  4 2 2  2
 0  3  3 3  4 2 2  2
_1  3  3 3  4 4 2  2
 1 _1  3 4  4 4 4  2
 1  1 _1 4  4 4 4 _1
 1  1 _1 4  4 4 5  5
 1  1 _1 4  4 5 5  5
 1  1 _1 5  5 5 5  5

We could even render the grid using J’s viewmat utility. Awesome!

Our sample inputs, visualized with viewmat.

Using viewmat to visualize my matricies like this actually helped my find and fix a bug in my solution incredibly quickly. I’m a big fan and plan on using it more in the future.

Because of how Manhattan distance works, cells with infinite volume are the cells that live on the border of our final matrix.

To find those infinite groups that live along the edges of my final matrix, I appended the edge of each edge of my matrix together and returned the nub of those values. I found the idea for this matrix rotation helper from this video on J I watched many months ago. I’m glad I remembered it!

    rot =. [: |. |:

    edges =. 3 : 0
      top =. 0 { y
      right =. 0 { rot^:1 y
      bottom =. 0 { rot^:2 y
      left =. 0 { rot^:3 y
      ~. top , right , bottom , left
    )

To find my final answer, I raveled my matrix, removed the infinite groups, used the “key” (/.) adverb to count the size of each group, and returned the size of the largest group.

    without =. _1 , edges groups
    raveled =. ,groups
    0 0 {:: \:~ ({. ;~ #)/.~ raveled #~ -. raveled e. without

This definitely isn’t the most efficient solution, but it works. At this point, I’m happy with that.

Part Two

Part two turned out to be much easier than part one. We simply needed to iterate over each point in our grid, counting the total distance to each of our input points. The set of points that was less than a fixed number from all input points defined a circular “landing area”. We were asked to find the size of that area.

I gutted most of my part one solution and replaced the values returned by my grid verb with the total distance to each input point:

    distances =. 4 : 0
      +/ ((>x) d"1~ ])"1 y
    )

    grid =. 3 : 0
      (<y) distances"1 coords y
    )

Finding my final answer was as easy as calculating my grid, checking which points were less than 10000, removing all of the 0 values, and counting the result.

    numbers =. ". input
    # #~ , 10000 > grid numbers

Notes

  • Rotating a matrix (|. |:) is a great trick.
  • viewmat is awesome. It very quickly helped me find and fix a bug in my solution.
  • Boxes can be treated like arrays in most cases. I was under the wrong impression that a box was a single unit in terms of rank.

December 05, 2018

Derek Jones (derek-jones)

Coding guidelines should specify what constructs can be used December 05, 2018 05:42 PM

There is a widespread belief that an important component of creating reliable software includes specifying coding constructs that should not be used, i.e., coding guidelines. Given that the number of possible coding constructs is greater than the number of atoms in the universe, this approach is hopelessly impractical.

A more practical approach is to specify the small set of constructs that developers that can only be used. Want a for-loop, then pick one from the top-10 most frequently occurring looping constructs (found by measuring existing usage); the top-10 covers 70% of existing C usage, the top-5 55%.

Specifying the set of coding constructs that can be used, removes the need for developers to learn lots of stuff that hardly ever gets used, allowing them to focus on learning a basic set of techniques. A small set of constructs significantly simplifies the task of automatically checking code for problems; many of the problems currently encountered will not occur; many edge cases disappear.

Developer coding mistakes have two root causes:

  • what was written is not what was intended. A common example is the conditional in the if-statement: if (x = y), where the developer intended to write if (x == y). This kind of coding typo is the kind of construct flagged by static analysis tools as suspicious.

    People make mistakes, and developers will continue to make this kind of typographical mistake in whatever language is used,

  • what was written does not have the behavior that the developer believes it has, i.e., there is a fault in the developers understanding of the language semantics.

    Incorrect beliefs, about a language, can be reduced by reducing the amount of language knowledge developers need to remember.

Developer mistakes are also caused by misunderstandings of the requirements, but this is not language specific.

Why do people invest so much effort on guidelines specifying what constructs not to use (these discussions essentially have the form of literary criticism)? Reasons include:

  • providing a way for developers to be part of the conversation, through telling others about their personal experiences,
  • tool vendors want a regular revenue stream, and product updates flagging uses of even more constructs (that developers could misunderstand or might find confusing; something that could be claimed for any language construct) is a way of extracting more money from existing customers,
  • it avoids discussing the elephant in the room. Many developers see themselves as creative artists, and as such are entitled to write whatever they think necessary. Developers don’t seem to be affronted by the suggestion that their artistic pretensions and entitlements be curtailed, probably because they don’t take the idea seriously.

Stjepan Golemac (stjepangolemac)

Hi Shawn, December 05, 2018 03:38 PM

Hi Shawn,

Yes I would go with isRefreshing flag somewhere in your store too. First refresh sets it to true, and all subsequent ones wait until it is changed.

You can have refreshError value in your store too that will be null by default, and that would change if error fails.

After refresh finishes all sagas that were waiting for it can check was it successful or not by checking the refreshError value, and act accordingly.

You could even race refreshSuccess and logout events, and if the logout is first you cancel refresh.

You can find more info here:

I hope this helps you!

Pete Corey (petecorey)

Advent of Code: Alchemical Reduction December 05, 2018 12:00 AM

Today’s Advent of Code problem is to repeatedly remove corresponding “units” from a “polymer” until we’re left with an unreducable polymer string. Unit pairs that can be removed are neighboring, matching characters, with differing cases, like C and c. The answer to the problem is the length of the resulting string.

I’m really happy with my J-based solution to this problem, which is a relief after yesterday’s disaster. I started by writing a function that compares two units and returns a boolean that says whether they react:

    test =. ([ -.@:= ]) * tolower@:[ = tolower@:]
    'C' test 'c'
1

Next, I wrote a dyadic verb that takes a single unit as its x argument, and a string of units as its y argument. If x and the head of y react, it returns the beheaded (}.) y. Otherwise it returns x appended to y:

    pass =. ([,]) ` (}.@:]) @. ([ test {.@:])
    'C' pass 'cab'
ab

This pass verb can be placed between each element of our polymer string using the insert (/) adverb. This gives us a reduced polymer string.

    pass/ 'dabAcCaCBAcCcaDA'
dabCBAcaDA

Finally, we can repeatedly apply pass until the result is stable, essentially converging on a solution. Once we’ve got our fully reduced polymer string, we count its length and print the result:

    echo # pass/^:_ 'dabAcCaCBAcCcaDA'
10

And that’s it!

Part Two

Part two tells us that one of the unit pairs is causing trouble with our polymer reduction. It wants us to remove each possible unit pair from the input string, count the length of the resulting reduction, and return the lowest final polymer string length.

My solution to part two builds nicely off of part one.

We’ll keep test and pass as they are. We’ll start by writing a remove verb that takes a character to remove as x, and a string to remove it from as y. I use i. to build a map that shows me where x isn’t in y, and then use # to omit those matching characters.

    remove =. ] #~ [ i. [: tolower ]
    'y' remove 'xyz'
xz

Next I wrote a remove_nubs verb that calculates the nub of our polymer string, and uses remove to remove each nub from our original string. I box up the results to avoid J appending spaces to end of my strings to fill the matrix.

    remove_nubs =. [ <@:remove"1 (1 , #@:nub) $ nub =. [: ~. tolower
    remove_nubs 'aabc'
┌──┬───┬───┐
│bc│aac│aab│
└──┴───┴───┘

Finally, I apply remove_nubs from my input, and converge on a solution for each new polymer string, count their resulting lengths, and return the minimum length:

    echo <./ ([: # [: pass/^:_"1 >)"0 remove_nubs 'dabAcCaCBAcCcaDA'
4

Notes

  • The application of / modified verbs is from right to left. I would have expected left to right, for some reason. This makes sense though, considering J’s execution model.
  • Visualizing verb trains makes it so much easier to write them. I actually found myself getting them right the first time, thanks to “tree view” ((9!:3) 4.
  • Boxing can be helpful when I don’t want J to pad the value to fit the dimensions of the array it lives in.

Pepijn de Vos (pepijndevos)

Building Bad Amplifiers December 05, 2018 12:00 AM

My brother scavenged and repaired an old bass guitar, and asked if I could make him an equally bad amplifier to go with it to create the ultimate bad sound.

I was happy to oblige, so I threw everything I know about good amplifiers out of the window and googled around for some inspiration. Thus resulted in a lot of badly designed amplifiers, that subject your speaker to DC currents or don’t deliver any power at all.

So I started making some myself. First thought was to make a class A amplifier with a single power MOSFET and power resistor. I got two 1 Ω resistors in series, rated for 5 W. This gives a maximum voltage of 2.2 per resistor.

The MOSFET is rated for 30 A, so that’s probably fine. Then I used a potentiometer to bias the gate and a capacitor to drive it. Something like this.

class a

Problem is, while it’s a very nice and simple space heater, it does’t sound bad enough. It’s inefficient and non-linear, but the sound is kind of fine.

So the next step was to make an amplifier that sounds worse. What better than a pure class B output stage with its sweet unmitigated crossover distortion?

I pulled a complementary pair of BJTs from a drawer, and drove it with some random small-signal MOSFET and a 1K resistor. A diode was added to reduce the cross-over a bit. The output cap is just a large one. At the bottom of the MOSFET I added a 100 Ω degeneration resistor that I bypassed in AC with a capacitor for more gain. I again added a potentiometer to bias the MOSFET.

My brother liked the bad sound, but it wasn’t loud enough, so I added another MOSFET gain stage. Same story, 1K resistor, small bypassed degeneration resistor, and a potentiometer to bias the MOSFET. Except now I put the potentiometer as the degeneration resistor, for no good reason.

class b

Neither of these amplifiers involved much design, calculation, or simulation. They were directly constructed on overboard with potentiometers where I would have needed math to find the correct value, and I just made these drawings for this post.

Normally what you’d do is calculate the gate voltage created by the voltage divider.

The voltage across the degeneration resistor is then roughly

For a BJT the threshold voltage is roughly 0.6 V while a small-signal MOSFET is more like 2 V. Ohms law then gives you the current through the degeneration resistor, which is the same current as through the 1k resistor at the top, so you know how much voltage drops across that one.

December 04, 2018

Gustaf Erikson (gerikson)

November December 04, 2018 03:15 PM

Pete Corey (petecorey)

Advent of Code: Repose Record December 04, 2018 12:00 AM

Today’s Advent of Code challenge asks us to parse and process a set of time-series data that describes when guards start their shift, when they fall alseep, and when they wake up. Our task is to find the guard that sleeps the most. We need to multiple their ID together with the minute they’re asleep the most.

This was an incredibly difficult problem for me to solve using J. My plan of attack was to build a “sleep matrix” for each guard. Each matrix would have a row for each day the guard was on duty, and each row would be sixty columns wide, with each row/column representing whether the guard was asleep during that minute of the twelfth hour of that day.

I was immediately stumped by how to parse each string and organize all of the data into useful, easily manipulatable structures.

After sorting my lines (/:~ input), I checked if each line had a 'G' character at index nineteen. If it did, I raised a boolean and used +/\ e. and # to build a set of groups for each guard’s shift. Once I’d grouped each shift, I could build my sleep matrix and box it together with the guard’s ID:

    sleep_map =. 3 : 0
      1 y } 60 $ 0
    )

    filter =. (' ' -.@:= 0&{)"1 # ]

    parse_log =. 3 : 0
      head =. {. y
      rest =. filter }. y
      'Y M D h m id' =. numbers head
      sleep =. sleep_map"0 ({:"1 numbers"1 rest #~ (# rest) $ 1 0)
      wake =. _1 * sleep_map"0 ({:"1 numbers"1 rest #~ (# rest) $ 0 1)
      id&;"2 +/\"1 sleep + wake
    )

    parse =. 3 : 0
      groups =. +/\ ('G' = 19&{::)"1 y
      masks =. groups&e."0 ~. groups
      parse_log"_1 masks # y
    )

Next I needed to consolidate each guard’s set of shifts and sleep matrices into a single sleep matrix:

    group_days =. 3 : 0
      id =. 0 {:: {. y
      days =. ,/ 1 {::"_1 y
      id;days
    )

    group =. 3 : 0
      ids =. 0 {::"1 y
      ids group_days/. y
    )

Finally I could box up the needed statistics for each guard and sleep matrix, sort the results, and return the desired calculation:

    stats =. 3 : 0
      id =. 0 {:: y
      days =. 1 {:: y
      overlap =. +/ days
      most =. overlap i. >./ overlap
      slept =. +/ overlap
      slept; most; id; days
    )

    result =. (2&{:: * 1&{::) {. \:~ stats"1 group parse log

Part Two

Part two just wants us to find the guard that is asleep most frequently on the same minute of the night. We’re to return that guard’s ID multiplied by the minute they’re usually asleep.

Thankfully, I was able to recycle all of the hard work I put into part one when it came time to solve part two. All I really needed to do was make a change to the set of statistics I boxed up in my final step:

    stats =. 3 : 0
      id =. 0 {:: y
      days =. 1 {:: y
      overlap =. +/ days
      most =. >./ overlap
      minute =. overlap i. most
      most; minute; id; days
    )

The rest of my code was left unchanged.

Notes

  • The “key” verb (/.) can be incredibly useful for grouping data and performing actions on those subsets.
  • Sorting is interesting in J.
  • Any type of data can be sorted. Sorting arrays of boxes behaves like sorting lists of tuples in Elixir, which is a very handy trick.
  • (9!:3) 4 renders verb trains in “tree view” which I find very helpful.

Alex Wilson (mrwilson)

Notes from the Week #11 December 04, 2018 12:00 AM

I’m trying a slightly different style this week, let’s see how it feels!

Four Things That Happened

One

It’s the end of ProDev’s quarter, and we celebrated with a … science fair. We’ve done one of these before and I absolutely love the creativity that comes out of a such a simple proposition. Each team prepares a “stall” that we take into our clubhouse meeting room and we can go around and see what each team has done during the last quarter.

We had:

  • Super-detailed artwork on movable wipe-boards
  • Large monitors showing off new reporting capabilities
  • Kahoot quizzes about our data platform’s learnings
  • And more …

We themed ours on our Opsgenie integration and had “3 Wishes” that we’ve fulfilled during the last quarter.

These events are a great way to down tools (kind of) and create something to show off what we’ve been working on — a side-effect of Agile/XP that I’ve observed is that working in vanishingly thin slices and focusing on incremental delivery removes a lot of the sense of progress as we’re only taking small steps.

Events like these science fairs give us a way to take a step back and recognise an entire quarter’s worth of work.

Two

Shift have been informal adopters of the Occupy Hand Signals as a way of self-moderating group conversations and making sure everyone has opportunities to speak whilst avoiding the “loudest person wins” degenerate case.

We learned this technique from observing another larger team (their team lead wrote about it here) and found ourselves picking up the very basic signals like “I would like to speak” and “I have a direct response”. Over time we’ve gotten pretty good at the self-enforcement part, such as calling out “X first, then Y” when two people put their hands up to speak.

One of our working agreements in our last retrospective was that we would formally adopt this for meetings of more than two people — a largely ceremonial action but it’s now encoded within our working practices.

There’s also a great GDS blog-post about using these signals.

Finally, I read a brief Twitter thread about conversational interactions, the seed of which was this article — I like to think that over the last few years I’ve become more a member of the Church of Strong Civility than the Church of Interruption.

DOCTRINES OF THE CHURCH OF STRONG CIVILITY
Thou shalt not interrupt.
Thou shalt speak briefly.
Thou shalt use physical cues to indicate your understanding and desire to speak.

Definite food for thought, and a reminder that conversations about how we have conversations are often vital preludes to making conversations themselves productive.

Three

We resumed work on improving our log-aggregation/query platform — our initial assessment of the AWS hosted ElasticSearch was that it was missing key features for our usecase, and that we would be trading off too much configurability for reduced operational overhead.

We’re now looking at Elastic’s own cloud offering and we repeated our investigation workflow that we used for our incident management application trials:

  • Identify use cases and trials to run (with input from our stakeholders)
  • Get our ducks in a row to engage a free trial (i.e. set up infrastructure ready for it)
  • Commence the free trial and evaluate our criteria

My use of the word trial rather than experiment is deliberate, and comes from our CTO’s most recent fortnightly ProDev all-hands session — paraphrasing Linda Rising, there’s no null hypothesis or statistical validation going on, so what we’re performing are trials and not experiments.

This is not a bad thing, but if we don’t have the resources to do proper experimental validation we should at least keep our trials short, effective, and frugal. Our free trial is temporally capped at 14 days, and has no revenue cost, so provided we’re effective with the criteria we evaluate this is shaping up to be a good trial.

Four

We now maintain a number of tools to orchestrate production systems that have historically been un-loved — a Shift developer, Stephen, took it upon himself to spend a bit of time to spike a cleaner version with more user-friendliness.

Shift are lucky: if we scrunched up a post-it and lobbed it a few feet, we’d hit one of our stakeholders.

The original XP book (Extreme Programming Explained) talks about the benefits of having an embedded customer for validating work quickly — we’ll go and user-research stuff that we’re building with the users that will be using them by embedding and watching them use the tool, whether it’s a senior developer, a new hire, or one of our experienced Site-Reliability Engineers.

The tool he spiked, to improve our puppet node management workflow, opens up a lot of opportunities for us to try new technologies. Now that we know there’s desire for the product, we’re umming-and-ahhing about whether we:

  1. Keep it in Bash
  2. TDD it from scratch in Python, a language we’re familiar with
  3. TDD it from scratch in Go/Rust, languages we’re not familiar with.

We’ll be invoking the Improve, No Change, Worsen workflow (outlined in a previous blogpost) to establish pros and cons of each approach — (2) and (3) are almost certainly slower, but (3) gives us the opportunity to broaden our horizons.

Will (3) be worth the overhead of learning a new language/toolchain?

Stay tuned to find out!

Reflections

I’m starting to meditate again, once a day for 15–20 minutes if I can. My mind is full of stuff right now, both at work and in my personal life so I am trying to get better at taking time for myself.

I’m basically being hit in the face with my own oft-repeated quote

If you don’t take time, or make time, how can you ever have time?

I finished Killing Commendatore and have downloaded an audio-book of A Wild Sheep Chase, another Murakami book. There’s something relaxing about his prose that takes me out of myself for a bit.

Originally published at blog.probablyfine.co.uk on December 4, 2018.

December 03, 2018

Jan van den Berg (j11g)

Advent of Code December 03, 2018 06:42 AM

Currently the yearly Advent of Code contest created by Eric Wastl is being held at adventofcode.com. That means the site is sprouting two daily programming challenges, until Christmas, to see who can solve them fastest. But it is not just about being fast of course, Advent of Code is a great way to improve your programming skills through daily clever puzzles. And because everyone gets the same puzzles it is also a great way to share and discuss results, and above all, learn.

Though I knew of Advent of Code, I hadn’t participated before, but it seems two days, and four puzzles later, I am sort of in.

Or at least, after only two days I am already fascinated by what I have seen, so I thought I’d share!

Fascinating findings

  • Python seems to be the most popular language, by far. At least judging by Github repo names, which is of course not an exact measure, but it is more or less an indicator, as a lot of people tend to share their solutions there. Python is popular, and it is not even close:

  • Browsing through the code, it once again becomes apparent that even with the exact same tools (e.g. Python) we all bring different experiences and education to the battle resulting in a colorful variation of solutions for the exact same puzzles. I’ve seen 100+ lines of Python code generate the exact same result as 10 lines. Nowhere is it more clear than a coding challenge that we are all unique individuals, and there is not a specific right way, as long as you get there (at least for the sake of this contest, don’t @ me).
  • If I had to guess I would have picked JavaScript to be the most popular language, but as you can see it comes in second. Ruby, Go and C# are also not surprising entries on this list, but Haskell and Elixir are (to me). These two functional languages seem to have quite a bit of buzz around them as people passionately seem to choose either one, which is interesting as I know very little about either. The creator of Elixir even participates himself in AoC! 
  • Very few people seem to pick PHP. Which I also find surprising, because gigantic parts of the web run PHP. But it seems to have little appeal when it comes to coding challenges?
  • Some people are fast, I mean really fast! Just look at the times on the leader board. Judging from these times, this means some people are able to read around 1000 words explaining a puzzle, and then coding up not one, but two solutions and submitting the correct answer in under four minutes!  I kid you not. This next person live-streamed it, and clocks in around 5 minutes (even without using the command-line shortcut CTRL-R), but it didn’t even put him in the top 20!

  • Of course you can use any language you like or even pen and paper, it is a puzzle after all. And people use some really crazy stuff, I love it. Anything goes (even Excel), and that is of course part of the goal of the contest: try to learn new things. There is one person who deliberately tried a new language for each challenge.

Notable entries

So it’s not all about speed, it’s also about trying new things. Here are some other unexpected examples.

    • Minecraft: This one takes the cake for me. See if you can wrap your head around what is happening here:

Learnings so far

So apart from the fascinating findings, I also got involved myself. I think because I solved the very first challenge with a simple AWK one-liner. But solving the followup challenge seemed trickier in AWK, though people seem to have done so (of course).

Being completely new to Python, and seeing how popular it is, I decided to give it a go, and I must say I think I understand a bit better now why and how Python is so popular. Yes, it is well known that it forces deliberately clean code but it also provides ways for incredibly succinct code (my favorite!). Because so far I have learned about map(), collections.counter, zip(), and cycle() Very handy, built-in functions and datatypes that I was unaware of, but which are incredibly powerful.

Some people tend to disagree (probably very few), as I found this comment on StackOverflow when researching the Counter dict.

I don’t think that’s fair, because in a sense every higher level programming language is an abstraction of differently expressed machine code. So unless you’re typing in machine code directly you are also using general purpose tools, and how narrow or general something is, who’s to say? And as long as it helps humans do things faster, programming languages are tools after all, I’m all for it. And let the computer worry about the zeros and ones.

I was very surprised and pleased with the mentioned Python functions. For example I brute-forced a solution in Bash which took probably more than 10 minutes to run, but ran in 0.07 seconds in only a couple of lines of Python. So of course the knowledge of the right functions and data structures once again proved to be the difference between 100 lines or 10, which reminded me of this quote of Linus Torvalds:

So that’s it and if you want to learn something new, go give it a try!

 

The post Advent of Code appeared first on Jan van den Berg.

Pete Corey (petecorey)

Advent of Code: No Matter How You Slice It December 03, 2018 12:00 AM

Today’s Advent of Code challenge wants us to model many rectangular intersections. Given a large number of rectangles laid out on a grid, we’re asked to find the total number of square inches of overlap between all of these rectangles.

Using J to parse the input for this challenge string turned out to be incredibly difficult. Frustratingly difficult. It seems there’s no nice, easy, built-in way to do this kind of string processing in J. At least none that I could find.

I could have used this example from the Strings phrase page, but I didn’t want to pull in a helper function I didn’t fully understand, and I didn’t want to rely on a package, if I could avoid it.

At the end of the day, I used the “words” verb (;:) to build a finite state machine that pulls sequences of digits out of my input string. This guide was a huge help in understanding and building out this code.

    parse =. 3 : 0
      m =. a. e. '1234567890'
      s =. 1   2 2 $ 0 0  1 1
      s =. s , 2 2 $ 0 3  1 0
      ". > (0;s;m;0 _1 0 0) ;:"1 y
    )
    parse '#1 @ 1,3: 4x4'
1 1 3 4 4

Once I was able to parse out the offsets and dimensions of each rectangle, solving the problem was relatively straight forward. I first created a width by height matrix of 1s to represent each rectangle. I shifted the matrix down by appending (,) rows of zeros, and right by appending zeros to the beginning of each row. J is nice enough to fill in the gaps.

    cut_cloth =. 1 $~ |.

    cut_and_shift =. 3 : 0
      left  =. 1 {:: y
      top   =. 2 {:: y
      cloth =. cut_cloth (0 0 0 1 1 # y)
      cloth =. 0 ,^:top cloth
      cloth =. 0 ,"1^:left cloth
      cloth
    )
    
    cut_and_shift 0 1 1 2 2
0 0 0
0 1 1
0 1 1

Once each of the rectangles were sized and positioned, we can add them together:

    +/ cut_and_shift"1 parse input
0 0 0 0 0 0 0
0 0 0 1 1 1 1
0 0 0 1 1 1 1
0 1 1 2 2 1 1
0 1 1 2 2 1 1
0 1 1 1 1 1 1
0 1 1 1 1 1 1

This gives us a visual depiction of where our rectangles overlaps where each positive number represents the number of intersections at that location. To find the answer to our problem, we ravel this grid (,), filter out all elements that aren’t greater than 1, and count the remaining:

    # (>&1 # ]) , +/ cut_and_shift"1 parse input
4

Part Two

This tweet reply from Raul Miller sent me down a rabbit hole related to improving my string-parsing-fu. After coming out the other side I had learned that the inv adverb, or ^:_1, when paired with # can be used to preserve the gaps on a filtered list, or string:

    ((1 0 1 0 1 1 0)&#^:_1) 'abcd'
a b cd

This led me to a much better parse function:

    parse =. ".@:(]#^:_1]#[) (e.&' 123456789')

Part two of today’s challenge asks us for the ID of the only rectangle in the set of inputs that doesn’t intersect with any other rectangle.

My strategy for this part was different than the first part. Instead of building a matrix representation of each rectangle, I decided to transform each description into a set of left, right, top, and bottom coordinates (with a little destructuring help):

    repackage =. 3 : 0
      'i l t w h' =. y
      i,l,t,(l + w),(t + h)
    )
    repackage 0 1 1 2 2
0 1 1 3 3

Once I had the bounds of each rectangle described, I could determine if any two rectangles intersect:

    intersect =. 4 : 0
      'i_1 l_1 t_1 r_1 b_1' =. x
      'i_2 l_2 t_2 r_2 b_2' =. y
      -.+./(l_2>:r_1),(r_2<:l_1),(t_2>:b_1),(b_2<:t_1)
    )
    0 1 1 3 3 intersect 0 2 2 4 4
1

Using my new intersect verb, I could build an “intersection table”, and find the rectangle that has only one intersection: the intersection with itself!

    parsed =. parse"1 input
    ({."1 parsed) #~ -. 1 i. +/ intersect"1/~ repackage"1 parsed
3

Notes

  • I had trouble with the subtle differences between intersect"1/~ and intersect/~"1. I need to dig deeper here.
  • The inversion (^:_1) of # is a special case.

Allow Yourself to do Things Poorly December 03, 2018 12:00 AM

I often find myself stuck at the beginning of things. Haunted by past mistakes, I vow to myself that “I’ll do things right this time.” Unfortunately, insisting on doing everything “right” is crippling. Largely because doing everything “right” is impossible.

Lately I’ve stopped beating myself over the head for doing things that I know aren’t “best practice”, and instead I’ve given myself the freedom to start doing things poorly.

It’s been liberating.

Analysis Paralysis

My Elixir-based Chord project is coming along nicely. While my ASCII-based chord chart renderer is incredibly helpful for visualizing chords in the terminal, it’s still difficult to sift through several thousand chords at a time.

Reluctantly, I realized that my Chord project needed a web-based front-end.

These days, my go-to tool for building a front-end is React. However, I’ve built enough React applications for clients and in my personal work to know that if you’re not vigilant and strict, the complexity of a React project can quickly spiral out of control. I was determined not to allow that to happen this time around.

Not only that, but I wasn’t sure how best to build the user interface. Chord offers users a huge number of potential knobs to tweak, and the resulting sets of data can be massive and hard to sift through. How do we best present everything to the user?

I was paralyzed.

Get Going

After several days of hemming, hawing, and wringing my hands over architectural decisions and uncertainties over how I wanted the user experience to go, I decided that I was wasting my time.

It’s better, I convinced myself, to just get something built. Even if it’s built poorly and even if it isn’t an ideal interface for the tool, something is better than nothing. I essentially decided to start building a rough draft of the application.

The first thing I needed was a way to render chord charts in the browser. After an hour or so of writing some absolutely awful React code, I was there.

Randomly generated chords.

Next, I needed a way to pull chords from the back-end Elixir server and render them using our new chord chart component. After another couple hours of hacking together a (poorly designed and roughly implemented) GraphQL data layer, I was greeted with several hundred Cmaj7 chords:

All Cmaj7 chords.

At this point, I was stuck on how to proceed. How could I easily let users build chord progressions from the possibilities presented? I started iterating on a few ideas, mostly involving nested trees, but nothing seemed to click.

Awash in Inspiration

Several days later, I was browsing Reddit and I stumbled across this screenshot from /r/unixporn. I was completely inspired. This is what I wanted my project to look like!

My inspiration.

I fired up CodePen and started hashing out some mockups of how the application might look.

Codepen mockup.

Happy with the direction I was heading, I quickly translated the hard-coded, mocked-up HTML into my already existing React components. The results were promising.

With data.

Seeing real data on the screen gave me even more ideas on how to interact with the application.

With data.

I was awash in inspiration.

Riding the Wave

This cycle of building and playing with what I’d built kept up for days. Every few hours I’d pester my wife and show her what I’d added, because I was so excited. I didn’t take the time during this process to focus on best practices or code quality. Instead, I focused on getting results.

Eventually, that wave rolled back, and I was left with a mostly functional and mostly ugly codebase. It became more difficult to make changes to the codebase, and with my inspiration fading, I wasn’t motivated to push through the pain.

At this point, I turned my attention towards refactoring my original code. Now was the time to focus on best practices and doing things “right”.

While I’m still not fully happy with the codebase or the user interface, I’m very happy with how far I’ve come in such a short time. I never would have made this progress if I didn’t allow myself to do things poorly, just for the sake of getting things done. If I was still at the beginning, fixated on engineering a “correct” solution, I wouldn’t have had the raw materials required to turn my inspiration into a tangible product.

The current state of things.

December 02, 2018

Ponylang (SeanTAllen)

Last Week in Pony - December 2, 2018 December 02, 2018 11:40 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, our users’ mailing list or join us on IRC.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Indrek Lasn (indreklasn)

Here are the most interesting developer podcasts — 2019 edition December 02, 2018 05:51 PM

Here are the most interesting developer podcasts — 2019 edition

Photo by Matthias Wagner on Unsplash

Who doesn’t love to hear the opinions of peer developers? Frankly put; Some of the most useful resources I’ve picked up is from listening to different opinions and thoughts. I’ve put together a small but comprehensive list of all great podcasts. The core topics revolve around coding, well-being philosophy and how to cope with a different set of challenges.

I find myself listening to podcasts while doing the dishes, on the airplane, driving, and commuting. Learning from podcasts adds up really quickly — there’s so much to learn yet so little time — why not try to maximize ways to improve?

If yours didn’t make it to the list, please post it in the comments and I’ll make sure to check it out!

Assume that the Person You’re Listening to Might Know Something You Don’t —Jordan Bernt Peterson

Laracast

https://laracasts.simplecast.fm/

The Laracasts snippet, each episode, offers a single thought on some aspect of web development. Nothing more, nothing less. Hosted by Jeffrey Way.

I truly — truly enjoy the way Jeffrey Way unfolds his opinions on web development, life, parenting, hiring and running a small business while keeping the atmosphere friendly. Great podcast with great topics — definitely helps to grow as a person.

Syntax

https://syntax.fm/

A Tasty Treats Podcast for Web Developers.

If you’re looking into how to land freelance gigs, improve your coding or overall get your career to the next level — Syntax is the podcasts for you.

Wes Bos and Scott Tolinski are both great teachers and express themselves very fluently. Keeping up with the latest trends can be a drag. Syntax helps me to keep up to date with my skillsets.

There’s nothing more embarrassing than not knowing the latest trends. Especially crucial in the freelancing landscape since knowing the latest tools can be a huge time saver.

Developer Tea

https://spec.fm/podcasts/developer-tea

A podcast for developers designed to fit inside your tea break

From industry professionals, for industry professionals

In January 2015, two independent podcasts — Design Details and Developer Tea — were started by three individuals who wanted to talk about the work they do every day. After an amazing response from the web community, we’ve teamed up to create the Spec Network to help designers and developers to learn, find great resources and connect with one another.

FreeCodeCamp podcast

https://freecodecamp.libsyn.com/

The official podcast of the freeCodeCamp open source community. Learn to code with free online courses, programming projects, and interview preparation for developer jobs.

FreeCodeCamp has your back from how to get your first job to how to negotiate salaries, and much more!

Coding Blocks

https://www.codingblocks.net/category/podcast/

We are a few guys who’ve been professional programmers for years. As avid listeners of podcasts and consumers of many things code-related, we were frustrated by the lack of quality programming (pun) available in listenable formats. Given our years of experience and real-world problem solving skills, we thought it might be worth getting into this world of podcasting and “giving back” a shot.

Programming Throwdown

https://www.programmingthrowdown.com/?m=0

From teaching kids to code to concurrency, Patrick Wheeler and Jason Gauci have you covered. Programming Throwdown recommends the best developer tools, books and coding patterns. They been available since 2010 and are a great over-all listening experience.

Away from the keyboard

http://awayfromthekeyboard.com/episodes/

Away From The Keyboard is a podcast that talks to technologists and tells their stories. Stories about how they started, how they grew, how they learned, and how they unwind. It’s hosted by Cecil Phillip and Richie Rump and new episodes are released every Tuesday.

Full Stack Radio

http://www.fullstackradio.com/

A podcast for developers interested in building great software products. Every episode, Adam Wathan is joined by a guest to talk about everything from product design and user experience to unit testing and system administration.

This should give you a nice range of topics to listen to! Stay awesome and thanks for reading!

If you found this article useful, give it some claps and follow the cleversonder medium publication for more!

Never miss an article — stay up to date by following me on Twitter

Here are some articles you might enjoy as well;

Want to write for Cleversonder? Great — send me a private note or an email ❤


Here are the most interesting developer podcasts — 2019 edition was originally published in cleversonder on Medium, where people are continuing the conversation by highlighting and responding to this story.

Pete Corey (petecorey)

Advent of Code: Inventory Management System December 02, 2018 12:00 AM

Today’s Advent of Code challenge asks us to scan through a list of IDs, looking for IDs that two and three repeated letters. Once we have the total number of words with two repeated characters and three repeated characters, we’re asked to multiple them together to get a pseudo-checksum of our list of IDs.

At first I had no idea about how I would tackle this problem in J. After digging through the vocabulary sheet, I stumbled upon the monadic form of e. and I was shown the light.

The e.y verb iterates over every element of y, building a boolean map of where that element exists in y. We can use that result to group together each element from y. Once we’ve grouped each element, we can count the length of each group. Because J pads extra 0/' ' values at the end of arrays to guarantee filled matricies, we’ll measure the length of each group by finding the index of this padding character in every group. Finally, we can see if 2 and 3 exists in each set of character counts, and sum and multiply the results.

    read_lines =. >@:cutopen@:(1!:1) <
    lines =. read_lines '/Users/pcorey/advent_of_code_2018/day_02/input'

    group =. ([: ~. e.) # ]
    count =. [: i."1&' ' group"1
    twos_and_threes =. [: (2 3)&e."1 count

    */ +/ twos_and_threes lines

I’ve been trying to make more use of forks, hooks, and verb trains after reading through Forks, Hooks, and Compound Adverbs and Trains on the J website.

Part Two

Part two asks us the find two IDs that differ by only a single letter. Once we’ve found those two IDs, we’re to return the set of letters they have in common.

My basic idea here was to compare every possible word combination and pack up the two compared words with the number of differences between them:

    compare =. 4 : 0
      differences =. x ([: +/ [: -. =) y
      differences;x;y
    )

    comparisons =. ,/ input compare"1/ input

Using that, we could pick out the set of word pairs that have one difference between them:

    ones =. 1&= 0&{::"1 comparisons

Unbox the word pairs from that set:

    words =. > }."1 ones # comparisons

And then use an inverted nub sieve to find the letters those words share in common, using another nub to filter out the duplicate caused by the inverse comparison:

    ([: . (-.@:: # ])"1/) words

I’ll admit, I found myself getting lost in what felt like a sea of boxes and arrays while working on this solution. I found it difficult to keep track of where I was in my data, and found myself building intermediate solutions in the REPL before moving them over to my project. I need to get better at inspecting my data and getting a feel for how to manipulate these structures in J.

I also found myself heavily relying on the REPL for building my verb chains. I constantly found myself building chains by trail and error.

Does this work? No? Maybe if I add a cap at the end? Yes!

I can handle single hooks and forks, but when things expand beyond that I find myself getting lost. I’m hoping I get better with that over time.

Notes

  • Trying to open a box that holds numbers and string will throw a domain error, because the two types can’t live together in an array.

December 01, 2018

Bogdan Popa (bogdan)

Advent of Racket 2018 December 01, 2018 05:45 AM

I decided to do this year’s Advent of Code in Racket and stream the whole thing. We’ll see how far I make it (getting up this early is rough!), but so far I finished day one. The code is here and the playlist for the recordings is here. If you want to get notified as soon as I jump on to start streaming, you can follow me on Twitch.

Vincent (vfoley)

vi, my favorite config-less editor December 01, 2018 03:14 AM

Before I became an Emacs user, I was a vi user. And though Emacs is now my primary editor, I still use vi every day: I use it when I want to quickly view the content of a file, for quick edits to configuration files, when I’m on a remote server, or when a command-line utility invokes $EDITOR. The main reason why I still use vi is because I’m able to use it very efficiently even without a configuration file.

I learned vi circa 2001 from a magazine and from vimtutor and it was my primary text editor until 2007 or 2008 when I switched to Emacs. Back then, I had a rather long .vimrc, complete with configurations for Linux and Windows, GTK vim and curses vim, and I used a few plugins such as bufexplorer.vim.

Recently, I deleted my old .vimrc and started a new one from scratch. I no longer use plugins and I don’t use a GUI version of vim. For these reasons, I now consider myself more a vi user than a vim user. (I am however a big fan of text objects, and I want unlimited undo.) My .vimrc is now only 12 lines and most of these changes are conservative and uncontroversial: backspace works “normally” in insert mode all the time, I enable incremental search, I use 4 spaces for indentations rather than hard tabs, I activate syntax highlighting. Even though vim offers a myriad of settings, a vanilla setup is extremely capable and usable.

Being able to use an editor without custom configuration is extremely useful. Here are a couple of examples: I learned how to exploit Linux binaries from the book Art of Exploitation. The book has a companion live CD that you can use to follow examples from the book and try to perform the hacks yourself. Emacs is not installed, but vim is, and I was perfectly happy and comfortable using it to read the source code of the examples and modify them. Another example: recently at work, I wanted to write a Python script on a server that didn’t have Python installed, but had Docker. I launched a Python container, installed vim-tiny, and proceeded to write, debug, and improve my script without ever feeling that I was editing with a hand tied behind my back.

It is oddly satisfying to know that no matter the kind of machine I find myself on or the kind of restrictions it has (no network to download Emacs and/or my custom configuration, corporate policy against fetching packages from MELPA, etc.), that there will always be an editor that I can happily use without needing to make it “my own”.

Tim O’Reilly and Paul Graham have also cited usability without configuration a reason for preferring vi to Emacs.

Thanks to Richard Kallos for proof-reading an early version of this article.

Pete Corey (petecorey)

Advent of Code: Chronal Calibration December 01, 2018 12:00 AM

I’ve been a huge fan of the Advent of Code challenges since I first stumbled across them a few years ago. Last year, I completed all of the 2017 challenges using Elixir. This year, I decided to challenge myself and use a much more esoteric language that’s held my interest for the past year or so.

It’s my goal to complete all of this year’s Advent of Code challenges using the J programming language.

Before we get into this, I should make it clear that I’m no J expert. In fact, I wouldn’t even say that I’m a J beginner. I’ve used J a handful of times, and repeatedly struggled under the strangeness of the language. That being said, there’s something about it that keeps pulling me back. My hope is that after a month of daily exposure, I’ll surmount the learning curve and get something valuable out of the experience and the language itself.


The first Advent of Code challenge of 2018 asks us to read in a series of “changes” as input, and apply those changes, in order, to a starting value of zero. The solution to the challenge is the value we land on after applying all of our changes. Put simply, we need to parse and add a list of numbers.

The first thing I wanted to do was read my input from a file. It turns out that I do know one or two things about J, and one of the things I know is that you can use foreigns to read files. In particular, the 1!:1 foreign reads and returns the contents of a file.

Or does it?

    1!:1 'input'
file name error

Apparently 1!:1 doesn’t read relative to the current script. I’m guessing it reads relative to the path of the jconsole executable? Either way, using an absolute path fixes the issue.

    input =. 1!:1 < '/Users/pcorey/advent_of_code_2018/day_01/input'

Now input is a string with multiple lines. Each line represents one of our frequency changes. We could use ". to convert each of those lines into a number, but because input is a single string, and not an array of lines, can’t map ". over input:

    0 "./ input
0

After scrambling for ways of splitting a string into an array of lines, I stumbled across cutopen, which takes a string and puts each line into a box. That’s helpful.

    boxed =. cutopen input
┌──┬──┬──┬──┐
│+1│-2│+3│+1│
└──┴──┴──┴──┘

Now if we open boxed, we’ll have our array of lines:

    lines =. > boxed
+1
-2
+3
+1

And now we can map ". over that array to get our array of numbers.

    numbers =. 0 "./ lines
1 _2 3 1

And the answer to our problem is the sum of numbers.

    +/ numbers
3

Here’s my first working solution:

    input =. 1!:1 < '/Users/pcorey/advent_of_code_2018/day_01/input'
    boxed =. cutopen input
    lines =. > boxed
    numbers =. 0 "./ lines
    +/ numbers

Part Two

My first instinct for solving this problem is to do it recursively. I might be able to define a dyadic verb that accepts my current list of frequencies and a list of changes. If the last frequency in my array exists earlier in the array, I’ll return that frequency. Otherwise, I’ll append the last frequency plus the first change to my frequencies array, rotate my changes array, and recurse.

After many struggles, I finally landed on this solution:

    input =. 1!:1 < '/Users/pcorey/advent_of_code_2018/day_01/sample'
    boxed =. cutopen input
    lines =. > boxed
    numbers =. 0 "./ lines

    change_frequency =. 4 : 0
      frequency =. {: x
      change =. {. y
      frequency_repeated =. frequency e. (}: x)
      next_x =. x , (frequency + change)
      nexy_y =. 1 |. y
      next =. change_frequency ` ({:@}:@[) @. frequency_repeated
      next_x next next_y
    )

    0 change_frequency numbers

This works great for example inputs, but blows the top off my stack for larger inputs. It looks like J’s max stack size is relatively small. Recursion might not be the best approach for these problems.

Looking into other techniques for working without loops, I learned that you can use the ^:_ verb to “converge” on a result. It will repeatedly apply the modified verb until the same result is returned.

I refactored my verb to take and return my frequencies array and my changes array as a boxed tuple, and converge on that verb until I get a repeated result. That repeated result holds my repeated frequency:

    input =. 1!:1 < '/Users/pcorey/advent_of_code_2018/day_01/sample'
    boxed =. cutopen input
    lines =. > boxed
    numbers =. 0 "./ lines

    package_next =. 4 : 0
      (x,({:x)+({.y));(1|.y)
    )

    package_result =. 4 : 0
      x;y
    )

    change =. 3 : 0
      frequencies =. >@{. y
      changes =. >@{: y
      frequency =. {: frequencies
      change =. {. changes
      repeated =. frequency e. (}: frequencies)
      next =. package_next ` package_result @. repeated
      frequencies next changes
    )

    result =. change^:_ (0;numbers)
    echo 'Repeated frequency:'
    {:@:{.@:> result

Notes

  • $: doesn’t seem to refer to the outermost named verb. Recursion wasn’t working as I expected with $:. Replacing it with the named verb worked perfectly.
  • J seems to have a short stack. Note to self: avoid deep recursion.
  • J doesn’t support tail call optimization.
  • ^:_ and variants can be used as an iterative alternative to recursion.
  • Use boxes like tuples.
  • Use echo for debug printing.

November 30, 2018

Unrelenting Technology (myfreeweb)

Looks like NetBSD is already working on the EC2 AArch64 instances! My attempt at... November 30, 2018 11:32 PM

Looks like NetBSD is already working on the EC2 AArch64 instances! My attempt at running FreeBSD there failed: for mysterious reasons, the system reboots just after the last loader.efi message..

Trying to do anything system-level on EC2 is incredibly frustrating. There is STILL no read-write access to the serial console, because Bezos doesn't believe in debugging or something >_<

Also, about the ARM instances themselves. I am happy to see a big player enter the ARM space. And with custom (Annapurna) chips, even. (Though they'd have much better performance if they just bought some Ampere eMAGs or Cavium ThunderX2s.)

But what's up with that price? Did anyone at AWS ever look at Scaleway's pricing page?! On-demand pricing for a single core EC2 ARM instance is almost 20 bucks per month! While Scaleway offers four ThunderX cores for three euros per month!! Sure sure Scaleway is not a big player and doesn't have a huge ecosystem and is getting close to being out of stock on these ARM instances.. but still, 1/4 the cores for 5x the price.

(Spot pricing is better of course.)

Frederic Cambus (fcambus)

Running a free public API, a post-mortem November 30, 2018 10:29 PM

It's been a little bit more than three years since Telize public API was permanently shut down on November 15th, 2015. I have previously written about the adventure itself, and about the decommissioning of the API.

Before shutting down the public API of Telize, a paid version was launched on Mashape to ease the transition for those who couldn't host their own instances. The Mashape API Marketplace became a part of RapidAPI last year, the service is still running and will keep doing so for the foreseable future. You can support my work on Telize by subscribing to the service.

While a small fraction of the userbase switched to the paid API, the vast majority didn't, and the number of requests exploded due to retries, as detailed in the article about the API decommission. One thing I wondered at the time was how long it would take for the traffic to become negligible. The Internet is a very strange place and things can go unnoticed for a very long time, years sometimes. Of course, Telize case is no exception.

Every year since the public API was closed down, I've been logging requests for a few days in a row to get a glimpse of how many of them were still being made. While the number of unique IP addresses querying the API kept decreasing, the amount of requests themselves went up again compared to last year.

2016-11-06 - Requests: 51,896,923 - Unique IPs: 2,543,814
2016-11-07 - Requests: 56,427,258 - Unique IPs: 2,756,065
2016-11-08 - Requests: 53,641,121 - Unique IPs: 2,746,005
2016-11-09 - Requests: 53,704,140 - Unique IPs: 2,536,632
2016-11-10 - Requests: 53,194,946 - Unique IPs: 2,525,167
2016-11-11 - Requests: 50,444,003 - Unique IPs: 2,652,730
2016-11-12 - Requests: 49,224,863 - Unique IPs: 2,670,926
2016-11-13 - Requests: 48,526,303 - Unique IPs: 2,492,765
2017-11-10 - Requests: 35,325,037 - Unique IPs: 1,736,815
2017-11-11 - Requests: 33,582,167 - Unique IPs: 1,613,161
2017-11-12 - Requests: 33,334,836 - Unique IPs: 1,587,549
2017-11-13 - Requests: 36,131,909 - Unique IPs: 1,593,255
2017-11-14 - Requests: 34,457,433 - Unique IPs: 1,571,144
2017-11-15 - Requests: 33,225,149 - Unique IPs: 1,563,845
2018-11-12 - Requests: 50,612,559 - Unique IPs:   611,302
2018-11-13 - Requests: 50,858,236 - Unique IPs:   640,836
2018-11-14 - Requests: 51,991,454 - Unique IPs:   661,410
2018-11-15 - Requests: 53,008,712 - Unique IPs:   689,646
2018-11-16 - Requests: 51,651,814 - Unique IPs:   686,646
2018-11-17 - Requests: 49,236,779 - Unique IPs:   662,717
2018-11-18 - Requests: 47,237,596 - Unique IPs:   692,718
2018-11-19 - Requests: 51,679,888 - Unique IPs:   735,396
2018-11-20 - Requests: 50,245,134 - Unique IPs:   755,177
2018-11-21 - Requests: 50,745,725 - Unique IPs:   773,949
2018-11-22 - Requests: 50,609,750 - Unique IPs:   786,963
2018-11-23 - Requests: 49,991,775 - Unique IPs:   687,652
2018-11-24 - Requests: 47,479,703 - Unique IPs:   584,058
2018-11-25 - Requests: 47,346,829 - Unique IPs:   597,153

Bandwidth usage, measured with nload:

Incoming:






                                                                    Curr: 2.44 MBit/s
                                                                    Avg: 2.08 MBit/s
  .  ....        ...  .     ...   ......  ...     ..           ...  Min: 1.76 MBit/s
#########################|######################|#################  Max: 2.70 MBit/s
##################################################################  Ttl: 2217.35 GByte
Outgoing:






                                                                    Curr: 1.22 MBit/s
                                                                    Avg: 1.07 MBit/s
                                                                    Min: 904.10 kBit/s
     ....                   ..                                   .  Max: 1.45 MBit/s
################################################|#################  Ttl: 1111.68 GByte

So more than 3 years after the decommission, I'm still getting around 50 millions daily requests. I'm honestly quite astonished to notice that the numbers went up again significantly this year.

Below is a report of user agents which performed more than 1M daily requests on November 12th 2018, top offenders being Android applications and Wordpress sites… How surprising.

Apache-HttpClient/UNAVAILABLE                  24,729,923
Dalvik/2.1.0                                    1,113,530
WordPress/4.2.21                                3,200,750
WordPress/4.1.24                                2,223,350
WordPress/4.3.17                                1,212,849

On a more positive note, those are recent Wordpress releases, which means it might be possible to identify the plugins performing those requests and contact their authors.

Regarding the open source project itself, I released 2.0.0 back in March, which is now using GeoIP2/GeoLite2 databases, as GeoIP/GeoLite databases have been deprecated since April. I'm currently working on a rewrite in C using Kore, which will bring in a couple of improvements compared to the current version. I will write about the new iteration in a following post.

Wallaroo Labs (chuckblake)

Reasons to Scale Horizontally November 30, 2018 03:00 PM

Here at Wallaroo Labs, we build Wallaroo a distributed stream processor designed to make it easy to scale real-time Python data processing applications. That’s a real mouthful. What does it mean? To me, the critical part is the “scale” from “easy to scale.” What does it mean to easily scale a data processing application? In the case of Wallaroo applications, it means that it’s easy to scale those applications horizontally.

Derek Jones (derek-jones)

The 520’th post November 30, 2018 01:34 AM

This is the 520’th post on this blog, which will be 10-years old tomorrow. Regular readers may have noticed an increase in the rate of posting over the last few months; at the start of this month I needed to write 10 posts to hit my one-post a week target (which has depleted the list of things I keep meaning to write about).

What has happened in the last 10-years?

I probably missed several major events hiding in plain sight, either because I am too close to them or blinkered.

What did not happen in the last 10 years?

  • No major new languages. These require major new hardware ecosystems; in the smartphone market Android used Java and iOS made use of existing languages. There were the usual selection of fashion/vanity driven wannabes, e.g., Julia, Rust, and Go. The R language started to get noticed, but it has been around since 1995, and Python looks set to eventually kill it off,
  • no accident killing 100+ people has been attributed to faults in software. Until this happens, software engineering has a dead bodies problem,
  • the creation of new software did not slow down from its break-neck speed,
  • in the first few years of this blog I used to make yearly predictions, which did not happen (most of the time).

Now I can relax for 9.5 years, before scurrying to complete 1,040 posts, i.e., the rate of posting will now resume its previous, more sedate, pace.

November 29, 2018

Bogdan Popa (bogdan)

Announcing geoip November 29, 2018 10:00 AM

I released geoip today. It’s a Racket library for working with MaxMind’s geolocation databases. It’s got a tiny API surface (3 functions!), but it should be useful to anyone needing to do geolocation in Racket. As always, check it out and let me know what you think! BTW, I streamed the whole process on Twitch so, if that’s your thing, you can check out the recordings here.

November 28, 2018

Derek Jones (derek-jones)

Half-life of software as a service, services November 28, 2018 12:07 AM

How is software used to provide a service (e.g., the software behind gmail) different from software used to create a product (e.g., sold as something that can be installed)?

This post focuses on one aspect of the question, software lifetime.

The Killed by Google website lists Google services and products that are no more. Cody Ogden, the creator of the site, has open sourced the code of the website; there are product start/end dates!

After removing 20 hardware products from the list, we are left with 134 software services. Some of the software behind these services came from companies acquired by Google, so the software may have been used to provide a service pre-acquisition, i.e., some calculated lifetimes are underestimates.

The plot below shows the number of Google software services (red) having a given lifetime (calculated as days between Google starting/withdrawing service), mainframe software from the 1990s (blue; only available at yearly resolution), along with fitted exponential regression lines (code+data):

Number of software systems having a given lifetime, in days

Overall, an exponential is a good fit (squinting to ignore the dozen red points), although product culling is not exponentially ruthless at short lifetimes (newly launched products are given a chance to prove themselves).

The Google service software half-life is 1,500 days, about 4.1 years (assuming the error/uncertainty is additive, if it is multiplicative {i.e., a percentage} the half-life is 1,300 days); the half-life of mainframe software is 2,600 days (with the same assumption about the kind of error/uncertainty).

One explanation of the difference is market maturity. Mainframe software has been evolving since the 1950s and probably turned over at the kind of rate we saw a few years ago with Internet services. By the 1990s things had settled down a bit in the mainframe world. Will software-based services on the Internet settle down faster than mainframe software? Who knows.

Based on this Google data, the cost/benefit ratio when deciding whether to invest in reducing future software maintenance costs, is going to have to be significantly better than the ratio calculated for mainframe software.

Software system lifetime data is extremely hard to find (this is only the second set I have found). Any pointers to other lifetime data very welcome, e.g., a collection of Microsoft product start/end dates :-)

November 27, 2018

Wallaroo Labs (chuckblake)

Using Wallaroo with PostgreSQL November 27, 2018 12:00 PM

Introduction In the last blog post, I gave an overview of our new Connectors APIs, discussing how they work under the hood and going over some of the examples we provide. In this post, we are going to take a deeper dive into how to build a connector to pull data from PostgreSQL. We’ll also talk a bit about some of the different approaches for building both external source and sink connectors.

Indrek Lasn (indreklasn)

Well written article! November 27, 2018 09:41 AM

Well written article!

November 26, 2018

Jan van den Berg (j11g)

The Good, the Bad and the Ugly of Red Dead Redemption 2 November 26, 2018 09:12 PM

It’s been over a week since I ‘finished’ the most anticipated game of 2018 (or maybe even of the last five years); Red Dead Redemption 2. I say finished in quotes, because I clocked in around 50 hours in a little less than three weeks just to finish the main campaign, but the more I learn about the game after finishing, the more I become convinced I haven’t even seen 10% of what there is to do and see in this enormous game. Still, I have some thoughts on this game, especially compared to the original Red Dead Redemption, which is among my favorite games of all time.

So let’s dissect in true Western fashion, here’s the Good, Bad and Ugly of RDR2. (Beware, major spoilers are ahead).

The Good

Let me just start by pointing out that there is very little wrong with this game. This is a triple A game by a triple A studio who actively tried to push boundaries with their latest title and succeeded. So you can call it a quad A game if you like, I will not argue with you.

  • Details

What every blog, vlog and review has already pointed out, is that the attention to detail in this game is just off the charts. There are many moments where you think “wait, they really thought about that, that is insane!”. My default mode is already that I am perpetually in awe by open world games. I always think that every piece of grass, every rock, every doorknob has been designed, created by someone and put there specifically. And considering the scale of most open world games, it never ceases to blow my mind. And what RDR2 does is take this to another level. Not just the size of the game (I mean the entire map of RDR1 is playable, are you kidding me?), but also the decisions and possibilities seem endless (everyone plays as Arthur but nobody looks alike, the combination of clothes and hairstyles seem infinitive). It is absolutely mind-boggling to me to think that people have created this, translated it to zeros and ones, put those on a disc that you can buy and put in a magical device which allows you to enter and participate in an entirely made up world. Just to get an idea of the level of detail, you can watch this video or many others about (hidden) details in this game.

  • Music/Soundtrack

This game has at least three original songs that could easily chart by themselves. But apart from these very well placed songs in the game story, it’s the instrumental songs in fight missions  where I think I had the most fun. Running, shooting, horse riding while the most classic Western cowboy tunes you can think of accompany your actions. I love it.

Check out That’s The Way It Is by Daniel Lanois (number 1) and May I? Unshaken by D’Angelo (number 6) from this list:

Or, Cruel, Cruel World by Willie Nelson:

  • Combat system

The original combat system from RDR1 is more or less unchanged in RDR2, which is the right decision. This system offers great balance between story and action. Meaning the game itself is not particularly difficult, and that is a good thing. Countless reloading of end-boss missions as most games do is tedious and frustrating and does damage to the story immersion. RDR2 does this just right.

  • Story 

Games I enjoy the most, tend to be immersive story games. This is the reason why I like RDR1 so much. RDR2 has another great immersive story. It is not specifically unique, but it’s deep and broad and interesting enough to keep you engaged for 107 missions. And that’s quite a feat.

  • Characters

There is no other game where so much attention has been given to the NPCs. They really seem to have their own life, particularly the fellow gang members. Usually great games, at most, have a partner backstory to the protagonist/ antagonist story. But not RDR2. Everything seems to be full of life and people and animals minding their own business. Things happen without you having anything to do with it. You are not playing the game so much as you are participating in the story.

These five good aspects combined I’d like to argue that RDR2 is not a game in the classic sense, but it is not a movie either. It is something else. With lack for a better word it is a hybrid of a movie and a game. Playing RDR2 means you participate in a life-action movie where you are the director. It is a unique experience.

There is a great NY Times piece, that more or less states the same, titled: Red Dead Redemption 2 Is True Art. I recommend reading it.

Bonus: personal favorite part

As a big fan of the first RDR, the part where John Marston gets his outfit from the first game was a goosebump part for me.

The Bad

  • Arthur dies

It’s pretty clear from the beginning, Arthur will die. And if you played RDR1 you know this for a fact, because there is no Arthur in RDR1. And maybe that’s part of the reason I never really felt as emotionally connected to Arthur as to John Marston from RDR1. It wasn’t until towards the end when Arthur falls out with Dutch that I started liking him more, but I think it was too late by then. Because not soon after, Arthur dies. But even before that, Arthur could sometimes be quite the asshole. In missions where he has to beat up some poor dad in front of his wife and kids, I particularly didn’t like that I had no say in this and it certainly didn’t help to increase his likeability. Also the backstory for Arthur is unclear (to me), how did he and Dutch meet, what happened there? So, even after spending 50 hours with Arthur and of course generally liking him and feeling bummed out in how he died (why!?) I didn’t really miss Arthur after that, like the first RDR. I remember after finishing RDR1 it struck a chord and I thought about it for a couple of days.

  • Redemption?

And this is where the comparison with the first RDR comes in. John Marston from RDR1 is one of the greatest game characters of all time. He is the torn and wounded protagonist who is trying his best to put his bad past behind him. He is looking for redemption! It was just that bad luck (or bandits) seemed to follow him around. This context provides a great and rich story and character paradox. As stated above, with Arthur the actual redemption part is short-lived and a bit unfulfilling.

The John Marston of RDR2 comes across as a young loose cannon, with his heart in the right place nonetheless. So that seems to fit with the first RDR which, in timeline, follows after RDR2. Because he is just a little bit older and wiser and determined to do the right thing. So the John in RDR1 is much more grounded in what is good and bad (and therefore more likeable).

Speaking about great game characters!

The Ugly

  • Finale: choice

Throughout the game there is plenty of freedom of how to go about it. Sometimes there are choices that might or might not impact the outcome of a (side)mission, but they all seem to be insignificant (no great impact on the main story). So I was rather surprised when in the final mission of the Arthur timeline I had to make a REALLY CRITICAL decision. To go with John or to go back for the money. I thought this was really unbalanced with the rest of the story. Because what happened was that during the following scene part of my brain kept thinking: have I made the right choice, what would happen if I had chosen the other option? And that’s not really what you want to be thinking about going into the final mission. I don’t understand why the game/story designers put this there. I did however make the right choice, so much became clear after googling the horrible alternative ending (do so at your own risk).

  • Dialogue

These rampaging gang members and their criminal ways seemed to cause less of a problem for me than the potty mouthed conversations. There is a lot of comedy and character development within the dialogue, which is certainly a strong suite of this game, but the swearing for swearing sake? For a game with this level of attention to detail (and accents, clothes and horses etc.) I would think that this is not how they talked back then. I am not a big fan of gore and violence either, but to a point I can understand what’s needed for a game like this. But the dialogue was sometimes just unnecessarily over the top with profanity.

  • Trees

So here is a nitpick. I did not like the trees, I said so from the first minutes, and I still thought so when finishing the game. The game is gorgeous and the level of attention to detail is astounding, the shoes, the spurs, the saddles, the faces, the fingernails, I have never seen it before in a game. But the trees? I think I’ve seen better.  So maybe in RDR3 they’ll fix that?

As we await RDR Online, there are still hundreds of possible gaming hours left in this incredible game, I mean I still have to explore all of New Austin (and I will get to it as soon as I finish Undertale)! So, to point back to the NY Times article about RDR2, it features this great quote: “As a technical achievement, it has no peers.” I strongly agree. But from an emotional engagement point of view, in my memory I still think the first RDR has one up on RDR2.

The post The Good, the Bad and the Ugly of Red Dead Redemption 2 appeared first on Jan van den Berg.

Derek Jones (derek-jones)

Ecosystems as major drivers of software development November 26, 2018 01:59 AM

During the age of the Algorithm, developers wrote most of the code in their programs. In the age of the Ecosystem, developers make extensive use of code supplied by third-parties.

Software ecosystems are one of the primary drivers of software development.

The early computers were essentially sold as bare metal, with the customer having to write all the software. Having to write all the software was expensive, time-consuming, and created a barrier to more companies using computers (i.e., it was limiting sales). The amount of software that came bundled with a new computer grew over time; the following plot (code+data) shows the amount of code (thousands of instructions) bundled with various IBM computers up to 1968 (an anti-trust case eventually prevented IBM bundling software with its computers):

Instructions contained in IBM computers shipped during the 1960s.

Some tasks performed using computers are common to many computer users, and users soon started to meet together, to share experiences and software. SHARE, founded in 1955, was the first computer user group.

SHARE was one of several nascent ecosystems that formed at the start of the software age, another is the Association for Computing Machinery; a great source of information about the ecosystems existing at the time is COMPUTERS and AUTOMATION.

Until the introduction of the IBM System/360, manufacturers introduced new ranges of computers that were incompatible with their previous range, i.e., existing software did not work.

Compatibility with existing code became a major issue. What had gone before started to have a strong influence on what was commercially viable to do next. Software cultures had come into being and distinct ecosystems were springing up.

A platform is an ecosystem which is primarily controlled by one vendor; Microsoft Windows is the poster child for software ecosystems. Over the years Microsoft has added more and more functionality to Windows, and I don’t know enough to suggest the date when substantial Windows programs substantially depended on third-party code; certainly small apps may be mostly Windows code. The Windows GUI certainly ties developers very closely to a Windows way of doing things (I have had many people tell me that porting to a non-Windows GUI was a lot of work, but then this statement seems to be generally true of porting between different GUIs).

Does Facebook’s support for the writing of simple apps make it a platform? Bill Gates thought not: “A platform is when the economic value of everybody that uses it, exceeds the value of the company that creates it.”, which some have called the Gates line.

The rise of open source has made it viable for substantial language ecosystems to flower, or rather substantial package ecosystems, with each based around a particular language. For practical purposes, language choice is now about the quality and quantity of their ecosystem. The dedicated followers of fashion like to tell everybody about the wonders of Go or Rust (in fashion when I wrote this post), but without a substantial package ecosystem, no language stands a chance of being widely used over the long term.

Major new software ecosystems have been created on a regular basis (regular, as in several per decade), e.g., mainframes in the 1960s, minicomputers and workstation in the 1970s, microcomputers in the 1980s, the Internet in the 1990s, smart phones in the 2000s, the cloud in the 2010s.

Will a major new software ecosystem come into being in the future? Major software ecosystems tend to be hardware driven; is hardware development now basically done, or should we expect something major to come along? A major hardware change requires a major new market to conquer. The smartphone has conquered a large percentage of the world’s population; there is no larger market left to conquer. Now, it’s about filling in the gaps, i.e., lots of niche markets that are still waiting to be exploited.

Software ecosystems are created through lots of people working together, over many years, e.g., the huge number of quality Python packages. Perhaps somebody will emerge who has the skills and charisma needed to get many developers to build a new ecosystem.

Software ecosystems can disappear; I think this may be happening with Perl.

Can a date be put on the start of the age of the Ecosystem? Ideas for defining the start of the age of the Ecosystem include:

  • requiring a huge effort to port programs from one ecosystem to another. It used to be very difficult to port between ecosystems because they were so different (it has always been in vendors’ interests to support unique functionality). Using this method gives an early start date,
  • by the amount of code/functionality in a program derived from third-party packages. In 2018, it’s certainly possible to write a relatively short Python program containing a huge amount of functionality, all thanks to third-party packages. Was this true for any ecosystems in the 1980s, 1990s?

An ecosystems reading list.

Ponylang (SeanTAllen)

Last Week in Pony - November 25, 2018 November 26, 2018 01:05 AM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, our users’ mailing list or join us on IRC.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Alex Wilson (mrwilson)

Notes from the Week #10 November 26, 2018 12:00 AM

I’m a bit pleased that I’ve managed to keep up weeknoting for 10 weeks now — I’m usually rubbish at building and maintaining habits.

Hubble, Hubble, Toil, and Trouble

Whilst I lead the Shift team, my “first team” in the Lencioni-sense is the Team of Team Leads. As a result of our highly-collaborative structure, we’re great at building rapport within our own teams as a result of working closely together day-in-day-out.

The Team of Team Leads (or TTL) however don’t spend that much time together, as we’ve got our own teams to be concerned with, but we’re trying to become a better team in the truest sense of the word — one way we’re doing that is to just spend time with each-other and bond, to understand eachothers’ motivations, contexts, and problems.

On Tuesday the TTL took a day-trip to the Science Museum. There was no plan (not even for when we’d take lunch!), forcing us to self-organise.

  • We ended up having a good ol’ wander around the exhibits
  • I learned some cool facts about foghorns
  • We took advantage of the Science Museum’s IMAX cinema to catch a short film about a Hubble repair mission in 3D — absolutely gorgeous and fascinating. (Even in space, there’s no escape from “Just hit it with a hammer until it does the thing”)
  • We shared trivia about ourselves during lunch, including things like “What’s the strangest dream you’ve had recently?”
  • We did end up coalescing into separate conversations but they were rich, deep, and left me with lots to think about.

All in all, a rewarding and fun day out!

Smokin’ on the Dock(er) of the bay

Last week I mentioned that we were going to try using containers to speed up the feedback loop of our developed-in-the-open Puppet code. In traditional Shift-style we kept a detailed log of the experiment and its objectives and I’m pleased with how it’s turned out.

We’ve tried dedicated testing solutions like Beaker and Kitchen before, but in the end all we needed was a very simple smoke test evaluation loop:

  1. Build test image containing SystemD and Puppet
  2. Copy module code into container
  3. Run puppet apply against a test manifest using the module
  4. Verify that the exit code is 2

The reason we test for exit code 2 rather than 0 is that puppet apply has non-standard exit codes - 2 means changes were applied and there were no errors, whereas 0 means no changes applied.

Our aims, which we fulfilled, are below:

  • Feedback loop for testing and implementing code is less than 5min.
  • Spin up/down of new container takes less than 1min.
  • Puppet can apply manifests cleanly.
  • We can develop in the open.
  • Provide a potential smoke-test environment for further developing production systems.
  • Timeboxed. If the working cost of setting up a Docker container for smoke tests was prohibitive we would abandon the experiment.

The spin-up/spin-down of the container is actually fast enough that the run-time of the smoke tests are dominated by the application of the manifest (installing packages, etc).

We chose to develop this in the open because we’re being strict about separating configuration from data, and in this case there’s no reason not to — check it out on our public GitHub!

Papers, Please

This week was the paper review deadline for ICSE 2019.

I’m honoured to be one of the program committee for the Software Engineering in Practice track and while I can’t talk about the papers that are coming through the system, I will say that there are some absolute bangers this year and I’m looking forward to spending a week in Montréal next year at the conference.

P.S. if anyone has any good food recommendations for Montréal, please let me know!

Originally published at blog.probablyfine.co.uk on November 26, 2018.

November 25, 2018

Stig Brautaset (stig)

Extracting Minecraft Music with Python November 25, 2018 09:23 PM

I create a Python script to extract music files from Minecraft's assets.

My hiring experience as a submarine sonar operator in the Norwegian Navy November 25, 2018 07:43 PM

This is a transcript of a lightning talk I gave at a company "all hands" get-together. I tell the story of my "hiring experience" as a sonar operator on a submarine in the Norwegian Navy. To tie it into work, I included a little hiring-related lesson at the end.

November 24, 2018

Bit Cannon (wezm)

Stardew Valley on FreeBSD November 24, 2018 11:57 PM

In, A Year Away From Mac OS, I wrote about my switch to FreeBSD on my desktop computer and noted one of the downsides was losing Stardew Valley:

I initially missed playing the game Stardew Valley on FreeBSD. It was consuming a few hours of my time each week prior to the FreeBSD install. The extra friction of rebooting into Arch to run the game basically stopped me playing, which wasn’t entirely a bad thing. There was some recent progress running Stardew Valley on OpenBSD so I could look into porting that work… I have enough side projects as it is though.

Fortunately Mariusz Zaborski (oshogbo) did the porting work and you can now play Stardew Valley (and other games) on FreeBSD. In this post I’ll describe the steps I took to get it running.

Stardew Valley running on FreeBSD

Prerequisites

You will need the following packages installed

Note: To get audio to work I had to build openal-soft from ports with pulseaudio support enabled, as this was not in the default configuration.

You will also need the FreeBSD port of the fnaify script, which can be cloned or downloaded from GitHub.

git clone https://github.com/oshogbo/fbsd-fnaify.git

fnaify

Assuming you have purchased Stardew Valley from GOG. Login and visit your games collection, click on Stardew Valley, change the system to Linux and download the game.

Once the download has finished, extract it with unzip. Note that the file name may differ if there has been an update to the game:

cd ~/Downloads
mkdir stardew-valley
cd stardew-valley
unzip ../stardew_valley_1_3_32_25307.sh

Now run fnaify to check that all the dependencies are installed, fix up paths to system libraries and create a new launcher script. Change the path to fnaify to match where you cloned/downloaded it:

cd data/noarch/game
~/Source/fbsd-fnaify/fnaify

You should see output similar to this:

Checking installed libraries...

Result of configuration testing: SUCCESS

Adjusting config files for BSD...
Replacing launcher script with BSD variant...

You should now be able to start the game by running:

$ ./StardewValley

As the script suggests you should now be able to start the game by running: ./StardewValley. Happy farming!

Derek Jones (derek-jones)

Polished statistical analysis chapters in evidence-based software engineering November 24, 2018 11:56 PM

I have completed the polishing/correcting/fiddling of the eight statistical analysis related chapters of my evidence-based software engineering book, and an updated draft pdf is now available (download here).

The material was in much better shape than I recalled, after abandoning it to the world 2-years ago, to work on the software engineering chapters.

Changes include moving more figures into the margin (which is responsible for a lot of the reduction in page count), fixing grammatical typos, removing place-holders for statistical techniques that are unlikely to be of general interest to software engineers, and mostly minor shuffling around (the only big change was moving a lot of material from the Experiments chapter to the Statistics chapter).

There is still some work to be done, in places (most notably the section on surveys).

What next? My collection of data waiting to be analysed has been piling up, so I will spend the next month reducing the backlog.

The six chapters covering the major areas of software engineering need to be polished and fleshed out, from their current bare-bones state. All being well, this time next year a beta release will be ready.

While working on the statistical material, I have been making monthly updates to the pdf+data available. If it makes sense to do this for the rest of the material, then it will happen. I’m not going to write a blog post every month; perhaps a post after what look like important milestones.

As always, if you know of any interesting software engineering data, please tell me.

Unrelenting Technology (myfreeweb)

what's this? :) November 24, 2018 09:45 PM

what's this? :)

Your browser does not support the video tag. Click here to view directly.

November 23, 2018

Jeff Carpenter (jeffcarp)

词典中找不到的单词 November 23, 2018 11:44 PM

你好!学习中文的时候,我发现了有的单词不可能在词典中找到。 以下是这些单词。 叽叽歪歪 词典也,谷歌翻译也没有翻译。 在网上搜索以后,我觉得叽叽歪歪用来形容话很多的人。我觉得叽叽歪歪是象声词。 半袖 我的词典有「长袖」和「短袖」但是它没有「半袖」(”half-sleeve shirt”)。 我在中文课学了这个单词。 蓝领,白领,金领 英文也有蓝领(”blue collar”)和白领(”white collar”), 但是英文也应该有金领(”gold collar”)。VP还有CEO是金领的人。 白白胖胖 这个可笑的单词-你可能会形容一个宝宝“白白胖胖”。 谢谢阅读我的博客! 杰夫

November 22, 2018

Derek Jones (derek-jones)

Waiting for the funerals: culture in software engineering research November 22, 2018 03:19 PM

A while ago I changed my opinion about why software engineering academics very rarely got/get involved in empirical/experimental based research.

I used to think it was because commercial data was so hard to get hold of.

In practice commercial data does not seem to be that hard to get hold of. At least for academics in business schools, and I have not experienced problems gaining access to commercial data (but it is very hard finding a company willing to allow me to make an anonymised version of its data public). There are many evidence-based papers published using confidential data (i.e., data that cannot be made public).

I now think the reasons for non-evidence-based research are culture and preference for non-people based research.

In the academic world the software side of computing often has a strong association is mathematics departments (I know that in some universities it is in engineering). I have had several researchers tell me that it would raise eyebrows, if they started doing more people oriented research, because this kind of research is viewed as being the purview of other departments.

Software had its algorithm era, which is now long gone; but unfortunately, many academics still live in a world where the mindset of TEOCP holds sway.

Baffled looks are common, when I talk to software engineering academics. They are baffled by the idea that it is possible to run experiments in software engineering, and they are baffled by the idea of evidence-based theories. I am still struggling to understand the mindset that produces the arguments they make against the possibility of experiments and evidence being useful.

In the past I know that some researchers have had problems getting experiment-based papers published. Hopefully this problem is now in the past, given that empirical/experimental papers are becoming more common.

Max Planck, one of the founders of quantum mechanics, found that physicists trained in what we now call classical physics, were not willing to teach or adopt a quantum mechanics world view; Planck observed: “Science advances one funeral at a time”.

Grzegorz Antoniak (dark_grimoire)

Bringing Visual Studio compiler into MSYS2 environment November 22, 2018 06:00 AM

Hi,

Using Windows is a painful experience for a command-line oriented person like me. Unfortunately, Windows doesn't offer any decent command-line oriented environment, because well, it has different goals, and that's understandable. The problem is when one is forced to use Windows when it's clear that those goals are incompatible …

November 21, 2018

Caius Durling (caius)

Prefixing Git Branch With Initials November 21, 2018 08:30 PM

Working somewhere where we prefix our branches with the creator's initials, I sometimes forget to do so.1 This leads to me having to rename the branch, typing out the whole name again after adding cd/ to the start of it.

Computers are meant to solve repetitive problems for us, so let's put it to work in this case too. My ~/bin contains git current-branch, which returns the current branch name.

If we hardcode the initials, this becomes a simple command to recall from our history:2

git branch --move --force cd/$(git current-branch)

But computers are supposed to solve all repetitive work, including knowing who I am, right? Correct, my local user account knows my full name, so we can work out my initials from that. Lets lean on the id(1) command to lookup the user's details then strip it down to just the initials.34

id -F
# => "Caius Durling"

id -F | sed -Ee 's/(^| )(.)[^ ]+/\2/g' | tr 'A-Z' 'a-z'
# => cd

Bingo, we can wrap that into a subshell passed to the branch move command and we're done in a one-liner.

git branch --move --force "$(id -F | sed -Ee 's/(^| )(.)[^ ]+/\2/g' | tr 'A-Z' 'a-z')/$(git current-branch)"

  1. I don't follow that policy for my personal repos, or working on forks of other people's code. And I'm human, so I forget. [return]
  2. You can also replace --move --force with -M: git branch -M newname [return]
  3. On macOS you can use id -F to return the full name of the user. Doing this on other platforms is left as an exercise for the reader. [return]
  4. Yes, this is an incredibly naive way to initialize a name, but it's good enough for the people I work with. Handling edge cases is left as … you got it, an exercise for the reader. [return]

Grzegorz Antoniak (dark_grimoire)

Generating preprocessed sources in CMake projects November 21, 2018 06:00 AM

Hi,

It appears you can use CMake to quickly generate preprocessed source of your chosen .cpp file without modifying the CMakeLists.txt project file. This trick works only when your generator is make -- meaning, it won't work with the ninja builder.

But still, if you're using make as your generator …

Derek Jones (derek-jones)

Some pair programming benefits may be mathematical artefacts November 21, 2018 01:42 AM

Many claims are made about the advantages of pair programming. The claim that the performance of pairs is better than the performance of individuals may actually be the result of the mathematical consequences of two people working together, rather than working independently (at least for some tasks).

Let’s say that individuals have to find a fault in code, and then fix it. Some people will find the fault and then its fix much more quickly than others. The data for the following analysis comes from the report Experimental results on software debugging (late Rome period), via Lutz Prechelt and shows the density of the time taken by each developer to find and fix a fault in a short Fortran program.

Fixing faults is different from many other development tasks in that if often requires a specific insight to spot the mistake; once found, the fixing task tends to be trivial.

Density plot of time taken to find a fault by developers.

The mean time taken, for task t1, is 22.2 minutes (standard deviation 13).

How long might pairs of developers have taken to solve the same problem. We can take the existing data, create pairs, and estimate (based on individual developer time) how long the pair might take (code+data).

Averaging over every pair of 17 individuals would take too much compute time, so I used bootstrapping. Assuming the time taken by a pair was the shortest time taken by the two of them, when working individually, sampling without replacement produces a mean of 14.9 minutes (sd 1.4) (sampling with replacement is complicated…).

By switching to pairs we appear to have reduced the average time taken by 30%. However, the apparent saving is nothing more than the mathematical consequence of removing larger values from the sample.

The larger the variability of individuals, the larger the apparent saving from working in pairs.

When working as a pair, there will be some communication overhead (unless one is much faster and ignores the other developer), so the saving will be slightly less.

If the performance of a pair was the mean of their individual times, then pairing would not change the mean performance, compared to working alone. The performance of a pair has to be less than the mean of the performance of the two individuals, for pairs to show an improved performance.

There is an analytic solution for the distribution of the minimum of two values drawn from the same distribution. If f(x) is a probability density function and F(x) the corresponding cumulative distribution function, then the corresponding functions for the minimum of a pair of values drawn from this distribution is given by: F_p(x)=1-(1-F(x))^2 and f_p(x)=2f(x)(1-F(x)).

The presence of two peaks in the above plot means the data is not going to be described by a single distribution. So, the above formula look interesting but are not useful (in this case).

When pairs of values are drawn from a Normal distribution, a rough calculation suggests that the mean is shifted down by approximately half the standard deviation.

November 20, 2018

Jeff Carpenter (jeffcarp)

How to Solve Every Software Engineering Interview Question November 20, 2018 10:13 PM

The Googleplex at dusk This post unfortunately does not contain a secret skeleton key that will unlock every tricky Software Engineering interview question. What’s below is a framework that you can apply to every interview question that will set you up for success every time. Software engineering interviews are not primarily about seeing if you can pull the #1 most perfect solution to a problem out of your hat.

Gustaf Erikson (gerikson)

November 19, 2018

Gustaf Erikson (gerikson)

A Trilogy of Trilogies: re-reading William Gibson November 19, 2018 07:41 PM

Sprawl

NeuromancerCount ZeroMona Lisa Overdrive

Bridge

Virtual LightIdoruAll Tomorrow’s Parties

Blue Ant

Pattern RecognitionSpook CountryZero History

It’s hard to overstate the effect Gibson’s fiction had on me as a young person. While I never went as far as dressing as a cyberpunk (like some people I hung out with), the ideas and images from Neuromancer were permanently burned into my brain.

Re-reading all of these books on a holiday was instructive. I’ve re-read some of them multiple times (probably Count Zero most often, but I’ve only read Zero History once) and that makes it pretty easy to re-read quickly.

One thing that stands out is that while the Sprawl and Bridge trilogies have protagonists who are from the underclass of society, Blue Ant makes a sharp swing into the upper middle class for its main characters. This, in conjunction with them being in the employ of the unimaginably wealthy Hubertus Bigend, lead to the books resembling some sort of technothriller Sex in the City (except there’s very little actual sex).

I think Gibson realized this and the next novel The Peripheral returns to the theme of the underclass confronting tech and society.

I like the second books in each trilogy better than the first or third. Idoru especially is good with its discussion of celebrity culture.

Technology - Gibson is credited with inventing “cyberspace”, but the “shared consensual hallucination” as depicted in the Sprawl books makes no goddamn sense from a user perspective. It’s a bit like the famous “I know Unix!” scene in Jurassic Park - great eye candy for someone who doesn’t know how a computer works, but not really productive.

But there’s so much else in the Sprawl books that’s just there, standard SF for the time. Orbital space platforms. Super-fast SST planes. No global warming. Sure, the US seems to have collapsed and the Eastern Seaboard is one vast shantytown, but the reasons for that are more because Blade Runner is cool, not really explained. And people still smoke, and read papers, even if it’s delivered by fax.

And a secondary plot point - the ability to virtually inhabit someone else’s entire sensorium - is so far away from anything we have now, it’s not even funny. Come to think of it, so are orbital space stations.

Sadly as Gibson nears the “real world” in the later books, the tech gets less gee-wiz and more dated. Sure, cyberspace is useless but it’s so goddamn cool. The later books are instantly anchored in time with specific Apple products and forum software versions.

That’s why the earlier books have aged better. Sure it’s funny to point and laugh at the things Gibson “got wrong” - but why not see them as an alternate future that diverged sometime in the 1970s? (This is explored in Gibson’s short story “The Gernsback Continuum”, so it’s a propos.)

In summary, I actually think the Bridge trilogy is where Gibson is at his best - lucid writing, telling stories about hard-luck characters trying to do good in a crapsack world, and a great mix of plausible and sensawunda SF.

Pete Corey (petecorey)

Elixir Mix November 19, 2018 12:00 AM

I was lucky enough, recently, to be given the opportunity to speak with Mark Ericksen and Josh Adams on an episode of Elixir Mix. We covered a wide range of Elixir-related topics like binary manipulation and pattern matching, Erlang’s extensive standard library, and property-based testing.

A good portion of our time together was spent going over the progress I’ve made implementing my in-progress Bitcoin full node. I think the fact that we had so much to talk about in the context of that project goes to show that it’s a fantastic platform to show off many of Elixir’s strengths. Discussing it with Mark and Josh rekindled my fire for the project, and I’m hoping to make more progress soon!

We also talked a bit about playing guitar and Chord, my current Elixir-based passion project.

I highly recommend you check out the podcast, and sign up to hear more from Mark, Josh, and the entire Elixir Mix crew. Also, be sure to check out my “pick”, The Sparrow by Mary Doria Russell, if you’re into science fiction and you’re looking for a new book to read. I’m still thinking about the book weeks after finishing it.

Thanks again for having me on!

Property Testing a Permutation Generator November 19, 2018 12:00 AM

Last time we spent some time writing a function to generate permutations of length k of a given list. Our final solution was fairly concise, but there are quite a few places where we could have made a mistake in our implementation.

Our first instinct might be to check our work using unit tests. Unfortunately using unit tests to check the correctness of a permutation generator leaves something to be desired. The length of our resulting set of permutations grows rapidly when k and list increase in size, making it feasible to only manually calculate and test the smallest possible permutations.

Thankfully there’s another way. We can use property testing to test the underlying properties of our solution and the permutations we’re generating. This will give us quite a bit more variation on the inputs we test and might uncover some hidden bugs!

Our Permutation Generator

The permutation generator that we’ll be testing looks like this:


defmodule Permutation do
  def generate(list, k \\ nil, repetitions \\ false)
  def generate([], _k, _repetitions), do: [[]]
  def generate(_list, 0, _repetitions), do: [[]]

  def generate(list, k, repetitions) do
    for head <- list,
        tail <- generate(next_list(list, head, repetitions), next_k(k)),
        do: [head | tail]
  end

  defp next_k(k) when is_number(k),
    do: k - 1

  defp next_k(k),
    do: k

  defp next_list(list, _head, true),
    do: list

  defp next_list(list, head, false),
    do: list -- [head]
end

You’ll notice that it’s a little different than the generator we built last time. This generator supports the creation of permutations both with and without repetitions of elements from list, and lets you optionally pass in the length of the final permutations, k.

We can use our Permutation.generate/3 function like so:


iex(1)> Permutation.generate([1, 2, 3])
[[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]

We can also specify a value for k:


iex(2)> Permutation.generate([1, 2, 3], 2)
[[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]

And we can tell our generator to allow repetitions in the final permutations:


iex(3)> Permutation.generate([1, 2, 3], 2, true)
[[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]

All looks good so far, but dragons hide in the depths. Let’s dig deeper.

List of Lists

Permutations have some fairly well-known and easily testable properties. For example, we know that the results of our calls to Permutation.generate/3 should always have the structure of a list of lists.

Using Elixir’s StreamData package, we can easily model and check that this property holds for our Permuatation.generate/3 function across a wide range of inputs. Let’s start by creating a new property test to verify this for us:


property "returns a list of lists" do
end

We start by telling StreamData that we want it to generate lists of, at most, @max_length (which we’ll define at 5 for now) integers.


property "returns a list of lists" do
+  check all list <- list_of(integer(), max_length: @max_length) do
+  end
end

Next, we call our Permutation.generate/3 function to create the permutations of the list that was just generated for us:


  property "returns a list of lists" do
    check all list <- list_of(integer(), max_length: @max_length) do
+      permutations = Permutation.generate(list)
    end
  end

Finally we’ll make assertions about the structure of our resulting permutations. We want to assert that the result of our call to Permutation.generate/3 is a list, but also that every element in that result is a list as well:


property "returns a list of lists" do
  check all list <- list_of(integer(), max_length: @max_length) do
    permutations = Permutation.generate(list)
+    assert permutations |> is_list
+    Enum.map(permutations, &assert(&1 |> is_list))
  end
end

And that’s all there is to it. Running out test suite, we’ll see that our first property test passed with flying colors (well, mostly green).

.

Finished in 0.06 seconds
1 property, 0 failures

Correct Number of Permutations

Now that we know that the structure of our resulting list of permutations is correct, the next obvious property that we can test is that the number of permutations returned by our Permutation.generate/3 function is what we’d expect.

Permutations are a well-defined mathematical concept, and so a nice equation exists to determine how many k-length permutations exist for a list of n elements:


P(n, k) = n! / (n - k)!

Let’s write a quick factorial function to help calculate this value:


defp factorial(n) when n <= 0,
  do: 1

defp factorial(n),
  do: n * factorial(n - 1)

Let’s also rewrite our P(n, k) calculation as an Elixir helper function:


  defp pnk(list, k),
    do: div(factorial(length(list)), factorial(length(list) - k))

Great!

Now we’re set up to test that our Permutation.generate/3 function is giving us the correct number of permutations for a given list and value of k.


property "returns the correct number of permutations" do
end

This time we’ll generate our list, along with a value for k that ranges from 0 to the length of list:


property "returns the correct number of permutations" do
+  check all list <- list_of(integer(), max_length: @max_length),
+            k <- integer(0..length(list)) do
  end
end

Once we have values for list and k, we can generate our set of permutations and make an assertion about its length:


property "returns the correct number of permutations" do
  check all list <- list_of(integer(), max_length: @max_length),
            k <- integer(0..length(list)) do
+    assert pnk(list, k) ==
+            list
+            |> Permutation.generate(k)
+            |> length
  end
end

Once again, our tests pass.

..

Finished in 0.06 seconds
2 properties, 0 failures

Only Include Elements From the List

Another neatly testable property of the permutations we’re generating is that they should only contain values from the list being permutated. Once again, we’ll start by defining the property we’ll be testing and generate our values for list and k:


property "permutations only include elements from list" do
  check all list <- list_of(integer(), max_length: @max_length),
            k <- integer(0..length(list)) do
  end
end

Next, we’ll want to generate our set of permutations for list and k, and reject any permutations from that set that include values not found in list:


property "permutations only include elements from list" do
  check all list <- list_of(integer(), max_length: @max_length),
            k <- integer(0..length(list)) do
+    assert [] ==
+              list
+             |> Permutation.generate(k)
+             |> Enum.reject(fn permutation ->
+                [] ==
+                  permutation
+                 |> Enum.reject(&Enum.member?(list, &1))
+              end)
  end
end

We’re asserting that the resulting list of permutations should be an empty list ([]). There should be no permutations left that contain elements not found in list!

And, as expected, our suite still passes.

...

Finished in 0.08 seconds
3 properties, 0 failures

Use Each Item Once

Our current implementation of Permutation.generate/3 allows for duplicate items to be passed in through list. When we generate each possible permutation, it’s important that each of these duplicate items, and more generally any item in the list, only be used once.

That is, if list is [1, 2, 2], our set of possible permutations should look like this:


[[1, 2, 2], [1, 2, 2], [2, 1, 2], [2, 2, 1], [2, 1, 2], [2, 2, 1]]

Note that 2 is used twice in each permutation, but never more than twice.

At first, it seems like we might need a new test to verify this property of our permutation generator. It’s conceivable that we could group each set of equal elements, count them, and verify that the resulting permutation have the correct count of each element group. But that sounds complicated, and an added test just introduces more code into our codebase that we need to maintain.

It turns out that we can tweak our previous property test to verify this new property.

Instead of identifying duplicates, counting them, and verifying the correct counts in the final set of permutations, let’s take a simpler approach. Let’s ensure that each element in list is unique by using Elixir’s Enum.with_index/1 function to bundle the element with its index value.

For example, our previous [1, 2, 2] value for list would be transformed into:


[{1, 0}, {2, 1}, {2, 2}]

Now both of our 2 elements are unique. The first is {2, 1}, and the second is {2, 2}. Using this technique, we can recycle our “permutations only include elements from list” with a few slight tweaks:


  property "permutations only include elements from list" do
    check all list <- list_of(integer(), max_length: @max_length),
              k <- integer(0..length(list)) do
      assert [] ==
               list
+              |> Enum.with_index()
               |> Permutation.generate(k)
               |> Enum.reject(fn permutation ->
                 [] ==
                   permutation
-                  |> Enum.reject(&Enum.member?(list, &1))
+                  |> Enum.reject(&Enum.member?(list |> Enum.with_index(), &1))
               end)
    end
  end

And once again, our suite passes:


...

Finished in 0.08 seconds
3 properties, 0 failures

Final Thoughts

The current version of our property tests currently only test values of list composed entirely of integers. This doesn’t necessarily need to be the case. The list being permutated can contain literally anything, in theory. Expanding our property tests to support a wider range of types might be a great opportunity to try out writing a custom generator function!

Our current test suite also defines @max_length as 5. While testing, I noticed that values of @max_length up to 8 were very performant and finished in under a second on my machine. Running the suite with a @max_length of 9 took several seconds to complete, and using a value of 10 took nearly two minutes to come back green. I’m not sure if these performance problems can easily be improved, but I’m happy about how obvious they became through property testing.

You’ll also note that none of this testing covers generating permutations that allow infinite repetitions of elements from list. The properties for these sets of permutations are completely different, so I’m going to leave this as an exercise for the truly motivated reader.

I’m enjoying my experiences with property testing so far. Hopefully you find these write-ups useful as well. If so, let me know on Twitter!

Alex Wilson (mrwilson)

Notes from the Week #9 November 19, 2018 12:00 AM

The consistency of my mind feels akin to my thoughts being wrapped in some kind of cling-film as I’m coming out of last week’s cold, but the main theme of this week was coming back to things we’ve done or thought before.

Re(visitations)

Last year we attempted and subsequently abandoned a rebuild of our Nagios infrastructure. The reasons were many, including the state of the configuration being close to spaghetti, and the application’s ubiquity meant it had a lot of inertia and rebuilding it was quite dangerous — would we know if it wasn’t alerting properly?

We’re about to attempt this again, but now we feel confident that we can succeed:

  • The team are much more experienced in rebuilding our existing infrastructure
  • We’ve gotten good at delivering value in small slices and can much reason much more effectively about the dependencies within our systems
  • We’ve been deep in the Nagios weeds during our paging system rebuild to use Opsgenie (see last week’s weeknotes for more details)

I’m excited about this not only as we’ve been wanting to do this for ages but also because it showcases how much the team has grown in experience and confidence, to want to tackle this behemoth of our tech stack again.

Re(evaluations)

I have maintained a healthy suspicion of Docker for some time now — a combination of having been burned by it in the past (when its RHEL support was much less stable than now) and a general skepticism towards something that seemingly spreads like wildfire through a community.

While a strong advocate of “solve the problem you have, not the one you want to have”, we’re experimenting with it for a fast feedback loop on testing our infrastructure code, and so far the experience has been quite slick!

In this scenario, I’m happy to be a bit late to the party if most of the kinks have been worked out already.

Re(st) and Re(cuperation)

I am absolutely terrible at being even slightly ill.

I took sick leave on Friday and spent the weekend relaxing with family, and for the first time in a long time managed to get lost in a fiction book.

I’m reading Haruki Murakami’s Killing Commendatore — I’ve been a fan of Murakami’s books for years after a friend loaned me a copy of A Wild Sheep Chase.

Here’s to hoping that I’m able to continue my rediscovery of reading for pleasure!

Re(mixes)

I picked up on the excellent Mouth Moods this week, a continuous mashup-cum-remix of many unexpected combinations (see Drowning Pool’s Let The Bodies Hit The Floor vs Popcorn, and the innuendo-laden remix of Ghostbusters).

Originally published at blog.probablyfine.co.uk on November 19, 2018.

Noon van der Silk (silky)

Reliable training hack on the Google Colaboratory November 19, 2018 12:00 AM

Posted on November 19, 2018 by Noon van der Silk

Google’s Colaboratory is a hosted notebook environment, with access to GPUs, and even TPUs!

It’s really quite handy, but by far the biggest downside is that the sessions time out. It makes sense; I’m sure even Google can’t give out an unlimited amount of compute-resources for free to every person.

Background/Problem

On the weekend, I wanted to train a few sketch-rnn models on the quickdraw data.

Naively, I figured this would be really easy with Google colab. While it was straightforward to start training, what I noticed is that getting data on to and off of the instance was frustrating, and the timeouts blocked me from getting a good amount of training time.

Solution

Happily, colab supports very nice integration with Google services, so my plan was:

  1. Download data from Google Cloud Platform (GCP),
  2. Train, or continue training,
  3. Push a checkpoint to Google Drive occasionally,
  4. Repeat until happy.

Here’s how it looks, in code:

Download data from GCP

As I’m working with the quickdraw data, it’s already on the Google Cloud Platform, so this was very easy. In a cell, I simply ran the following to get the “eye” quickdraw data:

!gsutil cp gs://quickdraw_dataset/sketchrnn/eye.npz .

(Note that the gsutil command is already installed on the instance.)

Train, or continue training, and save to Drive

As I’m using the sketch_rnn model, I first simply install magenta (and I have to pick a Python 2 environment.)

!pip install magenta

Now, there’s some considerations. Recalling that I’m going to be pushing my checkpoints to Google Drive, I need to authenticate with Google Drive. This is how that looks:

from google.colab import auth
auth.authenticate_user()

Then you’ll be prompted to copy in a code. Once that’s done, you can connect to Google Drive like so:

from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

Now, if I’m training from scratch, I’ll run something like this:

!sketch_rnn_train --log_root=logs --data_dir=./ --hparams="data_set=[eye.npz],num_steps=501"

This will run for however long, and utlimately produce checkpoints in the ./logs folder, supposing that eye.npz exists in the present directory.

Once that’s completed, I start my main training-pushing loop. Firstly, there’s a bit of busywork to zip files, get the latest checkpoint number, and upload it to Google Drive:

import os
import zipfile
from googleapiclient.http import MediaFileUpload

def get_largest_num (dir="logs", prefix="vector"):
  
  files = os.listdir(dir)
  
  biggest = 0
  
  for f in files:
    if f.startswith(prefix):
      k = int( f.split(".")[0].split("-")[1] ) 
      if k > biggest:
        biggest = k
  
  return biggest


def zip_model (name, k):
  sk     = str(k)
  zipobj = zipfile.ZipFile(name + ".zip", "w", zipfile.ZIP_DEFLATED)

  files = [ "checkpoint"
          , "model_config.json"
          , "vector-" + sk + ".meta"
          , "vector-" + sk + ".index"
          , "vector-" + sk + ".data-00000-of-00001"]
  
  for f in files:
    zipobj.write("logs/" + f, f)



def upload_to_drive (name="model.zip"):
  file_metadata = {
    "name":     name,
    "mimeType": "binary/octet-stream" }

  media = MediaFileUpload(name, 
                          mimetype="binary/octet-stream",
                          resumable=True)

  created = drive_service.files().create(body=file_metadata,
                                         media_body=media,
                                         fields="id").execute()
  file_id = created.get("id")
  return file_id

Then, the main loop:

iterations = 200
for k in range(iterations):
  print("Iteration " + str(k))
  cmd = 'sketch_rnn_train --log_root=logs --resume_training --data_dir=./ ' + \
        ' --hparams="data_set=[eye.npz],num_steps=1001"'
  x = os.system(cmd)
  zip_model("model", get_largest_num())
  upload_to_drive()

So, all that does is run the main training command, to reload the model from the latest checkpoint and continue training, then zips and uploads!

Set the iterations to whatever you wish; chances are your instance will never run for that long anyway; the main point is to push up the checkpoints every-so-often (for me, every 1000 steps of the sketch_rnn model; which takes about 1 hour or so, depending on params.)

Brining down the most recent Drive checkpoint

Now, when your instance goes away, you’ll need to bring down the most recent checkpoint from Drive. I did this somewhat manually, but it works well enough:

# Mount Google Drive as a folder
from google.colab import drive
drive.mount('/content/gdrive')
# Extract latest model zip file
!cp /content/gdrive/My\ Drive/model\ \(3\).zip logs/model.zip && cd logs && unzip model.zip

Note that Google Drive numbers all the files as copies, like “model (4).zip”, “model (5).zip”, when you upload the same name. On the web interface, it only shows one file, but gives you history. Do as you wish here; I was a bit lazy.

That’s it!

Hope this helps you do some training!

You can read more about other ways to access data from Google Colaboratory here.

November 18, 2018

Derek Jones (derek-jones)

Christmas books for 2018 November 18, 2018 09:12 PM

The following are the really interesting books I read this year (only one of which was actually published in 2018, everything has to work its way through several piles). The list is short because I did not read many books and/or there is lots of nonsense out there.

The English and their history by Robert Tombs. A hefty paperback, at nearly 1,000 pages, it has been the book I read on train journeys, for most of this year. Full of insights, along with dull sections, a narrative that explains lots of goings-on in a straight-forward manner. I still have a few hundred pages left to go.

The mind is flat by Nick Chater. We experience the world through a few low bandwidth serial links and the brain stitches things together to make it appear that our cognitive hardware/software is a lot more sophisticated. Chater’s background is in cognitive psychology (these days he’s an academic more connected with the business world) and describes the experimental evidence to back up his “mind is flat” model. I found that some of the analogues dragged on too long.

In the readable social learning and evolution category there is: Darwin’s unfinished symphony by Leland and The secret of our success by Henrich. Flipping through them now, I cannot decide which is best. Read the reviews and pick one.

Group problem solving by Laughin. Eye opening. A slim volume, packed with data and analysis.

I have already written about Experimental Psychology by Woodworth.

The Digital Flood: The Diffusion of Information Technology Across the U.S., Europe, and Asia by Cortada. Something of a specialist topic, but if you are into the diffusion of technology, this is surely the definitive book on the diffusion of software systems (covers mostly hardware).

Ponylang (SeanTAllen)

Last Week in Pony - November 18, 2018 November 18, 2018 04:40 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, our users’ mailing list or join us on IRC.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Artemis (Artemix)

CSS Re-work November 18, 2018 12:00 AM

I made this entire website by myself, and, since CSS isn't my forte, I lost a lot of time monkey-patching.

The document grew a lot, and I'm convinced that there's a lot of mess, duplicate and badly designed parts.

The clear examples of that were around spacing in the title, and video iframes (e.g. in the google AMP article).

The biggest problem I encountered when designing the website was that I had no real knowledge about UX, typography or proper web design.

In that sense, I was just "placing" elements, and when I managed to place them right, it was "ok, done" for me.

Line spacing was fucked up, as for overflow, or font rendering, but it "worked".

Now that I'm working on redesigning my blog, I'm taking those lessons into account, and I'll try to make it much simpler, and adaptive.

# Analysing the existing structure

The overall blog layout is simple enough for me to not directly care about it.

Instead, I'll put my attention around the article, which is, after all, my main content!

I firstly analyzed all the "top-level" node types, for each article, which allowed me to build the following list.

[
    "p",
    "h1",
    "h2",
    "h3",
    "ol",
    "ul",
    "blockquote",
    "pre",
    "iframe"
]
An example tag analysis result on the Google AMP articleAn example tag analysis result on the Google AMP article

Example result with Google AMP. I decided to exclude the script tag, since it doesn't render anything.

The script tag is part of the vimeo player's iframe loading.

I then separated every content type into type categories: typography, blocks and lists.

{
    "typography": [
        "p",
        "h1",
        "h2",
        "h3",
        "h4",
        "h5",
        "h6"
    ],
    "lists": [
        "ol",
        "ul"
    ],
    "blocks": [
        "blockquote",
        "pre",
        "iframe",
    ]
}

Typography is the general-purpose text, lists is separated, because their spacing, alignment and such, requires extra caution; and finally, blocks are a completely different entity, as they'll have special rendering used.

With the current markdown renderer, the images were scoped inside a p block, which means that it was inline-level tag placing.

I needed to wrap the images inside figures instead, but that'll come later on.

In that sense, I decided to not care about images right away.

# First document schemas

Since we're re-designing everything from the ground up (while obviously trying to keep the design close to the "old" one), I worked with my favourite tool to build the schemas: My whiteboard.

This photo shows the overall page structure, as designed with typography and alignment in mind.This photo shows the overall page structure, as designed with typography and alignment in mind.

I decided to drop the two-columns footer, which was filled with too much informations, to keep it simple and light to read.

This is the resulting footer, made simpleThis is the resulting footer, made simple

After integrating the synaxic-coloration CSS rules, the website starts to look like something!

An example code snippet, syntaxic-colouredAn example code snippet, syntaxic-coloured

# Rules fixing

I decided to change some rules to add custom behaviour.

The first one is obviously images, wrapping them in <figure></figure> tags, and using the alt text as figure caption.

This was a bit tricky, since I just discovered how to tweak the markdown library I used (which is Marked), but the following code snippets managed to do what I wanted to do.

The hardest bit was the fact that img is an inline tag, which means that I also had to overwrite the paragraph renderer.

const renderer = new marked.Renderer();
renderer.image = (href, _, alt) =>
    `<figure><img src="${href}" alt="${alt}" /><figcaption>${alt}</figcaption></figure>`;
renderer.paragraph = text =>
    (text.startsWith('<figure') && text.endsWith('</figure>')) ? `${text}\n` : `<p>${text}</p>\n`;

The second feature I wanted to add was an auto-prefix for heading tags, to allow easy anchor-link picking (using the well known #).

renderer.heading = (text, level) => {
    const escapedText = slugify(text);

    return `<h${level}><a name="${escapedText}" href="#${escapedText}">#</a> ${text}</h${level}>\n`;
};

# Video

Vimeo offers a quality service, but one thing's bothering me: I need to use their player.

Thing is, it's an iframe, requiring javascript and pinging a few domains. Way too heavy for me.

Thankfully, after some little research, Cloudinary is the best service to get a direct video feed, which I can then directly put inside a HTML5 video tag!

With that came two bonuses:

  • The ability to resize a video "on-the-fly", allowing me to cap the video image size. After some trial and error, I found that 1600 for width is the best compromise between quality and size.
  • The ability to convert a video in my two target formats (mp4 and webm), "on-the-fly" again.

For now, I'm uploading video medias through their management dashboard, but that'll change in time, once I'll have enough video content to upload to start bother.

But since Marked was a real bother to extend, and since my video tags are pretty simple (the tag follows the format { video <name> }), I decided to go with a regex parse/replace, to put some HTML5 video tags just before passing the document through the marked renderer.

const cloudinaryURL = 'https://res.cloudinary.com/nyx/video/upload/w_1600/blog/';
content.rendered = content.content
    .replace(/^{ ?video ([\w-]+) ?}$/gm, `<video controls>
<source src="${cloudinaryURL}$1.mp4" type="video/mp4">
<source src="${cloudinaryURL}$1.webm" type="video/webm">
<p>Your browser sadly doesn't support videos. You can directly access and download the video <a href="${cloudinaryURL}$1.mp4">here</a>.</p>
</video>`);
content.rendered = marked(content.rendered);

This rather... "ugly" method picks the name I've put inside the tag, and uses it to build the cloudinary URL.

# Future plans

I still have some things I want to do on this blog's new design, like Add right-aligned Language: ${lang} to code blocks.

I'd also like some form of comment space, but that's still in discovery process, so that'll take a while.

This redesign is quite refreshing, and allowed me to really simplify my blog's code, rendering it even lighter and more compatible.

November 17, 2018

Robin Schroer (sulami)

Genetic Programming in Clojure November 17, 2018 12:00 AM

The Theory

Like most programmers I have always had a vague interest in AI, and one of its branches that requires less complicated maths than recurrent neural networks which are the most well known one, is genetic programming. The idea of genetic programming is quite simpleIf you are into more visual examples, this I believe is a very good practical example: http://rednuht.org/genetic_cars_2/

:

  1. You build something that is parameterised in key places
  2. You build a scoring function to assess the performance of your something with a set of parameters
  3. You randomly adjust (mutate) your parameters in some way a couple of times and compare the score of each set
  4. You take the best one or ones as a base to start a new round of mutations and scoring
  5. Basically just repeat steps 3 & 4 for a while and your parameters will tend towards a maximum score

Depending on a variety of meta-parameters that control for example the size of each generation or the nature of the mutations you might just find a local maximum, but often times this can yield pretty good results for a variety of problems.

The Practice

I have toyed around with this over the last couple of days and built an very simple abstract implementation in Clojure, which I am going to share here (and eventually will be somewhere on Github as well). Let us explore it from the inside out.

First of all we need to be able to generate some mutations of our specimen. Because we do not assume anything about the specimen, this ends up being quite simple because a lot of the heavy lifting is done outside of this implementation as it is specific to the problem in question.

This returns a potentially infinite list containing first the specimen passed in, and then as many mutations of it as we want.

(defn mutate
  "Generator that mutates a base, the first element being the base."
  [base mutator]
  (concat [base]
          (repeatedly #(mutator base))))

Next we also need to be able to score it. In this case we would like to attach the scores to the specimens so that we can use them to sort and select specimens without losing the specimens themselves.

(defn attach-score
  "Attaches the score to a specimen."
  [score-fn specimen]
  [specimen (score-fn specimen)])

Now let us begin to tie these together. A single generation should take a base specimen, mutate it a couple of times, score each of them, and then select the “fittest” based on the scoresThis also only keeps the best specimen in every generation, which makes the code much simpler. For actual real world usage it might be beneficial to keep the best n specimens in every generation to avoid running into local maxima. This would make the mutation slightly more complex though because there would be several base specimens which need to be mutated, so I decided to leave out this feature for the purposes of explanation.

. Note that in this implementation a lower score is better. To change this just reverse the sorting.

(defn generation
  "Picks out the best one from a generation."
  [base mutator score-fn gen-size]
  (->> (mutate base mutator)
       (take gen-size)
       (map (partial attach-score score-fn))
       (sort-by second)
       first
       first))

And to finish off, we just need to run a number of generations, each based on the previous one’s winner.

(defn evolution
  "Generator for generations."
  [base mutator score-fn gen-size]
  (iterate #(generation % mutator score-fn gen-size) base))

The lazy nature of this implementation is allows us to inspect intermediate results easily, as we can see the path evolution has taken in the form of each generation’s winner.

The Actual Practice

Now this above is actually not that much code, and it is very abstract in its nature, so let us have a look at what it looks like when we actually use it. A simple example would be approximating a single number that is hard to approximate, like √2.

Our specimen is just a float, and any will do as the initial seed. It is itself the only parameter.

(def base 0.0)

To mutate it, we just adjust it by a random amount within 0.5 in either direction.

(defn mutator [base]
  (-> (rand)
      (- 0.5)
      (+ base)))

Our scoring function is cheating a little, because we already know the target, we can just compare against it and use the distance as the score.

(defn score-fn [x]
  (-> x
      (- (Math/sqrt 2))
      Math/abs))

Now when we run this, we can see how it approximates the target value over time (√2 ≈ 1.4142).

(take 6 (evolution base mutator score-fn 25))
;; => (0.0
;;     0.33079046010191426
;;     0.7509224756253191
;;     1.2164225056336746
;;     1.3768753691848903
;;     1.4125030676422798)

Because evolution returns an infinite sequence, we can just use nth on it to get the winner after a certain number of generations.

While this is a very simple example, I am currently working on a way of using this to build and mutate a Clojure S-expression and score it by running a series of unit tests against the generated code. If this works out I might write about it here soon.

Genetic Programming in Clojure November 17, 2018 12:00 AM

The Theory

Like most programmers I have always had a vague interest in AI, and one of its branches that requires less complicated maths than recurrent neural networks which are the most well known one, is genetic programming. The idea of genetic programming is quite simpleIf you are into more visual examples, this I believe is a very good practical example: http://rednuht.org/genetic_cars_2/

:

  1. You build something that is parameterised in key places
  2. You build a scoring function to assess the performance of your something with a set of parameters
  3. You randomly adjust (mutate) your parameters in some way a couple of times and compare the score of each set
  4. You take the best one or ones as a base to start a new round of mutations and scoring
  5. Basically just repeat steps 3 & 4 for a while and your parameters will tend towards a maximum score

Depending on a variety of meta-parameters that control for example the size of each generation or the nature of the mutations you might just find a local maximum, but often times this can yield pretty good results for a variety of problems.

The Practice

I have toyed around with this over the last couple of days and built an very simple abstract implementation in Clojure, which I am going to share here (and eventually will be somewhere on Github as well). Let us explore it from the inside out.

First of all we need to be able to generate some mutations of our specimen. Because we do not assume anything about the specimen, this ends up being quite simple because a lot of the heavy lifting is done outside of this implementation as it is specific to the problem in question.

This returns a potentially infinite list containing first the specimen passed in, and then as many mutations of it as we want.

(defn mutate
  "Generator that mutates a base, the first element being the base."
  [base mutator]
  (concat [base]
          (repeatedly #(mutator base))))

Next we also need to be able to score it. In this case we would like to attach the scores to the specimens so that we can use them to sort and select specimens without losing the specimens themselves.

(defn attach-score
  "Attaches the score to a specimen."
  [score-fn specimen]
  [specimen (score-fn specimen)])

Now let us begin to tie these together. A single generation should take a base specimen, mutate it a couple of times, score each of them, and then select the “fittest” based on the scoresThis also only keeps the best specimen in every generation, which makes the code much simpler. For actual real world usage it might be beneficial to keep the best n specimens in every generation to avoid running into local maxima. This would make the mutation slightly more complex though because there would be several base specimens which need to be mutated, so I decided to leave out this feature for the purposes of explanation.

. Note that in this implementation a lower score is better. To change this just reverse the sorting.

(defn generation
  "Picks out the best one from a generation."
  [base mutator score-fn gen-size]
  (->> (mutate base mutator)
       (take gen-size)
       (map (partial attach-score score-fn))
       (sort-by second)
       first
       first))

And to finish off, we just need to run a number of generations, each based on the previous one’s winner.

(defn evolution
  "Generator for generations."
  [base mutator score-fn gen-size]
  (iterate #(generation % mutator score-fn gen-size) base))

The lazy nature of this implementation is allows us to inspect intermediate results easily, as we can see the path evolution has taken in the form of each generation’s winner.

The Actual Practice

Now this above is actually not that much code, and it is very abstract in its nature, so let us have a look at what it looks like when we actually use it. A simple example would be approximating a single number that is hard to approximate, like √2.

Our specimen is just a float, and any will do as the initial seed. It is itself the only parameter.

(def base 0.0)

To mutate it, we just adjust it by a random amount within 0.5 in either direction.

(defn mutator [base]
  (-> (rand)
      (- 0.5)
      (+ base)))

Our scoring function is cheating a little, because we already know the target, we can just compare against it and use the distance as the score.

(defn score-fn [x]
  (-> x
      (- (Math/sqrt 2))
      Math/abs))

Now when we run this, we can see how it approximates the target value over time (√2 ≈ 1.4142).

(take 6 (evolution base mutator score-fn 25))
;; => (0.0
;;     0.33079046010191426
;;     0.7509224756253191
;;     1.2164225056336746
;;     1.3768753691848903
;;     1.4125030676422798)

Because evolution returns an infinite sequence, we can just use nth on it to get the winner after a certain number of generations.

While this is a very simple example, I am currently working on a way of using this to build and mutate a Clojure S-expression and score it by running a series of unit tests against the generated code. If this works out I might write about it here soon.

November 16, 2018

Bogdan Popa (bogdan)

Announcing net-ip November 16, 2018 11:00 AM

I released net-ip – a small Racket library for working with IP (v4 and v6) addresses and networks – today. I needed this to be able to work on another library I’m going to release at some point in the future for doing geo-location based on Maxmind’s databases. Check it out and let me know what you think!

November 15, 2018

Joe Nelson (begriffs)

C Portability Lessons from Weird Machines November 15, 2018 12:00 AM

In this article we’ll go on a journey from 4-bit microcontrollers to room-sized mainframes and learn how porting C to each of them helped people separate the essence of the language from the environment of its birth. I’ve found technical manuals and videos for this article to help bring each computer to life.

It’s amazing that, by carefully writing portable ANSI C code and sticking to standard library functions, you can create a program that will compile and work without modification on almost any of these weird systems.

Cover of Portable C

I hope that being exposed to these examples will help you write code more portably, and dispel the belief that current computers with their multiple cores, cache hierarchy, and pipelining, are somehow too alien for C. A language tough enough to handle the diversity of old machines is tough enough to handle today’s relatively homogeneous CPUs.

To prepare this article I worked backward from the book “Portable C” (by Henry Rabinowitz), searching for architectures that illustrate each of the pitfalls he points out. You should read the book for a great explanation of what the author calls “C-World,” a semantic model of the execution of a C program.

Unisys 1100/2200

While this video doesn’t show the true computer in operation, you can still see the shape of one of the control panels. The gentleman in the video has quixotic fascination with it.

Video source: youtube

The first unusual thing about the architecture is its word size. You may be familiar with datatypes having powers-of-two bit sizes, but these Unisys series went with multiples of 9! The word size is 18 bits, and the C compiler for the platform uses:

  • char — 9
  • short — 18
  • int — 36
  • long — 36
  • long long — 72

(The Honeywell 6000 was another machine with 9-bit char and 36-bit word.)

Just to make matters more interesting, the oddly sized integers use ones’ complement binary arithmetic. That’s right, in this system there are distinct values for positive and negative zero. (CDC computers also used ones’ complement.)

The thirty-six bits integers can hold a lot, but guess what they can’t hold on this architecture? Pointer values. Section 8.6.1 of the C manaul for the Unisys 2200 says:

A pointer in UC cannot be treated as an integer. A UC pointer is a two-word structure with the base virtual address (VA) of a bank in the first word and a bit-word pointer in the second word. The bit-word pointer is necessary since the 2200 hardware does not have byte pointers; the basic pointer in the 2200 hardware is a word (36-bit) VA pointer that can only point to words. The bit-word portion of the UC pointer has a bit offset in the first 6 bits of the word and a word offset in the lower 24 bits of the word. If you convert (cast) a UC pointer to a 36-bit integer (int, long, or unsigned), the bit offset is lost. Converting it back to a C pointer results in it pointing to a word boundary. If you add 1 to the integer before converting it back to a pointer, the pointer points to the next word, not the next byte. A 36-bit integer is not capable of holding all the information in a UC pointer.

If you think regular pointers are demanding, section 8.6.2 says that function pointer requires a full eight words!

A function pointer is 8 words long and has a completely different format. Only two words out of the 8-word function pointer are actually (currently) used by UC-generated code. (The second and third words.) (More of the words are used by other UCS languages such as FORTRAN and COBOL.) You can cast UC data pointers to function pointers and function pointers to data pointers and not lose any information. The two words that hold information are simply moved back and forth.

Finally, if you think the Unisys is confined to the pages of history, you’re mostly right, but not entirely. They still manufacture and sell the “ClearPath Dorado” which uses the 2200 architecture.

Unisys ClearPath A Series

Video source: youtube

Like the previous Unisys machine, the ClearPath has an unusual word size. Here are the integral data type sizes for the ClearPath C compiler:

  • char — 8
  • short — 48
  • int — 48
  • long — 48
  • long long — ??

This machine uses neither twos’ complement nor ones’ complement signed arithmetic – it uses sign-magnitude form instead.

AT&T 3B (or 3B2)

A reliable old machine that had a devoted community. Fairly normal architecture, except it is big endian, unlike most computers nowadays. The char datatype is unsigned by default. Finally, the standard compiler for this architecture guarantees that function arguments are evaluated from left to right.

Symbolics Lisp Machine 3600

Video source: youtube

C is so portable that someone wrote a compiler – Symbolics C – for a computer running Lisp natively. Targeting the Symbolics Lisp machine required some creativity. For instance, a pointer is represented as a pair consisting of a reference to a list and a numerical offset into the list. In particular, the NULL pointer is <NIL, 0>, basically a NIL list with no offset. Certainly not a bitwise zero integral value.

The word size is 16 bits. There are no alignment requirements for data, although instructions must be on a 16-bit boundary. Here are the sizes of integer types defined by the compiler on the machine:

  • char — 8
  • short — 16
  • int — 16
  • long — 16

Motorola 68000

Video source: youtube

This processor found its way into many game consoles, embedded systems, and printers. It’s a pretty normal architecture, though big endian with a compiler default of unsigned chars. Also pointers (32 bits) are a different size than ints (16 bits).

One significant quirk is that the machine is very sensitive to data alignment. The processor had two-byte granularity and lacked the circuitry to cope with unaligned addresses. When presented with such an address, the processor would throw an exception. The original Mac (also based on the 68000) would usually demand the user restart the machine after an alignment error. (Similarly, some Sparc machines would raise a SIGBUS exception for alignment problems.)

Data General Eclipse

Video source: youtube

This machine uses a different numbering scheme for character- and integer-pointers. The same location in memory must be referred to by different addresses depending on the pointer type. A cast between char* and int* actually changes the address inside the pointer. Chris Torek recounts the details.

Cray T90

This machine provides another cautionary tale about trying to manipulate pointer values as if they were integers. On this architecture char* or void* are secretly word pointers with an offset stored in the three unused higher-order bits. Thus incrementing char* as an integer value would move to the next word but keep the same offset.

Prime 50 series

Video source: youtube

Notable for using a NULL pointer address that is not bitwise zero. In particular it uses segment 07777, offset 0 for the null pointer. (Some Honeywell-Bull mainframes use 06000 for the NULL pointer value, which is another example of non-zero NULL.)

R3000 MIPS

Video source: youtube

The DECstation uses the R3000 processor. It could be switched into either little- or big-endian mode at the programmer’s discretion. One quirk is that the processor raises an exception for signed integer overflow, unlike many other processors which silently wrap to negative values. Allowing a signed integer to overflow (in a loop for instance), is thus not portable.

Acorn Archimedes A3010

Video source: youtube

This computer is actually the origin of the ARM architecture which we commonly find in mobile phones and Arduinos. The Acorn in particular uses ARM2, and has a 32-bit data bus and a 26-bit address space. Like the Motorola 68000, the ARM2 raises a SIGBUS exception for unaligned memory access. (Note that the Arduino is an instance of a compiler still using 16-bit ints.)

8086/8088/80286

Video source: youtube

Everyone who writes about programming the Intel 286 says what a pain its segmented memory architecture was. Each memory segment can address up to 64 Kb, which is the largest contiguous region of memory that C could allocate per data object. (Thus size_t is smaller than unsigned int in this architecture.)

Because the full address of any word in memory was specified by a segment and offset, there are 4096 ways of referring to it by some combination of the two. (For instance address 0x1234 can be referenced as 0123:0004, 0122:0014, etc.) Also variables declared next to one another may live in different segments, far apart in memory. This breaks some highly inadvisable tricks people used, like zeroing out a block of several variables by memset’ing the whole memory range between their addresses.

Despite this awkwardness, the personal computer was hot, and as of 1983 Byte Magazine (Vol 8, Issue 8) identified that there were nine different C compilers for the IBM PC! I found the manual for one of them, Lattice C. It’s the same compiler used on other IBM products such as the System 370.

In Lattice C both short and int are 16 bits, but long is 32. Char is signed by default, and of course the x86 is little endian.

Intel 8051

Sticking with the theme of memory complications, enter the 8051. It’s a microcontroller that uses a “Harvard architecture.” This means it communicates with different types of memory attached to the same system. It uses word-oriented addressing for the ROM space and byte-oriented addressing for the RAM space. It needs differently sized pointers for each.

Many addresses are ambiguous, and could meaningfully point to either the RAM or ROM bank. 8051 compilers such as Crossware use a slightly larger “generic” pointer which tags the memory class in its high byte to resolve the ambiguity.

HP Saturn

Video source: youtube

The Saturn family are 4-bit microprocessors developed by Hewlett-Packard in the 1980s for programmable scientific calculators and microcomputers. The video above shows the HP-71B calculator, which is really more of a general purpose computer in strange packaging. You load bulk data into it by zipping a magnetic band through some kind of a reader slot.

The Saturn processor doesn’t have hardware instructions to do signed arithmetic. That has to be emulated using combinations of other assembly instructions. Thus unsigned number operations are more efficient. No surprise that char defaults to being unsigned.

The memory is interesting. Its addresses are nibble-based, and can address 1M nibbles = 512Kb. Pointers are 20-bits, but stored as 32-bits. Saturn C datatypes are pretty normal:

  • char — 8
  • short — 16
  • int — 32
  • long — 32
  • long long — 64
  • float — 64
  • double — 64

MOS 6502

Video source: youtube

This was one of the first low-cost 8-bit microprocessors, and it found its way into all kinds of systems including the Apple II, Commodore 64, and Nintendo Entertainment System. This processor is downright hostile to C compilers. Check out the crazy optimization suggestions for the CC65 compiler.

Some of the troublesome things about this architecture:

  • There is no multiply or divide operation in the assembly, it has to be emulated with other instructions.
  • Accessing any address higher than the “zero page” (0x0 to 0xFF) causes a performance penalty.
  • The CPU does not provide any 16-bit register or any support for 16-bit operations.
  • It only comes with a single “true” register.
  • However it accesses its memory in one cycle, so a programmer can use the zero page as a pool of 256 8-bit registers.

The 6502 helps reveal the edge of portability, the place where C’s “luxuries” are too costly.

PDP-11

Video source: youtube

The C home planet. Not much to say about it, because things work smoothly. The real surprises happened when porting PDP code to other machines. Pointers of all types and integers can be interchanged without casting.

One strange thing about this machine is that whereas 16-bit words are stored in little endian, 32-bit long ints use a weird mixed endian format. The four bytes in the string “Unix” when stored in the PDP-11 are arranged as “nUxi” if interpreted as big endian. In fact that scrambled string itself resulted when porting code from the PDP to a big endian machine.

VAX-11/780

Video source: youtube

The VAX is like a 32-bit PDP. It’s the next machine in PDP evolution. People enjoyed coding for the VAX with its nice flat memory and uniform pointers of all types. People liked it so much that the term “VAXocentric” referred to sloppy coding by those who got too comfortable with the architecture and who didn’t bother to learn how other computers differed.

The assembly for x86-64 looks externally similar to VAX, and people originally believed VAX would outlast Intel. This proved incorrect, as the “attack of the micros” destroyed the mainframe and minicomputer markets.

Datatype sizes:

  • char — 8
  • short — 16
  • int — 32
  • long — 32

Programs run faster with data aligned properly, but there is no strict alignment requirement, unlike previous architectures such as the IBM 360, PDP-11, Interdata 8/32 etc. Size and alignment attributes are logically independent. The VAX-11 C compiler aligns all the basic data types on address boundaries which are multiples of the size of each type.

Other facts: the VAX C compiler doesn’t guarantee left-to-right evaluation of function arguments. Chars are signed by default. The PDP was OK with division by zero and returned the dividend, but VAX causes an unmaskable trap.

Bell Labs wrote an interesting report about porting programs from the PDP to VAX, and some of their recommendations were adopted by ANSI C.

Conclusion

If these digital delectations make you want to learn more about writing portable code, then the best place to learn is in one of the excellent books on the topic. The one by Henry Rabinowitz mentioned earlier is great, as is another by Mark Horton. Good books are another strength of the C language. Unlike new trendy languages, C has been around long enough to accumulate professional and well regarded literature.

November 14, 2018

Jeff Carpenter (jeffcarp)

Book Review: The Manager's Path November 14, 2018 11:32 PM

I can’t recommend The Manager’s Path by Camille Fournier highly enough for Software Engineers. I’m not a tech lead, however I found this book super useful for understanding the structure of technical organizations. It contains many gems that I want to write on sticky notes and post above my desk at work. Like this: Especially as you become more senior, remember that your manager expects you to bring solutions, not problems.

Bogdan Popa (bogdan)

Announcing component November 14, 2018 07:10 PM

I released component the other day. It’s a Racket library for managing the lifecycle of stateful objects in long-running applications and for doing dependency injection. It was inspired by the Clojure library of the same name by Stuart Sierra. Check it out and let me know what you think! P.S. Expect more Racket libraries from me in the coming weeks and months. I’m really enjoying the language so far!

Gergely Nagy (algernon)

Chrysalis Progress Report #3 November 14, 2018 12:30 PM

It's been a while I wrote a progress report, yet, there's so much to share! Many, many things happened in the 18 months since the last update, some good, some bad. This report will not be a completely accurate and through account of those months, but rather a summary. Lets start with the most glaring fact: Chrysalis is still not ready for a beta. But it is closer than ever was before.

For this reason, I recently tagged a new release, feel free to take it for a spin, but do expect bugs. If you are on OSX, please read the release notes. Actually, just read it anyway.

Before you continue, be aware that I'm still not a front-end guy, a grumpy one at that. There is some mild dissatisfaction and complaining involved in the paragraphs below. Thankfully there's a way to cheer me up, read the end to see how!

One of the major pain points of Chrysalis development for me was that I was doing it. I'm not a front-end guy. I know next to nothing about UI/UX, nor about frameworks, libraries, I'm not even familiar with the tooling. That's why I went with ClojureScript, so I'd have at least something familiar in the set. It was a good choice at the time, but it had one huge problem: the number of people willing and able to contribute to a ClojureScript project were few. That's not a good outlook when your long term plan is to let others take over the project. You see, I started Chrysalis because I felt there's a need for it, and noone else was working on it. I never intended to develop and maintain it forever, the goal has always been that the community will eventually take it over. ClojureScript turned out to be a roadblock.

So a couple of months ago, after having let Chrysalis linger in limbo for many more, the decision was made to rewrite it in JavaScript, because there are many more people familiar with that language, so we can both attract new contributors, and it will hopefully make it easier to pass the torch down the road. This wasn't an easy decision. I'm still not a front-end person, and don't intend to become one either. Previously I had ClojureScript to cling on to, but now that's been pulled out from under me (by my own self, even!). I had to fight with tooling (did I mention I'm not familiar with it?), and a language that's quite far from my preferred ones. It was a rough start. However, there were a number of lessons learned from the previous implementation of Chrysalis, all of which are proving to be useful in this new iteration.

First of all, Chrysalis was too ambitious in its goals. It tried to be everything, for everyone, and ended up being nothing useful to anyone. It tried to support many different devices, flashing, LED theme editing, keymap editing, debugging capabilities, a REPL. All in one package. This simply did not scale. The new implementation sports a very different architecture, and is developed rather differently too. While the ClojureScript implementation was all in one repository (with the goal of eventually splitting it up), the JavaScript rewrite started off as a few tiny libraries I can build an UI on top of. The older version was supposed to support many keyboards with the same application, with the new architecture, we expect vendors to do their own bundles, to ship their own UI on top of the same building blocks. This way they all have complete control over how it looks and behaves, and Chrysalis itself does not need to support themeing or branding in any way. That's a huge relief. Of course, we still want to share as much code as possible, so the common, reusable parts will be in their own little place, separate from the bundle itself. I'm still working out this part, mind you. Right now there are a few closely tied libraries, split across a few repositories. It might make sense to pull them together into one repository instead, but still keep them as separate packages. But I'd need to lean yet another tool I'll never use ever again to do that, so it's not the top of my priorities at the moment.

Second, the ClojureScript version tried to be well designed. But due to my lack of experience with building such applications, this didn't quite happen. But the code was more complex in places than it really needed to be. With the JavaScript rewrite, the goal is to have something usable out the door as soon as possible. It may look horrible under the hood (it does), it may not be the most efficient (it isn't), nor would it follow best practices (it doesn't), but it will do what most people want to do with it: allow one to remap keys. This was another problem of the old implementation: key editing came after the LED editor. I deemed it easier to implement LED theme editing (it is easier), but that came at the cost of keymap editing lagging behind. I had people help me out there, and I want to thank both James Cash and Simon-Claudius for their amazing work on the old code base. They took it where I wasn't able to.

With the rewrite, the primary focus has been keymap editing. I'm happy to report that it works. It's not perfect. It's not exactly pretty, or very friendly, but it works, and is - at least in my opinion - usable. A quick demonstration is shown in the video below:

Your browser does not appear to support the video tag.

It was a long road that led to this, and it doesn't look have many of the features the old Chrysalis did, but on the other hand, it sports a usable keymap editor, which works with the upstream firmware (from git master, the factory firmware doesn't have the necessary bits yet) out of the box. It is at a stage where I feel comfortable editing my own keymap with it: the last few changes I did (adding a + and a = key to my numpad area), I did with Chrysalis first to try them out, and only added them to my sketch after. I used to implement changes in my sketch, copy the keymap over to EEPROM with a tool, and try then. Now I can try a bunch of things without having to do that dance, and the feeling is liberating.

Closing words

In the intro I warned you, my dear reader, that this is a slightly grumpy post. If you read this far, you now have a better idea why I'm grumpy: I'm working in a field I have no experience with, and no desire to stay there long term. I also hinted at a way to cheer me up, if you'd like: contribute! Make my life easier by letting me work on things I do better and enjoy much more (the firmware), by helping push Chrysalis forward. You don't need to be a wizard, or a React and frontend guru (but that certainly helps, and if you are, here, have the reins!), no. If you can give Chrysalis a try, submit issues, ideas, that helps a lot too. Critique my code! Tell me how to change the looks so it becomes friendlier! Tell me what would it take to make the application useful to you! If you are up for it, I'm more than happy to accept pull requests too. There is plenty of things in there that could be improved, big and small, enough to do for all levels of experience.

Head over to the chrysalis bundle, and have a go at it.

November 13, 2018

Derek Jones (derek-jones)

Is it worth attending an academic conference or workshop? November 13, 2018 04:32 PM

If you work in industry, is it worth attending an academic conference or workshop?

The following observations are based on my attending around 50 software engineering and compiler related conferences/workshops, plus discussion with a few other people from industry who have attended such events.

Short answer: No.

Slightly longer answer: Perhaps, if you are looking to hire somebody knowledgeable in a particular domain.

Much longer answer: Academics go to conferences to network. They are looking for future collaborators, funding, jobs, and general gossip. What is the point of talking to somebody from industry? Academics will make small talk and be generally friendly, but they don’t know how to interact, at the professional level, with people from industry.

Why are academics generally hopeless at interacting, at the professional level, with people from industry?

Part of the problem is lack of practice, many academic researchers live in a world that rarely intersects with people from industry.

Impostor syndrome is another. I have noticed that academics often think that people in industry have a much better understanding of the realities of their field. Those who have had more contact with people from industry might have noticed that impostor syndrome is not limited to academia.

Talking of impostor syndrome, and feeling of being a fraud, academics don’t seem to know how to handle direct criticism. Again I think it is a matter of practice. Industry does not operate according to: I won’t laugh at your idea, if you don’t laugh at mine, which means people within industry are practiced at ‘robust’ discussion (this does not mean they like it, and being good at handling such discussions smooths the path into management).

At the other end of the impostor spectrum, some academics really do regard people working in industry as simpletons. I regularly have academics express surprise that somebody in industry, i.e., me, knows about this-that-or-the-other. My standard reply is to say that its because I paid more for my degree and did not have the usual labotomy before graduating. Not a reply guaranteed to improve industry/academic relations, but I enjoy the look on their faces (and I don’t expect they express that opinion again to anyone else from industry).

The other reason why I don’t recommend attending academic conferences/workshops, is that lots of background knowledge is needed to understand what is being said. There is no point attending ‘cold’, you will not understand what is being presented (academic presentations tend to be much better organized than those given by people in industry, so don’t blame the speaker). Lots of reading is required. The point of attending is to talk to people, which means knowing something about the current state of research in their area of interest. Attending simply to learn something about a new topic is a very poor use of time (unless the purpose is to burnish your c.v.).

Why do I continue to attend conferences/workshops?

If a conference/workshop looks like it will be attended by people who I will find interesting, and it’s not too much hassle to attend, then I’m willing to go in search of gold nuggets. One gold nugget per day is a good return on investment.

Frederic Cambus (fcambus)

OpenBSD/arm64 on the NanoPi NEO2 November 13, 2018 11:20 AM

I bought the NanoPi NEO2 solely for it's form-factor, and I haven't been disappointed. It's a cute little board (40*40mm), which is to the best of my knowledge the smallest possible device one can run OpenBSD on.

The CPU is a quad-core ARM Cortex-A53 which is quite capable, a GENERIC.MP kernel build taking 15 minutes. On the downside, the board only has 512MB of RAM.

An USB to TTL serial cable is required to connect to the board and perform installation. The system doesn't have a supported miniroot so the preparation steps detailed in the INSTALL.arm64 file have to be performed to get a working installation image.

The following packages need to be installed:

pkg_add dtb u-boot-aarch64

After writing the miniroot image to an SD card, the correct DTB should be copied:

mount /dev/sdXi /mnt
mkdir /mnt/allwinner
cp /usr/local/share/dtb/arm64/allwinner/sun50i-h5-nanopi-neo2.dtb /mnt/allwinner
umount /mnt

Lastly, the correct U-Boot image should be written:

dd if=/usr/local/share/u-boot/nanopi_neo2/u-boot-sunxi-with-spl.bin of=/dev/sdXc bs=1024 seek=8

After performing the installation process, the DTB should be copied again to the SD card before attempting to boot the system.

Here is the output of running file on executables:

ELF 64-bit LSB shared object, AArch64, version 1

And this is the result of the md5 -t benchmark:

MD5 time trial.  Processing 10000 10000-byte blocks...
Digest = 52e5f9c9e6f656f3e1800dfa5579d089
Time   = 1.070000 seconds
Speed  = 93457943.925234 bytes/second

For the record, LibreSSL speed benchmark results are available here.

System message buffer (dmesg output):

OpenBSD 6.4-current (GENERIC.MP) #262: Mon Nov 12 01:54:10 MST 2018
    deraadt@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 407707648 (388MB)
avail mem = 367030272 (350MB)
mainbus0 at root: FriendlyARM NanoPi NEO 2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu0: 512KB 64b/line 16-way L2 cache
efi0 at mainbus0: UEFI 2.7
efi0: Das U-Boot rev 0x0
sxiccmu0 at mainbus0
psci0 at mainbus0: PSCI 0.2
simplebus0 at mainbus0: "soc"
syscon0 at simplebus0: "syscon"
sxiccmu1 at simplebus0
sxipio0 at simplebus0: 94 pins
ampintc0 at simplebus0 nirq 224, ncpu 4 ipi: 0, 1: "interrupt-controller"
sxiccmu2 at simplebus0
sxipio1 at simplebus0: 12 pins
sximmc0 at simplebus0
sdmmc0 at sximmc0: 4-bit, sd high-speed, mmc high-speed, dma
ehci0 at simplebus0
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
ehci1 at simplebus0
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1
dwxe0 at simplebus0: address 02:01:f7:f9:2f:67
rgephy0 at dwxe0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
com0 at simplebus0: ns16550, no working fifo
com0: console
sxirtc0 at simplebus0
gpio0 at sxipio0: 32 pins
gpio1 at sxipio0: 32 pins
gpio2 at sxipio0: 32 pins
gpio3 at sxipio0: 32 pins
gpio4 at sxipio0: 32 pins
gpio5 at sxipio0: 32 pins
gpio6 at sxipio0: 32 pins
gpio7 at sxipio1: 32 pins
agtimer0 at mainbus0: tick rate 24000 KHz
cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4
cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu1: 512KB 64b/line 16-way L2 cache
cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4
cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu2: 512KB 64b/line 16-way L2 cache
cpu3 at mainbus0 mpidr 3: ARM Cortex-A53 r0p4
cpu3: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu3: 512KB 64b/line 16-way L2 cache
scsibus0 at sdmmc0: 2 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <SD/MMC, SC64G, 0080> SCSI2 0/direct removable
sd0: 60906MB, 512 bytes/sector, 124735488 sectors
vscsi0 at root
scsibus1 at vscsi0: 256 targets
softraid0 at root
scsibus2 at softraid0: 256 targets
bootfile: sd0a:/bsd
boot device: sd0
root on sd0a (1fbfe51d132e41c0.a) swap on sd0b dump on sd0b

Alex Wilson (mrwilson)

Notes from the Week #8 November 13, 2018 12:00 AM

Despite my best efforts, I seem to be succumbing to the same cold that’s going around at work right now, so a brief and late weeknotes from last week — my brain is full of fluff and being slow. Blerggh.

Money talks

Steve and I had another excellent Wednesday chat, and we talked a bit about our experiences with budgeting and procurement. We’re both in a similar position where the real value of our teams is often in second or third order effects rather than direct revenue.

E.g. replacing a system, which shaves off toil (in the SRE Book sense of the word) for each team, increasing productivity and streamlining onboarding, but from a budgetary point of view it’s still just a cost and the effects might have a long lead time.

In these cases, it’s often hard to make compelling business cases (particularly if they are risk-based, and we’ve accepted the risk up until this point).

There seems to be no easy answer to this question other than making good faith arguments about the perceived benefits of the change, enumerating costs and savings wherever possible, and trying to monitor things like time spent on toil within each team.

If anyone has good suggestions, please let me know through Twitter or other means!

Adopting a new platform

After a few weeks of trialling and figuring out the cost-benefits, the Shift team has moved from an in-house paging service to Opsgenie — we’re a small team so reap the benefits of the free plan, but we like it so much that we’re putting together a business case to roll this out across the whole of Product Development at Unruly.

The integration was delightfully easy, so major kudos to the Opsgenie team and their thorough documentation!

Moving towards SLx

Before the Shift team came into existence, shared systems were collectively owned and so ‘everyone’ was responsible for them. Now that there is a dedicated team to drive and improve, we feel a need to have a conversation tool to talk about reliability of the mission-critical shared services we maintain (like metric collection, monitoring, paging).

We’re going to be experimenting with SLx (service level indicators/objectives/agreements) and error budgets as a way to communicate these things.

There’s a good chapter in the SRE book about these things.

Originally published at blog.probablyfine.co.uk on November 13, 2018.

November 12, 2018

Derek Jones (derek-jones)

Practical ecosystem books for software engineers November 12, 2018 02:45 PM

So you have read my (draft) book on evidence-based software engineering and want to learn more about ecosystems. What books do I suggest?

Biologists have been studying ecosystems for a long time, and more recently social scientists have been investigating cultural ecosystems. Many of the books written in these fields are oriented towards solving differential equations and are rather subject specific.

The study of software ecosystems has been something of a niche topic for a long time. Problems for researchers have included gaining access to ecosystems and the seeming proliferation of distinct ecosystems. The state of ecosystem research in software engineering is rudimentary; historians are starting to piece together what has happened.

Most software ecosystems are not even close to being in what might be considered a steady state. Eventually most software will be really old, and this will be considered normal (“Shock Of The Old: Technology and Global History since 1900″ by Edgerton; newness is a marketing ploy to get people to buy stuff). In the meantime, I have concentrated on the study of ecosystems in a state of change.

Understanding ecosystems is about understanding how the interaction of participant’s motivation, evolves the environment in which they operate.

“Modern Principles of Economics” by Cowen and Tabarrok, is a very readable introduction to economics. Economics might be thought of as a study of the consequences of optimizing the motivation of maximizing return on investment. “Principles of Corporate Finance” by Brealey and Myers, focuses on the topic in its title.

“The Control Revolution: Technological and Economic Origins of the Information Society” by Beniger: the ecosystems in which software ecosystems coexist and their motivations.

“Evolutionary dynamics: exploring the equations of life” by Nowak, is a readable mathematical introduction to the subject given in the title.

“Mathematical Models of Social Evolution: A Guide for the Perplexed” by McElreath and Boyd, is another readable mathematical introduction, but focusing on social evolution.

“Social Learning: An Introduction to Mechanisms, Methods, and Models” by Hoppitt and Laland: developers learn from each other and from their own experience. What are the trade-offs for the viability of an ecosystem that preferentially contains people with specific ways of learning?

“Robustness and evolvability in living systems” by Wagner, survival analysis of systems built from components (DNA in this case). Rather specialised.

Books with a connection to technology ecosystems.

“Increasing returns and path dependence in the economy” by Arthur, is now a classic, containing all the basic ideas.

“The red queen among organizations” by Barnett, includes a chapter on computer manufacturers (has promised me data, but busy right now).

“Information Foraging Theory: Adaptive Interaction with Information” by Pirolli, is an application of ecosystem know-how, i.e., how best to find information within a given environment. Rather specialised.

“How Buildings Learn: What Happens After They’re Built” by Brand, yes building are changed just like software and the changes are just as messy and expensive.

Several good books have probably been omitted, because I failed to spot them sitting on the shelf. Suggestions for books covering topics I have missed welcome, or your own preferences.

Andreas Zwinkau (qznc)

Model View Controller isn't November 12, 2018 12:00 AM

Maybe the most misunderstood design pattern.

Read full article!

Pete Corey (petecorey)

Permutations With and Without Repetition in Elixir November 12, 2018 12:00 AM

I’ve been hacking away at my ongoing Chord project, and I ran into a situation where I needed to generate all possible permutations of length k for a given list of elements, where repetitions of elements from our list are allowed. I figured this would be an excellent opportunity to flex our Elixir muscles and dive into a few possible solutions.

Base Cases

Let’s get the ball rolling by defining a Permutation module to hold our solution and a with_repetitions/2 function that accepts a list of elements, and a value for k:


defmodule Permutation do
  def with_repetitions(list, k)
end

We’ll start by defining a few base cases for our with_repetitions/2 function. First, if list is empty, we’ll want to return a list who’s first element is an empty list:


def with_repetitions([], _k), do: [[]]

We’ll do the same if k is 0:


def with_repetitions(_list, 0), do: [[]]

Note that we’re returning [[]] because the only possible permutations of an empty list, and the only possible zero-length permutation of a list is []. The list of all possible permutations, in that case, is [[]].

Building Our Permutator

Now we come to the interesting case where both list and k have workable values:


def with_repetitions(list, k) do
  # ...
end

We’ll start by mapping over every element in our list. We’ll use each of these elements as the head of a new k-length permutation we’re building.


list
|> Enum.map(fn head ->
  # ...
end)

For each value of head, we want to calculate all n - 1 length sub-permutations of our list, and concatenate each of these sub-permutations to our head:


list
|> with_repetitions(k - 1)
|> Enum.map(fn tail ->
  [head] ++ tail
end)

At this point, the result of our with_repetitions/2 function is a list of a list of permutations. For every head, we’re returning a list of all k-length permutations starting with that head. What we really want is a singly nested list of all permutations we’ve found.

To reduce our list of lists of permutations, we need to append each list of permutations for every value of head together:


|> Enum.reduce(&(&2 ++ &1))

This will order our permutations in lexical order (assuming our initial list was sorted). If we wanted our final set of permutations in reverse order, we could switch the order of our concatenation:


|> Enum.reduce(&(&2 ++ &1))

Or we could just use Kernel.++/2 to accomplish the same thing:


|> Enum.reduce(&Kernel.++/2)

Simplifying with Special Forms

This pattern of nested mapping and consolidating our result into a flat list of results is a fairly common pattern that we run into when writing functional code. It’s so common, in fact, that Elixir includes a special form specifically designed to make this kind of computation easier to handle.

Behold the list comprehension:


def with_repetitions(list, k) do
  for head <- list, tail <- with_repetitions(list, k - 1), do: [head | tail]
end

This single line is functionally equivalent to our original with_repetitions/2 function. The for tells us that we’re starting a list comprehension. For every value of head and every value of tail, we build a new permutation ([head | tail]). The result of our list comprehension is simply a list of all of these permutations.

Without Repetition

We’ve been trying to find permutations where repetitions of elements from list are allowed. How would we solve a similar problem where repetitions aren’t allowed?

It turns out that the solution is fairly straight-forward. When we generate our set of sub-permutations, or tail values, we can simply pass in list without the current value of head:


def with_repetitions(list, k) do
  for head <- list, tail <- with_repetitions(list -- [head], k - 1), 
    do: [head | tail]
end

That’s it! Each level of our permutation removes the current value of head preventing it from being re-used in future sub-permutations. It’s fantastically convenient that previously computed values in our list comprehension, like head, can be used in the subsequent values we iterate over.

November 11, 2018

Ponylang (SeanTAllen)

Last Week in Pony - November 11, 2018 November 11, 2018 11:07 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, our users’ mailing list or join us on IRC.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

November 10, 2018

Mark J. Nelson (mjn)

Mistaken machine translations of 'Trump' November 10, 2018 12:00 PM

The name Trump is transliterated into Greek as Τραμπ. This is a fairly straightforward transliteration, but the reverse direction is surprisingly ambiguous. I found one news article where Google Translate translates Τραμπ back into English three different ways in the same article, none of them correct:

  • US President Donald Trub
  • President Trabble
  • President Trabid

What's going on here? There are two sources of ambiguity in the transliteration that keeps it from roundtripping successfully.

Ambiguous vowel

The first source of ambiguity is the vowel. The four consonants in Trump are transliterated to pretty straightforward equivalents. The vowel however becomes an 'α' (alpha), because Greek phonology doesn't have an 'uh' sound, and an 'ah' is the closest approximation. (This is a common feature of Greek accents in English for the same reason, with the word 'but' pronounced roughly 'baht'.)

Of course, 'α' can also be used to transliterate words where the sound was originally an 'ah' in English. Therefore when going back to English, there's ambiguity as to which sound it originally represented.

Ambiguous consonant cluster

The second source of ambiguity is the final consonant cluster, 'mp'. In most contexts, m↔μ and p↔π are unproblematically equivalent. However, when combined into the cluster 'μπ', there are three possible pronunciations: 'mp', 'mb', or 'b'.

In many cases, the choice is allophonic; which of the three is realized depends on the surrounding phonemes and on how quickly or formally someone is speaking. But Greek has no letter that's equivalent to English 'b', so 'μπ' can also be used as a digraph for plain 'b', especially in loanwords and transliterations. For example, the name Robert becomes Ρόμπερτ.

Combined ambiguity plus autocorrect

Combining these two sources of ambiguity, there are six reasonable guesses for how you'd transliterate Τραμπ back into English, given no additional information:

  • Trab
  • Tramb
  • Tramp
  • Trub
  • Trumb
  • Trump

You'd think that with this particular name, Google's corpus would have additional information, but apparently not. Let's go back to the three machine translations of Τραμπ at the start of this post: Trub, Trabble, Trabid. Here's a guess as to what happened.

First, of the six possibilities, Trab and Trub might be preferred because word-final '-μπ' occurs most often in loanwords and transliterations, where it was used to represent a word-final '-b' in the source language. Note that I have not checked this guess against any kind of corpus. But if true, that would explain Trub.

Trabble and Trabid then seem likely to be spurious spelling autocorrections of Trab. Trabble is the name of an app, and Trabid is an enzyme. I've noticed before that some of the stranger Google Translate errors are due to a spurious spelling correction having happened on either the source or destination side. When you combine that with the other sources of ambiguity in language, you can end up with results pretty far away from where you started.

November 09, 2018

Derek Jones (derek-jones)

Practical psychology books for software engineers November 09, 2018 03:37 AM

So you have read my (draft) book on evidence-based software engineering and want to learn more about human psychology. What books do I suggest?

I wrote a book about C that attempted to use results from cognitive psychology to understand developer characteristics. This work dates from around 2000, and some of my book choices may have been different, had I studied the subject 10 years later. Another consequence is that this list is very weak on social psychology.

I own all the following books, but it may have been a few years since I last took them off the shelf.

There are two very good books providing a broad introduction: “Cognitive psychology and its implications” by Anderson, and “Cognitive psychology: A student’s handbook” by Eysenck and Keane. They have both been through many editions, and buying a copy that is a few editions earlier than current, saves money for little loss of content.

“Engineering psychology and human performance” by Wickens and Hollands, is a general introduction oriented towards stuff that engineering requires people to do.

Brain functioning: “Reading in the brain” by Dehaene (a bit harder going than “The number sense”). For those who want to get down among the neurons “Biological psychology” by Kalat.

Consciouness: This issue always comes up, so let’s kill it here and now: “The illusion of conscious will” by Wegner, and “The mind is flat” by Chater.

Decision making: What is the difference between decision making and reasoning? In psychology those with a practical orientation study decision making, while those into mathematical logic study reasoning. “Rational choice in an uncertain world” by Hastie and Dawes, is a general introduction; “The adaptive decision maker” by Payne, Bettman and Johnson, is a readable discussion of decision making models. “Judgment under Uncertainty: Heuristics and Biases” by Kahneman, Slovic and Tversky, is a famous collection of papers that kick started the field at the start of the 1980s.

Evolutionary psychology: “Human evolutionary psychology” by Barrett, Dunbar and Lycett. How did we get to be the way we are? Watch out for the hand waving (bones can be dug up for study, but not the software of our mind), but it weaves a coherent’ish story. If you want to go deeper, “The Adapted Mind: Evolutionary Psychology and the Generation of Culture” by Barkow, Tooby and Cosmides, is a collection of papers that took the world by storm at the start of the 1990s.

Language: “The psychology of language” by Harley, is the book to read on psycholinguistics; it is engrossing (although I have not read the latest edition).

Memory: I have almost a dozen books discussing memory. What these say is that there are a collection of memory systems having various characteristics; which is what the chapters in the general coverage books say.

Modeling: So you want to model the human brain. ACT-R is the market leader in general cognitive modeling. “Bayesian cognitive modeling” by Lee and Wagenmakers, is a good introduction for those who prefer a more abstract approach (“Computational modeling of cognition” by Farrell and Lewandowsky, is a big disappointment {they have written some great papers} and best avoided).

Reasoning: The study of reasoning is something of a backwater in psychology. Early experiments showed that people did not reason according to the rules of mathematical logic, and this was treated as a serious fault (whose fault it was, shifted around). Eventually most researchers realised that the purpose of reasoning was to aid survival and reproduction, not following the recently (100 years or so) invented rules of mathematical logic (a few die-hards continue to cling to the belief that human reasoning has a strong connection to mathematical logic, e.g., Evans and Johnson-Laird; I have nearly all their books, but have not inflicted them on the local charity shop yet). Gigerenzer has written several good books: “Adaptive thinking: Rationality in the real world” is a readable introduction, also “Simple heuristics that make us smart”.

Social psychology: “Social learning” by Hoppitt and Laland, analyzes the advantages and disadvantages of social learning; “The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter” by Henrich, is a more populist book (by a leader in the field).

Vision: “Visual intelligence” by Hoffman is a readable introduction to how we go about interpreting the photons entering our eyes, while “Graph design for the eye and mind” by Kosslyn is a rule based guide to visual presentation. “Vision science: Photons to phenomenology” by Palmer, for those who are really keen.

Several good books have probably been omitted, because I failed to spot them sitting on the shelf. Suggestions for books covering topics I have missed welcome, or your own preferences.

Caius Durling (caius)

Struct-uring your data November 09, 2018 12:31 AM

In Ruby it's easy to structure data in hashes and pass it around, and usually that leads to errors with calling methods on nil, or misspelling the name of a key, or some such silly bug that we might catch earlier given a defined object with a custom Class behind it. But it's so much work to create a Class just to represent some grab bag of data we've been handed, right? Well, maybe!

Lets say we have some event data that we're being sent and we want to do some stuff with it in memory, we could just represent this as an array of hashes:

events = [
  {
    name: "Writing",
    duration: 15,
    started: true,
    finished: false,
  },
  {
    name: "Evening walk",
    duration: 60,
    started: true,
    finished: true,
  }
]

This is not a bad way to represent the data, but if we want to start asking questions of it like "find all events currently happening" it becomes trickier. We could filter the collection to just those "in progress" events with the following

events.select { |event| event[:started] && !event[:finished] }

Next time someone reads this though, they have to figure out what it means to have an event that's started but not finished. Also what happens when someone in future misremembers :finished as :completed when running over the data in new code?1 Wouldn't it be better if we could do the following instead?2

events.select { |event| event.in_progress? }

An easy way to do this is to just create a Struct for the event, with the extra method defined internally. Whilst we're in there, we could add a couple more methods to make us querying the state of boolean attributes nicer to read(eh?)

Event = Struct.new(:name, :duration, :started, :finished) do
  alias_method :started?, :started
  alias_method :finished?, :finished

  def in_progress?
    started? && !finished?
  end
end

And then to create the objects, we can either use the positional arguments to .new (same order as the symbols given to Struct.new), or tap the object and use the setters directly for each attribute.

events = [
  Event.new("Writing", 15, true, false),
  Event.new.tap { |e|
    e.name = "Evening Walk"
    e.duration = 60
    e.started = true
    e.finished = false
  },
]
# => [#<struct Event
#      name="Writing",
#      duration=15,
#      started=true,
#      finished=false>,
#     #<struct Event
#      name="Evening Walk",
#      duration=60,
#      started=true,
#      finished=false>]

Now we can use our easier-to-read code for selecting all in-progress events, or ignoring all of those. Or if we just want to grab all finished events, we now have a method to call—Event#finished?—that conveys the intent of what it returns without having to look up the data structure of the hash to work out if that field is a String or Boolean.3

For super-powered structs, you don't even need to assign them to a Constant. You can just assign them to normal variables and use them locally in that scope without needing to define a constant.

class Grabber
  def call
    result = Struct.new(:success, :output) do
      alias_method :success?, :success
    end

    if (data = grab_data)
      result.new(true, data)
    else
      result.new(false, nil)
    end
  end
end

That'll handily return you an object you can interrogate for success? and ask for the output if it was successful. And no Constants were created in the making of this method. 🎉

Keep an eye out for where you can Struct-ure your data. It might be more often than you expect.


  1. It would always returns true - !nil. [return]
  2. That's events.select(&:in_progress?) for the golfers amongst you. [return]
  3. …? methods in ruby are truthy/falsy by convention. [return]

Noon van der Silk (silky)

A quick note on budgeting ... November 09, 2018 12:00 AM

Posted on November 9, 2018 by Noon van der Silk

In “The Oregon Experiment”, the authors consider the following budgeting scenario.

Suppose you’re in the following situation: You have ~$2,500,000 that you’d like to allocate to construction projects in your community.

There are many ways you can allocate this money to projects of varying sizes. Consider the following options:

table.budget { margin: 10px; background: #eaeaea; } table.budget th { border: 0; margin: 0; border-bottom: solid 1px gray; } table.budget td { border: 0; padding: 5px; } table.budget td.r { text-align: right; } td.m, th.m { width: 100px; } td.r, th.r { width: 150px; } table.budget td.m { text-align: center; } table.budget tr.total td { border-top: solid 1px gray; } div.tables { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: center; } div.tables div { display: flex; flex-direction: column; }

Option 1 - All projects considered equally

category
number of projects
rough total cost based on averages
A < $1000
1
$500
B $1000-$10,000
1
$5,000
C $10,000-$100,000
1
$50,000
D $100,000-$1,000,000
1
$500,000
E > $1,000,000
1
$2,000,000
totals
5
~$2,600,000

Option 2 - Projects considered unequally

category
number of projects
rough total cost based on averages
A < $1000
1000
$500,000
B $1000-$10,000
100
$500,000
C $10,000-$100,000
10
$500,000
D $100,000-$1,000,000
1
$500,000
E > $1,000,000
⅒ th of a project
$500,000
totals
~1100
$2,500,000

Option 3 - A middle ground

category
number of projects
rough total cost based on averages
A < $1000
500
$250,000
B $1000-$10,000
50
$250,000
C $10,000-$100,000
10
$500,000
D $100,000-$1,000,000
1
$500,000
E > $1,000,000
1
$1,000,000
totals
~550
$2,500,000

The main conclusion is that, for the same amount of money, we can chose to either support lots of small projects, or few total projects.

One of the main premises of the book, is that good change is made locally, by locals. In this way, schemes 2 and 3 are a significant improvement over Option 1.

One of the best initiatives that Victoria is doing along these lines is the “Pick My Project” program.

However, I think it’s also interesting to think about this in relation to other areas:

  • Health: Should you make one big change? Or many small ones?
  • Programming: Should you write one big program? Or many little ones?
  • Management: Should you set big goals from the top? Or should you empower the people below you to set their own goals?

November 08, 2018

Wallaroo Labs (chuckblake)

Python Python Python! Python 3 Comes to Wallaroo November 08, 2018 06:50 PM

If you’ve tried to build a scalable distributed system in Python you could be excused for thinking that the world is conspiring against you; in spite of Python’s popularity as a programming language for everything from data processing to robotics, there just aren’t that many options when it comes to using it to create resilient stateful applications that scale easily across multiple workers. About a year ago we created the Wallaroo Python API to help address this problem.

Derek Jones (derek-jones)

Practical statistics books for software engineers November 08, 2018 03:50 AM

So you have read my (draft) book on evidence-based software engineering and want to learn more about the statistical techniques used, but are not interested lots of detailed mathematics. What books do I suggest?

All the following books are sitting on the shelf next to where I write (not that they get read that much these days).

Before I took the training wheels off my R usage, my general go to book was (I still look at it from time to time): “The R Book” by Crawley, second edition; “R in Action” by Kabacoff is a good general read.

In alphabetical subject order:

Categorical data: “Categorical Data Analysis” by Agresti, the third edition is a weighty tomb (in content and heaviness). Plenty of maths+example; more of a reference.

Compositional data: “Analyzing compositional data with R” by van den Boogaart and Tolosana-Delgado, is more or less the only book of its kind. Thankfully, it is quite good.

Count data: “Modeling count data” by Hilbe, may be more than you want to know about count data. Readable.

Circular data: “Circular statistics in R” by Pewsey, Neuhauser and Ruxton, is the only non-pure theory book available. The material seems to be there, but is brief.

Experiments: “Design and analysis of experiments” by Montgomery.

General: “Applied linear statistical models” by Kutner, Nachtsheim, Neter and Li, covers a wide range of topics (including experiments) using a basic level of mathematics.

Machine learning: “An Introduction to Statistical Learning: with Applications in R” by James, Witten, Hastie and Tibshirani, is more practical (but not dumbed down, like some) and less maths (a common problem with machine learning books, e.g., “The Elements of Statistical Learning”). Watch out for the snake-oil salesmen using machine learning.

Mixed-effects models: “Mixed-effects models in S and S-plus” by Pinheiro and Bates, is probably the book I prefer; “Mixed effects models and extensions in ecology with R” by Zuur, Ieno, Walker, Saveliev and Smith, is another view on an involved topic (plus lots of ecological examples).

Modeling: “Statistical rethinking” by McElreath, is full of interesting modeling ideas, using R and Stan. I wish I had some data to try out some of these ideas.

Regression analysis: “Applied Regression Analysis and Generalized Linear Models” by Fox, now in its third edition (I also have the second edition). I found this the most useful book, of those available, for a more detailed discussion of regression analysis. Some people like “Regression modeling strategies” by Harrell, but this does not appeal to me.

Survival analysis: “Introducing survival and event history analysis” by Mills, is a readable introduction covering everything; “Survival analysis” by Kleinbaum and Klein, is full of insights but more of a book to dip into.

Time series: The two ok books are: “Time series analysis and its application: with R examples” by Shumway and Stoffler, contains more theory, while “Time series analysis: with applications in R” by Cryer and Chan, contains more R code.

There are lots of other R/statistics books on my shelves (just found out I have 31 of Springer’s R books), some ok, some not so. I have a few ‘programming in R’ style books; if you are a software developer, R the language is trivial to learn (its library is another matter).

Suggestions for books covering topics I have missed welcome, or your own preferences (as a software developer).

November 06, 2018

Josh Manders (joshmanders)

Failure of Launching in Under a Week. November 06, 2018 07:30 PM

A whoopsy daisy. I failed.

Kinda.

Let me back track a bit. A week ago I made the decision to pivot from my project Merched to App Metrics as my primary project. After having a discussion with a few people, I have determined that Merched is too massive of a project to be able to launch solo in a timely manner.

So I decided to see how quickly I can build an MVP of App Metrics and launch that in under a week.

Let me tell you the grind is terrible. After about 3 days of pulling 16 hour days, going to bed about 3-4 hours past my normal time took a big toll not only on how I feel, but also my productivity.

While I did end up getting about 60-75% of the MVP done in that timeframe, it did slow me down considerably in the fact that I have spent the last 2 days trying to come up with a more informative landing page, with not so great results.

All in all I feel the attempt was beneficial. It lit a fire under me that I hadn’t had in a while, and pushed me to skim off the fluff and only come up with the absolute minimal features to launch as soon as possible.

I will continue pressing on and hope to launch the project in the next week or two.

November 05, 2018

Luke Picciau (user545)

I made mistakes trying to chase new features rather than perfect existing ones. November 05, 2018 01:38 PM

Hi everyone. Its been a little while since I posted news on my project PikaTrack. I know a lot of you expressed interest on Reddit as well as Lobsters. I thank you all for the support and I think its about time I explain how things have been going over the last month. If you have been watching my commits you can see that there hasn’t been much activity in the past few weeks.

Robin Schroer (sulami)

Working remotely (Part 2) November 05, 2018 12:00 AM

It has now been 48 days since the first part of this post, and 42 days since I moved into my current flat in Amsterdam. Time for me to write the promised follow-up and explain what has happened so far and how things have been going.

The Good

First of all, I am a lot happier in my personal life, moving was definitely the right decision for me in general, all work things aside. Having made the step into a remote agreement frees me up to move when- and wherever I want, as my employer does not care about from where I actually work at this point, apart from timezone-related concerns.

I also think that full control over my workspace can drastically improve my productivity, the main benefit being the lack of distractions.

Last time I mentioned that I would go back to London once a month. So far I am keeping to this schedule, and my first visit has been quite pleasant, even if a little stressful. But being able see my team members in person, catching up and meeting new hires allows us to stay more in touch. My only minor complaint is that travel is quite a hassle, especially at this rate, so I might try to turn down the frequency a bit.

The Bad

After the moving process, which included being legally homeless for a weekend in France, I was quite happy to be able to stay home for a bit, so I ended up working from my living room for a couple of weeks. In addition to that, coworking spaces are (to me) surprisingly expensive for what they are.200€/mo. for something they call “a desk” but in reality is just a slot at a table, electricity, wifi and access to a kitchen with free coffee is quite a lot in my opinion. If I wanted an actual desk and office chair, ideally with a set of walls around it, I can expect to pay at least twice that.

But after a while I started suffering from both cabin fever and loneliness, so I bit the bullet and subscribed to one. This also ended up being the right decision, because just the way I wrote the last time, I have a physical location I can go to to work, but it is purely optional.

In addition to potential extra costs and stress due to more traveling, some employers also seem to think that remote workers should by default earn less than the ones coming into the office. It is true that remote work in and of itself can be seen as a perk, but I also believe that correctly executed remote work is beneficial to everyone involved.

Closing Notes

If you want to go remote I recommend you do what I (we) did and get a remote working agreement that is signed by you and your employer. It should capture the terms of the remote work, such as work hours, visits, who pays for transport and how long much in advance a warning needs to be given for you to be called into the office.

The last point is quite important especially if you are expected to pay for transport (and possibly lodging) which can be much more expensive if booked spontaneously.

Working remotely (Part 2) November 05, 2018 12:00 AM

It has now been 48 days since the first part of this post, and 42 days since I moved into my current flat in Amsterdam. Time for me to write the promised follow-up and explain what has happened so far and how things have been going.

The Good

First of all, I am a lot happier in my personal life, moving was definitely the right decision for me in general, all work things aside. Having made the step into a remote agreement frees me up to move when- and wherever I want, as my employer does not care about from where I actually work at this point, apart from timezone-related concerns.

I also think that full control over my workspace can drastically improve my productivity, the main benefit being the lack of distractions.

Last time I mentioned that I would go back to London once a month. So far I am keeping to this schedule, and my first visit has been quite pleasant, even if a little stressful. But being able see my team members in person, catching up and meeting new hires allows us to stay more in touch. My only minor complaint is that travel is quite a hassle, especially at this rate, so I might try to turn down the frequency a bit.

The Bad

After the moving process, which included being legally homeless for a weekend in France, I was quite happy to be able to stay home for a bit, so I ended up working from my living room for a couple of weeks. In addition to that, coworking spaces are (to me) surprisingly expensive for what they are.200€/mo. for something they call “a desk” but in reality is just a slot at a table, electricity, wifi and access to a kitchen with free coffee is quite a lot in my opinion. If I wanted an actual desk and office chair, ideally with a set of walls around it, I can expect to pay at least twice that.

But after a while I started suffering from both cabin fever and loneliness, so I bit the bullet and subscribed to one. This also ended up being the right decision, because just the way I wrote the last time, I have a physical location I can go to to work, but it is purely optional.

In addition to potential extra costs and stress due to more traveling, some employers also seem to think that remote workers should by default earn less than the ones coming into the office. It is true that remote work in and of itself can be seen as a perk, but I also believe that correctly executed remote work is beneficial to everyone involved.

Closing Notes

If you want to go remote I recommend you do what I (we) did and get a remote working agreement that is signed by you and your employer. It should capture the terms of the remote work, such as work hours, visits, who pays for transport and how long much in advance a warning needs to be given for you to be called into the office.

The last point is quite important especially if you are expected to pay for transport (and possibly lodging) which can be much more expensive if booked spontaneously.

Pete Corey (petecorey)

Bending Jest to Our Will: Caching Modules Across Tests November 05, 2018 12:00 AM

My test suite has grown to be unreliable. At first, a single red test raised its head. Not believing my eyes, I re-ran the suite, and as expected, everything came back green. As time went on and as more tests were offered up to the suite, random failures became more of a recurring problem. Eventually, the problem became so severe that the suite consistently failed, rather than consistently passed.

Something had to be done.

After nearly twenty four hours of banging my head against the wall and following various loose ends until they inevitably unraveled, I finally stumbled upon the cause of my problems.

Making Too Many Database Connections

Jest, the testing platform used by the project in question, insists on running tests in isolation. The idea is that tests run in isolation can also be run in parallel, which is the default behavior of Jest’s test runner. Due to decisions made far in the immutable past, our team decided to scrap parallel executions of tests and run each test sequentially with the --runInBand command line option.

The Jest documentation explains that running tests in band executes all of the tests sequentially within the same process, rather than spinning up a new process for every test:

Run all tests serially in the current process, rather than creating a worker pool of child processes that run tests.

However, when I ran the test suite I noticed that every test file was trigging a log statement that indicated that it just established a new database connection.


Connected to mongodb://localhost:27017/test

This explains a lot. If each test is spinning up its own database connection, it’s conceivable that our database simply can’t handle the amount of connections we’re forcing on it. In that case, it would inevitably time out and fail on seemingly random tests.

But if all of our tests are sharing a single process, why aren’t they sharing a single database connection?

Jest Ignores the Require Cache

In turns out that this project instantiates its database connection in a dependent, shared sub-module. The code that handles the instantiation looks something like this:


let connection = mongoose.createConnection(MONGO_URL, ...);

connection.on('open', () => console.log(`Connected to ${MONGO_URL}`));

module.exports.connection = connection;

Normally, due to how Node.js’ require and require.cache work, the first time this shared module is required anywhere in our project, this code would be executed and our database connection would be established. Any subsequent requires of our module would return the previously cached value of module.exports. The module’s code would not re-run, and additional database connections would not be opened.

Unfortunately, Jest doesn’t honor require.cache. This means that every test file blows away any previously cached modules, and any require calls that test file makes will re-evaluate the required module’s source. In our case, this re-evaulation creates a new database connection, which is the root of our problem.

Mocking a Module with the Real Thing

The Github issue I linked above hints at a solution to our problem.

If you’d like to set a module, you can add it to setupFiles in a way that does jest.mock('module-name', () => { return moduleContents }).

Christoph is suggesting that we add a file to our setupFiles Jest configuration, which we’ll call test/setup.js, and mock our shared module in that setup file:


const mockSharedModule = require('shared-module');
jest.mock('shared-module', () => mockSharedModule);

Unfortunately, this doesn’t solve our problem. The test/setup.js script runs before each test (emphasis is my own):

The path to a module that runs some code to configure or set up the testing framework before each test.

We need to find a way to require our shared module once, before all tests run.

Thankfully, we can do this by creating a custom Jest environment, and instructing Jest to use our new environment with the testEnvironment configuration option. We can require our shared module within our new environment, and mock any subsequent imports of our module to return a reference to the instance we just instantiated.

Unfortunately, we can’t set up that mock within our environment. We need to do that within our test setup file.

This means we need some way of passing the contents of our shared module from our custom environment into our test/setup.js. Strangely, the only way I’ve found to accomplish this is through the use of globals:


const NodeEnvironment = require('jest-environment-node');
const sharedModule = require('shared-module');

class CustomEnvironment extends NodeEnvironment {
    constructor(config) {
        super(config);
    }

    async setup() {
        await super.setup();
        this.global.__SHARED_MODULE__ = sharedModule;
    }

    async teardown() {
        await super.teardown();
    }

    runScript(script) {
        return super.runScript(script);
    }
}

module.exports = CustomEnvironment;

Most of this custom environment class is boilerplate required by Jest. The interesting pieces are where we require our shared module, and most importantly, when we assign its contents to the __SHARED_MODULE__ global variable:


this.global.__SHARED_MODULE__ = sharedModule;

Note that __SHARED_MODULE__ isn arbitrary name I chose to avoid collisions with other variables defined in the global namespace. There’s no magic going on in the naming.

Now, we can go back to test/setup.js and create a mock of our shared module that returns the contents of the global __SHARED_MODULE__:


jest.mock('shared-module', () => global.__SHARED_MODULE__);

And that’s all there is to it.

Our custom environment requires and evaluates our shared module once, instantiating a single database connection. The reference to the shared module’s contents is passed into our test setup script through a global variable. Our setup script mocks any future requires of our shared module to return the provided reference, rather than re-evaluating the module, creating a new database connection, and returning the new reference.

Whew.

In Hindsight

After much blood, swear, and tears, our test suite is once again consistently passing. Rejoice!

While this solution works, it highlights a fundamental problem. We’re not using Jest correctly. We came into this project with a preconceived notion of how testing and, by extension, our test framework should work. When the we learned more about our tools and realized that they didn’t work how we expected, we didn’t retrace our steps and reassess our strategy. Instead, we applied quite a bit of time and pressure to force our tools to behave as we expected.

While having the knowhow and ability to do this is great, actually doing it in your project isn’t recommended. Instead, see if you can use Jest in “The Jest Way”. If you cannot, maybe you shouldn’t be using Jest.

In hindsight, I should go have a talk with my team…

Noon van der Silk (silky)

Simple Dance Booth Open-Sourced (based on TensorFlow.js demo) November 05, 2018 12:00 AM

Posted on November 5, 2018 by Noon van der Silk

Over on the Silverpond GitHub account we recently open-sourced the little dance-booth project that we’ve been playing with for a while now.

It’s a very simple little wrapper around their PoseNet demo.

It’s a neat little repo, that runs entirely offline, and, from a webcam or anything that can be accessed via the browser, can be used to capture dances in three forms:

  • The raw video from the webcam,
  • JSON data of the poses per frame,
  • Video of the skeleton dancing on the dance floor.

You’ll find all of these saved under the date-and-time in the ./saved-videos folder. The JSON can be used to recreate the entire dance in any form you wish; it gives the coords of the various joints.

At the present moment it the dance capture only starts when the people in frame all raise their hands above their heads.

Feel free to change and play with as you wish!

November 04, 2018

Ponylang (SeanTAllen)

Last Week in Pony - November 4, 2018 November 04, 2018 04:55 PM

Last Week In Pony is a weekly blog post to catch you up on the latest news for the Pony programming language. To learn more about Pony check out our website, our Twitter account @ponylang, our users’ mailing list or join us on IRC.

Got something you think should be featured? There’s a GitHub issue for that! Add a comment to the open “Last Week in Pony” issue.

Henry Robinson (henryr)

Beating hash tables with trees? The ART-ful radix trie November 04, 2018 05:04 AM

The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases

Leis et. al., ICDE 2013 [paper]

Tries are an unloved third data structure for building key-value stores and indexes, after search trees (like B-trees and red-black trees) and hash tables. Yet they have a number of very appealing properties that make them worthy of consideration - for example, the height of a trie is independent of the number of keys it contains, and a trie requires no rebalancing when updated. Weighing against those advantages is the heavy memory cost that vanilla radix tries can incur, because each node contains a pointer for every possible value of the ‘next’ character in the key. With ASCII as an example, that’s 256 pointers for every node in the tree.

But the astute reader will feel in their bones that this is naive - there must be more efficient ways to store a set of pointers, indexed by a fixed size set of keys (the trie’s alphabet). Indeed, there are - several of them, in fact, distinguished by the number of children the node actually has, not just how many it might potentially have.

This is where the Adaptive Radix Tree (ART) comes in. In this breezy, easy-to-read paper, the authors show how to reduce the memory cost of a regular radix trie by adapting the data structure used for each node to the number of children that it needs to store. In doing so they show, perhaps surprisingly, that the amount of space consumed by a single key can be bounded no matter how long the key is.

Artemis (Artemix)

Read - School-time projects November 04, 2018 12:00 AM

About one year passed by, and I'm back to school. With that obviously comes school-time projects!

The project I worked on for the last two days (and which is now in version v0.3, and working perfectly) is a "new" lightweight and minimalistic markdown renderer.

For the context side, I'm using a VPS to store some data, including some markdown documents, either article drafts, presentations notes or simply documents I want to share.

I wanted to have two things:

  • Listing of markdown documents in a directory
  • Rendering of the document if I load the file

For the first point, I still wanted to use the markdown renderer for "private" documents, while hiding them from the directory.

The idea struck me a few days ago, while working on some markdown documents I wanted to share with work colleagues for a project, but that I didn't want to publish.

And this article is here to document this little journey.

# The idea

I wanted to have a small and easy-to-maintain server that'd work the following logic:

When asked for a given markdown file, load it, parse front-matter tags, and only returns the actual markdown body,
either as raw or html-rendered.
I want to have an index on which I want to be able to index some files, but not all of them.
This is done by a front-matter tag (`indexed`), and a small loop that'll only keep files with the indexed status set to true.
Every other case won't show up.

I wanted two routes:

  • / will be the List (the actual index)
  • /:page will be the rendered page (or 404 Document not found if not found)

To access the raw document, I didn't want to have yet another route, so I decided to go with a GET parameter acting as a flag (raw). That means that on every valid /:page URL, if the user adds ?raw at the end, he'll receive the plaintext content.

That's all I care about for the first version.

# Chosen stack

I decided to go with NodeJS, ExpressJS and EJS for this quick project.

For front-matter tags parsing, I went with FrontMatter, and Marked is tasked with converting markdown to HTML.

Note that, right now, the project is in v0.3 and introduced a few concepts, like syntaxic coloration with highlight.js, cache with redis etc.

Also note that the full dependency list can be found on the project's readme.

For the CSS UI, Skeleton is more than enough, but I just need to add a small tweak to limit images by maximum width, which is quick to add.

# The development

The development started with architecturing everything, especially expressjs, to allow for clean and easy development.

NodeJS gave me quite some trouble with filesystem manipulations due to the poorness of its standard library, but also the ambiguity of its documentation, especially around basic functionalities, such as file listing, directory checking and such.

The lack of documentation around integration of ejs in expressjs had me fiddle around for some time, but some blog articles managed to fill out the void caused by this absence.

The routes were pretty straightforward, as the following snippet shows.

app.get('/', async (req, res) => {
    const list = await reader.list();
    res.render('list', {
        list,
        count: list.length
    });
});
app.get('/:file', async (req, res) => {
    const fileId = req.params.file;

    const file = await reader.load(fileId);

    if (file === null) {
        return res
            .status(404)
            .send('Markdown document not found');
    }

    // Conditional rendering
});

Note that the reader object used in both functions is a small class I made for properly handling filesystem interactions (listing, reading etc).

The overall tool development was pretty straightforward for obtaining a MVP without any cache or exportation mechanism, but I later decided to develop another two versions; one for caching, one for PDF exportation (+ caching).

While the PDF work required quite a lot of work (including developing embedded "cron" task to regularly clean up the PDF cache), the basic page rendering cache (using Redis) was awfully straightforward.

For every route, I simply added cache.route().

app.get('/', cache.route('list'), async (req, res) => {
    // ..
});
app.get('/:file', cache.route(), async (req, res) => {
    // ..
});

The nicest feature about this tool is the fact that there's no configuration file, since I only need two environment variables; one for the markdown document base path, and one for the cache folder.

# Conclusion

While the overall project, even the caching mechanism, was quite straightforward (besides a bit of fiddling with the docs), the biggest trouble I had in production was with wkhtmltopdf, which I chose for its lightness.

The problem was that I needed an X server running on my server, which is obviously unacceptable for a simple HTTP server.

After an instant of searching, I managed to find a simple solution, which was to install a lightweight X virtual server.

The following command managed to fix all my problems.

apt-get install wkhtmltopdf
apt-get install xvfb
printf '#!/bin/bashnxvfb-run -a --server-args="-screen 0, 1024x768x24" /usr/bin/wkhtmltopdf -q $*' > /usr/bin/wkhtmltopdf.sh
chmod a+x /usr/bin/wkhtmltopdf.sh
ln -s /usr/bin/wkhtmltopdf.sh /usr/local/bin/wkhtmltopdf

From this point, I now have a pretty lightweight Markdown reader, but it also exports files to PDF!

November 03, 2018

Andreas Zwinkau (qznc)

How to implement strings November 03, 2018 12:00 AM

There are many alternatives to C strings. Here we explore the design space.

Read full article!

eta (eta)

How do cryptocurrencies work? November 03, 2018 12:00 AM

There’s been a lot of press over cryptocurrencies like Bitcoin, Bitcoin Cash, Ethereum and friends - particularly due to the wildly high prices some of these currencies can reach. However, how do they actually work under the hood?

Addresses

Fundamentally, cryptocurrencies have to have some way for people to own money, and to authenticate that someone is actually allowed to spend the money they say they have. In the real world, your bank knows who you are by name, has a record of your balance, and stops you from spending more than you actually have. Or, you have some physical banknotes which you can give to people, which are worth value in and of themselves. However, cryptocurrencies don’t have any central banks or paper money, so we have a problem.

To get around this problem, cryptocurrencies have addresses. An address has both a public and a secret component; the public bit is the bit you give to other people if you want them to pay you, and you keep the secret bit, well, secret, because it’s what you use to send other people money. Anyone can generate an address, using some maths called public-key cryptography to do so - for example, going to bitaddress.org and moving your mouse around for a bit will do just that. The secret bit is the key (literally); you’re only allowed to spend the money that people have sent to the public address if you have the secret component. And, since anyone can generate addresses using a common bit of maths, our problem is solved! But how does money actually move around?

Transactions

An important thing about the way addresses work is that, if you have the secret component of the address, you can sign stuff - and, because of how the maths works out, everyone else can check that only you, the person with the secret, could have been able to do that. We can use this to build transactions - to pay someone some money, I simply sign something saying “I, the owner of address ABC, would like to pay address XYZ 3 bitcoins”, and tell as many people as I can about this, so everybody knows. Everybody then does the maths and checks that my signature is valid, preventing anyone without the secret from stealing my money.

In fact, everyone else also has to check that you indeed have the money you’re trying to use, which is why transactions also require you to tell everyone where the money came from - e.g. “I got 1 BTC from Bob and 2 BTC from Jim, and I’d like to send all 3 BTC to Dave”. This makes it easier for everyone to verify that what you say is true, by looking at the list of all transactions that have ever taken place and finding the ones you’re talking about. However, how do we make a list of transactions that everyone can agree on, and how do we check that the transactions are valid ones?

Blocks

To solve this problem, cryptocurrencies have this concept of a block. A block is simply a bunch of transactions that someone has checked over and shown to be valid, coupled with a reference to the last block (e.g. “this is block #3, and the last one I know about was block #2”) and the solution to a really hard maths problem, called a proof-of-work (PoW). The proof-of-work is the clever bit: it’s a maths problem (which is different for every block) that takes quite a lot of computing power to solve, meaning that only people with proper hardware (so-called miners) are able to solve it. This means that a significant investment is required to be able to confirm transactions, keeping people honest: nobody’s going to buy a whole bunch of fancy stuff to do bitcoin mining and then do evil things with it, because people will stop using Bitcoin if that happens, making their investment worthless.

When miners find the correct answer to the proof-of-work, they can include it in a block and get a bunch of miner’s fees for their hard work. These are made up of a reward, which started at 50 BTC per block and has been gradually decreasing over time, and transaction fees, which people add in to their transactions to get miners to put them in the blocks (e.g. “I want to send Dave 0.9 bitcoins, and I’ll give 0.1 bitcoins to any miner who checks my transaction”). The reward is how coins come into existence in the first place - and is also why the supply of, say, Bitcoin is limited, as the reward is set to decrease as more and more blocks are mined until it reaches zero.

(Incidentally, different cryptocurrencies can have different proof-of-work problems. For example, Bitcoin uses SHA256 hashing, while Litecoin uses a hash function called ‘scrypt’. Ethereum, a relatively new currency, requires that people run mini-programs embedded in the transactions instead.)

The blockchain

The blocks that the miners mine are linked together in a blockchain (since every block has to say which block came before it, they’re all ‘linked together’). This blockchain is distributed amongst every person running the cryptocurrency software (e.g. the Bitcoin client) - essentially, it’s a list of all transactions that have ever taken place, together with proofs that show that a miner has verified each and every one of them. This way, everyone can agree upon who has what money, without having to trust anyone (it’s all maths!).

Conclusion

This has been a bit of a whistle-stop tour; if you want to learn more about any of the concepts behind cryptocurrencies, there are lots of resources available online! In particular, the original Bitcoin whitepaper, available at https://bitcoin.org/bitcoin.pdf, is worth a read, if you’re interested in what the creator of Bitcoin wrote about the protocol.

November 02, 2018

Gonçalo Valério (dethos)

Finding Product Market Fit November 02, 2018 03:41 PM

Interesting talk addressing the challenges of finding product market fit, in the form of a case study about Weebly . It’s worth the 1h you will spend listening, if you are into this kind of stuff.

The transcript and slides can be found on the Startup School’s website.

Wallaroo Labs (chuckblake)

The Treacherous Tangle of Redundant Data: Resilience for Wallaroo November 02, 2018 09:00 AM

Introduction: we need data redundancy, but how, exactly? You now have your distributed system in production, congratulations! Your cluster is starting at six machines, but it is expected to grow quickly as it is assigned more work. The cluster’s main application is stateful, and that’s a problem. What if you lose a local disk drive? Or a sysadmin runs rm -rf on the wrong directory? Or else the entire machine cannot reboot, due to a power failure or administrator error that destroys an entire virtual machine?

Alex Wilson (mrwilson)

Ghosts from the Week #7 — Hallowe’en Edition November 02, 2018 12:00 AM

Ghosts from the Week #7 — Hallowe’en Edition

My main takeaway from this week, fittingly, is that sometimes the scariest thing to confront is your own fears and admitting your own failures.

Listening to Understand

I had quite a shock this week to learn that I’m not as good at listening to understand (versus say, listening to respond) as I thought I was. It’s always jarring when how your perception of yourself is discovered to not equate with how other people perceive you.

Our CTO likes to remind us of and warn about falling victim to Fundamental Attribution Error, because it’s so subtle and insidious:

The tendency to believe that what people do reflects who they are.

Or, much less charitably

I do what I do because of the context that I’m in. You do what you do because you’re an idiot.

A group of us spent time together trying to ‘listen to understand’ through Reflective listening, defined as:

… seeking to understand a speaker’s idea, then offering the idea back to the speaker, to confirm the idea has been understood correctly.

The intent is to put ourselves in the shoes of the other people in the group, and using our empathy to understand what they are thinking and feeling.

The most important thing I learned was that because I’ve been at Unruly a long time (six years) I’m perceived, and perception is often more important than reality, to have more sway over decisions than I actually have. Also that my approach towards evaluating possible change, which is by my own admission quite analytical, could be perceived as being “not supportive of change”.

This was a really hard thing to hear, especially given how I perceive myself as being a strong driver of change and improvement.

Being aware of this throws my interactions into a new light, and I’ve resolved to take better care in situations where the perceived power dynamic is uneven.

Cross-Team Advocacy

I’ve mentioned a couple of times how one of the main problems that my team face is advocacy and adoption for standards and consistent approaches to repeated tasks.

Chance has been lucky enough to throw two really good blog-posts at me this week that really hit home on some of the things we are trying to do.

Three Sales Mistakes Software Engineers Make

As I’ve mentioned in a previous weeknotes, we (Shift) often have to sell better solutions to other teams. Cost of change to the unknown means that these can be long and hard pitches, but we’ve largely kept to the points in this point.

Shift members often go and sit, ask questions, and watch other teams to make sure we’re solving the right problem. We’re good at building only when necessary, and I’ll write up a longer version of our journey towards Structured Diagnostic Logging in the future.

Monitoring Legislatures: The Long Game

I picked this one up from the Democracy Club Slack, and while it’s ostensibly about civic tech and not building scrapers for everything, the underlying philosophy about getting standards adopted hits really close to home for us.

I can rattle off a number of times where we’ve leveraged:

  • It’s good to have friends on the inside.” — people who’ve rotated into Shift for a small amount of time go back to their home team with knowledge and advocacy for Shift
  • Be polite and helpful.” — a lot of what we do is helped by ‘building social capital’ by helping and demonstrating good faith cooperation.

This was a week full of learning which I’ll no doubt follow up in future — learning about the motivations and concerns of the other team leads, learning about my own motivations which I may have not spoken aloud before, and learning about how we can be a better team.

Onwards into November!

Originally published at blog.probablyfine.co.uk on November 2, 2018.