Awkward solutions to Advent of Code challenges

In day 6 of the 2019 Advent of Code edition, we are presented with a system of objects orbiting in space. All objects orbit the center of mass of the system (modeled as object COM), either directly or by orbiting a larger object.

COM)B  # B orbits COM
B)C    # C orbits B
C)D    # D orbits C

These chains of orbits can be solved as a directed graph. It just so happens that the old Unix toolkit has tools to operate on directed graphs. They aren’t designed for the use that we will make of them, but that’s what makes this fun.

A legacy utility

Directed graphs are useful for modelling systems of dependencies (e.g., packages, build artefacts), which explains the existence of tsort.

Rooted in the way that early Unix linkers handled archive files, this primordial utility performs a topological sort of a directed graph. Its simple output format limits what we can do with it, but internally it builds a graph based on directed arcs (A → B, just like our orbits input). It checks for cycles in the graph, then lists all nodes in topological order.

In our scenario of objects orbiting in space, this topological order maps to “from innermost to outermost orbit”.

< input             \ # Read our problem input
    tr ')' ' '    | \ # Rewrite each orbit  "A)B" as "A B"
    tsort             # Sort all space objects

This output can provide insight when we are prototyping a solution to the orbits problem, but to fully solve it with tsort alone is a bit of a stretch. We can’t query for specific nodes or relationships: the sequence of ordered nodes is all we get.

That said, I did explore a solution that involves looping over the sorted output with awk to build a table associating each orbit between A and B with the distance to the center of mass (COM). It involves looping over all nodes for each node — and thus has O(n²) time complexity — and counting the number of orbits between a node and the center of mass. Snippet attached below for those feeling curious:

Attachment: Constructing a table of orbits
table=$(tr ')' $'\t' < input)

for o in $(tsort <<< "$table"); do
    awk "\$1==\"$o\"" <<< "$table"
done | awk '
    ns[$2] = ns[$1]+1;
    print $0 "\t" ns[$2]

B	C	2
B	G	2
G	H	3
C	D	3
D	E	4
D	I	4
E	F	5
E	J	5
J	K	6
K	L	7

Summing the values in the third column above will give us the result to question one.

However limited, our use of tsort is semantically correct. So we move on to a worse approach.

The wrong tool for the job

Celebrating its 45th anniversary this year, the make tool has long been a staple of build automation in software projects. Given a description of how each artefact is built, it works by internally building a graph of dependencies in order to determine the right sequence of build steps. For example, if we describe that file program depends on file source.o, which itself depends on source.c, then make will retrace the necessary steps to build every intermediate asset. This description is provided in the form of a Makefile.

What if we represent our orbital system as a Makefile?

# Convert input file into Makefile
awk -F ')' '{ printf "%s: %s\n\t# %s\n", $2, $1, $2 }
            END { print "COM:" }'
    # B
C: B
    # C
D: C
    # D

Now, make can answer the question of which objects a given object orbits, from outermost to innermost:

make D
# B
# C
# D

So wrong, yet so right.

We can package this into a shell function named chain that takes a space object as argument and returns the chain of orbits for that object.

Attachment: Defining chain
chain() {
    awk -F')' '
        { printf "%s: %s\n\t# %s\n", $2, $1, $2 }
        END { print "COM:" }
    ' | make -f- "$1"

The answer to the first question is the sum of the lengths of all objects’ chains, which we can refactor as the length of the concatenation of all chains:

Attachment: Counting all orbital chains
# Exclude special objects COM, SAN and YOU from the count.
objs=$(cut -d')' -f2 "$input" | grep -Ev 'COM|SAN|YOU')

for o in $objs; do
    chain "$o" < "$input"
done | wc -l

This works, but is immensely inefficient, since a new make process is run for every object, and each process needs to recompute the graph. The solution for the sample input (n = 13) is returned immediately, but for the proper input (n = 1714) the answer takes several minutes to return.

I’ve looked at makeʼs debug output to look for ways to compute these chains in a single run, but nothing stood out. If anyone as willing to waste their time as me finds a way, please let me know!

A strangely apt comparison tool

The second part of the challenge deals with “orbital transfers”: we are asked to determine how many orbits one has to “hop” between two objects (YOU and SAN). This amounts to finding the path through the closest common ancestor node between YOU and SAN.

So let’s compare the two chains with our friend comm:

chain YOU "$input" > chain-you
chain SAN "$input" > chain-san
comm -3 chain-you chain-san

And here is the same but taking advantage of Bash’s process substitution:

comm -3 <(chain YOU < "$input") <(chain SAN < "$input")
# E
	# I
# J
# K
	# SAN

comm compares two files line by line and produces three text columns as output: lines present only in file 1, lines present only in file 2, and lines present in both files. The flag -3 tells it to reject lines in the third category.

In our spatial problem, this means that comm -3 will reveal the orbital steps between YOU, SAN, and their common ancestor, while ignoring the common trunk. Looking at the output above, from YOU we have to hop inward to K, then to J, then to E, then to our common ancestor (D, omitted from the output); from there we hop outward to I, then to SAN.

To get the correct number of orbital “hops”, we need to exclude YOUʼs and SANʼ s own direct orbits by subtracting two:

transfers() {
    comm -3 <(chain YOU <"$1") <(chain SAN <"$1") \
        | wc -l \
        | dc - <(echo '2 - p') # Subtract YOU and SAN

That dc trick is just for fun. dc is a stack-based postfix calculator. There are much friendlier ways of obtaining the same result:

Attachment: Subtracting 2 from a line count
# With `wc -l` and Bash arithmetic
n=$(transfers input | wc -l)
echo $((n - 2))

# With AWK
transfers input | awk 'END { print NR - 2 }'

In the end, it’s clear that there are diminishing returns: asking make to return a single chain is relatively elegant, but solving the first part of the challenge with duct tape around make doesn’t scale at all. That said, for the second part, comparing just two chains with comm felt wonderfully easy and semantic; clocking at ~1s, it is acceptably slow.

Usurping traditional tools was a lot of fun, and we are left with an extremely concise implementation. See the full source.

Author: Miguel Fonseca

Engineer at Automattic. Linguist, cyclist, Lindy Hopper, tree climber, and headbanger.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: