<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Scattered Thoughts]]></title>
  <link href="http://scattered-thoughts.net/atom.xml" rel="self"/>
  <link href="http://scattered-thoughts.net/"/>
  <updated>2013-05-21T21:17:54+01:00</updated>
  <id>http://scattered-thoughts.net/</id>
  <author>
    <name><![CDATA[Jamie Brandon]]></name>
    <email><![CDATA[jamie@scattered-thoughts.net]]></email>
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Flowing faster: External memory]]></title>
    <link href="http://scattered-thoughts.net/blog/2013/05/21/flowing-faster-external-memory/"/>
    <updated>2013-05-21T20:43:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2013/05/21/flowing-faster-external-memory</id>
    <content type="html"><![CDATA[<p>I always want to be a better developer than I am. What work I do that is worthwhile happens in the few hours of flow I manage to achieve every week. A million different things break that flow every day. I suspect that a large part of achieving flow is keeping the current problem in working memory. To improve my chances I can improve my working memory, offload parts of the problem to the computer or prevent context switches. I&#8217;m on my own with the first option, but a better development environment can help with the latter two.</p>

<!--more-->


<p>The first thing that I want to fix in this series is offloading memory. There are basically two kinds of questions I regularly deal with:</p>

<ul>
<li><p>How did I solve this problem / build this software / configure this program X months ago?</p></li>
<li><p>What was I trying to remember to change X seconds ago?</p></li>
</ul>


<p>I&#8217;ve started using <a href="http://jblevins.org/projects/deft/">deft</a> to answer both of these. Deft stores notes in a folder full of flat files and adds an incremental search buffer to emacs (searching > organising). This means that my notes are simple plain text which I can easily edit, backup, grep or serve on the web.</p>

<p>For long-term memory I create a new note every time I solve a problem or learn something useful. Within emacs M-&#8217; brings up the deft window, typing triggers the incremental search and hiting Enter opens the first matching note.</p>

<p>For short-term memory I have a single note called stack. Hitting C-&#8217; opens the stack note with the cursor on a new blank line for adding items to the stack. Hitting C-DEL deletes the previous line and C-q closes the stack. Hopefully this is sufficiently low-friction that the extra memory makes up for the context switch.</p>

<p>My config is <a href="https://github.com/jamii/emacs-live-packs/blob/master/deft-pack/init.el">here</a>. I&#8217;m considering writing a gnome-shell extension which displays the last line of the stack in the status bar to remind me what I&#8217;m supposed to be doing when my mental stack gets rudely dumped. I also want to add the global key bindings to gnome-shell so I don&#8217;t have to navigate to emacs first.</p>

<p>This is a very simple tool, which is kind of the point. The more stucture and options added to a note-taking tool the more effort it takes to actually use it and the more likely it is that I lose my entire mental stack whilst doing so.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Search trees and core.logic]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/12/19/search-trees-and-core-dot-logic/"/>
    <updated>2012-12-19T20:32:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/12/19/search-trees-and-core-dot-logic</id>
    <content type="html"><![CDATA[<p>I mentioned in an <a href="http://scattered-thoughts.net/blog/2012/12/02/hacker-school/">earlier post</a> that I had spent some time working on <a href="https://github.com/jamii/shackles">shackles</a>, an extensible <a href="http://en.wikipedia.org/wiki/Constraint_programming">constraint solver</a> based on <a href="http://www.gecode.org/">gecode</a> with extensions for <a href="http://en.wikipedia.org/wiki/Logic_programming">logic programming</a>. I eventually gave up working on shackles in favor of using <a href="https://github.com/clojure/core.logic">core.logic</a> which is much more mature and has actual maintainers. Last week David Nolen (the author of core.logic) was visiting Hacker School so I decided to poke around inside core.logic and see what could be brought over from shackles. The <a href="https://github.com/clojure/core.logic/pull/13">first chunk of work</a> adds fair conjunction, user-configurable search and a parallel solver.</p>

<!--more-->


<p>First, a little background. From a high-level point of view, a constraint solver does three things:</p>

<ul>
<li><p>specifies a search space in the form of a set of constraints</p></li>
<li><p>turns that search space into a search tree</p></li>
<li><p>searches the resulting tree for non-failed leaves</p></li>
</ul>


<p>Currently core.logic (and cKanren before it) complects all three of these. My patch partly decomplects the latter from the first two, allowing different search algorithms to be specified independently of the problem specification.</p>

<p>Let&#8217;s look at how core.logic works. I&#8217;m going to gloss over a lot of implementation details in order to make the core ideas clearer.</p>

<p>The search tree in core.logic is representated as a lazy stream of the non-failed leaves of the tree. This stream can be:</p>

<ul>
<li><p><code>nil</code> - the empty stream</p></li>
<li><p><code>(Choice. head tail)</code> - a cons cell</p></li>
</ul>


<p>Disjunction of two goals produces a new goal which contains the search trees of the two goals as adjacent branches. In core.logic, this is implemented by combining their streams with <code>mplus</code>. A naive implementation might look like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">mplus</span> <span class="p">[</span><span class="nv">stream1</span> <span class="nv">stream2</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">cond</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">nil? </span><span class="nv">stream1</span><span class="p">)</span> <span class="nv">stream2</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">choice?</span> <span class="nv">stream1</span><span class="p">)</span> <span class="p">(</span><span class="nf">Choice.</span> <span class="p">(</span><span class="nf">.head</span> <span class="nv">stream1</span><span class="p">)</span> <span class="p">(</span><span class="nf">mplus</span> <span class="p">(</span><span class="nf">.tail</span> <span class="nv">stream1</span><span class="p">)</span> <span class="nv">stream2</span><span class="p">))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>This amounts to a depth-first search of the leaves of the tree. Unfortunately, search trees in core.logic can be infinitely deep so a depth-first search can get stuck. If the first branch has an infinite subtree we will never see results from the second branch.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="c1">;; simple non-terminating goal</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">forevero</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">fresh</span> <span class="p">[]</span>
</span><span class='line'>    <span class="nv">forevero</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">run*</span> <span class="p">[</span><span class="nv">q</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">conde</span>
</span><span class='line'>    <span class="p">[</span><span class="nv">forvero</span><span class="p">]</span>
</span><span class='line'>    <span class="p">[(</span><span class="nb">== </span><span class="nv">q</span> <span class="mi">1</span><span class="p">)]))</span>
</span><span class='line'>
</span><span class='line'><span class="c1">;; with depth-first search blocks immediately, returning (...)</span>
</span><span class='line'><span class="c1">;; with breadth-first search blocks after the first result, returning (1 ...)</span>
</span></code></pre></td></tr></table></div></figure>


<p>We can perform breadth-first search by adding a new stream type:</p>

<ul>
<li><code>(fn [] stream)</code> - a thunk representing a branch in the search tree</li>
</ul>


<p>And then interleaving results from each branch:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">mplus</span> <span class="p">[</span><span class="nv">stream1</span> <span class="nv">stream2</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">cond</span>
</span><span class='line'>    <span class="nv">...</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">fn?</span> <span class="nv">stream1</span><span class="p">)</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="p">(</span><span class="nf">mplus</span> <span class="nv">stream2</span> <span class="p">(</span><span class="nf">stream1</span><span class="p">)))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is how core.logic implements fair disjunction (fair in the sense that all branches of <code>conde</code> will be explored equally). However, we still have a problem with fair conjunction. Conjunction is performed in core.logic by running the second goal starting at each of the leaves of the tree of the first goal. In terms of the stream representation, this looks like:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">bind</span> <span class="p">[</span><span class="nv">stream</span> <span class="nv">goal</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">cond</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">nil? </span><span class="nv">stream</span><span class="p">)</span> <span class="nv">nil</span> <span class="c1">;; failure</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">choice?</span> <span class="nv">stream</span><span class="p">)</span> <span class="p">(</span><span class="nf">Choice.</span> <span class="p">(</span><span class="nf">bind</span> <span class="p">(</span><span class="nf">.head</span> <span class="nv">stream</span><span class="p">)</span> <span class="nv">goal</span><span class="p">)</span> <span class="p">(</span><span class="nf">bind</span> <span class="p">(</span><span class="nf">.tail</span> <span class="nv">stream</span><span class="p">)</span> <span class="nv">goal</span><span class="p">))</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">fn?</span> <span class="nv">stream</span><span class="p">)</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="p">(</span><span class="nf">bind</span> <span class="p">(</span><span class="nf">stream</span><span class="p">)</span> <span class="nv">goal</span><span class="p">))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>This gives rise to similar behaviour as the naive version of <code>mplus</code>:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">run*</span> <span class="p">[</span><span class="nv">q</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">all</span>
</span><span class='line'>    <span class="nv">forevero</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">!=</span> <span class="nv">q</span> <span class="nv">q</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'><span class="c1">;; with unfair conjunction blocks immediately, returning (...)</span>
</span><span class='line'><span class="c1">;; with fair conjunction the second branch causes failure, returning ()</span>
</span></code></pre></td></tr></table></div></figure>


<p>I suspect the reason that core.logic didn&#8217;t yet have fair conjunction is entirely due to this stream representation, which complects all three stages of constraint solving and hides the underlying search tree. Since shackles is based on gecode it has the advantage of a much clearer theoretical framework (I strongly recommend <a href="http://www.gecode.org/paper.html?id=Tack:PhD:2009">this paper</a>, not just for the insight into gecode but as a shining example of how mathematical intuition can be used to guide software design).</p>

<p>The first step in introducing fair conjunction to core.logic is to explicitly represent the search tree. The types are similar:</p>

<ul>
<li><code>nil</code> - the empty tree</li>
<li><code>(Result. state)</code> - a leaf</li>
<li><code>(Choice. left right)</code> - a branch</li>
<li><code>(Thunk. state goal)</code> - a thunk containing the current state and a sub-goal</li>
</ul>


<p>Defining <code>mplus</code> is now trivial since it is no longer responsible for interleaving results:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">mplus</span> <span class="p">[</span><span class="nv">tree1</span> <span class="nv">tree2</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">Choice.</span> <span class="nv">tree1</span> <span class="nv">tree2</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>And we now have two variants of bind:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">bind-unfair</span> <span class="p">[</span><span class="nv">tree</span> <span class="nv">goal</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">cond</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">nil? </span><span class="nv">goal</span><span class="p">)</span> <span class="nv">nil</span> <span class="c1">;; failure</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">result?</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">goal</span> <span class="p">(</span><span class="nf">.state</span> <span class="nv">tree</span><span class="p">))</span> <span class="c1">;; success, start the second tree here</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">choice?</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">Choice.</span> <span class="p">(</span><span class="nf">bind-unfair</span> <span class="p">(</span><span class="nf">.left</span> <span class="nv">tree</span><span class="p">)</span> <span class="nv">goal</span><span class="p">)</span> <span class="p">(</span><span class="nf">bind-unfair</span> <span class="p">(</span><span class="nf">.right</span> <span class="nv">tree</span><span class="p">)</span> <span class="nv">goal</span><span class="p">))</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">thunk?</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">Thunk.</span> <span class="p">(</span><span class="nf">.state</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">bind-unfair</span> <span class="p">((</span><span class="nf">.goal</span> <span class="nv">tree</span><span class="p">)</span> <span class="nv">state</span><span class="p">)</span> <span class="nv">goal</span><span class="p">))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">bind-fair</span> <span class="p">[</span><span class="nv">tree</span> <span class="nv">goal</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">cond</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">nil? </span><span class="nv">goal</span><span class="p">)</span> <span class="nv">nil</span> <span class="c1">;; failure</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">result?</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">goal</span> <span class="p">(</span><span class="nf">.state</span> <span class="nv">tree</span><span class="p">))</span> <span class="c1">;; success, start the second tree here</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">choice?</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">Choice.</span> <span class="p">(</span><span class="nf">bind-fair</span> <span class="p">(</span><span class="nf">.left</span> <span class="nv">tree</span><span class="p">)</span> <span class="nv">goal</span><span class="p">)</span> <span class="p">(</span><span class="nf">bind-fair</span> <span class="p">(</span><span class="nf">.right</span> <span class="nv">tree</span><span class="p">)</span> <span class="nv">goal</span><span class="p">))</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">thunk?</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">Thunk.</span> <span class="p">(</span><span class="nf">.state</span> <span class="nv">tree</span><span class="p">)</span> <span class="p">(</span><span class="nf">bind-fair</span> <span class="p">(</span><span class="nf">goal</span> <span class="nv">state</span><span class="p">)</span> <span class="p">(</span><span class="nf">.goal</span> <span class="nv">tree</span><span class="p">)))))</span> <span class="c1">;; interleave!</span>
</span></code></pre></td></tr></table></div></figure>


<p>The crucial difference here is that bind-fair takes advantage of the continuation-like thunk to interleave both goals, allowing each to do one thunk&#8217;s worth of work before switching to the next.</p>

<p>(We keep bind-unfair around because it tends to be faster in practice - when you know what order your goals will be run in you can use domain knowledge to specify the most optimal order. However, making program evaluation dependent on goal ordering is less declarative and there are also some problems that cannot be specified without fair conjunction. It&#8217;s nice to have both.)</p>

<p>Now that we explicity represent the tree we can use different search algorithms. My patch defaults to lazy, breadth-first search (to maintain the previous semantics) but it also supplies a variety of others including a <a href="https://github.com/jamii/core.logic/blob/flexible-search/src/main/clojure/clojure/core/logic/par.clj#L49">parallel depth-first search</a> using <a href="http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html">fork-join</a>.</p>

<p>I still need to write a few more tests and sign the clojure contributor agreement before this can be considered for merging. I also have a pesky performance regression in lazy searches - this branch sometimes does more work than the original when only finding the first solution. I&#8217;m not sure yet whether this is down to a lack of laziness somewhere or maybe just a result of a slightly different search order. Either way, it needs to be fixed.</p>

<p>After this change, core.logic still complects the specification of the search space and the generation of the search tree (eg we have to choose between bind-unfair and bind-fair in the problem specification). At some point I would like to either fix that in core.logic or finish work on shackles. For now though, I&#8217;m going back to working on <a href="https://github.com/jamii/droplet">droplet</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Strucjure: motivation]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/12/04/strucjure-motivation/"/>
    <updated>2012-12-04T02:31:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/12/04/strucjure-motivation</id>
    <content type="html"><![CDATA[<p>I feel that the readme for <a href="https://github.com/jamii/strucjure">strucjure</a> does a reasonable job of explaining how to use the library but not of explaining why you would want to. I want to do that here. I&#8217;m going to focus on the motivation behind strucjure and the use cases for it rather than the internals, so try not to worry too much about how this all works and just focus on the ideas (the implementation itself is <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">very simple</a> but liable to keep changing).</p>

<!--more-->


<p>The core idea is that strucjure (and the <a href="http://www.google.com/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=4&amp;cad=rja&amp;ved=0CFIQFjAD&amp;url=http%3A%2F%2Flambda-the-ultimate.org%2Fnode%2F2477&amp;ei=lQ69UJqrLK-WyAHC1IGIBg&amp;usg=AFQjCNEJAMQULpZ62ASYefNHadlUWTlgKA&amp;sig2=E1ePKzLJJNaFw5BfEG9rrA">OMeta</a> library on which it is based) is not just yet-another-parser, but is instead a concise language for describing, manipulating and transforming data structures. The <a href="http://www.vpri.org/">VPRI</a> folks have done some amazing things with OMeta. My goal with strucjure is to see how much further this idea can be taken.</p>

<p>(Note: For the purposes of this post I&#8217;ll use the terms pattern and view interchangeably. There <em>is</em> a difference, but the line between the two is not yet clear to me and will probably change in future implementations)</p>

<h1>Pattern matching</h1>

<p>Pattern matching is a concept found in many functional languages. The basic idea is something like a switch statement, combined with a mini-language for describing patterns which the input should be tested against. The first pattern which matches has its corresponding branch executed.</p>

<p>As a very simple example, we can use strucjure to write fizzbuzz like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">i</span> <span class="p">(</span><span class="nb">range </span><span class="mi">100</span><span class="p">)]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">prn</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">match</span> <span class="p">[(</span><span class="nf">mod</span> <span class="nv">i</span> <span class="mi">3</span><span class="p">)</span> <span class="p">(</span><span class="nf">mod</span> <span class="nv">i</span> <span class="mi">5</span><span class="p">)]</span>
</span><span class='line'>          <span class="p">[</span><span class="mi">0</span> <span class="mi">0</span><span class="p">]</span> <span class="s">&quot;fizzbuzz&quot;</span>
</span><span class='line'>          <span class="p">[</span><span class="mi">0</span> <span class="nv">_</span><span class="p">]</span> <span class="s">&quot;fizz&quot;</span>
</span><span class='line'>          <span class="p">[</span><span class="nv">_</span> <span class="mi">0</span><span class="p">]</span> <span class="s">&quot;buzz&quot;</span>
</span><span class='line'>          <span class="nv">_</span>      <span class="nv">i</span><span class="p">)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is a concise, readable description of the various cases and replaces a chain of if-statements.</p>

<p>If we stopped there, you could be forgiven for not caring. Simple examples don&#8217;t really demonstrate the power of pattern matching. Let&#8217;s instead look at a more complicated example - <a href="http://en.wikipedia.org/wiki/Red%E2%80%93black_tree">red-black trees</a>. An important operation on red-black trees is re-establishing the balance invariants after inserting a new node. Here is a java implementation of the balance operation (from <a href="http://algs4.cs.princeton.edu/33balanced/RedBlackBST.java.html">this implementation</a>):</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
</pre></td><td class='code'><pre><code class='java'><span class='line'><span class="c1">// make a left-leaning link lean to the right</span>
</span><span class='line'><span class="kd">private</span> <span class="n">Node</span> <span class="nf">rotateRight</span><span class="o">(</span><span class="n">Node</span> <span class="n">h</span><span class="o">)</span> <span class="o">{</span>
</span><span class='line'>    <span class="k">assert</span> <span class="o">(</span><span class="n">h</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">);</span>
</span><span class='line'>    <span class="n">Node</span> <span class="n">x</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">;</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">left</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="na">right</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">right</span> <span class="o">=</span> <span class="n">h</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="na">right</span><span class="o">.</span><span class="na">color</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">right</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="n">RED</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">N</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="na">N</span><span class="o">;</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">N</span> <span class="o">=</span> <span class="n">size</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">+</span> <span class="n">size</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">)</span> <span class="o">+</span> <span class="mi">1</span><span class="o">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">x</span><span class="o">;</span>
</span><span class='line'><span class="o">}</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// make a right-leaning link lean to the left</span>
</span><span class='line'><span class="kd">private</span> <span class="n">Node</span> <span class="nf">rotateLeft</span><span class="o">(</span><span class="n">Node</span> <span class="n">h</span><span class="o">)</span> <span class="o">{</span>
</span><span class='line'>    <span class="k">assert</span> <span class="o">(</span><span class="n">h</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">);</span>
</span><span class='line'>    <span class="n">Node</span> <span class="n">x</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">;</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">right</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="na">left</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">left</span> <span class="o">=</span> <span class="n">h</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="na">left</span><span class="o">.</span><span class="na">color</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">left</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="n">RED</span><span class="o">;</span>
</span><span class='line'>    <span class="n">x</span><span class="o">.</span><span class="na">N</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="na">N</span><span class="o">;</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">N</span> <span class="o">=</span> <span class="n">size</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">+</span> <span class="n">size</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">)</span> <span class="o">+</span> <span class="mi">1</span><span class="o">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">x</span><span class="o">;</span>
</span><span class='line'><span class="o">}</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// flip the colors of a node and its two children</span>
</span><span class='line'><span class="kd">private</span> <span class="kt">void</span> <span class="nf">flipColors</span><span class="o">(</span><span class="n">Node</span> <span class="n">h</span><span class="o">)</span> <span class="o">{</span>
</span><span class='line'>    <span class="c1">// h must have opposite color of its two children</span>
</span><span class='line'>    <span class="k">assert</span> <span class="o">(</span><span class="n">h</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">);</span>
</span><span class='line'>    <span class="k">assert</span> <span class="o">(!</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">)</span> <span class="o">&amp;&amp;</span>  <span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">&amp;&amp;</span>  <span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">))</span>
</span><span class='line'>        <span class="o">||</span> <span class="o">(</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">)</span>  <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">));</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="o">!</span><span class="n">h</span><span class="o">.</span><span class="na">color</span><span class="o">;</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="o">!</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">.</span><span class="na">color</span><span class="o">;</span>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">.</span><span class="na">color</span> <span class="o">=</span> <span class="o">!</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">.</span><span class="na">color</span><span class="o">;</span>
</span><span class='line'><span class="o">}</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// restore red-black tree invariant</span>
</span><span class='line'><span class="kd">private</span> <span class="n">Node</span> <span class="nf">balance</span><span class="o">(</span><span class="n">Node</span> <span class="n">h</span><span class="o">)</span> <span class="o">{</span>
</span><span class='line'>    <span class="k">assert</span> <span class="o">(</span><span class="n">h</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">if</span> <span class="o">(</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">))</span>                      <span class="n">h</span> <span class="o">=</span> <span class="n">rotateLeft</span><span class="o">(</span><span class="n">h</span><span class="o">);</span>
</span><span class='line'>    <span class="k">if</span> <span class="o">(</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">.</span><span class="na">left</span><span class="o">))</span> <span class="n">h</span> <span class="o">=</span> <span class="n">rotateRight</span><span class="o">(</span><span class="n">h</span><span class="o">);</span>
</span><span class='line'>    <span class="k">if</span> <span class="o">(</span><span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="n">isRed</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">))</span>     <span class="n">flipColors</span><span class="o">(</span><span class="n">h</span><span class="o">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">h</span><span class="o">.</span><span class="na">N</span> <span class="o">=</span> <span class="n">size</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">left</span><span class="o">)</span> <span class="o">+</span> <span class="n">size</span><span class="o">(</span><span class="n">h</span><span class="o">.</span><span class="na">right</span><span class="o">)</span> <span class="o">+</span> <span class="mi">1</span><span class="o">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">h</span><span class="o">;</span>
</span><span class='line'><span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>This pile of if-statements obscures the intent of the code, which is to re-arrange the tree so that no red node has a red child. What we really want to see is &#8216;if the tree looks like foo, replace it with bar&#8217;. Using pattern matching we can express this directly (code based on <a href="http://www.cs.cornell.edu/courses/cs3110/2009sp/lectures/lec11.html">this implementation</a>):</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defrecord </span><span class="nv">Leaf</span> <span class="p">[])</span>
</span><span class='line'><span class="p">(</span><span class="kd">defrecord </span><span class="nv">Red</span> <span class="p">[</span><span class="nb">left </span><span class="nv">value</span> <span class="nv">right</span><span class="p">])</span>
</span><span class='line'><span class="p">(</span><span class="kd">defrecord </span><span class="nv">Black</span> <span class="p">[</span><span class="nb">left </span><span class="nv">value</span> <span class="nv">right</span><span class="p">])</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">balance</span>
</span><span class='line'>  <span class="c1">;; if it looks like one of these...</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">or</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">Black.</span> <span class="p">(</span><span class="nf">Red.</span> <span class="p">(</span><span class="nf">Red.</span> <span class="nv">?a</span> <span class="nv">?x</span> <span class="nv">?b</span><span class="p">)</span> <span class="nv">?y</span> <span class="nv">?c</span><span class="p">)</span> <span class="nv">?z</span> <span class="nv">?d</span><span class="p">)</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">Black.</span> <span class="p">(</span><span class="nf">Red.</span> <span class="nv">?a</span> <span class="nv">?x</span> <span class="p">(</span><span class="nf">Red.</span> <span class="nv">?b</span> <span class="nv">?y</span> <span class="nv">?c</span><span class="p">))</span> <span class="nv">?z</span> <span class="nv">?d</span><span class="p">)</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">Black.</span> <span class="nv">?a</span> <span class="nv">?x</span> <span class="p">(</span><span class="nf">Red.</span> <span class="p">(</span><span class="nf">Red.</span> <span class="nv">?b</span> <span class="nv">?y</span> <span class="nv">?c</span><span class="p">)</span> <span class="nv">?z</span> <span class="nv">?d</span><span class="p">))</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">Black.</span> <span class="nv">?a</span> <span class="nv">?x</span> <span class="p">(</span><span class="nf">Red.</span> <span class="nv">?b</span> <span class="nv">?y</span> <span class="p">(</span><span class="nf">Red.</span> <span class="nv">?c</span> <span class="nv">?z</span> <span class="nv">?d</span><span class="p">))))</span>
</span><span class='line'>  <span class="c1">;; replace it with this...</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">Red.</span> <span class="p">(</span><span class="nf">Black.</span> <span class="nv">a</span> <span class="nv">x</span> <span class="nv">b</span><span class="p">)</span> <span class="nv">y</span> <span class="p">(</span><span class="nf">Black.</span> <span class="nv">c</span> <span class="nv">z</span> <span class="nv">d</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'>  <span class="c1">;; otherwise, leave it alone</span>
</span><span class='line'>  <span class="nv">?other</span>
</span><span class='line'>  <span class="nv">other</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>(Note that this isn&#8217;t exactly the same operation as the code above, because the corresponding implementation has a slightly different insert algorithm too. Nevertheless, converting this operation to java would result in the same grotesque expansion of if-statements).</p>

<p>Strucjure is not very optimized yet, but if you use a more mature pattern-matching library then this code would be as fast as what you would write by hand. For complex patterns <a href="https://github.com/clojure/core.match">core.match</a> often does a better job of optimizing the decision tree than I can manage by hand, in much the same way that GCC does a better job of writing assembly code than I ever could.</p>

<p>Strucjure patterns are first-class values and can call other patterns or recursively call themselves, so they can express much more complex patterns than other pattern matchers. For example:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">balanced-height</span>
</span><span class='line'>  <span class="nv">Leaf</span>
</span><span class='line'>  <span class="mi">0</span>
</span><span class='line'>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nf">Black.</span> <span class="nv">_</span>
</span><span class='line'>         <span class="p">(</span><span class="nf">balanced-height</span> <span class="nv">?l</span><span class="p">)</span>
</span><span class='line'>         <span class="p">(</span><span class="nf">balanced-height</span> <span class="nv">?r</span><span class="p">))</span>
</span><span class='line'>       <span class="o">#</span><span class="p">(</span><span class="nb">= </span><span class="nv">l</span> <span class="nv">r</span><span class="p">))</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">+ </span><span class="mi">1</span> <span class="nv">l</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nf">Red.</span> <span class="nv">_</span>
</span><span class='line'>         <span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nb">not </span><span class="nv">Red</span><span class="p">)</span> <span class="p">(</span><span class="nf">balanced-height</span> <span class="nv">?l</span><span class="p">))</span>
</span><span class='line'>         <span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nb">not </span><span class="nv">Red</span><span class="p">)</span> <span class="p">(</span><span class="nf">balanced-height</span> <span class="nv">?r</span><span class="p">)))</span>
</span><span class='line'>       <span class="o">#</span><span class="p">(</span><span class="nb">= </span><span class="nv">l</span> <span class="nv">r</span><span class="p">))</span>
</span><span class='line'>  <span class="nv">l</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is a pattern which only matches balanced red-black trees, by recursively matching against each branch and returning the number of black nodes per path (see property 5 <a href="http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Properties">here</a>).</p>

<h1>Parsing</h1>

<p>Strucjure supports patterns which only consume part of the input and can chain these patterns together. Combine that with pattern matching and you can very easily write back-tracking recursive-descent parsers.</p>

<p>We can use this for traditional text parsing (you have to be feeling a little masochistic at the moment because strucjure can&#8217;t directly handle strings yet, only sequences of \c \h \a \r \s). For example, strucjure <a href="http://scattered-thoughts.net/blog/2012/10/25/strucjure-reading-the-readme/">parses its own readme</a> to ensure all the examples are correct.</p>

<p>Parsing doesn&#8217;t have to be limited to text. We can apply the same techniques to any sequential data structure.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="nv">user&gt;</span> <span class="p">(</span><span class="nf">defnview</span> <span class="nv">zero-or-more-prefix</span> <span class="p">[</span><span class="nv">elem</span><span class="p">]</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">prefix</span> <span class="o">&amp;</span> <span class="p">(</span><span class="nf">elem</span> <span class="nv">?x</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more-prefix</span> <span class="nv">elem</span><span class="p">)</span> <span class="nv">?xs</span><span class="p">))</span> <span class="p">(</span><span class="nb">cons </span><span class="nv">x</span> <span class="nv">xs</span><span class="p">)</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">prefix</span> <span class="p">)</span> <span class="nv">nil</span><span class="p">)</span>
</span><span class='line'><span class="o">#</span><span class="ss">&#39;user/zero-or-more-prefix</span>
</span><span class='line'><span class="nv">user&gt;</span> <span class="p">(</span><span class="nf">defview</span> <span class="nv">self-counting</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">prefix</span> <span class="mi">1</span><span class="p">)</span> <span class="ss">&#39;one</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">prefix</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">)</span> <span class="ss">&#39;two</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">prefix</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">3</span><span class="p">)</span> <span class="ss">&#39;three</span><span class="p">)</span>
</span><span class='line'><span class="o">#</span><span class="ss">&#39;user/self-counting</span>
</span><span class='line'><span class="nv">user&gt;</span> <span class="p">(</span><span class="nf">run</span> <span class="p">(</span><span class="nf">zero-or-more-prefix</span> <span class="nv">self-counting</span><span class="p">)</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">])</span>
</span><span class='line'><span class="p">(</span><span class="nf">one</span> <span class="nv">three</span> <span class="nv">two</span> <span class="nv">one</span> <span class="nv">two</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Since we live in lisp land, code is data too. We can use strucjure to easily and <em>readably</em> (hopefully) operate over sexps.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="c1">;; generic parser for (right-binding) infix operators with precedence</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">value?</span> <span class="p">[</span><span class="nv">all</span> <span class="nv">form</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">not-any? </span><span class="o">#</span><span class="p">(</span><span class="nb">contains? </span><span class="nv">%</span> <span class="nv">form</span><span class="p">)</span> <span class="nv">all</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">bind*</span> <span class="p">[</span><span class="nv">all</span> <span class="nv">current</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">if-let </span><span class="p">[[</span><span class="nv">ops</span> <span class="o">&amp;</span> <span class="nv">tighter</span><span class="p">]</span> <span class="nv">current</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">view</span>
</span><span class='line'>     <span class="p">(</span><span class="nf">prefix</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">bind*</span> <span class="nv">all</span> <span class="nv">tighter</span><span class="p">)</span> <span class="nv">?x</span><span class="p">)</span> <span class="p">(</span><span class="nb">and </span><span class="o">#</span><span class="p">(</span><span class="nb">contains? </span><span class="nv">ops</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">?op</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">bind*</span> <span class="nv">all</span> <span class="nv">current</span><span class="p">)</span> <span class="nv">?y</span><span class="p">))</span> <span class="o">`</span><span class="p">(</span><span class="o">~</span><span class="nv">op</span> <span class="o">~</span><span class="nv">x</span> <span class="o">~</span><span class="nv">y</span><span class="p">)</span>
</span><span class='line'>     <span class="p">(</span><span class="nf">prefix</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">bind*</span> <span class="nv">all</span> <span class="nv">tighter</span><span class="p">)</span> <span class="nv">?x</span><span class="p">))</span> <span class="nv">x</span><span class="p">)</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">view</span>
</span><span class='line'>     <span class="p">(</span><span class="nf">prefix</span> <span class="p">[((</span><span class="nf">bind*</span> <span class="nv">all</span> <span class="nv">all</span><span class="p">)</span> <span class="nv">?x</span><span class="p">)])</span> <span class="nv">x</span>
</span><span class='line'>     <span class="p">(</span><span class="nf">prefix</span> <span class="p">(</span><span class="nb">and </span><span class="o">#</span><span class="p">(</span><span class="nf">value?</span> <span class="nv">all</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">?x</span><span class="p">))</span> <span class="nv">x</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">bind</span> <span class="p">[</span><span class="nv">binding-levels</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">bind*</span> <span class="nv">binding-levels</span> <span class="nv">binding-levels</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="c1">;; run &#39;bind with basic arithmetic precedences</span>
</span><span class='line'><span class="p">(</span><span class="kd">defmacro </span><span class="nv">math</span> <span class="p">[</span><span class="o">&amp;</span> <span class="nv">args</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">run</span> <span class="p">(</span><span class="nf">bind</span> <span class="p">[</span><span class="o">#</span><span class="p">{</span><span class="ss">&#39;+</span> <span class="ss">&#39;-</span><span class="p">}</span> <span class="o">#</span><span class="p">{</span><span class="ss">&#39;*</span> <span class="ss">&#39;/</span><span class="p">}])</span> <span class="nv">args</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nb">macroexpand </span><span class="o">&#39;</span><span class="p">(</span><span class="nf">math</span> <span class="mi">1</span> <span class="nb">- </span><span class="mi">2</span> <span class="nb">+ </span><span class="mi">3</span> <span class="nb">- </span><span class="mi">4</span><span class="p">))</span>
</span><span class='line'><span class="c1">;; (- 1 (+ 2 (- 3 4)))</span>
</span><span class='line'><span class="p">(</span><span class="nb">macroexpand </span><span class="o">&#39;</span><span class="p">(</span><span class="nf">math</span> <span class="mi">1</span> <span class="nb">+ </span><span class="mi">2</span> <span class="nb">* </span><span class="mi">7</span> <span class="nb">+ </span><span class="mi">1</span> <span class="nb">/ </span><span class="mi">2</span><span class="p">))</span>
</span><span class='line'><span class="c1">;; (+ 1 (+ (* 2 7) (/ 1 2)))</span>
</span><span class='line'><span class="p">(</span><span class="nb">macroexpand </span><span class="o">&#39;</span><span class="p">(</span><span class="nf">math</span> <span class="mi">1</span> <span class="nb">+ </span><span class="mi">2</span> <span class="nb">* </span><span class="p">(</span><span class="mi">7</span> <span class="nb">+ </span><span class="mi">1</span><span class="p">)</span> <span class="nb">/ </span><span class="mi">2</span><span class="p">))</span>
</span><span class='line'><span class="c1">;; (+ 1 (* 2 (/ (7 + 1) 2)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>No more death-by-polish-notation!</p>

<p>(The operators above really ought to bind to the left but, unlike ometa, strucjure doesn&#8217;t yet support <a href="http://en.wikipedia.org/wiki/Left_recursion">left-recursion</a> and I&#8217;m too lazy to manually transform the grammar. It&#8217;s a temporary limitation.)</p>

<p>Taking this to its logical conclusion, the syntax for patterns and views in strucjure is itself defined <a href="https://github.com/jamii/strucjure/blob/master/src/strucjure/parser.clj#L178">using views</a>. This is a fairly complex DSL but with strucjure it&#8217;s was very easy to write, read and modify the parser.</p>

<h1>Generic programming</h1>

<p>Clojure has some great facilities for generic traversals in the form of clojure.walk:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">walk</span>
</span><span class='line'>  <span class="s">&quot;Traverses form, an arbitrary data structure.  inner and outer are</span>
</span><span class='line'><span class="s">  functions.  Applies inner to each element of form, building up a</span>
</span><span class='line'><span class="s">  data structure of the same type, then applies outer to the result.</span>
</span><span class='line'><span class="s">  Recognizes all Clojure data structures. Consumes seqs as with doall.&quot;</span>
</span><span class='line'>
</span><span class='line'>  <span class="p">{</span><span class="ss">:added</span> <span class="s">&quot;1.1&quot;</span><span class="p">}</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">inner</span> <span class="nv">outer</span> <span class="nv">form</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">cond</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">list?</span> <span class="nv">form</span><span class="p">)</span> <span class="p">(</span><span class="nf">outer</span> <span class="p">(</span><span class="nb">apply list </span><span class="p">(</span><span class="nb">map </span><span class="nv">inner</span> <span class="nv">form</span><span class="p">)))</span>
</span><span class='line'>   <span class="p">(</span><span class="nb">instance? </span><span class="nv">clojure.lang.IMapEntry</span> <span class="nv">form</span><span class="p">)</span> <span class="p">(</span><span class="nf">outer</span> <span class="p">(</span><span class="nf">vec</span> <span class="p">(</span><span class="nb">map </span><span class="nv">inner</span> <span class="nv">form</span><span class="p">)))</span>
</span><span class='line'>   <span class="p">(</span><span class="nb">seq? </span><span class="nv">form</span><span class="p">)</span> <span class="p">(</span><span class="nf">outer</span> <span class="p">(</span><span class="nb">doall </span><span class="p">(</span><span class="nb">map </span><span class="nv">inner</span> <span class="nv">form</span><span class="p">)))</span>
</span><span class='line'>   <span class="p">(</span><span class="nf">coll?</span> <span class="nv">form</span><span class="p">)</span> <span class="p">(</span><span class="nf">outer</span> <span class="p">(</span><span class="nb">into </span><span class="p">(</span><span class="nf">empty</span> <span class="nv">form</span><span class="p">)</span> <span class="p">(</span><span class="nb">map </span><span class="nv">inner</span> <span class="nv">form</span><span class="p">)))</span>
</span><span class='line'>   <span class="ss">:else</span> <span class="p">(</span><span class="nf">outer</span> <span class="nv">form</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">postwalk</span>
</span><span class='line'>  <span class="s">&quot;Performs a depth-first, post-order traversal of form. Calls f on</span>
</span><span class='line'><span class="s">each sub-form, uses f&#39;s return value in place of the original.</span>
</span><span class='line'><span class="s">Recognizes all Clojure data structures except sorted-map-by.</span>
</span><span class='line'><span class="s">Consumes seqs as with doall.&quot;</span>
</span><span class='line'>  <span class="p">{</span><span class="ss">:added</span> <span class="s">&quot;1.1&quot;</span><span class="p">}</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">f</span> <span class="nv">form</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">walk</span> <span class="p">(</span><span class="nb">partial </span><span class="nv">postwalk</span> <span class="nv">f</span><span class="p">)</span> <span class="nv">f</span> <span class="nv">form</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Essentially, all this is doing is specifying how to take apart clojure data structures and how to put them back together again. Strucjure supports passing optional :pre-view and :post-view functions to modify the input to or output from any named view encountered during parsing, so we can do something very similar:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">clojure</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="nv">list?</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">clojure</span><span class="p">)</span> <span class="nv">?xs</span><span class="p">))</span> <span class="nv">xs</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="nv">clojure.lang.IMapEntry</span> <span class="p">[</span><span class="nv">?x</span> <span class="nv">?y</span><span class="p">])</span> <span class="p">[</span><span class="nv">x</span> <span class="nv">y</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and seq? </span><span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">clojure</span><span class="p">)</span> <span class="nv">?xs</span><span class="p">))</span> <span class="nv">xs</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="nv">coll?</span> <span class="nv">?coll</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">clojure</span><span class="p">)</span> <span class="nv">?xs</span><span class="p">))</span> <span class="p">(</span><span class="nb">into </span><span class="p">(</span><span class="nf">empty</span> <span class="nv">coll</span><span class="p">)</span> <span class="nv">xs</span><span class="p">)</span>
</span><span class='line'>  <span class="nv">?other</span> <span class="nv">other</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">postwalk</span> <span class="p">[</span><span class="nv">form</span> <span class="nv">f</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">run</span> <span class="nv">clojure</span> <span class="nv">form</span> <span class="p">{</span><span class="ss">:post-view</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">_</span> <span class="nv">sub-form</span><span class="p">]</span> <span class="p">(</span><span class="nf">f</span> <span class="nv">sub-form</span><span class="p">)}))</span>
</span></code></pre></td></tr></table></div></figure>


<p>The problem with using this (or clojure.walk) for generic traversals is that it loses context. When a given sub-form is encountered, the function f is given no indication of where in the data structure that sub-form is or how it is being used. If we apply the above idea to domain-specific views we can do generic traversals <em>with context</em>. The motivating example for this was a simple game I was porting called <a href="https://github.com/jamii/l-seed">l-seed</a> (I haven&#8217;t yet updated l-seed to use strucjure, but you can see a precursor to it in <a href="https://github.com/jamii/l-seed/blob/master/src/l_seed/syntax.clj">l-seed.syntax</a>). In l-seed, players submit programs defining the growth of their plant species and compete with other player&#8217;s plants for sunlight and nutrients. The plant language can be defined like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+name+</span>
</span><span class='line'>  <span class="nb">string? </span><span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+tag+</span>
</span><span class='line'>  <span class="nb">string? </span><span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+length+</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="nv">number?</span> <span class="o">#</span><span class="p">(</span><span class="nb">&lt;= </span><span class="mi">0</span> <span class="nv">%</span><span class="p">))</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+direction+</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="nv">number?</span> <span class="o">#</span><span class="p">(</span><span class="nb">&lt;= </span><span class="mi">-360</span> <span class="nv">%</span> <span class="mi">360</span><span class="p">))</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+relation+</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">or </span><span class="ss">&#39;=</span> <span class="ss">&#39;&gt;</span> <span class="ss">&#39;&gt;=</span> <span class="ss">&#39;&lt;</span> <span class="ss">&#39;&lt;=</span><span class="p">)</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+property+</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">or </span><span class="ss">&#39;tag</span> <span class="ss">&#39;length</span> <span class="ss">&#39;direction</span><span class="p">)</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+condition+</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;and</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">+condition+</span><span class="p">)</span> <span class="nv">?conditions</span><span class="p">)]</span> <span class="p">(</span><span class="nb">cons </span><span class="ss">&#39;and</span> <span class="nv">conditions</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;or</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">+condition+</span><span class="p">)</span> <span class="nv">?conditions</span><span class="p">)]</span> <span class="p">(</span><span class="nb">cons </span><span class="ss">&#39;or</span> <span class="nv">conditions</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;not</span> <span class="p">(</span><span class="nf">+condition+</span> <span class="nv">?condition</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;not</span> <span class="nv">condition</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[(</span><span class="nf">+relation+</span> <span class="nv">?relation</span><span class="p">)</span> <span class="p">(</span><span class="nf">+property+</span> <span class="nv">?property</span><span class="p">)</span> <span class="nv">?value</span><span class="p">]</span> <span class="p">(</span><span class="nb">list </span><span class="nv">relation</span> <span class="nv">property</span> <span class="nv">value</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+condition-head+</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;when</span> <span class="p">(</span><span class="nf">+condition+</span> <span class="nv">?condition</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;when</span> <span class="nv">condition</span><span class="p">)</span>
</span><span class='line'>  <span class="ss">&#39;whenever</span> <span class="ss">&#39;whenever</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+action+</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;grow-by</span> <span class="p">(</span><span class="nf">+length+</span> <span class="nv">?length</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;grow-by</span> <span class="nv">length</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;turn-by</span> <span class="p">(</span><span class="nf">+direction+</span> <span class="nv">?direction</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;turn-by</span> <span class="nv">direction</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;turn-to</span> <span class="p">(</span><span class="nf">+direction+</span> <span class="nv">?direction</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;turn-to</span> <span class="nv">direction</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;tag</span> <span class="p">(</span><span class="nf">+tag+</span> <span class="nv">?tag</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;tag</span> <span class="nv">tag</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;blossom</span> <span class="p">(</span><span class="nf">+tag+</span> <span class="nv">?tag</span><span class="p">)]</span> <span class="p">(</span><span class="nb">list </span><span class="ss">&#39;blossom</span> <span class="nv">tag</span><span class="p">)</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;branch</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="p">(</span><span class="nf">zero-or-more</span> <span class="nv">+action+</span><span class="p">))</span> <span class="nv">?action-lists</span><span class="p">)]</span> <span class="p">(</span><span class="nb">cons </span><span class="ss">&#39;branch</span> <span class="nv">action-lists</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+rule+</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">&#39;rule</span> <span class="p">(</span><span class="nf">+name+</span> <span class="nv">?name</span><span class="p">)</span> <span class="p">(</span><span class="nf">+condition-head+</span> <span class="nv">?condition-head</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">+action+</span><span class="p">)</span> <span class="nv">?actions</span><span class="p">)]</span> <span class="p">(</span><span class="nb">apply list </span><span class="ss">&#39;rule</span> <span class="nb">name </span><span class="nv">condition-head</span> <span class="nv">actions</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">+rules+</span>
</span><span class='line'>  <span class="p">[</span><span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">+rule+</span><span class="p">)</span> <span class="nv">?rules</span><span class="p">)]</span> <span class="nv">rules</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>(Note that we specify both how to take apart a data structure and how to put it together. Really, the latter should be derived from the former. I think strucjure will eventually feature reversible patterns for this purpose.)</p>

<p>We can then operate on these programs in a generic way. For example, deciding which rule to execute next:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">select*</span> <span class="p">[</span><span class="nv">properties</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">defview</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+relation+</span> <span class="nv">?relation</span><span class="p">]</span> <span class="p">(</span><span class="nb">resolve </span><span class="nv">relation</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+property+</span> <span class="nv">?property</span><span class="p">]</span> <span class="p">(</span><span class="nb">get </span><span class="nv">properties</span> <span class="nv">property</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+condition+</span> <span class="p">[</span><span class="ss">&#39;and</span> <span class="o">&amp;</span> <span class="nv">?conds</span><span class="p">]]</span> <span class="p">(</span><span class="nb">every? true? </span><span class="nv">conds</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+condition+</span> <span class="p">[</span><span class="ss">&#39;or</span> <span class="o">&amp;</span> <span class="nv">?conds</span><span class="p">]]</span> <span class="p">(</span><span class="nb">some true? </span><span class="nv">conds</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+condition+</span> <span class="p">[</span><span class="ss">&#39;not</span> <span class="nv">?cond</span><span class="p">]]</span> <span class="p">(</span><span class="nb">not </span><span class="nv">cond</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+condition+</span> <span class="p">[</span><span class="nv">?relation</span> <span class="nv">?property</span> <span class="nv">?value</span><span class="p">]]</span> <span class="p">(</span><span class="nf">relation</span> <span class="nv">property</span> <span class="nv">value</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+condition-head+</span> <span class="p">[</span><span class="ss">&#39;when</span> <span class="nv">?condition</span><span class="p">]]</span> <span class="nv">condition</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+condition-head+</span> <span class="p">[</span><span class="ss">&#39;whenever</span><span class="p">]]</span> <span class="nv">true</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+rule+</span> <span class="p">[</span><span class="ss">&#39;rule</span> <span class="nv">_</span> <span class="nv">?condition</span> <span class="o">&amp;</span> <span class="nv">?actions</span><span class="p">]]</span> <span class="p">(</span><span class="nb">when </span><span class="nv">condition</span> <span class="nv">actions</span><span class="p">)</span>
</span><span class='line'>    <span class="p">[</span><span class="o">`</span><span class="nv">+rules+</span> <span class="p">[</span><span class="o">&amp;</span> <span class="nv">?rules</span><span class="p">]]</span> <span class="p">(</span><span class="nf">choose</span> <span class="p">(</span><span class="nb">filter seq </span><span class="nv">rules</span><span class="p">))</span>
</span><span class='line'>    <span class="p">[</span><span class="nv">_</span> <span class="nv">?other</span><span class="p">]</span> <span class="nv">other</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nb">select </span><span class="p">[</span><span class="nv">rules</span> <span class="nv">properties</span><span class="p">]</span>
</span><span class='line'>  <span class="s">&quot;Pick a valid rule and return its list of actions (or nil if no rules are valid)&quot;</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">utilpostwalk</span> <span class="nv">+rules+</span> <span class="nv">rules</span> <span class="p">(</span><span class="nf">select*</span> <span class="nv">properties</span><span class="p">)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Writing code like this allows us to separate the shape of the data from the computation we perform over it.</p>

<p>We&#8217;re also not limited to just walking over data structures. We can perform more complex operations in the same generic fashion.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">map-reduce</span> <span class="p">[</span><span class="nv">strucjure</span> <span class="nv">form</span> <span class="nv">map-op</span> <span class="nv">reduce-op</span><span class="p">]</span>
</span><span class='line'>  <span class="s">&quot;Call map-op on every sub-form and reduce results with reduce-op&quot;</span>
</span><span class='line'>  <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">acc</span> <span class="p">(</span><span class="nf">atom</span> <span class="p">(</span><span class="nf">reduce-op</span><span class="p">))]</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">run</span> <span class="nv">strucjure</span> <span class="nv">form</span>
</span><span class='line'>           <span class="p">{</span><span class="ss">:post-view</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nb">name </span><span class="nv">form</span><span class="p">]</span>
</span><span class='line'>                         <span class="p">(</span><span class="nf">swap!</span> <span class="nv">acc</span> <span class="nv">reduce-op</span> <span class="p">(</span><span class="nf">map-op</span> <span class="nb">name </span><span class="nv">form</span><span class="p">))</span>
</span><span class='line'>                         <span class="nv">form</span><span class="p">)})</span>
</span><span class='line'>    <span class="err">@</span><span class="nv">acc</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">collect</span> <span class="p">[</span><span class="nv">strucjure</span> <span class="nv">form</span> <span class="nv">filter-op</span><span class="p">]</span>
</span><span class='line'>  <span class="s">&quot;Return all sub-forms satisfying filter-op&quot;</span>
</span><span class='line'>  <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">acc</span> <span class="p">(</span><span class="nf">atom</span> <span class="nv">nil</span><span class="p">)]</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">run</span> <span class="nv">strucjure</span> <span class="nv">form</span>
</span><span class='line'>           <span class="p">{</span><span class="ss">:post-view</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nb">name </span><span class="nv">form</span><span class="p">]</span>
</span><span class='line'>                         <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">filter-op</span> <span class="nb">name </span><span class="nv">form</span><span class="p">)</span>
</span><span class='line'>                           <span class="p">(</span><span class="nf">swap!</span> <span class="nv">acc</span> <span class="nb">conj </span><span class="nv">result</span><span class="p">)))})</span>
</span><span class='line'>    <span class="err">@</span><span class="nv">acc</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<h1>Types</h1>

<p>I originally learned to code in haskell. One of the things I miss about strong static typing is it that it automatically provides documentation about the data structures used in your program. Strucjure patterns can fulfill the same role. In l-seed, if you are confused about what a rule should look like you can just go look at the +rule+ pattern.</p>

<p>We can&#8217;t quite get static typing out of this, but we do get runtime checking for complex typedata-structures:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defmacro </span><span class="nv">defgenotype</span> <span class="p">[</span><span class="nb">name </span><span class="o">&amp;</span> <span class="nv">rules</span><span class="p">]</span>
</span><span class='line'>  <span class="c1">;; compile-time syntax check for the genotype language</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">run</span> <span class="nv">+rules+</span> <span class="nv">rules</span><span class="p">)</span>
</span><span class='line'>  <span class="o">`</span><span class="p">(</span><span class="k">def </span><span class="o">~</span><span class="nb">name </span><span class="o">&#39;~</span><span class="p">(</span><span class="nf">vec</span> <span class="nv">rules</span><span class="p">)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>In theory, it should also be possible to generate random data structures satisfying a given pattern. This would be useful for providing examples and for <a href="https://github.com/clojure/test.generative">generative testing</a>. In erlang, <a href="https://github.com/manopapad/proper">proper</a> allows using type-specs directly alongside hand-written generators. I haven&#8217;t yet implemented this in strucjure but I think it should be reasonably easy once reversible patterns are implemented.</p>

<h1>State machines</h1>

<p>One can think of parsers in general as state machines with look-ahead and backtracking. OMeta takes this idea and runs with it:</p>

<blockquote><p>Most  interesting  ideas  have  more  than  one  fruitful  way  to  view  them,  and  it  occurred  to  us  that,
abstractly,  one  could  think  of  TCP/IP  as  a  kind  of  “non‐deterministic  parser  with  balancing
heuristics”,  in  that  it  takes  in  a  stream  of  things,  does  various  kinds  of  pattern‐matching  on  them,
deals with errors by backing up and taking other paths, and produces a transformation of the input in
a specified form as a result.</p>

<p>Since the language transformation techniques we use operate on arbitrary objects, not just strings (see
above), and include some abilities of both standard and logic programming, it seemed that this could
be used to make a very compact TCP/IP. Our first attempt was about 160 lines of code that was robust
enough to run a website. We think this can be done even more compactly and clearly, and we plan to
take another pass at this next year.</p></blockquote>

<p>I haven&#8217;t yet tried doing anything like this in strucjure, but all the machinery is there. It would make an interesting complement to <a href="https://github.com/jamii/droplet">droplet</a>.</p>

<h1>Moving forward</h1>

<p>There are of lot of different directions for improvement and experimentation.</p>

<p>One of my top priorities is better error reporting. This sucks:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="nv">clojure.lang.ExceptionInfo</span><span class="err">:</span> <span class="nv">throw+</span><span class="err">:</span> <span class="o">#</span><span class="nv">strucjure.view.PartialMatch</span><span class="p">{</span><span class="ss">:view</span> <span class="o">#</span><span class="nv">strucjure.view.Or</span><span class="p">{</span><span class="ss">:views</span> <span class="p">[</span><span class="o">#</span><span class="nv">strucjure.view.Match</span><span class="p">{</span><span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Seq</span><span class="p">{</span><span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Chain</span><span class="p">{</span><span class="ss">:patterns</span> <span class="p">[</span><span class="o">#</span><span class="nv">strucjure.view.Import</span><span class="p">{</span><span class="ss">:view-fun</span> <span class="o">#</span><span class="nv">&lt;test$bind_STAR_$fn__2339</span> <span class="nv">test$bind_STAR_$fn__2339</span><span class="err">@</span><span class="mi">60</span><span class="nv">a896b8&gt;</span>, <span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Bind</span><span class="p">{</span><span class="ss">:symbol</span> <span class="nv">x</span><span class="p">}}</span> <span class="o">#</span><span class="nv">strucjure.pattern.Head</span><span class="p">{</span><span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.And</span><span class="p">{</span><span class="ss">:patterns</span> <span class="p">[</span><span class="o">#</span><span class="nv">strucjure.pattern.Guard</span><span class="p">{</span><span class="ss">:fun</span> <span class="o">#</span><span class="nb">&lt; </span><span class="nv">clojure.lang.AFunction$1</span><span class="err">@</span><span class="mi">5</span><span class="nv">c3f3b9b&gt;</span><span class="p">}</span> <span class="o">#</span><span class="nv">strucjure.pattern.Bind</span><span class="p">{</span><span class="ss">:symbol</span> <span class="nv">op</span><span class="p">}]}}</span> <span class="o">#</span><span class="nv">strucjure.view.Import</span><span class="p">{</span><span class="ss">:view-fun</span> <span class="o">#</span><span class="nv">&lt;test$bind_STAR_$fn__2343</span> <span class="nv">test$bind_STAR_$fn__2343</span><span class="err">@</span><span class="mi">3</span><span class="nv">b626c6d&gt;</span>, <span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Bind</span><span class="p">{</span><span class="ss">:symbol</span> <span class="nv">y</span><span class="p">}}]}}</span>, <span class="ss">:result-fun</span> <span class="o">#</span><span class="nb">&lt; </span><span class="nv">clojure.lang.AFunction$1</span><span class="err">@</span><span class="mi">3</span><span class="nv">abc8690&gt;</span><span class="p">}</span> <span class="o">#</span><span class="nv">strucjure.view.Match</span><span class="p">{</span><span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Seq</span><span class="p">{</span><span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Chain</span><span class="p">{</span><span class="ss">:patterns</span> <span class="p">[</span><span class="o">#</span><span class="nv">strucjure.view.Import</span><span class="p">{</span><span class="ss">:view-fun</span> <span class="o">#</span><span class="nv">&lt;test$bind_STAR_$fn__2347</span> <span class="nv">test$bind_STAR_$fn__2347</span><span class="err">@</span><span class="mi">2</span><span class="nv">f267610&gt;</span>, <span class="ss">:pattern</span> <span class="o">#</span><span class="nv">strucjure.pattern.Bind</span><span class="p">{</span><span class="ss">:symbol</span> <span class="nv">x</span><span class="p">}}]}}</span>, <span class="ss">:result-fun</span> <span class="o">#</span><span class="nb">&lt; </span><span class="nv">clojure.lang.AFunction$1</span><span class="err">@</span><span class="mi">6112</span><span class="nv">c9f&gt;</span><span class="p">}]}</span>, <span class="ss">:input</span> <span class="p">(</span><span class="mi">1</span> <span class="nb">- </span><span class="mi">2</span> <span class="nb">+ </span><span class="mi">3</span> <span class="nb">- </span><span class="mi">4</span><span class="p">)</span>, <span class="ss">:remaining</span> <span class="p">(</span><span class="nb">- </span><span class="mi">2</span> <span class="nb">+ </span><span class="mi">3</span> <span class="nb">- </span><span class="mi">4</span><span class="p">)</span>, <span class="ss">:output</span> <span class="mi">1</span><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>I have some ideas about how to improve this but nothing totally concrete. I could, at the very least, return the bindings that existed at the point of failure along with some kind of failure stack. If I can figure out a reasonable way to implement <a href="http://en.wikipedia.org/wiki/Cut_%28logic_programming%29">cut</a> that will also help.</p>

<p>Another short-term priority is some form of <a href="http://en.wikipedia.org/wiki/Tail_call#Tail_recursion_modulo_cons">tail call elemination</a>. Many patterns and views are naturally implemented in a recursive fashion:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defnview</span> <span class="nv">zero-or-more</span> <span class="p">[</span><span class="nv">elem</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">prefix</span> <span class="p">(</span><span class="nf">elem</span> <span class="nv">?x</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">elem</span><span class="p">)</span> <span class="nv">?xs</span><span class="p">))</span> <span class="p">(</span><span class="nb">cons </span><span class="nv">x</span> <span class="nv">xs</span><span class="p">)</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">prefix</span> <span class="p">)</span> <span class="nv">nil</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>But in the current implementation of strucjure this will quickly overflow the stack. The current workaround is to define such views by hand:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defrecord </span><span class="nv">ZeroOrMore</span> <span class="p">[</span><span class="nv">view</span><span class="p">]</span>
</span><span class='line'>  <span class="nv">View</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">run*</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">input</span> <span class="nv">opts</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nf">or</span>
</span><span class='line'>           <span class="p">(</span><span class="nb">nil? </span><span class="nv">input</span><span class="p">)</span>
</span><span class='line'>           <span class="p">(</span><span class="nb">instance? </span><span class="nv">clojure.lang.Seqable</span> <span class="nv">input</span><span class="p">))</span>
</span><span class='line'>      <span class="p">(</span><span class="k">loop </span><span class="p">[</span><span class="nv">elems</span> <span class="p">(</span><span class="nb">seq </span><span class="nv">input</span><span class="p">)</span>
</span><span class='line'>             <span class="nv">outputs</span> <span class="nv">nil</span><span class="p">]</span>
</span><span class='line'>        <span class="p">(</span><span class="nb">if-let </span><span class="p">[[</span><span class="nv">elem</span> <span class="o">&amp;</span> <span class="nv">elems</span><span class="p">]</span> <span class="nv">elems</span><span class="p">]</span>
</span><span class='line'>          <span class="p">(</span><span class="nb">if-let </span><span class="p">[[</span><span class="nv">remaining</span> <span class="nv">output</span><span class="p">]</span> <span class="p">(</span><span class="nf">run</span> <span class="nv">view</span> <span class="nv">elem</span> <span class="nv">opts</span><span class="p">)]</span>
</span><span class='line'>            <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">remaining</span><span class="p">)</span>
</span><span class='line'>              <span class="p">(</span><span class="nf">recur</span> <span class="nv">elems</span> <span class="p">(</span><span class="nb">cons </span><span class="nv">output</span> <span class="nv">outputs</span><span class="p">))</span>
</span><span class='line'>              <span class="p">[(</span><span class="nb">cons </span><span class="nv">elem</span> <span class="nv">elems</span><span class="p">)</span> <span class="p">(</span><span class="nb">reverse </span><span class="nv">outputs</span><span class="p">)])</span>
</span><span class='line'>            <span class="p">[(</span><span class="nb">cons </span><span class="nv">elem</span> <span class="nv">elems</span><span class="p">)</span> <span class="p">(</span><span class="nb">reverse </span><span class="nv">outputs</span><span class="p">)])</span>
</span><span class='line'>          <span class="p">[</span><span class="nv">nil</span> <span class="p">(</span><span class="nb">reverse </span><span class="nv">outputs</span><span class="p">)])))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">zero-or-more</span> <span class="nv">-&gt;ZeroOrMore</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is gross. I don&#8217;t have any ideas on how to overcome this.</p>

<p>I&#8217;ve already briefly mentioned reversible patterns. At the beginning of this post I warned that I would use the terms view and pattern interchangeably. The line between them in strucjure is currently blurry but I think that the distinction should be that patterns must be reversible while views are allowed to destroy information.</p>

<p>Lastly, there will eventually be a need for some level of optimization. Given the extra flexibility in strucjure I don&#8217;t expect to ever be as fast as core.match but there is certainly lots of room for improvement on the current code. Originally, strucjure patterns were compiled into efficient clojure code but the implementation was complicated and it was difficult to rapidly iterate around it. I will probably return to compilation once the semantics and interface settle down.</p>

<p>For now, I&#8217;m going to dogfood strucjure in various projects while ruminating on improvements. I&#8217;m already very happy with how much leverage can be had from such a simple idea, especially if I can fix the problems above. Hopefully the examples here might get other people thinking along the same lines.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Hacker School]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/12/02/hacker-school/"/>
    <updated>2012-12-02T01:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/12/02/hacker-school</id>
    <content type="html"><![CDATA[<p>I&#8217;ve spent the last ten weeks or so at <a href="https://www.hackerschool.com/">Hacker School</a>. It&#8217;s something like a writer&#8217;s retreat for programmers. Unlike a traditional school there is very little structure and the focus is on project-based learning. In order to make the most of this environment, it&#8217;s important to be clear exactly what your goals are.</p>

<!--more-->


<p>So here is my goal - to create better tools for the problems I regularly encounter. My focus is on building distributed systems and p2p networks but I suspect that these tools will be generally useful. When working as a freelancer I am necessarily constrained to using proven ideas and techniques because the risk assumed is not mine. Hacker School is a chance for me to explore some more far-out ideas. These ideas are drawn primarily from two places: the <a href="http://vpri.org/">Viewpoint Research Institute</a> and the <a href="boom.cs.berkeley.edu/">Berkeley Order Of Magnitude</a> project.</p>

<h1>Viewpoint Research Institute</h1>

<p>Specifically, I&#8217;m interested in the <a href="http://www.vpri.org/pdf/tr2011004_steps11.pdf">Steps Towards Expressive Programming</a> project. Their goal is no less than the reinvention of programming. By way of proof of concept they aim to develop an entire computing system, from OS to compilers to applications, in less than 20k LOC. Such a system would be compact enough to be understood in its entirety by a single person, something that is unthinkable in todays world of multi-million LOC systems. Amazingly, their initial prototypes of various subsystems actually approach this goal.</p>

<p>Their approach relies heavily on the use of <a href="http://en.wikipedia.org/wiki/Domain-specific_language">DSLs</a> to capture high-level, domain-specific expressions of intent which are then compiled into efficient code. By way of example, they describe their TCP-IP stack:</p>

<blockquote><p>Most  interesting  ideas  have  more  than  one  fruitful  way  to  view  them,  and  it  occurred  to  us  that,
abstractly,  one  could  think  of  TCP/IP  as  a  kind  of  “non‐deterministic  parser  with  balancing
heuristics”,  in  that  it  takes  in  a  stream  of  things,  does  various  kinds  of  pattern‐matching  on  them,
deals with errors by backing up and taking other paths, and produces a transformation of the input in
a specified form as a result.</p>

<p>Since the language transformation techniques we use operate on arbitrary objects, not just strings (see
above), and include some abilities of both standard and logic programming, it seemed that this could
be used to make a very compact TCP/IP. Our first attempt was about 160 lines of code that was robust
enough to run a website. We think this can be done even more compactly and clearly, and we plan to
take another pass at this next year.</p></blockquote>

<p>The &#8216;language transformation techniques&#8217; they refer to are embodied in <a href="http://lambda-the-ultimate.org/node/2477">OMeta</a>, a <a href="http://en.wikipedia.org/wiki/PEG">PEG</a>-based language for parsing and pattern-matching. OMeta provides an incredible amount of leverage for such a simple abstraction. For starters, it leads to very concise and readable descriptions of tokenisers, parsers and tree transformers which are all crucial for developing DSLs.</p>

<h1>Berkeley Order Of Magnitude</h1>

<p>The Berkeley Order Of Magnitude project has spent a number of years experimenting with using logic languages for distributed systems. Like the STEPS project, their goals are audaciously ambitious.</p>

<blockquote><p>Enter BOOM, an effort to explore implementing Cloud software using disorderly, data-centric languages. BOOM stands for the Berkeley Orders Of Magnitude project, because we seek to enable people to build systems that are OOM bigger than are building today, with OOM less effort than traditional programming methodologies.</p></blockquote>

<p>Among their <a href="boom.cs.berkeley.edu/papers.html">myriad publications</a> they describe an <a href="http://www.srcf.ucam.org/~ms705/temp/eurosys2010/boom.pdf">API-compliant reimplementation of Hadoop and HDFS</a> in ~1K lines of Overlog code, which they then extend with a variety of features (eg master-node failover via MultiPaxos) not yet found in Hadoop. Thanks to a number of high-level optimisations enabled by the simpler code-base their implementation is almost as fast as the original.</p>

<p>For me, the most interesting aspect is the amount of reflective power gained by treating everything as data:</p>

<blockquote><p>One key to our approach is that everything is data, i.e. rows in tables that can be queried and manipulated. This includes persistent data (e.g. filesystem metadata), runtime state (e.g. Hadoop scheduler bookkeeping), summary stats (e.g. for advanced straggler scheduling), in-flight msgs and system events, even parsed code. When everything in a system is data, it becomes easy to do things like parallelize computations on the state, make it fault tolerant, and express (and enforce) invariants on legal states of the system.</p></blockquote>

<p>The latest project from the BOOM group is the <a href="http://www.bloom-lang.net/">Bloom language</a>. Bloom has a more solid theoretical foundation than their previous languages and also enables an amazing level of static analysis, even being able to guarantee that certain Bloom programs are eventually consistent.</p>

<h1>Core Ideas</h1>

<p>What can I take away from these projects? Here are some vague ideas, which to my mind all seem related.</p>

<h2>Higher-level reasoning</h2>

<p>The STEPS notes talk about &#8216;separating meaning from tactics&#8217;. It&#8217;s often easier to specify what a correct solution to a problem looks like than it is to actually find it. In many domains, finding a solution is then just a matter of applying a suitable search algorithm. For example, constraint solvers such as <a href="http://www.gecode.org/">gecode</a> or <a href="https://github.com/clojure/core.logic">core.logic</a> express a problem as a set of logical constraints on the possible solutions and then search through the space of variable assignments to find a solution. By automatically pruning parts of the search space which break one or more constraints and applying user-specified search heuristics, constraint solvers can often be faster than hand-coded solvers for complex problems whilst at the same time allowing a clear, concise, declarative specification of the problem.</p>

<h2>Everything is data</h2>

<p>Constraint solving is enabled by treating both the problem specification and the solution space as data, reducing the problem to search. In lisps, treating code as data enables macros and code rewriting. In Overlog, everything from persistent data to scheduler state to the language runtime is available as data and can be queried and manipulated using the same powerful abstractions. Tracing in Overlog is as simple as adding a rule that fires whenever a new fact is derived, because the derivation itself is stored alongside the fact.</p>

<p>Whatever you are working on, making it accessible as plain data enables turning the full power and expressivity of your language directly onto the problem. This is where OO falls down, in trying to hide data behind custom interfaces. Rob Pike recently put it: &#8220;It has become clear that OO zealots are afraid of data&#8221;.</p>

<h2>Reflection</h2>

<p>When you expose the internals of a system as data to that same system, amazing (and, yes, sometimes terrifying) things happen. The STEPS folks manage to stay withing their code budget by building highly dynamic, self-hosting, meta-circular, introspective languages. Many of the amazing results of the Overlog project, from the optimising compiler to declarative distributed tracing, resulted from exposing the language runtime and program source code to the same logic engine that it implements. Turning a system in on itself and allowing it to reason about its own behaviour is an incredibly powerful idea. Certainly it can be dangerous, and it&#8217;s all too easy to tangle oneself in knots, but the results speak for themselves. This is an idea that has been <a href="http://steve-yegge.blogspot.com/2007/01/pinocchio-problem.html">expounded</a> <a href="http://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach">many</a> <a href="http://www.paulgraham.com/diff.html">times</a> before but I think there is still so much more to explore here.</p>

<h1>Progress</h1>

<p>My attempts to keep up with this have been focused on three projects.</p>

<p><a href="https://github.com/jamii/shackles">Shackles</a> is a constraint solver supporting both finite-domain and logical constraints. It was originally an experiment to see what, if any, extra power could be gained from implementing a gecode-style solver using persistent data-structures (constraint solvers in traditional languages spend much of their time cloning program state to enable back-tracking). Fortunately, <a href="https://github.com/clojure/core.logic">core.logic</a> now supports finite domain variables with constraint propagation and there has been noise about implementing user-specified search heuristcs, so that&#8217;s one less piece of code I need to write :D</p>

<p><a href="https://github.com/jamii/strucjure">Strucjure</a> is similar to OMeta but aims to be a good clojure citizen rather than a totally separate tool. As such, all of its core components are <a href="http://clojure.org/protocols">protocols</a>, semantic actions are plain clojure code and the resulting patterns and views are just nested <a href="http://clojure.org/datatypes">records</a> which can be manipulated by regular clojure code. Following the principles above, the syntax of strucjure patterns/views is <a href="https://github.com/jamii/strucjure/blob/master/src/strucjure/parser.clj#L94">self-defined using views</a> and the test suite <a href="https://github.com/jamii/strucjure/blob/master/src/strucjure/test.clj#L1">parses the documentation</a> to verify the correctness of the examples.</p>

<p><a href="https://github.com/jamii/droplet">Droplet</a> is based on the Bloom<sup>L</sup> language (an extension of the Bloom language that operates over arbitrary semi-lattices). Droplet is so far less developed than the other projects but the core interpreter is working as well as basic datalog-like rules. Again, droplet attempts to be a good clojure citizen. Rules are just clojure functions. The datalog syntax is implemented via a simple macro which produces a rule function. Individual droplets are held in <a href="http://clojure.org/agents">agents</a> and communicate either via agent sends or over <a href="https://github.com/ztellman/lamina">lamina</a> queues. I&#8217;m currently working out a composable, extensible query language that is able to operate over arbitrary semi-lattices, rather than just sets. In its current (and largely imaginary) form, it looks something like <a href="https://gist.github.com/4171094">this</a>.</p>

<p>I&#8217;ll go into more detail on the latter two projects soon but for now I&#8217;m content to just throw these ideas out into the world, without justification, and see what bounces back.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Strucjure: reading the readme]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/10/25/strucjure-reading-the-readme/"/>
    <updated>2012-10-25T19:37:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/10/25/strucjure-reading-the-readme</id>
    <content type="html"><![CDATA[<p>I just released <a href="https://github.com/jamii/strucjure">strucjure</a>, a clojure library and DSL for parsing and pattern matching based on <a href="http://lambda-the-ultimate.org/node/2477">Ometa</a>.</p>

<p>The readme on github has detailed descriptions of the syntax etc which I won&#8217;t repeat here. What I do want to do is run through a realistic example.</p>

<!--more-->


<p>The readme has a large number of examples and I want to be sure that these are all correct and up to date. As part of the test-suite for strucjure I parse the <a href="https://raw.github.com/jamii/strucjure/master/README.md">readme source</a>, pull out all the examples and make sure that they all run correctly and return the expected output.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>jamie@alien:~/strucjure<span class="nv">$ </span>lein <span class="nb">test </span>strucjure.test
</span><span class='line'>WARNING: newline already refers to: <span class="c">#&#39;clojure.core/newline in namespace: strucjure.test, being replaced by: #&#39;strucjure.test/newline</span>
</span><span class='line'>
</span><span class='line'>lein <span class="nb">test </span>strucjure.test
</span><span class='line'>
</span><span class='line'>Ran 1 tests containing 166 assertions.
</span><span class='line'>0 failures, 0 errors.
</span></code></pre></td></tr></table></div></figure>


<p>The readme parser is pretty simple. Since I control both the parser and the readme source so it doesn&#8217;t need to be bullet-proof, just the simplest thing that will get the job done. Strucjure is very bare-bones at the moment though so we have to create a lot of simple views that really belong in a library somewhere.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">space</span>
</span><span class='line'>  <span class="sc">\s</span><span class="nv">pace</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">newline</span>
</span><span class='line'>  <span class="sc">\n</span><span class="nv">ewline</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">not-newline</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">not </span><span class="sc">\n</span><span class="nv">ewline</span><span class="p">)</span> <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">line</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nb">not </span><span class="p">[])</span> <span class="c1">; have to consume at least one char</span>
</span><span class='line'>       <span class="p">(</span><span class="nf">prefix</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more</span> <span class="nv">not-newline</span><span class="p">)</span> <span class="nv">?line</span><span class="p">)</span>
</span><span class='line'>               <span class="o">&amp;</span> <span class="p">((</span><span class="nf">optional</span> <span class="nv">newline</span><span class="p">)</span> <span class="nv">?end</span><span class="p">)))</span>
</span><span class='line'>  <span class="nv">line</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">indented-line</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">prefix</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">one-or-more</span> <span class="nv">space</span><span class="p">)</span> <span class="nv">_</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">(</span><span class="nf">line</span> <span class="nv">?line</span><span class="p">))</span>
</span><span class='line'>  <span class="nv">line</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>We want a tokeniser for various parts of the readme. We could write it like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defnview</span> <span class="nv">tokenise</span> <span class="p">[</span><span class="nv">sep</span><span class="p">]</span>
</span><span class='line'>  <span class="c1">;; empty input</span>
</span><span class='line'>  <span class="p">[]</span> <span class="o">&#39;</span><span class="p">(())</span>
</span><span class='line'>  <span class="c1">;; throw away separator, start a new token</span>
</span><span class='line'>  <span class="p">[</span><span class="o">&amp;</span> <span class="p">(</span><span class="nf">sep</span> <span class="nv">_</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">tokenise</span> <span class="nv">sep</span><span class="p">)</span> <span class="nv">?results</span><span class="p">)]</span> <span class="p">(</span><span class="nb">cons </span><span class="p">()</span> <span class="nv">results</span><span class="p">)</span>
</span><span class='line'>  <span class="c1">;; add the current char to the first token</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">?char</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">tokenise</span> <span class="nv">sep</span><span class="p">)</span> <span class="p">[</span><span class="nv">?result</span> <span class="o">&amp;</span> <span class="nv">?results</span><span class="p">])]</span> <span class="p">(</span><span class="nb">cons </span><span class="p">(</span><span class="nb">cons char </span><span class="nv">result</span><span class="p">)</span> <span class="nv">results</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Unfortunately in the current implementation of strucjure that recursive call goes on the stack, so this view will blow up on large inputs. For now we just have to implement this view by hand to get access to recur.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">tokenise</span> <span class="p">[</span><span class="nv">sep</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">view/-&gt;Raw</span>
</span><span class='line'>   <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">input</span> <span class="nv">opts</span><span class="p">]</span>
</span><span class='line'>     <span class="p">(</span><span class="nb">when-let </span><span class="p">[</span><span class="nv">elems</span> <span class="p">(</span><span class="nb">seq </span><span class="nv">input</span><span class="p">)]</span>
</span><span class='line'>       <span class="p">(</span><span class="k">loop </span><span class="p">[</span><span class="nv">elems</span> <span class="nv">elems</span>
</span><span class='line'>              <span class="nv">token-acc</span> <span class="nv">nil</span>
</span><span class='line'>              <span class="nv">tokens-acc</span> <span class="nv">nil</span><span class="p">]</span>
</span><span class='line'>         <span class="p">(</span><span class="nb">if-let </span><span class="p">[[</span><span class="nv">remaining</span> <span class="nv">_</span><span class="p">]</span> <span class="p">(</span><span class="nf">view/run</span> <span class="nv">sep</span> <span class="nv">elems</span> <span class="nv">opts</span><span class="p">)]</span>
</span><span class='line'>           <span class="p">(</span><span class="nf">recur</span> <span class="nv">remaining</span> <span class="nv">nil</span> <span class="p">(</span><span class="nb">cons </span><span class="p">(</span><span class="nb">reverse </span><span class="nv">token-acc</span><span class="p">)</span> <span class="nv">tokens-acc</span><span class="p">))</span>
</span><span class='line'>           <span class="p">(</span><span class="nb">if-let </span><span class="p">[[</span><span class="nv">elem</span> <span class="o">&amp;</span> <span class="nv">elems</span><span class="p">]</span> <span class="nv">elems</span><span class="p">]</span>
</span><span class='line'>             <span class="p">(</span><span class="nf">recur</span> <span class="nv">elems</span> <span class="p">(</span><span class="nb">cons </span><span class="nv">elem</span> <span class="nv">token-acc</span><span class="p">)</span> <span class="nv">tokens-acc</span><span class="p">)</span>
</span><span class='line'>             <span class="p">[</span><span class="nv">nil</span> <span class="p">(</span><span class="nb">reverse </span><span class="p">(</span><span class="nb">cons </span><span class="p">(</span><span class="nb">reverse </span><span class="nv">token-acc</span><span class="p">)</span> <span class="nv">tokens-acc</span><span class="p">))])))))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>The rest of the parser makes more sense reading in reverse order. We start by splitting up the readme by code delimiters (triple backticks). This gives us chunks of alternating text and code, so we parse every other chunk as a block of code.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">code-delim</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">prefix</span> <span class="sc">\`</span> <span class="sc">\`</span> <span class="sc">\`</span><span class="p">)</span>
</span><span class='line'>  <span class="ss">:code-delim</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">readme</span>
</span><span class='line'>  <span class="p">((</span><span class="nf">tokenise</span> <span class="nv">code-delim</span><span class="p">)</span> <span class="nv">?chunks</span><span class="p">)</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">apply concat </span><span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="nb">partial </span><span class="nv">run</span> <span class="nv">code-block</span><span class="p">)</span> <span class="p">(</span><span class="nb">take-nth </span><span class="mi">2</span> <span class="p">(</span><span class="nb">rest </span><span class="nv">chunks</span><span class="p">)))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>We only want to look at code blocks that are marked as clojure code.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">code-block</span>
</span><span class='line'>  <span class="p">[</span><span class="sc">\c</span> <span class="sc">\l</span> <span class="sc">\o</span> <span class="sc">\j</span> <span class="sc">\u</span> <span class="sc">\r</span> <span class="sc">\e</span> <span class="sc">\n</span><span class="nv">ewline</span> <span class="o">&amp;</span> <span class="p">(</span><span class="nf">code-block-inner</span> <span class="nv">?result</span><span class="p">)]</span>
</span><span class='line'>  <span class="nv">result</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>A few of the code blocks don&#8217;t contain examples - we can detect these because they don&#8217;t start with a &#8220;user> &#8221; prompt. All the other blocks contain a list of examples separated by prompts.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">prompt</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">prefix</span> <span class="sc">\u</span> <span class="sc">\s</span> <span class="sc">\e</span> <span class="sc">\r</span> <span class="sc">\&gt;</span> <span class="sc">\s</span><span class="nv">pace</span><span class="p">)</span>
</span><span class='line'>  <span class="ss">:prompt</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">code-block-inner</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nf">prompt</span> <span class="nv">_</span><span class="p">)</span>
</span><span class='line'>       <span class="p">((</span><span class="nf">tokenise</span> <span class="nv">prompt</span><span class="p">)</span> <span class="nv">?chunks</span><span class="p">))</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="nb">partial </span><span class="nv">run</span> <span class="nv">example</span><span class="p">)</span> <span class="p">(</span><span class="nb">filter </span><span class="o">#</span><span class="p">(</span><span class="nb">not </span><span class="p">(</span><span class="nf">empty?</span> <span class="nv">%</span><span class="p">))</span> <span class="nv">chunks</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'>  <span class="nv">_</span> <span class="c1">;; not a block of examples</span>
</span><span class='line'>  <span class="nv">nil</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>An example consists of an input, which may be on multiple lines, zero or more lines of printed output and finally a result.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">example</span>
</span><span class='line'>  <span class="p">[</span><span class="o">&amp;</span> <span class="p">(</span><span class="nf">line</span> <span class="nv">?input-first</span><span class="p">)</span>
</span><span class='line'>   <span class="o">&amp;</span> <span class="p">((</span><span class="nf">zero-or-more-prefix</span> <span class="nv">indented-line</span><span class="p">)</span> <span class="nv">?input-rest</span><span class="p">)</span>
</span><span class='line'>   <span class="o">&amp;</span> <span class="p">((</span><span class="nf">one-or-more-prefix</span> <span class="nv">line</span><span class="p">)</span> <span class="nv">?output-lines</span><span class="p">)]</span>
</span><span class='line'>  <span class="p">{</span><span class="ss">:input</span> <span class="p">(</span><span class="nb">with-out-str </span><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">line</span> <span class="p">(</span><span class="nb">cons </span><span class="nv">input-first</span> <span class="nv">input-rest</span><span class="p">)]</span> <span class="p">(</span><span class="nb">print </span><span class="p">(</span><span class="nb">apply str </span><span class="nv">line</span><span class="p">)</span> <span class="sc">\s</span><span class="nv">pace</span><span class="p">)))</span>
</span><span class='line'>   <span class="ss">:prints</span> <span class="p">(</span><span class="nb">with-out-str </span><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">line</span> <span class="p">(</span><span class="nb">butlast </span><span class="nv">output-lines</span><span class="p">)]</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">apply str </span><span class="nv">line</span><span class="p">))))</span>
</span><span class='line'>   <span class="ss">:result</span> <span class="p">(</span><span class="nf">run</span> <span class="nv">result</span> <span class="p">(</span><span class="nb">last </span><span class="nv">output-lines</span><span class="p">))})</span>
</span></code></pre></td></tr></table></div></figure>


<p>The result is either a return value or an exception.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="c1">;; #&quot;[a-zA-Z\.]&quot;</span>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">exception-chars</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">or </span><span class="sc">\.</span>
</span><span class='line'>      <span class="o">#</span><span class="p">(</span><span class="nb">&lt;= </span><span class="p">(</span><span class="nb">int </span><span class="sc">\a</span><span class="p">)</span> <span class="p">(</span><span class="nb">int </span><span class="nv">%</span><span class="p">)</span> <span class="p">(</span><span class="nb">int </span><span class="sc">\z</span><span class="p">))</span>
</span><span class='line'>      <span class="o">#</span><span class="p">(</span><span class="nb">&lt;= </span><span class="p">(</span><span class="nb">int </span><span class="sc">\A</span><span class="p">)</span> <span class="p">(</span><span class="nb">int </span><span class="nv">%</span><span class="p">)</span> <span class="p">(</span><span class="nb">int </span><span class="sc">\Z</span><span class="p">)))</span>
</span><span class='line'>  <span class="nv">%</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">defview</span> <span class="nv">result</span>
</span><span class='line'>  <span class="p">[</span><span class="sc">\E</span> <span class="sc">\x</span> <span class="sc">\c</span> <span class="sc">\e</span> <span class="sc">\p</span> <span class="sc">\t</span> <span class="sc">\i</span> <span class="sc">\o</span> <span class="sc">\n</span> <span class="sc">\I</span> <span class="sc">\n</span> <span class="sc">\f</span> <span class="sc">\o</span> <span class="sc">\s</span><span class="nv">pace</span>
</span><span class='line'>   <span class="sc">\t</span> <span class="sc">\h</span> <span class="sc">\r</span> <span class="sc">\o</span> <span class="sc">\w</span> <span class="sc">\+</span> <span class="sc">\:</span> <span class="sc">\s</span><span class="nv">pace</span>
</span><span class='line'>   <span class="sc">\#</span> <span class="o">&amp;</span> <span class="p">((</span><span class="nf">one-or-more</span> <span class="nv">exception-chars</span><span class="p">)</span> <span class="nv">?exception</span><span class="p">)</span>
</span><span class='line'>   <span class="o">&amp;</span> <span class="nv">_</span><span class="p">]</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">:throws</span> <span class="p">(</span><span class="nb">apply str </span><span class="nv">exception</span><span class="p">)]</span>
</span><span class='line'>
</span><span class='line'>  <span class="nv">?data</span>
</span><span class='line'>  <span class="p">[</span><span class="ss">:returns</span> <span class="p">(</span><span class="nb">apply str </span><span class="nv">data</span><span class="p">)])</span>
</span></code></pre></td></tr></table></div></figure>


<p>That&#8217;s it - parsing done.</p>

<p>Now we just have to turn the results into unit tests. We have to be careful about comparing the results of the examples because they might contain closures, which look different every time.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">replace-fun</span> <span class="p">[</span><span class="nv">unread-form</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">.replaceAll</span> <span class="nv">unread-form</span> <span class="s">&quot;#&lt;[^&gt;]*&gt;&quot;</span> <span class="s">&quot;#&lt;fun&gt;&quot;</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">prints-as</span> <span class="p">[</span><span class="nv">string</span> <span class="nv">form</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">= </span><span class="p">(</span><span class="nf">replace-fun</span> <span class="nv">string</span><span class="p">)</span> <span class="p">(</span><span class="nf">replace-fun</span> <span class="p">(</span><span class="nb">with-out-str </span><span class="p">(</span><span class="nb">pr </span><span class="nv">form</span><span class="p">)))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Running the examples is a little tricky because some of them create bindings or classes that are used by later examples. We end up needing to eval the code at runtime.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">example-test</span> <span class="p">[</span><span class="nv">input</span> <span class="nv">prints</span> <span class="nv">result</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">match</span> <span class="nv">result</span>
</span><span class='line'>         <span class="p">[</span><span class="ss">:returns</span> <span class="nv">?value</span><span class="p">]</span>
</span><span class='line'>         <span class="p">(</span><span class="nf">do</span>
</span><span class='line'>           <span class="p">(</span><span class="nf">is</span> <span class="p">(</span><span class="nf">prints-as</span> <span class="nv">value</span> <span class="p">(</span><span class="nf">input</span><span class="p">)))</span>
</span><span class='line'>           <span class="p">(</span><span class="nf">is</span> <span class="p">(</span><span class="nb">= </span><span class="nv">prints</span> <span class="p">(</span><span class="nb">with-out-str </span><span class="p">(</span><span class="nf">input</span><span class="p">)))))</span>
</span><span class='line'>
</span><span class='line'>         <span class="p">[</span><span class="ss">:throws</span> <span class="nv">?exception</span><span class="p">]</span>
</span><span class='line'>         <span class="p">(</span><span class="nf">do</span>
</span><span class='line'>           <span class="p">(</span><span class="nf">is</span> <span class="p">(</span><span class="nf">try+</span> <span class="p">(</span><span class="nf">input</span><span class="p">)</span>
</span><span class='line'>                     <span class="nv">nil</span>
</span><span class='line'>                     <span class="p">(</span><span class="nf">catch</span> <span class="nv">java.lang.Object</span> <span class="nv">thrown</span>
</span><span class='line'>                       <span class="p">(</span><span class="nf">prints-as</span> <span class="nv">exception</span> <span class="p">(</span><span class="nb">class </span><span class="nv">thrown</span><span class="p">)))))</span>
</span><span class='line'>           <span class="p">(</span><span class="nf">is</span> <span class="p">(</span><span class="nb">= </span><span class="nv">prints</span> <span class="p">(</span><span class="nf">with-out-str</span>
</span><span class='line'>                           <span class="p">(</span><span class="nf">try+</span> <span class="p">(</span><span class="nf">input</span><span class="p">)</span>
</span><span class='line'>                                 <span class="p">(</span><span class="nf">catch</span> <span class="nv">java.lang.Object</span> <span class="nv">_</span> <span class="nv">nil</span><span class="p">))))))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defmacro </span><span class="nv">insert-example-test</span> <span class="p">[{</span><span class="ss">:keys</span> <span class="p">[</span><span class="nv">input</span> <span class="nv">prints</span> <span class="nv">result</span><span class="p">]}]</span>
</span><span class='line'>  <span class="o">`</span><span class="p">(</span><span class="nf">example-test</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="p">(</span><span class="nb">eval </span><span class="o">&#39;</span><span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">use</span> <span class="o">&#39;~</span><span class="ss">&#39;strucjure</span><span class="p">)</span> <span class="o">~</span><span class="p">(</span><span class="nf">read-string</span> <span class="nv">input</span><span class="p">))))</span> <span class="o">~</span><span class="nv">prints</span> <span class="o">&#39;~</span><span class="nv">result</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defmacro </span><span class="nv">insert-readme-test</span> <span class="p">[</span><span class="nv">file</span><span class="p">]</span>
</span><span class='line'>  <span class="o">`</span><span class="p">(</span><span class="nf">do</span>
</span><span class='line'>     <span class="o">~@</span><span class="p">(</span><span class="nb">for </span><span class="p">[</span><span class="nv">example</span> <span class="p">(</span><span class="nf">run</span> <span class="nv">readme</span> <span class="p">(</span><span class="nb">seq </span><span class="p">(</span><span class="nb">slurp </span><span class="p">(</span><span class="nb">eval </span><span class="nv">file</span><span class="p">))))]</span>
</span><span class='line'>         <span class="o">`</span><span class="p">(</span><span class="nf">insert-example-test</span> <span class="o">~</span><span class="nv">example</span><span class="p">))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">deftest</span> <span class="nv">readme-test</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">insert-readme-test</span> <span class="s">&quot;README.md&quot;</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is fun. Not only does strucjure parse its own syntax, it reads its own documentation!</p>

<p>Parts of this were a little painful. The next version of strucjure will definitely have improved string matching. I&#8217;m also looking at optimising/compiling views, as well as memoisation. Previous versions of strucjure supported both but were hard to maintain. For now I&#8217;m going to be moving on to using strucjure to build other useful DSLs.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Causal ordering]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/08/16/causal-ordering/"/>
    <updated>2012-08-16T05:16:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/08/16/causal-ordering</id>
    <content type="html"><![CDATA[<p>Causal ordering is a vital tool for thinking about distributed systems. Once you understand it, many other concepts become much simpler.</p>

<!--more-->


<p>We&#8217;ll start with the fundamental property of distributed systems:</p>

<pre><code>Messages sent between machines may arrive zero or more times at any point after they are sent
</code></pre>

<p>This is the sole reason that building distributed systems is hard.</p>

<p>For example, because of this property it is impossible for two computers communicating over a network to agree on the exact time. You can send me a message saying &#8220;it is now 10:00:00&#8221; but I don&#8217;t know how long it took for that message to arrive. We can send messages back and forth all day but we will never know for sure that we are synchronised.</p>

<p>If we can&#8217;t agree on the time then we can&#8217;t always agree on what order things happen in. Suppose I say &#8220;my user logged on at 10:00:00&#8221; and you say &#8220;my user logged on at 10:00:01&#8221;. Maybe mine was first or maybe my clock is just fast relative to yours. The only way to know for sure is if something connects those two events. For example, if my user logged on and then sent your user an email and if you received that email before your user logged on then we know for sure that mine was first.</p>

<p>This concept is called causal ordering and is written like this:</p>

<pre><code>A-&gt;B (event A is causally ordered before event B)
</code></pre>

<p>Let&#8217;s define it a little more formally. We model the world as follows: We have a number of machines on which we observe a series of events. These events are either specific to one machine (eg user input) or are communications between machines. We define the causal ordering of these events by three rules:</p>

<pre><code>If A and B happen on the same machine and A happens before B then A-&gt;B

If I send you some message M and you receive it then (send M)-&gt;(recv M)

If A-&gt;B and B-&gt;C then A-&gt;C
</code></pre>

<p>We are used to thinking of ordering by time which is a <a href="http://en.wikipedia.org/wiki/Total_order">total order</a> - every pair of events can be placed in some order. In contrast, causal ordering is only a <a href="http://en.wikipedia.org/wiki/Partially_ordered_set">partial order</a> - sometimes events happen with no possible causal relationship i.e. not (A->B or B->A).</p>

<p><a href="http://upload.wikimedia.org/wikipedia/commons/5/55/Vector_Clock.svg">This image</a> shows a nice way to picture these relationships.</p>

<p>On a single machine causal ordering is exactly the same as time ordering (actually, on a multi-core machine the situation is <a href="http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency.html">more complicated</a>, but let&#8217;s forget about that for now). Between machines causal ordering is conveyed by messages. Since sending messages is the only way for machines to affect each other this gives rise to a nice property:</p>

<pre><code>If not(A-&gt;B) then A cannot possibly have caused B
</code></pre>

<p>Since we don&#8217;t have a single global time this is the only thing that allows us to reason about causality in a distributed system. This is really important so let&#8217;s say it again:</p>

<pre><code>Communication bounds causality
</code></pre>

<p>The lack of a total global order is not just an accidental property of computer systems, it is a <a href="http://en.wikipedia.org/wiki/Light_cone">fundamental property</a> of the laws of physics. I claimed that understanding causal order makes many other concepts much simpler. Let&#8217;s skim over some examples.</p>

<h2>Vector Clocks</h2>

<p><a href="http://en.wikipedia.org/wiki/Lamport_timestamps">Lamport clocks</a> and <a href="http://en.wikipedia.org/wiki/Vector_clock">Vector clocks</a> are data-structures which efficiently approximate the causal ordering and so can be used by programs to reason about causality.</p>

<pre><code>If A-&gt;B then LC_A &lt; LC_B

If VC_A &lt; VC_B then A-&gt;B
</code></pre>

<p>Different types of vector clock trade-off compression vs accuracy by storing smaller or larger portions of the causal history of an event.</p>

<h2>Consistency</h2>

<p>When mutable state is distributed over multiple machines each machine can receive update events at different times and in different orders. If the final state is dependent on the order of updates then the system must choose a single serialisation of the events, imposing a global total order. A distributed system is consistent exactly when the outside world can never observe two different serialisations.</p>

<h2>CAP Theorem</h2>

<p>The CAP (Consistency-Availability-Partition) theorem also boils down to causality. When a machine in a distributed system is asked to perform an action that depends on its current state it must decide that state by choosing a serialisation of the events it has seen. It has two options:</p>

<ol>
<li>Choose a serialisation of its current events immediately</li>
<li>Wait until it is sure it has seen all concurrent events before choosing a serialisation</li>
</ol>


<p>The first choice risks violating consistency if some other machine makes the same choice with a different set of events. The second violates availability by waiting for every other machine that could possibly have received a conflicting event before performing the requested action. There is no need for an actual network partition to happen - the trade-off between availability and consistency exists whenever communication between components is not instant. We can state this even more simply:</p>

<pre><code>Ordering requires waiting
</code></pre>

<p>Even your hardware <a href="http://en.wikipedia.org/wiki/Memory_barrier">cannot escape</a> this law. It provides the illusion of synchronous access to memory at the cost of availabilty. If you want to write fast parallel programs then you need to understand the messaging model used by the underlying hardware.</p>

<h2>Eventual Consistency</h2>

<p>A system is eventually consistent if the final state of each machine is the same regardless of how we choose to serialise update events. An eventually consistent system allows us to sacrifice availability without having the state of different machines diverge irreparably. It doesn&#8217;t save us from having the outside world see different serialisations of update events. It is also difficult to construct eventually consistent data structures and to reason about their composition.</p>

<h2>Further reading</h2>

<p><a href="http://hal.inria.fr/inria-00397981/en/">CRDTs</a> provide guidance on constructing eventually consistent data-structures.</p>

<p><a href="http://www.bloom-lang.net/">Bloom</a> is a logic-based DSL for writing distributed systems. The core observation is that there is a natural connection between monotonic logic programs (logic programs which do not have to retract output when given additional inputs) and available distributed systems (where individual machines do not have to wait until all possible inputs have been received before producing output). <a href="http://db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf">Recent work</a> from the Bloom group shows how to merge their approach with the CRDT approach to get the best of both worlds.</p>

<p>Nathan Marz suggests <a href="http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html">an architecture for data processing systems</a> which avoids much of the pain caused by the CAP theorem. In short, combine a consistent batch-processing layer with an available, eventually consistent real-time layer so that the system as a whole is available but any errors in the (complicated, difficult to program) eventually consistent layer are transient and cannot corrupt the consistent data store.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Frustrations with erlang]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/01/03/frustrations-with-erlang/"/>
    <updated>2012-01-03T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/01/03/frustrations-with-erlang</id>
    <content type="html"><![CDATA[<p>With my work on erl-telehash and at Smarkets I find myself fighting erlang more and more. The biggest pains are the dearth of libraries, the lack of polymorphism and being forced into a single model of concurrency.</p>

<!--more-->


<p>The first is self-explanatory and pretty well-known. I frequently have to fire up a python process through a port to do something simple like send an email. Even the standard library is incomplete and inconsistent.</p>

<p>The second doesn&#8217;t start to hurt until your codebase gets a bit bigger. For example, Smarkets makes a lot of use of fixed-precision decimal arithmetic which leads to code like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nn">decimal</span><span class="p">:</span><span class="n">mult</span><span class="p">(</span><span class="nv">Qty</span><span class="p">,</span>  <span class="nn">decimal</span><span class="p">:</span><span class="n">sub</span><span class="p">(</span><span class="nn">decimal</span><span class="p">:</span><span class="n">to_decimal</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="nv">Price</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>It also means any time you want to change a data-structure for one with an equivalent interface you have to rewrite whole swathes of code.</p>

<p>The third point is a bit more contentious. I&#8217;m fairly convinced that the erlang philosophy of fail-early, crash-only, restartable tasks is the right solution for most problems. What bugs me is that erlang conflates addresses, queues and actors by giving each process a single mailbox. This leads to problems like requiring the recipient of a message to have a global name if it is to be independently restartable, which means you can&#8217;t run more than one copy of that message topology on the same node. It also encourages processes to send messages directly to other processes which makes it difficult to create flexible, rewirable topologies or to isolate pieces of a topology for testing. I would prefer a model in which processes send and receive messages through queues which are wired together outside of the process. This would also allow restarting a process (and clearing but not deleting its queues) without giving it a global name.</p>

<p>I&#8217;m not about to run out now and rewrite erl-telehash in another language. It&#8217;s close enough to complete (for my purposes at least) that I&#8217;ll just continue with the existing code. For future experiments, however, I want something better.</p>

<p>The top candidate at the moment is clojure. It has the potential to replace my use of erlang and python, saving lots of cross-language pain. Agents look a lot like a (cleaner, saner) implementation of the <a href="http://scattered-thoughts.net/one/1300/292121/72985">mealy machines</a> that I wrote at Smarkets. <a href="https://github.com/ztellman/lamina">Lamina</a> neatly solves the queue pains I described above. <a href="http://code.google.com/p/clojure-contrib/wiki/DatalogOverview">Datalog</a> is the natural way to describe a lot of collections, including <a href="https://github.com/jamii/erl-telehash/blob/master/src/th_bucket.erl">th_bucket</a> which is in its current form is not obviously correct. The clojure community just seems to churn out well-designed libraries (lamina, aleph, slice, incanter, pallet, cascalog, storm, overtone etc).</p>

<p>In the short term I will get started by rewriting <a href="https://github.com/jamii/binmap">binmap</a>, since it&#8217;s fresh in my mind and simple enough to finish quickly. If that goes well it might eventually become an educational port of <a href="http://libswift.org">swift</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Binmaps: compressed bitmaps]]></title>
    <link href="http://scattered-thoughts.net/blog/2012/01/03/binmaps-compressed-bitmaps/"/>
    <updated>2012-01-03T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2012/01/03/binmaps-compressed-bitmaps</id>
    <content type="html"><![CDATA[<p>Lately I&#8217;ve been porting some code from c++. The code in question is a compressed bitmap used in <a href="http://libswift.org">swift</a> to track which parts of a download have already been retrieved. To reduce the memory usage the original uses lots of pointer tricks. Replicating these in ocaml is interesting.</p>

<!--more-->


<p>Here is the basic idea. Conceptually a binmap is a tree of bitmaps. In a leaf at the bottom of the tree each bit in the bitmap represents one bit. In a leaf one layer above the bottom each bit in the bitmap represents two bits. In a leaf two layers above the bottom each bit in the bitmap represents four bits etc.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">type</span> <span class="n">t</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">{</span> <span class="n">layers</span> <span class="o">:</span> <span class="kt">int</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">tree</span> <span class="o">:</span> <span class="n">tree</span> <span class="o">}</span>
</span><span class='line'>
</span><span class='line'><span class="k">type</span> <span class="n">tree</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Bitmap</span> <span class="k">of</span> <span class="kt">int</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Branch</span> <span class="k">of</span> <span class="n">tree</span> <span class="o">*</span> <span class="n">tree</span>
</span></code></pre></td></tr></table></div></figure>


<p>Let&#8217;s pretend for simplicity our bitmaps are only 1 bit wide. Then the string 00000000 would be represented as:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="o">{</span> <span class="n">layers</span> <span class="o">=</span> <span class="mi">3</span>
</span><span class='line'><span class="o">;</span> <span class="n">tree</span> <span class="o">=</span> <span class="nc">Bitmap</span> <span class="mi">0</span> <span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>And the string 00001100 would be:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="o">{</span> <span class="n">layers</span> <span class="o">=</span> <span class="mi">3</span>
</span><span class='line'><span class="o">;</span> <span class="n">tree</span> <span class="o">=</span>
</span><span class='line'>    <span class="nc">Branch</span>
</span><span class='line'>      <span class="o">(</span><span class="nc">Bitmap</span> <span class="mi">0</span><span class="o">)</span>
</span><span class='line'>      <span class="o">(</span><span class="nc">Branch</span>
</span><span class='line'>        <span class="o">(</span><span class="nc">Bitmap</span> <span class="mi">1</span><span class="o">)</span>
</span><span class='line'>        <span class="o">(</span><span class="nc">Bitmap</span> <span class="mi">0</span><span class="o">))</span> <span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>The worst case for this data structure is the string 0101010101&#8230; In this case we use about 6.5x as much memory as needed by a plain bitmap (3 words for a Branch with two pointers, 4 words for a Bitmap with a pointer to a boxed Int32). The c++ version uses some simple tricks to reduce this overhead to just over 2x that of a plain bitmap. We can replicate these in ocaml by using a bigarray to simulate raw memory access.</p>

<p>Our data structure looks like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">module</span> <span class="nc">Array</span> <span class="o">=</span>
</span><span class='line'><span class="k">struct</span>
</span><span class='line'>  <span class="k">include</span> <span class="nn">Bigarray</span><span class="p">.</span><span class="nc">Array1</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">geti</span> <span class="kt">array</span> <span class="n">i</span> <span class="o">=</span> <span class="nn">Bitmap</span><span class="p">.</span><span class="n">to_int</span> <span class="o">(</span><span class="nn">Bigarray</span><span class="p">.</span><span class="nn">Array1</span><span class="p">.</span><span class="n">get</span> <span class="kt">array</span> <span class="n">i</span><span class="o">)</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">seti</span> <span class="kt">array</span> <span class="n">i</span> <span class="n">v</span> <span class="o">=</span> <span class="nn">Bigarray</span><span class="p">.</span><span class="nn">Array1</span><span class="p">.</span><span class="n">set</span> <span class="kt">array</span> <span class="n">i</span> <span class="o">(</span><span class="nn">Bitmap</span><span class="p">.</span><span class="n">of_int</span> <span class="n">v</span><span class="o">)</span>
</span><span class='line'><span class="k">end</span>
</span><span class='line'>
</span><span class='line'><span class="k">type</span> <span class="n">t</span> <span class="o">=</span>
</span><span class='line'>    <span class="o">{</span> <span class="n">length</span> <span class="o">:</span> <span class="kt">int</span>
</span><span class='line'>    <span class="o">;</span> <span class="n">layers</span> <span class="o">:</span> <span class="kt">int</span>
</span><span class='line'>    <span class="o">;</span> <span class="k">mutable</span> <span class="kt">array</span> <span class="o">:</span> <span class="o">(</span><span class="nn">Bitmap</span><span class="p">.</span><span class="n">t</span><span class="o">,</span> <span class="nn">Bitmap</span><span class="p">.</span><span class="n">bigarray_elt</span><span class="o">,</span> <span class="nn">Bigarray</span><span class="p">.</span><span class="n">c_layout</span><span class="o">)</span> <span class="nn">Array</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>    <span class="o">;</span> <span class="n">pointers</span> <span class="o">:</span> <span class="nn">Widemap</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>    <span class="o">;</span> <span class="k">mutable</span> <span class="n">free</span> <span class="o">:</span> <span class="kt">int</span> <span class="o">}</span>
</span><span class='line'>
</span><span class='line'><span class="k">type</span> <span class="n">node</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Bitmap</span> <span class="k">of</span> <span class="nn">Bitmap</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Pointer</span> <span class="k">of</span> <span class="kt">int</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">get_node</span> <span class="n">binmap</span> <span class="n">node_addr</span> <span class="n">is_left</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">index</span> <span class="o">=</span> <span class="n">node_addr</span> <span class="o">+</span> <span class="o">(</span><span class="k">if</span> <span class="n">is_left</span> <span class="k">then</span> <span class="mi">0</span> <span class="k">else</span> <span class="mi">1</span><span class="o">)</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">match</span> <span class="nn">Widemap</span><span class="p">.</span><span class="n">get</span> <span class="n">binmap</span><span class="o">.</span><span class="n">pointers</span> <span class="n">index</span> <span class="k">with</span>
</span><span class='line'>  <span class="o">|</span> <span class="bp">false</span> <span class="o">-&gt;</span> <span class="nc">Bitmap</span> <span class="o">(</span><span class="nn">Array</span><span class="p">.</span><span class="n">get</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="n">index</span><span class="o">)</span>
</span><span class='line'>  <span class="o">|</span> <span class="bp">true</span> <span class="o">-&gt;</span> <span class="nc">Pointer</span> <span class="o">(</span><span class="nn">Array</span><span class="p">.</span><span class="n">geti</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="n">index</span><span class="o">)</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">set_node</span> <span class="n">binmap</span> <span class="n">node_addr</span> <span class="n">is_left</span> <span class="n">node</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">index</span> <span class="o">=</span> <span class="n">node_addr</span> <span class="o">+</span> <span class="o">(</span><span class="k">if</span> <span class="n">is_left</span> <span class="k">then</span> <span class="mi">0</span> <span class="k">else</span> <span class="mi">1</span><span class="o">)</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">match</span> <span class="n">node</span> <span class="k">with</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Bitmap</span> <span class="n">bitmap</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nn">Widemap</span><span class="p">.</span><span class="n">set</span> <span class="n">binmap</span><span class="o">.</span><span class="n">pointers</span> <span class="n">index</span> <span class="bp">false</span><span class="o">;</span>
</span><span class='line'>      <span class="nn">Array</span><span class="p">.</span><span class="n">set</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="n">index</span> <span class="n">bitmap</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Pointer</span> <span class="kt">int</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nn">Widemap</span><span class="p">.</span><span class="n">set</span> <span class="n">binmap</span><span class="o">.</span><span class="n">pointers</span> <span class="n">index</span> <span class="bp">true</span><span class="o">;</span>
</span><span class='line'>      <span class="nn">Array</span><span class="p">.</span><span class="n">seti</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="n">index</span> <span class="kt">int</span>
</span></code></pre></td></tr></table></div></figure>


<p>Each pair of cells in the array represents a branch. Leaves are hoisted into their parent branch, replacing the pointer. Widemap.t is an extensible bitmap which we use here to track whether a given cell in the array is a pointer or a bitmap. The length field is the number of bits represented by bitmap. The free field will be explained later.</p>

<p>Our previous example string 00001100 would now be represented like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="c">(*</span>
</span><span class='line'><span class="c">  0 -&gt; Bitmap 0</span>
</span><span class='line'><span class="c">  1 -&gt; Pointer 2</span>
</span><span class='line'><span class="c">  2 -&gt; Bitmap 1</span>
</span><span class='line'><span class="c">  3 -&gt; Bitmap 0</span>
</span><span class='line'><span class="c">*)</span>
</span><span class='line'>
</span><span class='line'><span class="o">{</span> <span class="n">length</span> <span class="o">=</span> <span class="mi">8</span><span class="o">;</span>
</span><span class='line'><span class="o">;</span> <span class="n">layers</span> <span class="o">=</span> <span class="mi">3</span><span class="o">;</span>
</span><span class='line'><span class="o">;</span> <span class="kt">array</span> <span class="o">=</span> <span class="o">[|</span> <span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">0</span> <span class="o">|]</span>
</span><span class='line'><span class="o">;</span> <span class="n">pointers</span> <span class="o">=</span> <span class="nn">Widemap</span><span class="p">.</span><span class="n">of_string</span> <span class="s2">&quot;0100&quot;</span>
</span><span class='line'><span class="o">;</span> <span class="n">free</span> <span class="o">=</span> <span class="mi">0</span> <span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>When the bitmap is changed we may have to add or delete pairs eg if the above example changed to 00001111 it would be represented as:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="c">(*</span>
</span><span class='line'><span class="c">  0 -&gt; Bitmap 0</span>
</span><span class='line'><span class="c">  1 -&gt; Bitmap 1</span>
</span><span class='line'><span class="c">  2 -&gt; ?</span>
</span><span class='line'><span class="c">  3 -&gt; ?</span>
</span><span class='line'><span class="c">*)</span>
</span></code></pre></td></tr></table></div></figure>


<p>We can grow and shrink the array as necessary, but since deleted pairs won&#8217;t necessarily be at the end of the used space the bigarray will become fragmented. To avoid wasting space we can write a linked list into the empty pairs to keep track of free space. 0 is always the root of the tree so we can use it as a list terminator. The free field marks the start of the list.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">let</span> <span class="n">del_pair</span> <span class="n">binmap</span> <span class="n">node_addr</span> <span class="o">=</span>
</span><span class='line'>  <span class="nn">Array</span><span class="p">.</span><span class="n">seti</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="n">node_addr</span> <span class="n">binmap</span><span class="o">.</span><span class="n">free</span><span class="o">;</span>
</span><span class='line'>  <span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="o">&lt;-</span> <span class="n">node_addr</span>
</span><span class='line'>
</span><span class='line'><span class="c">(* double the size of a full array and then initialise the freelist *)</span>
</span><span class='line'><span class="k">let</span> <span class="n">grow_array</span> <span class="n">binmap</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">assert</span> <span class="o">(</span><span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="o">=</span> <span class="mi">0</span><span class="o">);</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">old_len</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">dim</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">assert</span> <span class="o">(</span><span class="n">old_len</span> <span class="ow">mod</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">0</span><span class="o">);</span>
</span><span class='line'>  <span class="k">assert</span> <span class="o">(</span><span class="n">old_len</span> <span class="o">&lt;=</span> <span class="n">max_int</span><span class="o">);</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">new_len</span> <span class="o">=</span> <span class="n">min</span> <span class="n">max_int</span> <span class="o">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">old_len</span><span class="o">)</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">assert</span> <span class="o">(</span><span class="n">new_len</span> <span class="ow">mod</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">0</span><span class="o">);</span>
</span><span class='line'>  <span class="k">let</span> <span class="kt">array</span> <span class="o">=</span> <span class="n">create_array</span> <span class="n">new_len</span> <span class="k">in</span>
</span><span class='line'>  <span class="nn">Array</span><span class="p">.</span><span class="n">blit</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="o">(</span><span class="nn">Array</span><span class="p">.</span><span class="n">sub</span> <span class="kt">array</span> <span class="mi">0</span> <span class="n">old_len</span><span class="o">);</span>
</span><span class='line'>  <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="o">&lt;-</span> <span class="kt">array</span><span class="o">;</span>
</span><span class='line'>  <span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="o">&lt;-</span> <span class="n">old_len</span><span class="o">;</span>
</span><span class='line'>  <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="n">old_len</span> <span class="k">to</span> <span class="n">new_len</span><span class="o">-</span><span class="mi">4</span> <span class="k">do</span>
</span><span class='line'>    <span class="k">if</span> <span class="n">i</span> <span class="ow">mod</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">0</span>  <span class="k">then</span> <span class="nn">Array</span><span class="p">.</span><span class="n">seti</span> <span class="kt">array</span> <span class="n">i</span> <span class="o">(</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="o">)</span>
</span><span class='line'>  <span class="k">done</span><span class="o">;</span>
</span><span class='line'>  <span class="nn">Array</span><span class="p">.</span><span class="n">seti</span> <span class="kt">array</span> <span class="o">(</span><span class="n">new_len</span><span class="o">-</span><span class="mi">2</span><span class="o">)</span> <span class="mi">0</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">add_pair</span> <span class="n">binmap</span> <span class="n">node_left</span> <span class="n">node_right</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">(</span><span class="k">if</span> <span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">then</span> <span class="n">grow_array</span> <span class="n">binmap</span><span class="o">);</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">node_addr</span> <span class="o">=</span> <span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">free_next</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">geti</span> <span class="n">binmap</span><span class="o">.</span><span class="kt">array</span> <span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="k">in</span>
</span><span class='line'>  <span class="n">binmap</span><span class="o">.</span><span class="n">free</span> <span class="o">&lt;-</span> <span class="n">free_next</span><span class="o">;</span>
</span><span class='line'>  <span class="n">set_node</span> <span class="n">binmap</span> <span class="n">node_addr</span> <span class="bp">true</span> <span class="n">node_left</span><span class="o">;</span>
</span><span class='line'>  <span class="n">set_node</span> <span class="n">binmap</span> <span class="n">node_addr</span> <span class="bp">false</span> <span class="n">node_right</span><span class="o">;</span>
</span><span class='line'>  <span class="n">node_addr</span>
</span></code></pre></td></tr></table></div></figure>


<p>I haven&#8217;t yet written any code to shrink the array but it should be fairly straightforward to recursively copy the tree into a new array and rewrite the pointers.</p>

<p>With the freelist our modified example now looks like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="o">{</span> <span class="n">length</span> <span class="o">=</span> <span class="mi">8</span><span class="o">;</span>
</span><span class='line'><span class="o">;</span> <span class="n">layers</span> <span class="o">=</span> <span class="mi">3</span><span class="o">;</span>
</span><span class='line'><span class="o">;</span> <span class="kt">array</span> <span class="o">=</span> <span class="o">[|</span> <span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">0</span><span class="o">,</span> <span class="mi">0</span> <span class="o">|]</span>
</span><span class='line'><span class="o">;</span> <span class="n">pointers</span> <span class="o">=</span> <span class="nn">Widemap</span><span class="p">.</span><span class="n">of_string</span> <span class="s2">&quot;0100&quot;</span>
</span><span class='line'><span class="o">;</span> <span class="n">free</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>With the representation sorted the rest of the code more or less writes itself.</p>

<p>The only difficulty lies in choosing the width of the bitmaps used. Using smaller bitmaps increases the granularity of the binmap allowing better compression by compacting more nodes. Using larger bitmaps increases the size of the pointers allowing larger bitmaps to be represented. I&#8217;ve written the binmap code to be width-agnostic; it can easily be made into a functor of the bitmap module.</p>

<p>The paper linked below suggests using a layered address scheme to expand the effective pointer size, where the first bit of the pointer is a flag indicating which layer the address is in. I would suggest rather than putting the flag in the pointer it would be simper to use information implicit in the structure of the tree eg is the current layer mod 8 = 0. Either way, this hugely increases the size of the address space at a the cost of a little extra complexity.</p>

<p>The original version is <a href="https://github.com/gritzko/swift/blob/master/doc/binmaps-alenex.pdf">here</a> and my version is <a href="https://github.com/jamii/binmap">here</a>. This is just an experiment so far, I certainly wouldn&#8217;t suggest using it without some serious testing.</p>

<p>Overall I&#8217;m not sure how useful this particular data structure is but this method of compacting tree-like types in ocaml is certainly interesting. I suspect it could be at least partially automated.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Dial-a-stranger]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/07/10/dial-a-stranger/"/>
    <updated>2011-07-10T06:16:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/07/10/dial-a-stranger</id>
    <content type="html"><![CDATA[<p><a href="https://github.com/jamii/dial-a-stranger">This spawnfest entry</a> is inspired by traveling. I love the idea behind sites like chatroulette and omegle but if I had an internet connection I wouldn&#8217;t be bored enough to use them. I want a version I can use entirely over the phone network to while away the hours spent stuck in airports and train stations.</p>

<!--more-->


<p>I&#8217;ve built a quick proof of concept using twilio. Dial +1 (650) 763-8833 and you will be put on hold. As soon as there are two people on hold they will be linked together into a conference call.</p>

<p>This isn&#8217;t a great solution, since you have to sit around and wait for the next person to arrive. Perhaps a better method would be to have users register by SMS and then make outbound calls to both users once a connection is ready.</p>

<p>I also hooked up a chat bot to the SMS api. Eliza is ready and waiting on +1 (650) 763-8782 to listen to your problems and ask vague questions.</p>

<p>I&#8217;m not sure where to go with this. I have a vague idea that it could be turned into a game but the details elude me.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: router]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/04/19/telehash-router/"/>
    <updated>2011-04-19T06:16:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/04/19/telehash-router</id>
    <content type="html"><![CDATA[<p>Now that we have all the necessary datastructures we can build the router itself. Most of the routing table logic is handled by the bit_tree and bucket modules. The router just ties these together and handles I/O.</p>

<!--more-->


<p>Before actually running the routing table the router has to find out its own address, as it is seen from the outside world. It does this by sending +end signals to a list of known telehash nodes (eg telehash.org:42424).</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">record</span><span class="p">(</span><span class="n">bootstrap</span><span class="p">,</span> <span class="p">{</span> <span class="c">% the state of the router when bootstrapping</span>
</span><span class='line'>    <span class="n">timeout</span><span class="p">,</span> <span class="c">% give up if no address received before this time</span>
</span><span class='line'>    <span class="n">addresses</span> <span class="c">% list of addresses contacted to find out our address</span>
</span><span class='line'>   <span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">bootstrap</span><span class="p">(</span><span class="nv">Addresses</span><span class="p">,</span> <span class="nv">Timeout</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">bootstrapping</span><span class="p">]),</span>
</span><span class='line'>    <span class="nv">State</span> <span class="o">=</span> <span class="nl">#bootstrap</span><span class="p">{</span><span class="n">timeout</span><span class="o">=</span><span class="nv">Timeout</span><span class="p">,</span> <span class="n">addresses</span><span class="o">=</span><span class="nv">Addresses</span><span class="p">},</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="p">_</span><span class="nv">Pid</span><span class="p">}</span> <span class="o">=</span> <span class="nn">gen_server</span><span class="p">:</span><span class="n">start_link</span><span class="p">(</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="nv">State</span><span class="p">,</span> <span class="p">[]).</span>
</span><span class='line'>          
</span><span class='line'><span class="nf">init</span><span class="p">(</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">switch</span><span class="p">:</span><span class="n">listen</span><span class="p">(),</span>
</span><span class='line'>    <span class="k">case</span> <span class="nv">State</span> <span class="k">of</span>
</span><span class='line'>  <span class="nl">#bootstrap</span><span class="p">{</span><span class="n">timeout</span><span class="o">=</span><span class="nv">Timeout</span><span class="p">,</span> <span class="n">addresses</span><span class="o">=</span><span class="nv">Addresses</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nv">Telex</span> <span class="o">=</span> <span class="nn">telex</span><span class="p">:</span><span class="n">end_signal</span><span class="p">(</span><span class="nn">util</span><span class="p">:</span><span class="n">random_end</span><span class="p">()),</span>
</span><span class='line'>      <span class="nn">lists</span><span class="p">:</span><span class="n">foreach</span><span class="p">(</span><span class="k">fun</span> <span class="p">(</span><span class="nv">Address</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nn">switch</span><span class="p">:</span><span class="nb">send</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">)</span> <span class="k">end</span><span class="p">,</span> <span class="nv">Addresses</span><span class="p">),</span>
</span><span class='line'>      <span class="nn">erlang</span><span class="p">:</span><span class="nb">send_after</span><span class="p">(</span><span class="nv">Timeout</span><span class="p">,</span> <span class="n">self</span><span class="p">(),</span> <span class="n">giveup</span><span class="p">);</span>
</span><span class='line'>  <span class="nl">#state</span><span class="p">{}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">ok</span>
</span><span class='line'>    <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>Then we listen until we either get a reply with a _to field or run out of time.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">switch</span><span class="p">,</span> <span class="p">{</span><span class="n">recv</span><span class="p">,</span> <span class="nv">From</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">}},</span> <span class="nl">#bootstrap</span><span class="p">{</span><span class="n">addresses</span><span class="o">=</span><span class="nv">Addresses</span><span class="p">}</span><span class="o">=</span><span class="nv">Bootstrap</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="c">% bootstrapping, waiting to receive a message telling us our own address</span>
</span><span class='line'>    <span class="k">case</span> <span class="p">{</span><span class="nn">lists</span><span class="p">:</span><span class="n">member</span><span class="p">(</span><span class="nv">From</span><span class="p">,</span> <span class="nv">Addresses</span><span class="p">),</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">Telex</span><span class="p">,</span> <span class="n">&#39;_to&#39;</span><span class="p">)}</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{</span><span class="n">true</span><span class="p">,</span> <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Binary</span><span class="p">}}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">try</span> <span class="nn">util</span><span class="p">:</span><span class="n">to_end</span><span class="p">(</span><span class="nn">util</span><span class="p">:</span><span class="n">binary_to_address</span><span class="p">(</span><span class="nv">Binary</span><span class="p">))</span> <span class="k">of</span>
</span><span class='line'>      <span class="nv">End</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="nv">Self</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">End</span><span class="p">),</span>
</span><span class='line'>          <span class="nv">Table</span> <span class="o">=</span> <span class="n">touched</span><span class="p">(</span><span class="nv">From</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="n">empty_table</span><span class="p">(</span><span class="nv">Self</span><span class="p">)),</span>
</span><span class='line'>          <span class="nn">dialer</span><span class="p">:</span><span class="n">dial</span><span class="p">(</span><span class="nv">End</span><span class="p">,</span> <span class="p">[</span><span class="nv">From</span><span class="p">],</span> <span class="o">?</span><span class="nv">ROUTER_DIAL_TIMEOUT</span><span class="p">),</span>
</span><span class='line'>          <span class="n">refresh</span><span class="p">(</span><span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">),</span>
</span><span class='line'>          <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">bootstrap</span><span class="p">,</span> <span class="n">finished</span><span class="p">,</span> <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="nv">Binary</span><span class="p">},</span> <span class="p">{</span><span class="n">from</span><span class="p">,</span> <span class="nv">From</span><span class="p">}]),</span>
</span><span class='line'>          <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">self</span><span class="o">=</span><span class="nv">Self</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nn">sets</span><span class="p">:</span><span class="n">new</span><span class="p">(),</span> <span class="n">table</span><span class="o">=</span><span class="nv">Table</span><span class="p">}}</span>
</span><span class='line'>      <span class="k">catch</span>
</span><span class='line'>      <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="o">?</span><span class="nv">WARN</span><span class="p">([</span><span class="n">bootstrap</span><span class="p">,</span> <span class="n">bad_self</span><span class="p">,</span> <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="nv">Binary</span><span class="p">},</span> <span class="p">{</span><span class="n">from</span><span class="p">,</span> <span class="nv">From</span><span class="p">}]),</span>
</span><span class='line'>          <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">Bootstrap</span><span class="p">}</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">Bootstrap</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_info</span><span class="p">(</span><span class="n">giveup</span><span class="p">,</span> <span class="nl">#bootstrap</span><span class="p">{}</span><span class="o">=</span><span class="nv">Bootstrap</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="c">% failed to bootstrap, die</span>
</span><span class='line'>    <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">giveup</span><span class="p">,</span> <span class="p">{</span><span class="n">state</span><span class="p">,</span> <span class="nv">Bootstrap</span><span class="p">}]),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">stop</span><span class="p">,</span> <span class="p">{</span><span class="n">shutdown</span><span class="p">,</span> <span class="n">gaveup</span><span class="p">},</span> <span class="nv">Bootstrap</span><span class="p">};</span>
</span></code></pre></td></tr></table></div></figure>


<p>Once we know our own address we can fill in the state record and start managing the routing table.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">state</span><span class="p">,</span> <span class="p">{</span> <span class="c">% the state of the router in normal operation</span>
</span><span class='line'>    <span class="n">self</span><span class="p">,</span> <span class="c">% the bits of the routers own end</span>
</span><span class='line'>    <span class="n">pinged</span><span class="p">,</span> <span class="c">% set of addresses which have been pinged and not yet replied/timedout</span>
</span><span class='line'>    <span class="n">table</span> <span class="c">% the routing table, a bit_tree containing buckets of nodes</span>
</span><span class='line'>   <span class="p">}).</span>
</span></code></pre></td></tr></table></div></figure>


<p>One of the jobs of the router is to remove unresponsive nodes from the routing table. To check if a node is responsive we just a random +end signal and wait for a reply. If the node is unresponsive it gets marked as stale and we try to find a suitable replacement. The node won&#8217;t actually be dropped from the table until a replacement is found - this prevents the table from getting flushed if our network connection goes down.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">ping</span><span class="p">(</span><span class="nv">To</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Telex</span> <span class="o">=</span> <span class="nn">telex</span><span class="p">:</span><span class="n">end_signal</span><span class="p">(</span><span class="nn">util</span><span class="p">:</span><span class="n">random_end</span><span class="p">()),</span>
</span><span class='line'>    <span class="c">% do this in a message to self to avoid some awkward control flow</span>
</span><span class='line'>    <span class="n">self</span><span class="p">()</span> <span class="o">!</span> <span class="p">{</span><span class="n">pinging</span><span class="p">,</span> <span class="nv">To</span><span class="p">},</span>
</span><span class='line'>    <span class="nn">switch</span><span class="p">:</span><span class="nb">send</span><span class="p">(</span><span class="nv">To</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">),</span>
</span><span class='line'>    <span class="nn">erlang</span><span class="p">:</span><span class="nb">send_after</span><span class="p">(</span><span class="o">?</span><span class="nv">ROUTER_PING_TIMEOUT</span><span class="p">,</span> <span class="n">self</span><span class="p">(),</span> <span class="p">{</span><span class="n">timeout</span><span class="p">,</span> <span class="nv">Address</span><span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">timedout</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">update</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(_</span><span class="nv">Suffix</span><span class="p">,</span> <span class="p">_</span><span class="nv">Depth</span><span class="p">,</span> <span class="p">_</span><span class="nv">Gap</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="k">case</span> <span class="nn">bucket</span><span class="p">:</span><span class="n">timedout</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>        <span class="p">{</span><span class="nb">node</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Update</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="c">% try to touch this node, might be suitable replacement</span>
</span><span class='line'>            <span class="n">ping</span><span class="p">(</span><span class="nv">Node</span><span class="p">),</span>
</span><span class='line'>            <span class="nv">Update</span><span class="p">;</span>
</span><span class='line'>        <span class="nv">Update</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="nv">Update</span>
</span><span class='line'>        <span class="k">end</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">Address</span><span class="p">),</span>
</span><span class='line'>      <span class="nv">Self</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">Table</span>
</span><span class='line'>     <span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">pinging</span><span class="p">,</span> <span class="nv">Address</span><span class="p">},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="c">% do this in a message to self to avoid some awkward control flow</span>
</span><span class='line'>    <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">recording_ping</span><span class="p">,</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span> <span class="nv">Address</span><span class="p">}]),</span>
</span><span class='line'>    <span class="nv">Pinged2</span> <span class="o">=</span> <span class="nn">sets</span><span class="p">:</span><span class="n">add_element</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Pinged</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="nl">#state</span><span class="p">{</span><span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged2</span><span class="p">}};</span>
</span><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">timeout</span><span class="p">,</span> <span class="nv">Address</span><span class="p">},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">self</span><span class="o">=</span><span class="nv">Self</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged</span><span class="p">,</span> <span class="n">table</span><span class="o">=</span><span class="nv">Table</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">lists</span><span class="p">:</span><span class="n">member</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Pinged</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% ping timedout</span>
</span><span class='line'>      <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">timeout</span><span class="p">,</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span> <span class="nv">Address</span><span class="p">}]),</span>
</span><span class='line'>      <span class="nv">Table2</span> <span class="o">=</span> <span class="n">timedout</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">),</span>
</span><span class='line'>      <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="nl">#state</span><span class="p">{</span><span class="n">table</span><span class="o">=</span><span class="nv">Table2</span><span class="p">}};</span>
</span><span class='line'>  <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% address already replied</span>
</span><span class='line'>      <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<p>One of the rules of the router is that it should never pass on information about a node that it hasn&#8217;t personally confirmed to exist. Once we receive a message from a node we know that it exists (later we will implement <em>ring/</em>line to protect against address spoofing):</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">touched</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">update</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="p">_</span><span class="nv">Depth</span><span class="p">,</span> <span class="nv">Gap</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="nv">May_split</span> <span class="o">=</span> <span class="p">(</span><span class="nv">Gap</span> <span class="o">&lt;</span> <span class="o">?</span><span class="nv">K</span><span class="p">),</span> <span class="c">% !!! or (Depth &lt; ?ROUTER_TABLE_EXPANSION)</span>
</span><span class='line'>        <span class="nn">bucket</span><span class="p">:</span><span class="n">touched</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="n">now</span><span class="p">(),</span> <span class="nv">Bucket</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">)</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">Address</span><span class="p">),</span>
</span><span class='line'>      <span class="nv">Self</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">Table</span>
</span><span class='line'>     <span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>


<p>On receiving a .see command we record all the contained addresses as potential nodes and ping them to try to confirm their existence.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">seen</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">update</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="p">_</span><span class="nv">Depth</span><span class="p">,</span> <span class="p">_</span><span class="nv">Gap</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="k">case</span> <span class="nn">bucket</span><span class="p">:</span><span class="n">seen</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="n">now</span><span class="p">(),</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>        <span class="p">{</span><span class="nb">node</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Update</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="c">% check if this node is stale</span>
</span><span class='line'>            <span class="n">ping</span><span class="p">(</span><span class="nv">Node</span><span class="p">),</span>
</span><span class='line'>            <span class="nv">Update</span><span class="p">;</span>
</span><span class='line'>        <span class="nv">Update</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="nv">Update</span>
</span><span class='line'>        <span class="k">end</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">Address</span><span class="p">),</span>
</span><span class='line'>      <span class="nv">Self</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">Table</span>
</span><span class='line'>     <span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>


<p>On receiving a +end signal we reply with a .see command containing the nearest K addresses which we have confirmed to exist.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">see</span><span class="p">(</span><span class="nv">To</span><span class="p">,</span> <span class="nv">End</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Telex</span> <span class="o">=</span> <span class="nn">telex</span><span class="p">:</span><span class="n">see_command</span><span class="p">(</span><span class="n">nearest</span><span class="p">(</span><span class="o">?</span><span class="nv">K</span><span class="p">,</span> <span class="nv">End</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)),</span>
</span><span class='line'>    <span class="nn">switch</span><span class="p">:</span><span class="nb">send</span><span class="p">(</span><span class="nv">To</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">nearest</span><span class="p">(</span><span class="nv">N</span><span class="p">,</span> <span class="nv">End</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="k">when</span> <span class="nv">N</span><span class="o">&gt;=</span><span class="mi">0</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Bits</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">End</span><span class="p">),</span>
</span><span class='line'>    <span class="nn">iter</span><span class="p">:</span><span class="n">take</span><span class="p">(</span>
</span><span class='line'>      <span class="nv">N</span><span class="p">,</span>
</span><span class='line'>      <span class="nn">iter</span><span class="p">:</span><span class="n">flatten</span><span class="p">(</span>
</span><span class='line'>  <span class="nn">iter</span><span class="p">:</span><span class="n">map</span><span class="p">(</span>
</span><span class='line'>    <span class="k">fun</span> <span class="p">({_</span><span class="nv">Prefix</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">})</span> <span class="o">-&gt;</span> <span class="nn">bucket</span><span class="p">:</span><span class="n">by_dist</span><span class="p">(</span><span class="nv">End</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)))).</span>
</span></code></pre></td></tr></table></div></figure>


<p>On receiving a message we have handle the above three cases, which gets a little ugly.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">switch</span><span class="p">,</span> <span class="p">{</span><span class="n">recv</span><span class="p">,</span> <span class="nv">From</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">}},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">self</span><span class="o">=</span><span class="nv">Self</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged</span><span class="p">,</span> <span class="n">table</span><span class="o">=</span><span class="nv">Table</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="c">% this counts as a reply</span>
</span><span class='line'>    <span class="nv">Pinged2</span> <span class="o">=</span> <span class="nn">sets</span><span class="p">:</span><span class="n">del_element</span><span class="p">(</span><span class="nv">From</span><span class="p">,</span> <span class="nv">Pinged</span><span class="p">),</span>
</span><span class='line'>    <span class="c">% touched the sender</span>
</span><span class='line'>    <span class="c">% !!! eventually will check _line here</span>
</span><span class='line'>    <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">touched</span><span class="p">,</span> <span class="p">{</span><span class="nb">node</span><span class="p">,</span> <span class="nv">From</span><span class="p">}]),</span>
</span><span class='line'>    <span class="nv">Table2</span> <span class="o">=</span> <span class="n">touched</span><span class="p">(</span><span class="nv">From</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">),</span>
</span><span class='line'>    <span class="c">% maybe seen some nodes</span>
</span><span class='line'>    <span class="nv">Table3</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">Telex</span><span class="p">,</span> <span class="n">&#39;.see&#39;</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>      <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Binaries</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">try</span> <span class="p">[</span><span class="nn">util</span><span class="p">:</span><span class="n">binary_to_address</span><span class="p">(</span><span class="nv">Bin</span><span class="p">)</span> <span class="p">||</span> <span class="nv">Bin</span> <span class="o">&lt;-</span> <span class="nv">Binaries</span><span class="p">]</span> <span class="k">of</span>
</span><span class='line'>          <span class="nv">Addresses</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">seen</span><span class="p">,</span> <span class="p">{</span><span class="nb">nodes</span><span class="p">,</span> <span class="nv">Addresses</span><span class="p">},</span> <span class="p">{</span><span class="n">from</span><span class="p">,</span> <span class="nv">From</span><span class="p">}]),</span>
</span><span class='line'>          <span class="nn">lists</span><span class="p">:</span><span class="n">foldl</span><span class="p">(</span><span class="k">fun</span> <span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Table_acc</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">seen</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table_acc</span><span class="p">)</span> <span class="k">end</span><span class="p">,</span> <span class="nv">Table2</span><span class="p">,</span> <span class="nv">Addresses</span><span class="p">)</span>
</span><span class='line'>      <span class="k">catch</span>
</span><span class='line'>          <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">bad_seen</span><span class="p">,</span> <span class="p">{</span><span class="nb">nodes</span><span class="p">,</span> <span class="nv">Binaries</span><span class="p">},</span> <span class="p">{</span><span class="n">from</span><span class="p">,</span> <span class="nv">From</span><span class="p">}]),</span>
</span><span class='line'>          <span class="nv">Table2</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>      <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nv">Table2</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="c">% maybe send some nodes back</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">Telex</span><span class="p">,</span> <span class="n">&#39;+end&#39;</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Hex</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">try</span> <span class="nn">util</span><span class="p">:</span><span class="n">hex_to_end</span><span class="p">(</span><span class="nv">Hex</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>      <span class="nv">End</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">see</span><span class="p">,</span> <span class="p">{</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="nv">End</span><span class="p">},</span> <span class="p">{</span><span class="n">from</span><span class="p">,</span> <span class="nv">From</span><span class="p">}]),</span>
</span><span class='line'>          <span class="n">see</span><span class="p">(</span><span class="nv">From</span><span class="p">,</span> <span class="nv">End</span><span class="p">,</span> <span class="nv">Table3</span><span class="p">)</span>
</span><span class='line'>      <span class="k">catch</span>
</span><span class='line'>      <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="o">?</span><span class="nv">WARN</span><span class="p">([</span><span class="n">bad_see</span><span class="p">,</span> <span class="p">{</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="nv">Hex</span><span class="p">},</span> <span class="p">{</span><span class="n">from</span><span class="p">,</span> <span class="nv">From</span><span class="p">}])</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">ok</span>
</span><span class='line'>    <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="nl">#state</span><span class="p">{</span><span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged2</span><span class="p">,</span> <span class="n">table</span><span class="o">=</span><span class="nv">Table2</span><span class="p">}};</span>
</span></code></pre></td></tr></table></div></figure>


<p>The last responsibility of the router is to periodically refresh buckets which haven&#8217;t recently seen any activity.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">handle_info</span><span class="p">(</span><span class="n">refresh</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">self</span><span class="o">=</span><span class="nv">Self</span><span class="p">,</span> <span class="n">table</span><span class="o">=</span><span class="nv">Table</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">refreshing_table</span><span class="p">]),</span>
</span><span class='line'>    <span class="n">refresh</span><span class="p">(</span><span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="p">};</span>
</span><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">dialed</span><span class="p">,</span> <span class="p">_,</span> <span class="p">_},</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="c">% response from a bucket refresh, we don&#39;t care</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="p">};</span>
</span><span class='line'>
</span><span class='line'><span class="nf">dialed</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">update</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(_</span><span class="nv">Suffix</span><span class="p">,</span> <span class="p">_</span><span class="nv">Depth</span><span class="p">,</span> <span class="p">_</span><span class="nv">Gap</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="nn">bucket</span><span class="p">:</span><span class="n">dialed</span><span class="p">(</span><span class="n">now</span><span class="p">(),</span> <span class="nv">Bucket</span><span class="p">)</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">Address</span><span class="p">),</span>
</span><span class='line'>      <span class="nv">Self</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">Table</span>
</span><span class='line'>     <span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">needs_refresh</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">,</span> <span class="nv">Now</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">bucket</span><span class="p">:</span><span class="n">last_dialed</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="n">never</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">true</span><span class="p">;</span>
</span><span class='line'>  <span class="nv">Last</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">(</span><span class="nn">timer</span><span class="p">:</span><span class="n">now_diff</span><span class="p">(</span><span class="nv">Now</span><span class="p">,</span> <span class="nv">Last</span><span class="p">)</span> <span class="ow">div</span> <span class="mi">1000</span><span class="p">)</span> <span class="o">&lt;</span> <span class="o">?</span><span class="nv">ROUTER_REFRESH_TIME</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">refresh</span><span class="p">(</span><span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Now</span> <span class="o">=</span> <span class="n">now</span><span class="p">(),</span>
</span><span class='line'>    <span class="nn">iter</span><span class="p">:</span><span class="n">foreach</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">({</span><span class="nv">Prefix</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="k">case</span> <span class="n">needs_refresh</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">,</span> <span class="nv">Now</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>        <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="o">?</span><span class="nv">INFO</span><span class="p">([</span><span class="n">refreshing_bucket</span><span class="p">,</span> <span class="p">{</span><span class="n">prefix</span><span class="p">,</span> <span class="nv">Prefix</span><span class="p">},</span> <span class="p">{</span><span class="n">bucket</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">}]),</span>
</span><span class='line'>            <span class="nv">To</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">random_end</span><span class="p">(</span><span class="nv">Prefix</span><span class="p">),</span>
</span><span class='line'>            <span class="nv">From</span> <span class="o">=</span> <span class="n">nearest</span><span class="p">(</span><span class="o">?</span><span class="nv">K</span><span class="p">,</span> <span class="nv">To</span><span class="p">,</span> <span class="nv">Table</span><span class="p">),</span>
</span><span class='line'>            <span class="nn">dialer</span><span class="p">:</span><span class="n">dial</span><span class="p">(</span><span class="nv">To</span><span class="p">,</span> <span class="nv">From</span><span class="p">,</span> <span class="o">?</span><span class="nv">ROUTER_DIAL_TIMEOUT</span><span class="p">);</span>
</span><span class='line'>        <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="n">ok</span>
</span><span class='line'>        <span class="k">end</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nn">bit_tree</span><span class="p">:</span><span class="n">iter</span><span class="p">(</span><span class="nv">Self</span><span class="p">,</span> <span class="nv">Table</span><span class="p">)</span>
</span><span class='line'>     <span class="p">),</span>
</span><span class='line'>    <span class="nn">erlang</span><span class="p">:</span><span class="nb">send_after</span><span class="p">(</span><span class="o">?</span><span class="nv">ROUTER_REFRESH_TIME</span><span class="p">,</span> <span class="n">self</span><span class="p">(),</span> <span class="n">refresh</span><span class="p">),</span>
</span><span class='line'>    <span class="n">ok</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>That&#8217;s it. As usual the (untested) code is in the <a href="https://github.com/jamii/erl-telehash">repo</a>. The next post will probably deal with taps.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: gen_event woes]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/04/19/telehash-gen-event-woes/"/>
    <updated>2011-04-19T06:16:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/04/19/telehash-gen-event-woes</id>
    <content type="html"><![CDATA[<p>I ran into some tricky bugs caused by a misconception I had about gen_event. Since this is not explicitly stated in the gen_event documentation I will say it here: gen_event does NOT spawn individual processes for each handler. Each handler is run sequentially in the event manager process.</p>

<!--more-->


<p>Now obviously the documentation is not at fault here. I assumed that each handler got its own process solely because the callbacks resembled gen_server. However, a little googling reveals that several other people made the same mistake so I thought it was worth mentioning.</p>

<p>Here is how I found this out. I was working on the router implementation for telehash. When I tested the bootstrapping algorithm everything looked fine until the first dial, after which nothing else happened. Straight away I suspected a bug in the dialer, but repeating the exact same call in the console worked fine. After a few deadends I opened pman to look for anything suspicious but couldn&#8217;t find the dialer process (because it doesn&#8217;t exist, it&#8217;s an event handler). I assumed that it was somehow crashing silently and wasted an hour or so reading and rereading the code and stepping through various calls in the debugger. No matter what I tried the dialer worked absolutely perfectly unless it was called by the router.</p>

<p>Eventually I noticed that the switch_event process was blocking inside a receive and the whole thing unravelled. The dialer is an event handler so when started it calls:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'>  <span class="nn">gen_event</span><span class="p">:</span><span class="n">add_handler</span><span class="p">(</span><span class="n">switch_event</span><span class="p">,</span> <span class="n">dialer</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>which is a synchronous call to the switch_event process. The router event handler is running inside the switch_event process so when the router tries to dial it deadlocks.</p>

<p>The moral of this story is <em>RTFM</em>.</p>

<p>This is easily fixed by changing</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'>  <span class="nn">dialer</span><span class="p">:</span><span class="n">dial</span><span class="p">(</span><span class="nv">End</span><span class="p">,</span> <span class="p">[</span><span class="nv">Address</span><span class="p">])</span>
</span></code></pre></td></tr></table></div></figure>


<p>to</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'>  <span class="nb">spawn</span><span class="p">(</span><span class="k">fun</span> <span class="p">()</span> <span class="o">-&gt;</span> <span class="nn">dialer</span><span class="p">:</span><span class="n">dial</span><span class="p">(</span><span class="nv">End</span><span class="p">,</span> <span class="p">[</span><span class="nv">Address</span><span class="p">])</span> <span class="k">end</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>but there were more problems. Most of the event handlers used erlang:send_after to handle timeouts but since they all run in the same process they all receive each others timeouts. Also, every event handler is run sequentially so the switch_event process becomes a huge bottleneck.</p>

<p>The solution I settled on was to change each event handler into a gen_server and write a simple event handler that just forwards events to its owner. By using gen_event:add_sup_handler and listening for event handler exits we can keep the two in sync.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: buckets]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/03/30/telehash-buckets/"/>
    <updated>2011-03-30T06:16:00+01:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/03/30/telehash-buckets</id>
    <content type="html"><![CDATA[<p>The other half of the routing table is the buckets which store node addresses.</p>

<!--more-->


<p>Usual disclaimer: none of this is properly tested yet.</p>

<p>The Kademlia paper has much to say on the issue of routing, most of it contradictory. My takeaway from many readings and from browsing the source code of various different implementations is that the following points are the most important:</p>

<ul>
<li>each bucket should contain at most <em>K</em> nodes</li>
<li>we should only ever report node addresses which we have personally confirmed exist</li>
<li>responsive nodes should never be removed from buckets</li>
<li>nodes should never be removed from buckets unless a suitable replacement exists</li>
</ul>


<p>The first three points make the routing table very resistant to flooding and spoofing. In particular, they prevent a common attack for p2p networks where some bad guy floods the routing tables of all the other nodes so that all traffic is routed through nodes controlled by the bad guy. The last point prevents nodes from flushing their routing tables if their own network connection goes down.</p>

<p>I think the implementation I have come up with is fairly clean, if a little lengthy. Like the bit_tree I want the bucket to be completely pure. All side effects will be handled by the router itself. The main data structures are explained pretty well by the comments:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">K</span><span class="p">,</span> <span class="o">?</span><span class="nv">DIAL_DEPTH</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">node</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="n">address</span><span class="p">,</span> <span class="c">% node #address{} record</span>
</span><span class='line'>    <span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="c">% node end</span>
</span><span class='line'>    <span class="n">suffix</span><span class="p">,</span> <span class="c">% the remaining bits of the nodes end left over from the bit_tree</span>
</span><span class='line'>    <span class="n">status</span><span class="p">,</span> <span class="c">% one of [live, stale, cache]</span>
</span><span class='line'>    <span class="n">last_seen</span> <span class="c">% for live/stale nodes, the time of the last received message. for cache nodes the time of the last .see reference to the node</span>
</span><span class='line'>   <span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">bucket</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="nb">nodes</span><span class="p">,</span> <span class="c">% gb_tree mapping addresses to {Status, Last_seen}</span>
</span><span class='line'>    <span class="c">% remaining fields are pq&#39;s of nodes sorted by their last_seen field</span>
</span><span class='line'>    <span class="n">live</span><span class="p">,</span> <span class="c">% nodes currently expected to be alive</span>
</span><span class='line'>    <span class="n">stale</span><span class="p">,</span> <span class="c">% nodes which have not replied recently</span>
</span><span class='line'>    <span class="n">cache</span> <span class="c">% potential nodes which we have not yet verified </span>
</span><span class='line'>   <span class="p">}).</span> <span class="c">% invariant: pq_maps:size(live) + pq_maps:size(stale) &lt;= ?K</span>
</span></code></pre></td></tr></table></div></figure>


<p>The bucket is a two-stage data structure. This allows us the keep nodes of different statuses sorted by the last_seen time but still be able to get/delete nodes just knowing the address. The <em>get_node</em> function should make it clear how this works:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">get_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span>
</span><span class='line'>   <span class="nl">#bucket</span><span class="p">{</span><span class="nb">nodes</span><span class="o">=</span><span class="nv">Nodes</span><span class="p">,</span> <span class="n">live</span><span class="o">=</span><span class="nv">Live</span><span class="p">,</span> <span class="n">stale</span><span class="o">=</span><span class="nv">Stale</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="nv">Cache</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">gb_trees</span><span class="p">:</span><span class="n">lookup</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Nodes</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{</span><span class="n">value</span><span class="p">,</span> <span class="p">{</span><span class="nv">Status</span><span class="p">,</span> <span class="nv">Last_seen</span><span class="p">}}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">case</span> <span class="nv">Status</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">live</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="nb">get</span><span class="p">({</span><span class="nv">Last_seen</span><span class="p">,</span> <span class="nv">Address</span><span class="p">},</span> <span class="nv">Live</span><span class="p">)};</span>
</span><span class='line'>      <span class="n">stale</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="nb">get</span><span class="p">({</span><span class="nv">Last_seen</span><span class="p">,</span> <span class="nv">Address</span><span class="p">},</span> <span class="nv">Stale</span><span class="p">)};</span>
</span><span class='line'>      <span class="n">cache</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="nb">get</span><span class="p">({</span><span class="nv">Last_seen</span><span class="p">,</span> <span class="nv">Address</span><span class="p">},</span> <span class="nv">Cache</span><span class="p">)}</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="n">none</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">none</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is only long because records are purely a compile time structure ie we can&#8217;t write <em>Bucket#bucket.Status</em> so we have to pattern match on <em>Status</em> instead. We also define <em>add_node/2</em>, <em>del_node/2</em> and <em>update_node/2</em>, which look pretty similar, as well as <em>to_list/1</em>, <em>from_list/1</em> and <em>sizes/1</em>.</p>

<p>The router is going to react to various events by calling the appropriate bucket functions and possibly sending out messages based on the result. The first event it has to handle is a node becoming unresponsive. The bucket will mark this node as stale and return a cache node which the router can attempt to verify.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% this address failed to reply in a timely manner</span>
</span><span class='line'><span class="nf">timedout</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">timing_out</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>    <span class="k">case</span> <span class="n">get_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">case</span> <span class="nv">Node</span><span class="nl">#node.status</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">live</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% mark as stale, return a cache node that might be a suitable replacement</span>
</span><span class='line'>          <span class="nv">Bucket2</span> <span class="o">=</span> <span class="n">update_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">status</span><span class="o">=</span><span class="n">stale</span><span class="p">},</span> <span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>          <span class="n">pop_cache_hi</span><span class="p">(</span><span class="nv">Bucket2</span><span class="p">);</span>
</span><span class='line'>      <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% if cache or stale already we don&#39;t care </span>
</span><span class='line'>          <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="n">none</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% wtf? we don&#39;t even know this node?</span>
</span><span class='line'>      <span class="c">% one way this could happen: </span>
</span><span class='line'>      <span class="c">% send N1, sendN1, timedout N1, add N2 (pushing N1 out of stale), timedout N1 </span>
</span><span class='line'>      <span class="nn">log</span><span class="p">:</span><span class="n">warning</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">unknown_node_timedout</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>      <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% return most recently seen cache node, if any exist</span>
</span><span class='line'><span class="nf">pop_cache_hi</span><span class="p">(</span><span class="nl">#bucket</span><span class="p">{</span><span class="n">cache</span><span class="o">=</span><span class="nv">Cache</span><span class="p">}</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="n">pop_hi</span><span class="p">(</span><span class="nv">Cache</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{_</span><span class="nv">Key</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Cache2</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">{</span><span class="nb">node</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket</span><span class="nl">#bucket</span><span class="p">{</span><span class="n">cache</span><span class="o">=</span><span class="nv">Cache2</span><span class="p">})};</span>
</span><span class='line'>  <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>The next event is receiving a <em>.see</em> command. This may be as a result of a <em>+end</em> sent by the router but is more likely to be part of a dialing process happening elsewhere. The beauty of Kademlia is that the router can populate the routing table just by listening in on dialing attempts.</p>

<p>For each node listed in the <em>.see</em> command the router will call <em>seen</em>. This adds the node to the cache and returns the least recently seen live node so the router can check that it is still responsive.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% this address has been reported to exist by another node</span>
</span><span class='line'><span class="nf">seen</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">seeing</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>    <span class="k">case</span> <span class="n">get_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">case</span> <span class="nv">Node</span><span class="nl">#node.status</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">cache</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% for cache nodes being in a .see is good enough</span>
</span><span class='line'>          <span class="n">ok</span><span class="p">(</span><span class="n">update_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">last_seen</span><span class="o">=</span><span class="nv">Time</span><span class="p">},</span> <span class="nv">Bucket</span><span class="p">));</span>
</span><span class='line'>      <span class="p">_</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% for live/stale nodes we require direct contact so ignore this</span>
</span><span class='line'>          <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="n">none</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% put node in cache, return a live node to ping</span>
</span><span class='line'>      <span class="nv">Node</span> <span class="o">=</span> <span class="nl">#node</span><span class="p">{</span>
</span><span class='line'>        <span class="n">address</span> <span class="o">=</span> <span class="nv">Address</span><span class="p">,</span>
</span><span class='line'>        <span class="n">&#39;end&#39;</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">to_end</span><span class="p">(</span><span class="nv">Address</span><span class="p">),</span>
</span><span class='line'>        <span class="n">suffix</span> <span class="o">=</span> <span class="nv">Suffix</span><span class="p">,</span>
</span><span class='line'>        <span class="n">status</span> <span class="o">=</span> <span class="n">cache</span><span class="p">,</span>
</span><span class='line'>        <span class="n">last_seen</span> <span class="o">=</span> <span class="nv">Time</span>
</span><span class='line'>       <span class="p">},</span>
</span><span class='line'>      <span class="nv">Bucket2</span> <span class="o">=</span> <span class="n">add_node</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>      <span class="k">case</span> <span class="n">peek_live_lo</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">none</span> <span class="o">-&gt;</span> <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket2</span><span class="p">);</span>
</span><span class='line'>      <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Live_node</span><span class="p">}</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="nb">node</span><span class="p">,</span> <span class="nv">Live_node</span><span class="p">,</span> <span class="n">ok</span><span class="p">(</span><span class="nv">Bucket2</span><span class="p">)}</span>
</span><span class='line'>      <span class="k">end</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% return the oldest live node</span>
</span><span class='line'><span class="nf">peek_live_lo</span><span class="p">(</span><span class="nl">#bucket</span><span class="p">{</span><span class="n">live</span><span class="o">=</span><span class="nv">Live</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="n">peek_lo</span><span class="p">(</span><span class="nv">Live</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="n">none</span> <span class="o">-&gt;</span> <span class="n">none</span><span class="p">;</span>
</span><span class='line'>  <span class="p">{_,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>Any time we receive a message we learn that the node sending it exists (or not - we&#8217;ll deal with address spoofing in a later post) so we can potentially mark it as a live node. The <em>touched</em> function checks if the node is already in the bucket or if it needs to be added.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% this address has been verified as actually existing</span>
</span><span class='line'><span class="nf">touched</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">touching</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>    <span class="k">case</span> <span class="n">get_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>  <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">case</span> <span class="nv">Node</span><span class="nl">#node.status</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">live</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% update last_seen time</span>
</span><span class='line'>          <span class="n">ok</span><span class="p">(</span><span class="n">update_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">last_seen</span><span class="o">=</span><span class="nv">Time</span><span class="p">},</span> <span class="nv">Bucket</span><span class="p">));</span>
</span><span class='line'>      <span class="n">stale</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% update last_seen time and promote to live</span>
</span><span class='line'>          <span class="n">ok</span><span class="p">(</span><span class="n">update_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">last_seen</span><span class="o">=</span><span class="nv">Time</span><span class="p">,</span> <span class="n">status</span><span class="o">=</span><span class="n">live</span><span class="p">},</span> <span class="nv">Bucket</span><span class="p">));</span>
</span><span class='line'>      <span class="n">cache</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="c">% potentially promote the node to live</span>
</span><span class='line'>          <span class="nv">Bucket2</span> <span class="o">=</span> <span class="n">del_node</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>          <span class="n">new_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">Bucket2</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">)</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="n">none</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% potentially add the node to live</span>
</span><span class='line'>      <span class="n">new_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">)</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>If the node needs to be added then <em>touched</em> calls <em>new_node</em> which decides if there is space in the bucket and, if so, adds the new node. If the bucket is full and <em>May_split</em> is true then <em>new_node</em> will split the bucket before adding the new node. Deciding whether or not splitting is allowed is the routers job.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% assumes Address is not already in Bucket, otherwise crashes</span>
</span><span class='line'><span class="nf">new_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Node</span> <span class="o">=</span> <span class="nl">#node</span><span class="p">{</span>
</span><span class='line'>      <span class="n">address</span> <span class="o">=</span> <span class="nv">Address</span><span class="p">,</span>
</span><span class='line'>      <span class="n">&#39;end&#39;</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">to_end</span><span class="p">(</span><span class="nv">Address</span><span class="p">),</span>
</span><span class='line'>      <span class="n">suffix</span> <span class="o">=</span> <span class="nv">Suffix</span><span class="p">,</span>
</span><span class='line'>      <span class="n">status</span> <span class="o">=</span> <span class="n">undefined</span><span class="p">,</span>
</span><span class='line'>      <span class="n">last_seen</span> <span class="o">=</span> <span class="nv">Time</span>
</span><span class='line'>     <span class="p">},</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Lives</span><span class="p">,</span> <span class="nv">Stales</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="n">sizes</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>    <span class="k">if</span>
</span><span class='line'>  <span class="nv">Lives</span> <span class="o">+</span> <span class="nv">Stales</span> <span class="o">&lt;</span> <span class="o">?</span><span class="nv">K</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% space left in live</span>
</span><span class='line'>      <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">adding</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>      <span class="n">ok</span><span class="p">(</span><span class="n">add_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">status</span><span class="o">=</span><span class="n">live</span><span class="p">},</span> <span class="nv">Bucket</span><span class="p">));</span>
</span><span class='line'>  <span class="p">(</span><span class="nv">Lives</span> <span class="o">&lt;</span> <span class="o">?</span><span class="nv">K</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="nv">Stales</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% space left in live if we push something out of stale</span>
</span><span class='line'>      <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">adding</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>      <span class="nv">Bucket2</span> <span class="o">=</span> <span class="n">drop_stale</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>      <span class="n">ok</span><span class="p">(</span><span class="n">add_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">status</span><span class="o">=</span><span class="n">live</span><span class="p">},</span> <span class="nv">Bucket2</span><span class="p">));</span>
</span><span class='line'>  <span class="nv">May_split</span> <span class="ow">and</span> <span class="p">(</span><span class="nv">Suffix</span> <span class="o">/=</span> <span class="p">[])</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% allowed to split the bucket to make space</span>
</span><span class='line'>      <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">splitting</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>      <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="nv">BucketF</span><span class="p">,</span> <span class="nv">BucketT</span><span class="p">}</span> <span class="o">=</span> <span class="n">split</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>      <span class="p">[</span><span class="nv">Bit</span> <span class="p">|</span> <span class="nv">Suffix2</span><span class="p">]</span> <span class="o">=</span> <span class="nv">Suffix</span><span class="p">,</span>
</span><span class='line'>      <span class="k">case</span> <span class="nv">Bit</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="nv">BucketF2</span> <span class="o">=</span> <span class="n">new_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix2</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">BucketF</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">),</span>
</span><span class='line'>          <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="nv">BucketF2</span><span class="p">,</span> <span class="nv">BucketT</span><span class="p">};</span>
</span><span class='line'>      <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>          <span class="nv">BucketT2</span> <span class="o">=</span> <span class="n">new_node</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Suffix2</span><span class="p">,</span> <span class="nv">Time</span><span class="p">,</span> <span class="nv">BucketT</span><span class="p">,</span> <span class="nv">May_split</span><span class="p">),</span>
</span><span class='line'>          <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="nv">BucketF</span><span class="p">,</span> <span class="nv">BucketT2</span><span class="p">}</span>
</span><span class='line'>      <span class="k">end</span><span class="p">;</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% not allowed to split, will have to go in the cache</span>
</span><span class='line'>      <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">caching</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">]),</span>
</span><span class='line'>      <span class="n">ok</span><span class="p">(</span><span class="n">add_node</span><span class="p">(</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">status</span><span class="o">=</span><span class="n">cache</span><span class="p">},</span> <span class="n">bucket</span><span class="p">))</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% drop the oldest stale node, crashes if none exist</span>
</span><span class='line'><span class="nf">drop_stale</span><span class="p">(</span><span class="nl">#bucket</span><span class="p">{</span><span class="n">stale</span><span class="o">=</span><span class="nv">Stale</span><span class="p">}</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{_</span><span class="nv">Key</span><span class="p">,</span> <span class="p">_</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Stale2</span><span class="p">}</span> <span class="o">=</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="n">pop_one_hi</span><span class="p">(</span><span class="nv">Stale</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Bucket</span><span class="nl">#bucket</span><span class="p">{</span><span class="n">stale</span><span class="o">=</span><span class="nv">Stale2</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">split</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Nodes</span> <span class="o">=</span> <span class="n">to_list</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">NodesF</span> <span class="o">=</span> <span class="p">[</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">suffix</span><span class="o">=</span><span class="nv">Suffix2</span><span class="p">}</span> <span class="p">||</span> <span class="nl">#node</span><span class="p">{</span><span class="n">suffix</span><span class="o">=</span><span class="p">[</span><span class="n">false</span><span class="p">|</span><span class="nv">Suffix2</span><span class="p">]}</span><span class="o">=</span><span class="nv">Node</span> <span class="o">&lt;-</span> <span class="nv">Nodes</span><span class="p">],</span>
</span><span class='line'>    <span class="nv">NodesT</span> <span class="o">=</span> <span class="p">[</span><span class="nv">Node</span><span class="nl">#node</span><span class="p">{</span><span class="n">suffix</span><span class="o">=</span><span class="nv">Suffix2</span><span class="p">}</span> <span class="p">||</span> <span class="nl">#node</span><span class="p">{</span><span class="n">suffix</span><span class="o">=</span><span class="p">[</span><span class="n">true</span><span class="p">|</span><span class="nv">Suffix2</span><span class="p">]}</span><span class="o">=</span><span class="nv">Node</span> <span class="o">&lt;-</span> <span class="nv">Nodes</span><span class="p">],</span>
</span><span class='line'>    <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="n">from_list</span><span class="p">(</span><span class="nv">NodesF</span><span class="p">),</span> <span class="n">from_list</span><span class="p">(</span><span class="nv">NodesT</span><span class="p">)}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>Finally, upon receiving a <em>+end</em> signal the router needs to reply with a <em>.see</em> command listing the <em>K</em> nearest nodes to the specified end. This will be done using a combination of <em>bit_tree:iter</em> and <em>bucket:nearest</em>.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">nearest</span><span class="p">(</span><span class="nv">N</span><span class="p">,</span> <span class="nv">End</span><span class="p">,</span> <span class="nl">#bucket</span><span class="p">{</span><span class="n">live</span><span class="o">=</span><span class="nv">Live</span><span class="p">,</span> <span class="n">stale</span><span class="o">=</span><span class="nv">Stale</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Nodes</span> <span class="o">=</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="n">to_list</span><span class="p">(</span><span class="nv">Live</span><span class="p">)</span> <span class="o">++</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="n">to_list</span><span class="p">(</span><span class="nv">Stale</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Num_nodes</span> <span class="o">=</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="nb">size</span><span class="p">(</span><span class="nv">Live</span><span class="p">)</span> <span class="o">+</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="nb">size</span><span class="p">(</span><span class="nv">Stale</span><span class="p">),</span>
</span><span class='line'>    <span class="k">if</span>
</span><span class='line'>  <span class="nv">Num_nodes</span> <span class="o">=&lt;</span> <span class="nv">N</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">[</span><span class="nv">Node</span><span class="nl">#node.address</span> <span class="p">||</span> <span class="p">{_</span><span class="nv">Key</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Nodes</span><span class="p">];</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% !!! maybe should prefer to return live nodes even if further away</span>
</span><span class='line'>      <span class="nv">Nodes_by_dist</span> <span class="o">=</span> <span class="p">[{</span><span class="nn">util</span><span class="p">:</span><span class="n">distance</span><span class="p">(</span><span class="nv">End</span><span class="p">,</span> <span class="nv">Node</span><span class="nl">#node.&#39;end&#39;</span><span class="p">),</span> <span class="nv">Node</span><span class="p">}</span> <span class="p">||</span> <span class="p">{_</span><span class="nv">Key</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nn">pq_maps</span><span class="p">:</span><span class="n">to_list</span><span class="p">(</span><span class="nv">Live</span><span class="p">)],</span>
</span><span class='line'>      <span class="p">{</span><span class="nv">Closest</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="nn">lists</span><span class="p">:</span><span class="n">split</span><span class="p">(</span><span class="nv">N</span><span class="p">,</span> <span class="nn">lists</span><span class="p">:</span><span class="n">sort</span><span class="p">(</span><span class="nv">Nodes_by_dist</span><span class="p">)),</span>
</span><span class='line'>      <span class="p">[</span><span class="nv">Node</span><span class="nl">#node.address</span> <span class="p">||</span> <span class="p">{_</span><span class="nv">Dist</span><span class="p">,</span> <span class="nv">Node</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Closest</span><span class="p">]</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>As usual all the code is sitting in the <a href="https://github.com/jamii/erl-telehash">repo</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: bit_trees revisited]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/03/24/telehash-bittrees-revisited/"/>
    <updated>2011-03-24T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/03/24/telehash-bittrees-revisited</id>
    <content type="html"><![CDATA[<p>It has been suggested that the bit_trees presented in my last post are overly complicated. Indeed, in the cold light of the morning there is absolutely no need for that zipper. Without further ado, here is the much simpler version.</p>

<!--more-->




<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
<span class='line-number'>57</span>
<span class='line-number'>58</span>
<span class='line-number'>59</span>
<span class='line-number'>60</span>
<span class='line-number'>61</span>
<span class='line-number'>62</span>
<span class='line-number'>63</span>
<span class='line-number'>64</span>
<span class='line-number'>65</span>
<span class='line-number'>66</span>
<span class='line-number'>67</span>
<span class='line-number'>68</span>
<span class='line-number'>69</span>
<span class='line-number'>70</span>
<span class='line-number'>71</span>
<span class='line-number'>72</span>
<span class='line-number'>73</span>
<span class='line-number'>74</span>
<span class='line-number'>75</span>
<span class='line-number'>76</span>
<span class='line-number'>77</span>
<span class='line-number'>78</span>
<span class='line-number'>79</span>
<span class='line-number'>80</span>
<span class='line-number'>81</span>
<span class='line-number'>82</span>
<span class='line-number'>83</span>
<span class='line-number'>84</span>
<span class='line-number'>85</span>
<span class='line-number'>86</span>
<span class='line-number'>87</span>
<span class='line-number'>88</span>
<span class='line-number'>89</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% implements the tree part of kademlias k-buckets</span>
</span><span class='line'><span class="c">% a bit_tree maps ends (lists of bits) to buckets</span>
</span><span class='line'><span class="c">% as far as the bit_tree is concerned the buckets are completely opaque</span>
</span><span class='line'><span class="c">% the bit_tree also calculates various numbers needed for splitting decisions</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">module</span><span class="p">(</span><span class="n">bit_tree</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">include</span><span class="p">(</span><span class="s">&quot;conf.hrl&quot;</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">empty</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">update</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span> <span class="n">iter</span><span class="o">/</span><span class="mi">2</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% a bit_tree is either a leaf or a branch</span>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">leaf</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="nb">size</span><span class="p">,</span> <span class="c">% size of bucket</span>
</span><span class='line'>    <span class="n">bucket</span> <span class="c">% some opaque bucket of stuff</span>
</span><span class='line'>   <span class="p">}).</span>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">branch</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="nb">size</span><span class="p">,</span> <span class="c">% size(childF) + size(childT)</span>
</span><span class='line'>    <span class="n">childF</span><span class="p">,</span> <span class="c">% tree containing nodes whose next bit is false</span>
</span><span class='line'>    <span class="n">childT</span> <span class="c">% tree containing nodes whose next bit is true</span>
</span><span class='line'>   <span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- api ---</span>
</span><span class='line'>
</span><span class='line'><span class="nf">empty</span><span class="p">(</span><span class="nv">Size</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nl">#leaf</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">,</span> <span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">}.</span>
</span><span class='line'>      
</span><span class='line'><span class="nf">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="k">when</span> <span class="nb">is_function</span><span class="p">(</span><span class="nv">Fun</span><span class="p">),</span> <span class="nb">is_list</span><span class="p">(</span><span class="nv">Bits</span><span class="p">),</span> <span class="nb">is_list</span><span class="p">(</span><span class="nv">Self</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="nv">Self</span><span class="p">},</span> <span class="mi">0</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="nv">Gap</span><span class="p">,</span> <span class="nv">Depth</span><span class="p">,</span> <span class="nl">#leaf</span><span class="p">{</span><span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Gap_size</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nv">Gap</span> <span class="k">of</span>
</span><span class='line'>      <span class="p">{</span><span class="n">gap</span><span class="p">,</span> <span class="nv">G</span><span class="p">}</span> <span class="o">-&gt;</span> <span class="nv">G</span><span class="p">;</span>
</span><span class='line'>      <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="p">_}</span> <span class="o">-&gt;</span> <span class="mi">0</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="n">bucket_update_to_tree</span><span class="p">(</span><span class="nv">Fun</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Depth</span><span class="p">,</span> <span class="nv">Gap_size</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">));</span>
</span><span class='line'><span class="nf">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="nv">Self</span><span class="p">,</span> <span class="nv">Depth</span><span class="p">,</span> <span class="nl">#branch</span><span class="p">{</span><span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">[</span><span class="nv">Next</span><span class="p">|</span><span class="nv">Bits2</span><span class="p">]</span> <span class="o">=</span> <span class="nv">Bits</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Self2</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nv">Self</span> <span class="k">of</span>
</span><span class='line'>      <span class="p">{</span><span class="n">gap</span><span class="p">,</span> <span class="p">_}</span> <span class="o">-&gt;</span> <span class="nv">Self</span><span class="p">;</span>
</span><span class='line'>      <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="p">[</span><span class="nv">Next</span><span class="p">|</span><span class="nv">Rest</span><span class="p">]}</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="nv">Rest</span><span class="p">};</span>
</span><span class='line'>      <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="p">[</span><span class="n">false</span><span class="p">|_]}</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="n">gap</span><span class="p">,</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildF</span><span class="p">)};</span>
</span><span class='line'>      <span class="p">{</span><span class="n">self</span><span class="p">,</span> <span class="p">[</span><span class="n">true</span><span class="p">|_]}</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="n">gap</span><span class="p">,</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildT</span><span class="p">)}</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Depth2</span> <span class="o">=</span> <span class="nv">Depth</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span>
</span><span class='line'>    <span class="k">case</span> <span class="nv">Next</span> <span class="k">of</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nv">ChildT2</span> <span class="o">=</span> <span class="n">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span> <span class="nv">Bits2</span><span class="p">,</span> <span class="nv">Self2</span><span class="p">,</span> <span class="nv">Depth2</span><span class="p">,</span> <span class="nv">ChildT</span><span class="p">),</span>
</span><span class='line'>      <span class="nv">Size</span> <span class="o">=</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildF</span><span class="p">)</span> <span class="o">+</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildT2</span><span class="p">),</span>
</span><span class='line'>      <span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">,</span> <span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT2</span><span class="p">};</span>
</span><span class='line'>  <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nv">ChildF2</span> <span class="o">=</span> <span class="n">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span> <span class="nv">Bits2</span><span class="p">,</span> <span class="nv">Self2</span><span class="p">,</span> <span class="nv">Depth2</span><span class="p">,</span> <span class="nv">ChildF</span><span class="p">),</span>
</span><span class='line'>      <span class="nv">Size</span> <span class="o">=</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildF2</span><span class="p">)</span> <span class="o">+</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildT</span><span class="p">),</span>
</span><span class='line'>      <span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">,</span> <span class="n">childF</span><span class="o">=</span><span class="nv">ChildF2</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% iterate through buckets in ascending order of xor distance to Bits</span>
</span><span class='line'><span class="nf">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">,</span> <span class="k">fun</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">done</span> <span class="k">end</span><span class="p">).</span>
</span><span class='line'>          
</span><span class='line'><span class="nf">iter</span><span class="p">(_</span><span class="nv">Bits</span><span class="p">,</span> <span class="nl">#leaf</span><span class="p">{</span><span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">},</span> <span class="nv">Iter</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">fun</span> <span class="p">()</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">{</span><span class="nv">Bucket</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">;</span>
</span><span class='line'><span class="nf">iter</span><span class="p">([</span><span class="nv">Bit</span><span class="p">|</span><span class="nv">Bits</span><span class="p">],</span> <span class="nl">#branch</span><span class="p">{</span><span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">},</span> <span class="nv">Iter</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nv">Bit</span> <span class="k">of</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">ChildT</span><span class="p">,</span> <span class="n">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">ChildF</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">));</span>
</span><span class='line'>  <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">ChildF</span><span class="p">,</span> <span class="n">iter</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">ChildT</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">))</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- internal functions ---</span>
</span><span class='line'>
</span><span class='line'><span class="nf">tree_size</span><span class="p">(</span><span class="nl">#leaf</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Size</span><span class="p">;</span>
</span><span class='line'><span class="nf">tree_size</span><span class="p">(</span><span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Size</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">bucket_update_to_tree</span><span class="p">({</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Size</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nl">#leaf</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">,</span> <span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">};</span>
</span><span class='line'><span class="nf">bucket_update_to_tree</span><span class="p">({</span><span class="n">split</span><span class="p">,</span> <span class="nv">SplitF</span><span class="p">,</span> <span class="nv">SplitT</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">ChildF</span> <span class="o">=</span> <span class="n">bucket_update_to_tree</span><span class="p">(</span><span class="nv">SplitF</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">ChildT</span> <span class="o">=</span> <span class="n">bucket_update_to_tree</span><span class="p">(</span><span class="nv">SplitT</span><span class="p">),</span>
</span><span class='line'>    <span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildF</span><span class="p">)</span><span class="o">+</span><span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildT</span><span class="p">),</span> <span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- end ---</span>
</span></code></pre></td></tr></table></div></figure>


<p>And the corresponding test code.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% simple buckets used for testing bit_tree</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">module</span><span class="p">(</span><span class="n">test_bucket</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">include</span><span class="p">(</span><span class="s">&quot;conf.hrl&quot;</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">bits</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">add</span><span class="o">/</span><span class="mi">3</span><span class="p">,</span> <span class="n">split</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">add_to_tree</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">make_tree</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">distance</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">list_from</span><span class="o">/</span><span class="mi">2</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">MAX_SIZE</span><span class="p">,</span> <span class="mi">3</span><span class="p">).</span>
</span><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">BITS</span><span class="p">,</span> <span class="o">?</span><span class="nv">END_BITS</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="o">&lt;&lt;</span><span class="nv">Int</span><span class="p">:</span><span class="o">?</span><span class="nv">BITS</span><span class="o">&gt;&gt;</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">add</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Int</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">split</span><span class="p">([{</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Int</span><span class="p">}</span> <span class="p">|</span> <span class="nv">Bucket</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">split</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">if</span>
</span><span class='line'>  <span class="nb">length</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">&gt;</span> <span class="o">?</span><span class="nv">MAX_SIZE</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nv">BucketF</span> <span class="o">=</span> <span class="p">[{</span><span class="nv">Suffix2</span><span class="p">,</span> <span class="nv">Int2</span><span class="p">}</span> <span class="p">||</span> <span class="p">{[</span><span class="n">false</span> <span class="p">|</span> <span class="nv">Suffix2</span><span class="p">],</span> <span class="nv">Int2</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bucket</span><span class="p">],</span>
</span><span class='line'>      <span class="nv">BucketT</span> <span class="o">=</span> <span class="p">[{</span><span class="nv">Suffix2</span><span class="p">,</span> <span class="nv">Int2</span><span class="p">}</span> <span class="p">||</span> <span class="p">{[</span><span class="n">true</span> <span class="p">|</span> <span class="nv">Suffix2</span><span class="p">],</span> <span class="nv">Int2</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bucket</span><span class="p">],</span>
</span><span class='line'>      <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="n">split</span><span class="p">(</span><span class="nv">BucketF</span><span class="p">),</span> <span class="n">split</span><span class="p">(</span><span class="nv">BucketT</span><span class="p">)};</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nb">length</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span> <span class="nv">Bucket</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">add_to_tree</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">update</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="p">_</span><span class="nv">Depth</span><span class="p">,</span> <span class="p">_</span><span class="nv">Gap_size</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="n">add</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Int</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="n">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">),</span>
</span><span class='line'>      <span class="n">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">),</span> <span class="c">% dont care about gap for now</span>
</span><span class='line'>      <span class="nv">Tree</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">make_tree</span><span class="p">(</span><span class="nv">Ints</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Tree</span> <span class="o">=</span> <span class="nn">bit_tree</span><span class="p">:</span><span class="n">empty</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[]),</span>
</span><span class='line'>    <span class="nn">lists</span><span class="p">:</span><span class="n">foldl</span><span class="p">(</span><span class="k">fun</span> <span class="n">add_to_tree</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">,</span> <span class="nv">Ints</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">distance</span><span class="p">(</span><span class="nv">IntA</span><span class="p">,</span> <span class="nv">IntB</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">util</span><span class="p">:</span><span class="n">distance</span><span class="p">({</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="o">&lt;&lt;</span><span class="nv">IntA</span><span class="p">:</span><span class="o">?</span><span class="nv">BITS</span><span class="o">&gt;&gt;</span><span class="p">},</span> <span class="p">{</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="o">&lt;&lt;</span><span class="nv">IntB</span><span class="p">:</span><span class="o">?</span><span class="nv">BITS</span><span class="o">&gt;&gt;</span><span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% output *should* be in ascending order</span>
</span><span class='line'><span class="nf">list_from</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">List</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">iter_to_list</span><span class="p">(</span><span class="nn">bit_tree</span><span class="p">:</span><span class="n">iter</span><span class="p">(</span><span class="n">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">),</span> <span class="nv">Tree</span><span class="p">)),</span>
</span><span class='line'>    <span class="nn">lists</span><span class="p">:</span><span class="n">map</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="nn">lists</span><span class="p">:</span><span class="n">sort</span><span class="p">([{</span><span class="n">distance</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Elem</span><span class="p">),</span> <span class="nv">Elem</span><span class="p">}</span> <span class="p">||</span> <span class="p">{_,</span><span class="nv">Elem</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bucket</span><span class="p">])</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">List</span><span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: bit_trees]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/03/23/telehash-bittrees/"/>
    <updated>2011-03-23T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/03/23/telehash-bittrees</id>
    <content type="html"><![CDATA[<p>The next step in building a switch is managing a routing table. Actually, the next step is handling sessions via <em>ring/</em>line but I&#8217;m still mulling over the protocol so we&#8217;ll skip to the routing table.</p>

<!--more-->


<p>I&#8217;ll add the usual &#8216;I don&#8217;t understand Kademlia and I didn&#8217;t test my code&#8217; disclaimer in here.</p>

<p>Routing in the Kademlia paper is described using what can best be called the &#8216;mash everything together and be vague about the details&#8217; pattern. I want my switch to be a bit cleaner than that so I&#8217;ve split it into three modules. The first of these is the bit_tree.</p>

<p>The bit_tree is a suffix tree which maps ends (lists of bits) to buckets. The bit_tree neither knows nor cares what a bucket is and for now you don&#8217;t either. The utility of this tree comes down to one important property: the floor of the log (base 2) of the XOR distance between two ends is the height of the smallest sub-tree which contains both of them. Got that? For example, if log(distance(EndA,EndB)) == 7.234&#8230; then the height of the smallest sub-tree containing both EndA and EndB is 7 nodes. This makes it easy to locate the nearest known nodes to a specified end, something we are supposed to do in response to a <em>.see</em> command.</p>

<p>So here is a bog-standard binary suffix tree:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% a bit_tree is either a leaf or a branch</span>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">leaf</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="nb">size</span><span class="p">,</span> <span class="c">% size of bucket</span>
</span><span class='line'>    <span class="n">bucket</span> <span class="c">% some opaque bucket of stuff</span>
</span><span class='line'>   <span class="p">}).</span>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">branch</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="nb">size</span><span class="p">,</span> <span class="c">% size(childF) + size(childT)</span>
</span><span class='line'>    <span class="n">childF</span><span class="p">,</span> <span class="c">% tree containing nodes whose next bit is false</span>
</span><span class='line'>    <span class="n">childT</span> <span class="c">% % tree containing nodes whose next bit is true</span>
</span><span class='line'>   <span class="p">}).</span>
</span></code></pre></td></tr></table></div></figure>


<p>When adding nodes to a bucket we need to keep track of certain numbers which will be used by the router to decide when to split buckets. Some of these are quite complicated so to make this easier we will work with a zipper-like structure instead of using <em>leaf</em> and <em>branch</em> directly. If you know what a zipper is the code in this post will make sense. If you don&#8217;t know what a zipper is, go find out. When you come back the code in this post will make sense.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% zipper-esque structure marking a position in a bit_tree</span>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">finger</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>    <span class="n">sizer</span><span class="p">,</span> <span class="c">% a size function for buckets</span>
</span><span class='line'>    <span class="n">tree</span><span class="p">,</span> <span class="c">% current sub-tree</span>
</span><span class='line'>    <span class="n">self</span><span class="p">,</span> <span class="c">% the path *to* self (the nodes own end). either {down, Down_bits} or {up, Up_bits, Down_bits, Gap}</span>
</span><span class='line'>      <span class="c">% where Gap is the size of the largest tree containing self but not touching this finger</span>
</span><span class='line'>    <span class="n">depth</span><span class="p">,</span> <span class="c">% the number of bits away from the root tree</span>
</span><span class='line'>    <span class="n">zipper</span> <span class="c">% a list of {Bit, Tree} pairs marking branches NOT taken</span>
</span><span class='line'>   <span class="p">}).</span>
</span></code></pre></td></tr></table></div></figure>


<p>The finger keeps track of where the nodes own end is located in the tree in order to calculate something I have termed the gap - the size of the largest sub-tree containing the nodes own end but not touching the finger.</p>

<p>The empty bit_tree is easy to define:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">empty</span><span class="p">(</span><span class="nv">Self</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">,</span> <span class="nv">Sizer</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nl">#finger</span><span class="p">{</span>
</span><span class='line'>       <span class="n">sizer</span> <span class="o">=</span> <span class="nv">Sizer</span><span class="p">,</span>
</span><span class='line'>       <span class="n">tree</span> <span class="o">=</span> <span class="nl">#leaf</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Sizer</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span> <span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">},</span>
</span><span class='line'>       <span class="n">self</span> <span class="o">=</span> <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="nv">Self</span><span class="p">},</span>
</span><span class='line'>       <span class="n">depth</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
</span><span class='line'>       <span class="n">zipper</span> <span class="o">=</span> <span class="p">[]</span>
</span><span class='line'>      <span class="p">}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>Moving around within the tree is a little more complicated but if you already went away and read about zippers it should feel familiar. Most of the work is in keeping track of the gap.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
<span class='line-number'>57</span>
<span class='line-number'>58</span>
<span class='line-number'>59</span>
<span class='line-number'>60</span>
<span class='line-number'>61</span>
<span class='line-number'>62</span>
<span class='line-number'>63</span>
<span class='line-number'>64</span>
<span class='line-number'>65</span>
<span class='line-number'>66</span>
<span class='line-number'>67</span>
<span class='line-number'>68</span>
<span class='line-number'>69</span>
<span class='line-number'>70</span>
<span class='line-number'>71</span>
<span class='line-number'>72</span>
<span class='line-number'>73</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">extend</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nl">#finger</span><span class="p">{</span><span class="n">tree</span><span class="o">=</span><span class="nl">#leaf</span><span class="p">{}}</span><span class="o">=</span><span class="nv">Finger</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="c">% must always end on a leaf</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Finger</span><span class="p">};</span>
</span><span class='line'><span class="nf">extend</span><span class="p">([</span><span class="nv">Next</span> <span class="p">|</span> <span class="nv">Bits</span><span class="p">],</span>
</span><span class='line'>       <span class="nl">#finger</span><span class="p">{</span>
</span><span class='line'>   <span class="n">tree</span> <span class="o">=</span> <span class="nl">#branch</span><span class="p">{</span><span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">},</span>
</span><span class='line'>   <span class="n">self</span> <span class="o">=</span> <span class="nv">Self</span><span class="p">,</span>
</span><span class='line'>   <span class="n">depth</span> <span class="o">=</span> <span class="nv">Depth</span><span class="p">,</span>
</span><span class='line'>   <span class="n">zipper</span> <span class="o">=</span> <span class="nv">Zipper</span>
</span><span class='line'>  <span class="p">}</span><span class="o">=</span><span class="nv">Finger</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Branch_taken</span><span class="p">,</span> <span class="nv">Branch_missed</span><span class="p">}</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nv">Next</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">false</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="nv">ChildF</span><span class="p">,</span> <span class="nv">ChildT</span><span class="p">};</span>
</span><span class='line'>      <span class="n">true</span> <span class="o">-&gt;</span> <span class="p">{</span><span class="nv">ChildT</span><span class="p">,</span> <span class="nv">ChildF</span><span class="p">}</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Self2</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nv">Self</span> <span class="k">of</span>
</span><span class='line'>      <span class="p">{</span><span class="n">up</span><span class="p">,</span> <span class="nv">Up</span><span class="p">,</span> <span class="nv">Down</span><span class="p">,</span> <span class="nv">Gap</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% already stepped out of gap</span>
</span><span class='line'>      <span class="p">{</span><span class="n">up</span><span class="p">,</span> <span class="p">[</span><span class="ow">not</span><span class="p">(</span><span class="nv">Next</span><span class="p">)|</span><span class="nv">Up</span><span class="p">],</span> <span class="nv">Down</span><span class="p">,</span> <span class="nv">Gap</span><span class="p">};</span>
</span><span class='line'>      <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="p">[</span><span class="nv">Bit</span><span class="p">|</span><span class="nv">Down</span><span class="p">]}</span> <span class="k">when</span> <span class="nv">Bit</span> <span class="o">==</span> <span class="nv">Next</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% still in the gap</span>
</span><span class='line'>      <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="nv">Down</span><span class="p">};</span>
</span><span class='line'>      <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="p">[</span><span class="nv">Bit</span><span class="p">|</span><span class="nv">Down</span><span class="p">]}</span> <span class="k">when</span> <span class="nv">Bit</span> <span class="o">/=</span> <span class="nv">Next</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% leaving gap, check its size</span>
</span><span class='line'>      <span class="p">{</span><span class="n">up</span><span class="p">,</span> <span class="p">[</span><span class="ow">not</span><span class="p">(</span><span class="nv">Next</span><span class="p">)],</span> <span class="p">[</span><span class="nv">Bit</span><span class="p">|</span><span class="nv">Down</span><span class="p">],</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">Branch_missed</span><span class="p">)}</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Depth2</span> <span class="o">=</span> <span class="nv">Depth</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Zipper2</span> <span class="o">=</span> <span class="p">[{</span><span class="ow">not</span><span class="p">(</span><span class="nv">Next</span><span class="p">),</span> <span class="nv">Branch_missed</span><span class="p">}</span> <span class="p">|</span> <span class="nv">Zipper</span><span class="p">],</span>
</span><span class='line'>    <span class="nv">Finger2</span> <span class="o">=</span> <span class="nv">Finger</span><span class="nl">#finger</span><span class="p">{</span>
</span><span class='line'>      <span class="n">tree</span> <span class="o">=</span> <span class="nv">Branch_taken</span><span class="p">,</span>
</span><span class='line'>      <span class="n">self</span> <span class="o">=</span> <span class="nv">Self2</span><span class="p">,</span>
</span><span class='line'>      <span class="n">depth</span> <span class="o">=</span> <span class="nv">Depth2</span><span class="p">,</span>
</span><span class='line'>      <span class="n">zipper</span> <span class="o">=</span> <span class="nv">Zipper2</span>
</span><span class='line'>     <span class="p">},</span>
</span><span class='line'>    <span class="n">extend</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Finger2</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">retract</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nv">Finger</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Finger</span><span class="p">;</span>
</span><span class='line'><span class="nf">retract</span><span class="p">(</span><span class="nv">N</span><span class="p">,</span>
</span><span class='line'>  <span class="nl">#finger</span><span class="p">{</span>
</span><span class='line'>    <span class="n">tree</span> <span class="o">=</span> <span class="nv">Tree</span><span class="p">,</span>
</span><span class='line'>    <span class="n">self</span> <span class="o">=</span> <span class="nv">Self</span><span class="p">,</span>
</span><span class='line'>    <span class="n">depth</span> <span class="o">=</span> <span class="nv">Depth</span><span class="p">,</span>
</span><span class='line'>    <span class="n">zipper</span> <span class="o">=</span> <span class="p">[{</span><span class="nv">Last</span><span class="p">,</span><span class="nv">Branch</span><span class="p">}|</span><span class="nv">Zipper</span><span class="p">]</span>
</span><span class='line'>   <span class="p">}</span><span class="o">=</span><span class="nv">Finger</span><span class="p">)</span> <span class="k">when</span> <span class="nv">N</span><span class="o">&gt;</span><span class="mi">0</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Size</span> <span class="o">=</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">Tree</span><span class="p">)</span> <span class="o">+</span> <span class="n">tree_size</span><span class="p">(</span><span class="nv">Branch</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Tree2</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nv">Last</span> <span class="k">of</span>
</span><span class='line'>      <span class="n">false</span> <span class="o">-&gt;</span> <span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">,</span> <span class="n">childF</span><span class="o">=</span><span class="nv">Branch</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">Tree</span><span class="p">};</span>
</span><span class='line'>      <span class="n">true</span> <span class="o">-&gt;</span> <span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Size</span><span class="p">,</span> <span class="n">childF</span><span class="o">=</span><span class="nv">Tree</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">Branch</span><span class="p">}</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Self2</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">case</span> <span class="nv">Self</span> <span class="k">of</span>
</span><span class='line'>      <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="nv">Down</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% already in gap</span>
</span><span class='line'>      <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="p">[</span><span class="nv">Last</span><span class="p">|</span><span class="nv">Down</span><span class="p">]};</span>
</span><span class='line'>      <span class="p">{</span><span class="n">up</span><span class="p">,</span> <span class="p">[],</span> <span class="nv">Down</span><span class="p">,</span> <span class="p">_</span><span class="nv">Gap</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% just entered gap</span>
</span><span class='line'>      <span class="p">{</span><span class="n">down</span><span class="p">,</span> <span class="p">[</span><span class="nv">Last</span><span class="p">|</span><span class="nv">Down</span><span class="p">]};</span>
</span><span class='line'>      <span class="p">{</span><span class="n">up</span><span class="p">,</span> <span class="p">[</span><span class="nv">Bit</span><span class="p">|</span><span class="nv">Up</span><span class="p">],</span> <span class="nv">Down</span><span class="p">,</span> <span class="nv">Gap</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="c">% still outside gap</span>
</span><span class='line'>      <span class="n">true</span> <span class="o">=</span> <span class="p">(</span><span class="nv">Bit</span><span class="o">==</span><span class="nv">Last</span><span class="p">),</span> <span class="c">% assert</span>
</span><span class='line'>      <span class="p">{</span><span class="n">up</span><span class="p">,</span> <span class="nv">Up</span><span class="p">,</span> <span class="nv">Down</span><span class="p">,</span> <span class="nv">Gap</span><span class="p">}</span>
</span><span class='line'>  <span class="k">end</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Depth2</span> <span class="o">=</span> <span class="nv">Depth</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Finger2</span> <span class="o">=</span>
</span><span class='line'>  <span class="nv">Finger</span><span class="nl">#finger</span><span class="p">{</span>
</span><span class='line'>    <span class="n">tree</span><span class="o">=</span><span class="nv">Tree2</span><span class="p">,</span>
</span><span class='line'>    <span class="n">self</span><span class="o">=</span><span class="nv">Self2</span><span class="p">,</span>
</span><span class='line'>    <span class="n">depth</span><span class="o">=</span><span class="nv">Depth2</span><span class="p">,</span>
</span><span class='line'>    <span class="n">zipper</span><span class="o">=</span><span class="nv">Zipper</span>
</span><span class='line'>   <span class="p">},</span>
</span><span class='line'>    <span class="n">retract</span><span class="p">(</span><span class="nv">N</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="nv">Finger2</span><span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>


<p>The <em>extend</em> and <em>retract</em> functions are only used internally. We export a much simpler function, <em>move_to</em>, which moves the finger to point at the bucket corresponding to the specified end.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">move_to</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="nl">#finger</span><span class="p">{</span><span class="n">depth</span><span class="o">=</span><span class="nv">Depth</span><span class="p">}</span><span class="o">=</span><span class="nv">Finger</span><span class="p">)</span> <span class="k">when</span> <span class="nb">length</span><span class="p">(</span><span class="nv">Bits</span><span class="p">)</span> <span class="o">==</span> <span class="o">?</span><span class="nv">END_BITS</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="c">% !!! naive version</span>
</span><span class='line'>    <span class="n">extend</span><span class="p">(</span><span class="nv">Bits</span><span class="p">,</span> <span class="n">retract</span><span class="p">(</span><span class="nv">Depth</span><span class="p">,</span> <span class="nv">Finger</span><span class="p">)).</span>
</span></code></pre></td></tr></table></div></figure>


<p>We could make this more efficient by only retracting until the finger meets <em>Bits</em> partway up. For now I don&#8217;t expect performance of the bit_tree to be an issue.</p>

<p>Now that we can find buckets we can modify them. Deciding when to split buckets is not the concern of the bit_tree so we delegate it to the caller.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">update</span><span class="p">(</span><span class="nv">Fun</span><span class="p">,</span>
</span><span class='line'>       <span class="nl">#finger</span><span class="p">{</span>
</span><span class='line'>   <span class="n">sizer</span><span class="o">=</span><span class="nv">Sizer</span><span class="p">,</span>
</span><span class='line'>   <span class="n">tree</span><span class="o">=</span><span class="nl">#leaf</span><span class="p">{</span><span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">}</span>
</span><span class='line'>  <span class="p">}</span><span class="o">=</span><span class="nv">Finger</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Tree</span> <span class="o">=</span> <span class="n">bucket_update_to_tree</span><span class="p">(</span><span class="nv">Sizer</span><span class="p">,</span> <span class="nv">Fun</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)),</span>
</span><span class='line'>    <span class="nv">Finger</span><span class="nl">#finger</span><span class="p">{</span><span class="n">tree</span><span class="o">=</span><span class="nv">Tree</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">bucket_update_to_tree</span><span class="p">(</span><span class="nv">Sizer</span><span class="p">,</span> <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nl">#leaf</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="nv">Sizer</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">),</span> <span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">};</span>
</span><span class='line'><span class="nf">bucket_update_to_tree</span><span class="p">(</span><span class="nv">Sizer</span><span class="p">,</span> <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="nv">SplitF</span><span class="p">,</span> <span class="nv">SplitT</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">ChildF</span> <span class="o">=</span> <span class="n">bucket_update_to_tree</span><span class="p">(</span><span class="nv">Sizer</span><span class="p">,</span> <span class="nv">SplitF</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">ChildT</span> <span class="o">=</span> <span class="n">bucket_update_to_tree</span><span class="p">(</span><span class="nv">Sizer</span><span class="p">,</span> <span class="nv">SplitT</span><span class="p">),</span>
</span><span class='line'>    <span class="nl">#branch</span><span class="p">{</span><span class="nb">size</span><span class="o">=</span><span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildF</span><span class="p">)</span><span class="o">+</span><span class="n">tree_size</span><span class="p">(</span><span class="nv">ChildT</span><span class="p">),</span> <span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>In order to handle <em>.see</em> commands the <em>iter</em> function is used to return buckets in order of distance from the specified end. Here we are making use of the aforementioned nice properties of the bit_tree in order to efficiently return the buckets in order.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% iterate through buckets in ascending order of xor distance to (current position ++ Suffix)</span>
</span><span class='line'><span class="nf">iter</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nl">#finger</span><span class="p">{</span><span class="n">tree</span><span class="o">=</span><span class="nv">Tree</span><span class="p">,</span> <span class="n">zipper</span><span class="o">=</span><span class="nv">Zipper</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">iter_buckets</span><span class="p">(</span><span class="nv">Tree</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="n">iter_zipper</span><span class="p">(</span><span class="nv">Zipper</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">)).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% iterate through buckets in ascending order of xor distance to (current position ++ Suffix)</span>
</span><span class='line'><span class="nf">iter_zipper</span><span class="p">([],</span> <span class="p">_</span><span class="nv">Suffix</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">fun</span> <span class="p">()</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">done</span>
</span><span class='line'>    <span class="k">end</span><span class="p">;</span>
</span><span class='line'><span class="nf">iter_zipper</span><span class="p">([{</span><span class="nv">Bit</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">}</span> <span class="p">|</span> <span class="nv">Zipper</span><span class="p">],</span> <span class="nv">Suffix</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">iter_buckets</span><span class="p">(</span><span class="nv">Tree</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="n">iter_zipper</span><span class="p">(</span><span class="nv">Zipper</span><span class="p">,</span> <span class="p">[</span><span class="ow">not</span><span class="p">(</span><span class="nv">Bit</span><span class="p">)|</span><span class="nv">Suffix</span><span class="p">])).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% iterate through buckets in ascending order of xor distance to Bits, then hand over to Iter</span>
</span><span class='line'><span class="nf">iter_buckets</span><span class="p">(</span><span class="nl">#leaf</span><span class="p">{</span><span class="n">bucket</span><span class="o">=</span><span class="nv">Bucket</span><span class="p">},</span> <span class="p">_</span><span class="nv">Bits</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">fun</span> <span class="p">()</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">{</span><span class="nv">Bucket</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">;</span>
</span><span class='line'><span class="nf">iter_buckets</span><span class="p">(</span><span class="nl">#branch</span><span class="p">{</span><span class="n">childF</span><span class="o">=</span><span class="nv">ChildF</span><span class="p">,</span> <span class="n">childT</span><span class="o">=</span><span class="nv">ChildT</span><span class="p">},</span> <span class="p">[</span><span class="nv">Bit</span><span class="p">|</span><span class="nv">Bits</span><span class="p">],</span> <span class="nv">Iter</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nv">Bit</span> <span class="k">of</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">iter_buckets</span><span class="p">(</span><span class="nv">ChildT</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="n">iter_buckets</span><span class="p">(</span><span class="nv">ChildF</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">));</span>
</span><span class='line'>  <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="n">iter_buckets</span><span class="p">(</span><span class="nv">ChildF</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="n">iter_buckets</span><span class="p">(</span><span class="nv">ChildT</span><span class="p">,</span> <span class="nv">Bits</span><span class="p">,</span> <span class="nv">Iter</span><span class="p">))</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>It will typically be called like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">{</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree2</span><span class="p">}</span> <span class="o">=</span> <span class="nn">bit_tree</span><span class="p">:</span><span class="n">move_to</span><span class="p">(</span><span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="nv">End</span><span class="p">),</span> <span class="nv">Tree</span><span class="p">),</span>
</span><span class='line'><span class="nn">bit_tree</span><span class="p">:</span><span class="n">iter</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree2</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Splitting the routing table into separate structures like this makes for easier testing. The bit_tree can be tested independently using really simple buckets where the elements are just integers and the buckets split when they reach more than three elements.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% simple buckets used for testing bit_tree</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">module</span><span class="p">(</span><span class="n">test_bucket</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">include</span><span class="p">(</span><span class="s">&quot;conf.hrl&quot;</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">bits</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">add</span><span class="o">/</span><span class="mi">3</span><span class="p">,</span> <span class="n">split</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">move_to</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">add_to_tree</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">make_tree</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">distance</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">move_list_from</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">list_from</span><span class="o">/</span><span class="mi">3</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">MAX_SIZE</span><span class="p">,</span> <span class="mi">3</span><span class="p">).</span>
</span><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">BITS</span><span class="p">,</span> <span class="o">?</span><span class="nv">END_BITS</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">util</span><span class="p">:</span><span class="n">to_bits</span><span class="p">(</span><span class="o">&lt;&lt;</span><span class="nv">Int</span><span class="p">:</span><span class="o">?</span><span class="nv">BITS</span><span class="o">&gt;&gt;</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">add</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Int</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">split</span><span class="p">([{</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Int</span><span class="p">}</span> <span class="p">|</span> <span class="nv">Bucket</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">split</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">if</span>
</span><span class='line'>  <span class="nb">length</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">&gt;</span> <span class="o">?</span><span class="nv">MAX_SIZE</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nv">BucketF</span> <span class="o">=</span> <span class="p">[{</span><span class="nv">Suffix2</span><span class="p">,</span> <span class="nv">Int2</span><span class="p">}</span> <span class="p">||</span> <span class="p">{[</span><span class="n">false</span> <span class="p">|</span> <span class="nv">Suffix2</span><span class="p">],</span> <span class="nv">Int2</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bucket</span><span class="p">],</span>
</span><span class='line'>      <span class="nv">BucketT</span> <span class="o">=</span> <span class="p">[{</span><span class="nv">Suffix2</span><span class="p">,</span> <span class="nv">Int2</span><span class="p">}</span> <span class="p">||</span> <span class="p">{[</span><span class="n">true</span> <span class="p">|</span> <span class="nv">Suffix2</span><span class="p">],</span> <span class="nv">Int2</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bucket</span><span class="p">],</span>
</span><span class='line'>      <span class="p">{</span><span class="n">split</span><span class="p">,</span> <span class="n">split</span><span class="p">(</span><span class="nv">BucketF</span><span class="p">),</span> <span class="n">split</span><span class="p">(</span><span class="nv">BucketT</span><span class="p">)};</span>
</span><span class='line'>  <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">move_to</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">move_to</span><span class="p">(</span><span class="n">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">),</span> <span class="nv">Tree</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">add_to_tree</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree2</span><span class="p">}</span> <span class="o">=</span> <span class="n">move_to</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">),</span>
</span><span class='line'>    <span class="nn">bit_tree</span><span class="p">:</span><span class="n">update</span><span class="p">(</span><span class="k">fun</span> <span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">add</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Int</span><span class="p">,</span> <span class="nv">Bucket</span><span class="p">)</span> <span class="k">end</span><span class="p">,</span> <span class="nv">Tree2</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">make_tree</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Ints</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Tree</span> <span class="o">=</span> <span class="nn">bit_tree</span><span class="p">:</span><span class="n">empty</span><span class="p">(</span><span class="n">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">),</span> <span class="p">[],</span> <span class="k">fun</span> <span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">length</span><span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="k">end</span><span class="p">),</span>
</span><span class='line'>    <span class="nn">lists</span><span class="p">:</span><span class="n">foldl</span><span class="p">(</span><span class="k">fun</span> <span class="n">add_to_tree</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">,</span> <span class="nv">Ints</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">distance</span><span class="p">(</span><span class="nv">IntA</span><span class="p">,</span> <span class="nv">IntB</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">util</span><span class="p">:</span><span class="n">distance</span><span class="p">({</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="o">&lt;&lt;</span><span class="nv">IntA</span><span class="p">:</span><span class="o">?</span><span class="nv">BITS</span><span class="o">&gt;&gt;</span><span class="p">},</span> <span class="p">{</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="o">&lt;&lt;</span><span class="nv">IntB</span><span class="p">:</span><span class="o">?</span><span class="nv">BITS</span><span class="o">&gt;&gt;</span><span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% output *should* be in ascending order</span>
</span><span class='line'><span class="nf">move_list_from</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree2</span><span class="p">}</span> <span class="o">=</span> <span class="nn">bit_tree</span><span class="p">:</span><span class="n">move_to</span><span class="p">(</span><span class="n">bits</span><span class="p">(</span><span class="nv">Int</span><span class="p">),</span> <span class="nv">Tree</span><span class="p">),</span>
</span><span class='line'>    <span class="n">list_from</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree2</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">list_from</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">List</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">iter_to_list</span><span class="p">(</span><span class="nn">bit_tree</span><span class="p">:</span><span class="n">iter</span><span class="p">(</span><span class="nv">Suffix</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">)),</span>
</span><span class='line'>    <span class="nn">lists</span><span class="p">:</span><span class="n">map</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">(</span><span class="nv">Bucket</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>        <span class="nn">lists</span><span class="p">:</span><span class="n">sort</span><span class="p">([{</span><span class="n">distance</span><span class="p">(</span><span class="nv">Int</span><span class="p">,</span> <span class="nv">Elem</span><span class="p">),</span> <span class="nv">Elem</span><span class="p">}</span> <span class="p">||</span> <span class="p">{_,</span><span class="nv">Elem</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bucket</span><span class="p">])</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">List</span><span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>


<p>We can play around with the test buckets a bit:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="mi">25</span><span class="o">&gt;</span> <span class="nv">Tree</span> <span class="o">=</span> <span class="nn">test_bucket</span><span class="p">:</span><span class="n">make_tree</span><span class="p">(</span><span class="mi">47</span><span class="p">,</span> <span class="nn">lists</span><span class="p">:</span><span class="n">seq</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1000</span><span class="p">)).</span>
</span><span class='line'><span class="p">{</span><span class="n">finger</span><span class="p">,</span><span class="err">#</span><span class="nv">Fun</span><span class="o">&lt;</span><span class="n">test_bucket</span><span class="p">.</span><span class="mi">1</span><span class="p">.</span><span class="mi">121651971</span><span class="o">&gt;</span><span class="p">,</span>
</span><span class='line'>        <span class="p">{</span><span class="n">leaf</span><span class="p">,</span><span class="mi">1</span><span class="p">,[{[</span><span class="n">false</span><span class="p">,</span><span class="n">false</span><span class="p">,</span><span class="n">false</span><span class="p">],</span><span class="mi">1000</span><span class="p">}]},</span>
</span><span class='line'>        <span class="p">{</span><span class="n">up</span><span class="p">,[</span><span class="n">false</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">false</span><span class="p">,</span><span class="n">false</span><span class="p">,</span><span class="n">false</span><span class="p">,</span><span class="n">false</span><span class="p">,</span><span class="n">false</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span>
</span><span class='line'>             <span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span>
</span><span class='line'>             <span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">|...],</span>
</span><span class='line'>            <span class="p">[</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span>
</span><span class='line'>             <span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">,</span><span class="n">true</span><span class="p">|...],</span>
</span><span class='line'>            <span class="mi">0</span><span class="p">},</span>
</span><span class='line'>        <span class="mi">157</span><span class="p">,</span>
</span><span class='line'>        <span class="p">[{</span><span class="n">false</span><span class="p">,{</span><span class="n">branch</span><span class="p">,</span><span class="mi">8</span><span class="p">,</span>
</span><span class='line'>                        <span class="p">{</span><span class="n">branch</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span>
</span><span class='line'>                                <span class="p">{</span><span class="n">leaf</span><span class="p">,</span><span class="mi">2</span><span class="p">,[{[</span><span class="n">true</span><span class="p">],</span><span class="mi">993</span><span class="p">},{[</span><span class="n">false</span><span class="p">],</span><span class="mi">992</span><span class="p">}]},</span>
</span><span class='line'>                                <span class="p">{</span><span class="n">leaf</span><span class="p">,</span><span class="mi">2</span><span class="p">,[{[</span><span class="n">true</span><span class="p">],</span><span class="mi">995</span><span class="p">},{[</span><span class="n">false</span><span class="p">],</span><span class="mi">994</span><span class="p">}]}},</span>
</span><span class='line'>                        <span class="p">{</span><span class="n">branch</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span>
</span><span class='line'>                                <span class="p">{</span><span class="n">leaf</span><span class="p">,</span><span class="mi">2</span><span class="p">,[{[</span><span class="n">true</span><span class="p">],</span><span class="mi">997</span><span class="p">},{[</span><span class="n">false</span><span class="p">],</span><span class="mi">996</span><span class="p">}]},</span>
</span><span class='line'>                                <span class="p">{</span><span class="n">leaf</span><span class="p">,</span><span class="mi">2</span><span class="p">,[{[</span><span class="n">true</span><span class="p">],</span><span class="mi">999</span><span class="p">},{[</span><span class="n">false</span><span class="p">],</span><span class="mi">998</span><span class="p">}]}}}},</span>
</span><span class='line'>         <span class="p">{...}|...]}</span>
</span><span class='line'><span class="mi">26</span><span class="o">&gt;</span> <span class="nv">List</span> <span class="o">=</span> <span class="nn">test_bucket</span><span class="p">:</span><span class="n">move_list_from</span><span class="p">(</span><span class="mi">657</span><span class="p">,</span> <span class="nv">Tree</span><span class="p">).</span>
</span><span class='line'><span class="p">[[{</span><span class="mi">0</span><span class="p">,</span><span class="mi">657</span><span class="p">},{</span><span class="mi">1</span><span class="p">,</span><span class="mi">656</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">2</span><span class="p">,</span><span class="mi">659</span><span class="p">},{</span><span class="mi">3</span><span class="p">,</span><span class="mi">658</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">4</span><span class="p">,</span><span class="mi">661</span><span class="p">},{</span><span class="mi">5</span><span class="p">,</span><span class="mi">660</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">6</span><span class="p">,</span><span class="mi">663</span><span class="p">},{</span><span class="mi">7</span><span class="p">,</span><span class="mi">662</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">8</span><span class="p">,</span><span class="mi">665</span><span class="p">},{</span><span class="mi">9</span><span class="p">,</span><span class="mi">664</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">10</span><span class="p">,</span><span class="mi">667</span><span class="p">},{</span><span class="mi">11</span><span class="p">,</span><span class="mi">666</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">12</span><span class="p">,</span><span class="mi">669</span><span class="p">},{</span><span class="mi">13</span><span class="p">,</span><span class="mi">668</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">14</span><span class="p">,</span><span class="mi">671</span><span class="p">},{</span><span class="mi">15</span><span class="p">,</span><span class="mi">670</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">16</span><span class="p">,</span><span class="mi">641</span><span class="p">},{</span><span class="mi">17</span><span class="p">,</span><span class="mi">640</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">18</span><span class="p">,</span><span class="mi">643</span><span class="p">},{</span><span class="mi">19</span><span class="p">,</span><span class="mi">642</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">20</span><span class="p">,</span><span class="mi">645</span><span class="p">},{</span><span class="mi">21</span><span class="p">,</span><span class="mi">644</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">22</span><span class="p">,</span><span class="mi">647</span><span class="p">},{</span><span class="mi">23</span><span class="p">,</span><span class="mi">646</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">24</span><span class="p">,</span><span class="mi">649</span><span class="p">},{</span><span class="mi">25</span><span class="p">,</span><span class="mi">648</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">26</span><span class="p">,</span><span class="mi">651</span><span class="p">},{</span><span class="mi">27</span><span class="p">,</span><span class="mi">650</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">28</span><span class="p">,</span><span class="mi">653</span><span class="p">},{</span><span class="mi">29</span><span class="p">,</span><span class="mi">652</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">30</span><span class="p">,</span><span class="mi">655</span><span class="p">},{</span><span class="mi">31</span><span class="p">,</span><span class="mi">654</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">32</span><span class="p">,</span><span class="mi">689</span><span class="p">},{</span><span class="mi">33</span><span class="p">,</span><span class="mi">688</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">34</span><span class="p">,</span><span class="mi">691</span><span class="p">},{</span><span class="mi">35</span><span class="p">,</span><span class="mi">690</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">36</span><span class="p">,</span><span class="mi">693</span><span class="p">},{</span><span class="mi">37</span><span class="p">,</span><span class="mi">692</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">38</span><span class="p">,</span><span class="mi">695</span><span class="p">},{</span><span class="mi">39</span><span class="p">,</span><span class="mi">694</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">40</span><span class="p">,</span><span class="mi">697</span><span class="p">},{</span><span class="mi">41</span><span class="p">,</span><span class="mi">696</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">42</span><span class="p">,</span><span class="mi">699</span><span class="p">},{</span><span class="mi">43</span><span class="p">,</span><span class="mi">698</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">44</span><span class="p">,</span><span class="mi">701</span><span class="p">},{</span><span class="mi">45</span><span class="p">,</span><span class="mi">700</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">46</span><span class="p">,</span><span class="mi">703</span><span class="p">},{</span><span class="mi">47</span><span class="p">,</span><span class="mi">702</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">48</span><span class="p">,</span><span class="mi">673</span><span class="p">},{</span><span class="mi">49</span><span class="p">,</span><span class="mi">672</span><span class="p">}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">50</span><span class="p">,</span><span class="mi">675</span><span class="p">},{</span><span class="mi">51</span><span class="p">,...}],</span>
</span><span class='line'> <span class="p">[{</span><span class="mi">52</span><span class="p">,...},{...}],</span>
</span><span class='line'> <span class="p">[{...}|...],</span>
</span><span class='line'> <span class="p">[...]|...]</span>
</span><span class='line'><span class="mi">27</span><span class="o">&gt;</span> <span class="nn">lists</span><span class="p">:</span><span class="n">flatten</span><span class="p">(</span><span class="nv">List</span><span class="p">)</span> <span class="o">==</span> <span class="nn">lists</span><span class="p">:</span><span class="n">sort</span><span class="p">(</span><span class="nn">lists</span><span class="p">:</span><span class="n">flatten</span><span class="p">(</span><span class="nv">List</span><span class="p">)).</span>
</span><span class='line'><span class="n">true</span>
</span></code></pre></td></tr></table></div></figure>


<p>As usual the full code is in the <a href="http://github.com/jamii/erl-telehash">repo</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: dialing]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/03/21/telehash-dialing/"/>
    <updated>2011-03-21T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/03/21/telehash-dialing</id>
    <content type="html"><![CDATA[<p>The next step in building a telehash switch is being able to dial.</p>

<!--more-->


<p>First a disclaimer: this post reflects my current understanding of TeleHash and Kademlia and is highly likely to be wrong. This code has only received minimal testing. Properly testing a p2p network is not something I&#8217;m entirely sure how to do yet. Expect to see more on that in later posts.</p>

<p>Each TeleHash node and each key in the DHT is identified by a 160 bit sha1 hash (aka end). In the original Kademlia paper the node ids are selected at random but in TeleHash they are the hashed address (IP:port) of the node. This means that malicious nodes don&#8217;t get to choose where they are inserted in the DHT.</p>

<p>Kademlia routing is based on the XOR distance between ends. This forms a metric space over the set of ends.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">distance</span><span class="p">(</span><span class="nv">A</span><span class="p">,</span> <span class="nv">B</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="nv">EndA</span><span class="p">}</span> <span class="o">=</span> <span class="n">to_end</span><span class="p">(</span><span class="nv">A</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">&#39;end&#39;</span><span class="p">,</span> <span class="nv">EndB</span><span class="p">}</span> <span class="o">=</span> <span class="n">to_end</span><span class="p">(</span><span class="nv">B</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Bytes</span> <span class="o">=</span> <span class="nn">lists</span><span class="p">:</span><span class="n">zip</span><span class="p">(</span><span class="nb">binary_to_list</span><span class="p">(</span><span class="nv">EndA</span><span class="p">),</span> <span class="nb">binary_to_list</span><span class="p">(</span><span class="nv">EndB</span><span class="p">)),</span>
</span><span class='line'>    <span class="nv">Xor</span> <span class="o">=</span> <span class="nb">list_to_binary</span><span class="p">([</span><span class="nv">ByteA</span> <span class="ow">bxor</span> <span class="nv">ByteB</span> <span class="p">||</span> <span class="p">{</span><span class="nv">ByteA</span><span class="p">,</span> <span class="nv">ByteB</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Bytes</span><span class="p">]),</span>
</span><span class='line'>    <span class="o">&lt;&lt;</span><span class="nv">Dist</span><span class="p">:</span><span class="o">?</span><span class="nv">END_BITS</span><span class="o">&gt;&gt;</span> <span class="o">=</span> <span class="nv">Xor</span><span class="p">,</span>
</span><span class='line'>    <span class="nv">Dist</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>The Kademlia paper defines two constants, K and A. K controls the amount of redundant storage in the DHT and A controls the number of parallel requests issued by each node. To insert a key into the DHT a node must be able to locate the K nodes whose IDs are closest to the key. This process is called dialing.</p>

<p>Dialing works roughly as follows. Each node keeps track of all the other nodes it has seen. Upon receiving a +end signal a node will reply with a .see command containing the K nodes it is aware of which are closest to the specified end. To dial an end we send a +end signal to each of the K closest nodes we are aware of. Then to each node contained in the .see replies we send +end signals, and so on until we run out of nodes to contact.</p>

<p>Now this is nice and simple and will work but it generates a huge amount of load on the network. To reduce this Kademlia introduces two additional rules. First, we only send up to A signals at a time and don&#8217;t send any new signals until previous signals have either generated a reply or timed out. Second, we finish early if at any point we have received replies from K nodes which are closer to the end than all the nodes we are waiting to contact. The Kademlia paper proves that under reasonable assumptions about the knowledge of each node this still has a very high chance to return the correct results.</p>

<p>The dialer process is an event handler which has two important data structures. The first stores the dialer configuration:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">conf</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>          <span class="n">target</span><span class="p">,</span> <span class="c">% the end to dial</span>
</span><span class='line'>          <span class="n">timeout</span><span class="p">,</span> <span class="c">% the timeout for the entire dialing process</span>
</span><span class='line'>          <span class="n">ref</span><span class="p">,</span> <span class="n">caller</span> <span class="c">% reply details</span>
</span><span class='line'>         <span class="p">}).</span>
</span></code></pre></td></tr></table></div></figure>


<p>The second record stores the state of the dialing process. The principle around which the dialer is designed is that the state record is a reflection of the outside world and the sole job of the dialer is to keep this record up to date while maintaining the invariants in the comments. This is often the way that I write code and I feel that it needs it&#8217;s own post once I can articulate it properly. It&#8217;s certainly heavily informed both by the designs in <a href="http://books.google.co.uk/books?id=SxPzSTcTalAC&amp;printsec=frontcover&amp;dq=okasaki+purely+functional&amp;hl=en&amp;ei=6GiHTdTcKY_RcfjR_ZkD&amp;sa=X&amp;oi=book_result&amp;ct=result&amp;resnum=1&amp;ved=0CC4Q6AEwAA#v=onepage&amp;q&amp;f=false">Okasaki&#8217;s Purely Functional Data Structures</a> and by Conal Elliott&#8217;s ideas about <a href="http://conal.net/blog/posts/denotational-design-with-type-class-morphisms/">denotational semantics and type class morphisms</a>.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">state</span><span class="p">,</span> <span class="p">{</span>
</span><span class='line'>          <span class="n">fresh</span><span class="p">,</span> <span class="c">% nodes which have not yet been contacted</span>
</span><span class='line'>          <span class="n">pinged</span><span class="p">,</span> <span class="c">% nodes which have been contacted and have not replied</span>
</span><span class='line'>          <span class="n">waiting</span><span class="p">,</span> <span class="c">% nodes in pinged which were contacted less than ?DIAL_TIMEOUT ago</span>
</span><span class='line'>          <span class="n">ponged</span><span class="p">,</span> <span class="c">% nodes which have been contacted and have replied</span>
</span><span class='line'>          <span class="n">seen</span> <span class="c">% all nodes which have been seen</span>
</span><span class='line'>         <span class="p">}).</span> <span class="c">% invariant: pq:length(waiting) = ?A or pq:empty(fresh)</span>
</span></code></pre></td></tr></table></div></figure>


<p>The dialer module exports the dial function which creates the records and starts the event handler.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">dial</span><span class="p">(</span><span class="nv">To</span><span class="p">,</span> <span class="nv">From</span><span class="p">,</span> <span class="nv">Timeout</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">dialing</span><span class="p">,</span> <span class="nv">To</span><span class="p">,</span> <span class="nv">From</span><span class="p">,</span> <span class="nv">Timeout</span><span class="p">]),</span>
</span><span class='line'>    <span class="nv">Ref</span> <span class="o">=</span> <span class="nn">erlang</span><span class="p">:</span><span class="n">make_ref</span><span class="p">(),</span>
</span><span class='line'>    <span class="nv">Target</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">to_end</span><span class="p">(</span><span class="nv">To</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Conf</span> <span class="o">=</span> <span class="nl">#conf</span><span class="p">{</span>
</span><span class='line'>      <span class="n">target</span> <span class="o">=</span> <span class="nv">Target</span><span class="p">,</span>
</span><span class='line'>      <span class="n">timeout</span> <span class="o">=</span> <span class="nv">Timeout</span><span class="p">,</span>
</span><span class='line'>      <span class="n">ref</span> <span class="o">=</span> <span class="nv">Ref</span><span class="p">,</span>
</span><span class='line'>      <span class="n">caller</span> <span class="o">=</span> <span class="n">self</span><span class="p">()</span>
</span><span class='line'>     <span class="p">},</span>
</span><span class='line'>    <span class="nv">Nodes</span> <span class="o">=</span> <span class="p">[{</span><span class="nn">util</span><span class="p">:</span><span class="n">distance</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Target</span><span class="p">),</span> <span class="nv">Address</span><span class="p">}</span>
</span><span class='line'>             <span class="p">||</span> <span class="nv">Address</span> <span class="o">&lt;-</span> <span class="nv">From</span><span class="p">],</span>
</span><span class='line'>    <span class="nv">State</span> <span class="o">=</span> <span class="nl">#state</span><span class="p">{</span>
</span><span class='line'>      <span class="n">fresh</span><span class="o">=</span><span class="nn">pq</span><span class="p">:</span><span class="n">from_list</span><span class="p">(</span><span class="nv">Nodes</span><span class="p">),</span>
</span><span class='line'>      <span class="n">pinged</span><span class="o">=</span><span class="nn">sets</span><span class="p">:</span><span class="n">new</span><span class="p">(),</span>
</span><span class='line'>      <span class="n">waiting</span><span class="o">=</span><span class="nn">pq</span><span class="p">:</span><span class="n">empty</span><span class="p">(),</span>
</span><span class='line'>      <span class="n">ponged</span><span class="o">=</span><span class="nn">pq</span><span class="p">:</span><span class="n">empty</span><span class="p">(),</span>
</span><span class='line'>      <span class="n">seen</span><span class="o">=</span><span class="nn">sets</span><span class="p">:</span><span class="n">new</span><span class="p">()</span>
</span><span class='line'>     <span class="p">},</span>
</span><span class='line'>    <span class="n">ok</span> <span class="o">=</span> <span class="nn">switch</span><span class="p">:</span><span class="n">add_handler</span><span class="p">(</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">}),</span>
</span><span class='line'>    <span class="nv">Ref</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>The aim is to handle events and maintain the state invariants until we are finished. How do we define finished?</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% is the dialing finished yet?</span>
</span><span class='line'><span class="nf">finished</span><span class="p">(</span><span class="nl">#state</span><span class="p">{</span><span class="n">fresh</span><span class="o">=</span><span class="nv">Fresh</span><span class="p">,</span> <span class="n">waiting</span><span class="o">=</span><span class="nv">Waiting</span><span class="p">,</span> <span class="n">ponged</span><span class="o">=</span><span class="nv">Ponged</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">(</span><span class="nn">pq</span><span class="p">:</span><span class="n">is_empty</span><span class="p">(</span><span class="nv">Fresh</span><span class="p">)</span> <span class="ow">and</span> <span class="nn">pq</span><span class="p">:</span><span class="n">is_empty</span><span class="p">(</span><span class="nv">Waiting</span><span class="p">))</span> <span class="c">% no way to continue</span>
</span><span class='line'>    <span class="ow">or</span>
</span><span class='line'>    <span class="p">(</span><span class="k">case</span> <span class="nn">pq</span><span class="p">:</span><span class="nb">length</span><span class="p">(</span><span class="nv">Ponged</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="o">?</span><span class="nv">K</span> <span class="k">of</span>
</span><span class='line'>         <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>             <span class="n">false</span><span class="p">;</span> <span class="c">% dont yet have K nodes</span>
</span><span class='line'>         <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>             <span class="c">% finish if the K closest nodes we know are closer than all the nodes we haven&#39;t checked yet</span>
</span><span class='line'>             <span class="p">{</span><span class="nv">Dist_fresh</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">peek</span><span class="p">(</span><span class="nv">Fresh</span><span class="p">),</span>
</span><span class='line'>             <span class="p">{</span><span class="nv">Dist_waiting</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">peek</span><span class="p">(</span><span class="nv">Waiting</span><span class="p">),</span>
</span><span class='line'>             <span class="p">{</span><span class="nv">Nodes</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">pop</span><span class="p">(</span><span class="nv">Ponged</span><span class="p">,</span> <span class="o">?</span><span class="nv">K</span><span class="p">),</span>
</span><span class='line'>             <span class="p">{</span><span class="nv">Dist_ponged</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="nn">lists</span><span class="p">:</span><span class="n">last</span><span class="p">(</span><span class="nv">Nodes</span><span class="p">),</span>
</span><span class='line'>             <span class="p">(</span><span class="nv">Dist_ponged</span> <span class="o">&lt;</span> <span class="nv">Dist_fresh</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="nv">Dist_ponged</span> <span class="o">&lt;</span> <span class="nv">Dist_waiting</span><span class="p">)</span>
</span><span class='line'>     <span class="k">end</span><span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>


<p>One of the invariants we aim to maintain is that either the fresh queue is empty or the length of the waiting queue is A. This ensures that we send out +end signals whenever possible. This invariant is maintained by calling the ping_nodes function after every event.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% contact nodes from fresh until the waiting list is full</span>
</span><span class='line'><span class="nf">ping_nodes</span><span class="p">(</span><span class="nl">#conf</span><span class="p">{</span><span class="n">target</span><span class="o">=</span><span class="nv">Target</span><span class="p">},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">fresh</span><span class="o">=</span><span class="nv">Fresh</span><span class="p">,</span> <span class="n">waiting</span><span class="o">=</span><span class="nv">Waiting</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Num</span> <span class="o">=</span> <span class="o">?</span><span class="nv">A</span> <span class="o">-</span> <span class="nn">pq</span><span class="p">:</span><span class="nb">length</span><span class="p">(</span><span class="nv">Waiting</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Nodes</span><span class="p">,</span> <span class="nv">Fresh2</span><span class="p">}</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">pop</span><span class="p">(</span><span class="nv">Fresh</span><span class="p">,</span> <span class="nv">Num</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Telex</span> <span class="o">=</span> <span class="p">{</span><span class="n">struct</span><span class="p">,</span> <span class="p">[{</span><span class="n">&#39;+end&#39;</span><span class="p">,</span> <span class="nn">util</span><span class="p">:</span><span class="n">end_to_hex</span><span class="p">(</span><span class="nv">Target</span><span class="p">)}]},</span>
</span><span class='line'>    <span class="nn">lists</span><span class="p">:</span><span class="n">foreach</span><span class="p">(</span>
</span><span class='line'>      <span class="k">fun</span> <span class="p">({</span><span class="nv">Dist</span><span class="p">,</span> <span class="nv">Address</span><span class="p">}</span><span class="o">=</span><span class="nv">Node</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>              <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">ping</span><span class="p">,</span> <span class="nv">Dist</span><span class="p">,</span> <span class="nv">Address</span><span class="p">]),</span>
</span><span class='line'>              <span class="nn">switch</span><span class="p">:</span><span class="nb">send</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">),</span>
</span><span class='line'>              <span class="nn">erlang</span><span class="p">:</span><span class="nb">send_after</span><span class="p">(</span><span class="o">?</span><span class="nv">DIAL_TIMEOUT</span><span class="p">,</span> <span class="n">self</span><span class="p">(),</span> <span class="p">{</span><span class="n">timeout</span><span class="p">,</span> <span class="nv">Node</span><span class="p">})</span>
</span><span class='line'>      <span class="k">end</span><span class="p">,</span>
</span><span class='line'>      <span class="nv">Nodes</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Waiting2</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">push</span><span class="p">(</span><span class="nv">Nodes</span><span class="p">,</span> <span class="nv">Waiting</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Pinged2</span> <span class="o">=</span> <span class="nn">sets</span><span class="p">:</span><span class="n">union</span><span class="p">(</span><span class="nv">Pinged</span><span class="p">,</span> <span class="nn">sets</span><span class="p">:</span><span class="n">from_list</span><span class="p">(</span><span class="nv">Nodes</span><span class="p">)),</span>
</span><span class='line'>    <span class="nv">State</span><span class="nl">#state</span><span class="p">{</span><span class="n">fresh</span><span class="o">=</span><span class="nv">Fresh2</span><span class="p">,</span> <span class="n">waiting</span><span class="o">=</span><span class="nv">Waiting2</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged2</span><span class="p">}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>We handle replies by moving the replying node from the waiting queue to the ponged queue and inserting the .see nodes into the fresh list. We cannot allow duplicate nodes so the seen set is kept up to date. The pinged set will be used later to ensure that we only accept replies from nodes we have already contacted and only accept one reply per node.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% handle a reply from a node</span>
</span><span class='line'><span class="nf">ponged</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">See</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">fresh</span><span class="o">=</span><span class="nv">Fresh</span><span class="p">,</span> <span class="n">waiting</span><span class="o">=</span><span class="nv">Waiting</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged</span><span class="p">,</span> <span class="n">ponged</span><span class="o">=</span><span class="nv">Ponged</span><span class="p">,</span> <span class="n">seen</span><span class="o">=</span><span class="nv">Seen</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Waiting2</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">delete</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Waiting</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Pinged2</span> <span class="o">=</span> <span class="nn">sets</span><span class="p">:</span><span class="n">del_element</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Pinged</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Ponged2</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">push_one</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Ponged</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">New_nodes</span> <span class="o">=</span> <span class="nn">lists</span><span class="p">:</span><span class="n">filter</span><span class="p">(</span><span class="k">fun</span> <span class="p">(</span><span class="nv">See_node</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="ow">not</span><span class="p">(</span><span class="nn">sets</span><span class="p">:</span><span class="n">is_element</span><span class="p">(</span><span class="nv">See_node</span><span class="p">,</span> <span class="nv">Seen</span><span class="p">))</span> <span class="k">end</span><span class="p">,</span> <span class="nv">See</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Fresh2</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">push</span><span class="p">(</span><span class="nv">New_nodes</span><span class="p">,</span> <span class="nv">Fresh</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">Seen2</span> <span class="o">=</span> <span class="nn">sets</span><span class="p">:</span><span class="n">union</span><span class="p">(</span><span class="nv">Seen</span><span class="p">,</span> <span class="nn">sets</span><span class="p">:</span><span class="n">from_list</span><span class="p">(</span><span class="nv">See</span><span class="p">)),</span>
</span><span class='line'>    <span class="nv">State</span><span class="nl">#state</span><span class="p">{</span><span class="n">fresh</span><span class="o">=</span><span class="nv">Fresh2</span><span class="p">,</span> <span class="n">waiting</span><span class="o">=</span><span class="nv">Waiting2</span><span class="p">,</span> <span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged2</span><span class="p">,</span> <span class="n">ponged</span><span class="o">=</span><span class="nv">Ponged2</span><span class="p">,</span> <span class="n">seen</span><span class="o">=</span><span class="nv">Seen2</span><span class="p">}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>Once we are finished we need to return the results to the caller.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% return results to the caller</span>
</span><span class='line'><span class="nf">return</span><span class="p">(</span><span class="nl">#conf</span><span class="p">{</span><span class="n">ref</span><span class="o">=</span><span class="nv">Ref</span><span class="p">,</span> <span class="n">caller</span><span class="o">=</span><span class="nv">Caller</span><span class="p">},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">ponged</span><span class="o">=</span><span class="nv">Ponged</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="nv">Nodes</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span> <span class="nn">pq</span><span class="p">:</span><span class="n">pop</span><span class="p">(</span><span class="nv">Ponged</span><span class="p">,</span> <span class="o">?</span><span class="nv">K</span><span class="p">),</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">returning</span><span class="p">,</span> <span class="nv">Nodes</span><span class="p">]),</span>
</span><span class='line'>    <span class="nv">Result</span> <span class="o">=</span> <span class="p">[</span><span class="nv">Address</span> <span class="p">||</span> <span class="p">{_</span><span class="nv">Dist</span><span class="p">,</span> <span class="nv">Address</span><span class="p">}</span> <span class="o">&lt;-</span> <span class="nv">Nodes</span><span class="p">],</span>
</span><span class='line'>    <span class="nv">Caller</span> <span class="o">!</span> <span class="p">{</span><span class="n">dialed</span><span class="p">,</span> <span class="nv">Ref</span><span class="p">,</span> <span class="nv">Result</span><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Finally, after each event we call continue to decide whether to finish and return results or to carry on sending signals.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="c">% either continue to dial or return results</span>
</span><span class='line'><span class="c">% meant for use at the end of a gen_event callback</span>
</span><span class='line'><span class="nf">continue</span><span class="p">(</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="n">finished</span><span class="p">(</span><span class="nv">State</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>        <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="n">return</span><span class="p">(</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">),</span>
</span><span class='line'>            <span class="n">remove_handler</span><span class="p">;</span>
</span><span class='line'>        <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="nv">State2</span> <span class="o">=</span> <span class="n">ping_nodes</span><span class="p">(</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">),</span>
</span><span class='line'>            <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State2</span><span class="p">}}</span>
</span><span class='line'>    <span class="k">end</span><span class="p">.</span>
</span></code></pre></td></tr></table></div></figure>


<p>The functions above are glued together by a gen_event handler. The handler is attached to the switch gen_event manager and receives an event for each telex arriving at the switch.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">behaviour</span><span class="p">(</span><span class="n">gen_event</span><span class="p">).</span>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">init</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">handle_event</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">handle_call</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">handle_info</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">terminate</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">code_change</span><span class="o">/</span><span class="mi">3</span><span class="p">]).</span>
</span></code></pre></td></tr></table></div></figure>


<p>The init function is called when the handler is started. It sends out the first +end signals and sets a timer that tells the handler when to give up dialling.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">init</span><span class="p">({</span><span class="nl">#conf</span><span class="p">{</span><span class="n">timeout</span><span class="o">=</span><span class="nv">Timeout</span><span class="p">}</span><span class="o">=</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">erlang</span><span class="p">:</span><span class="nb">send_after</span><span class="p">(</span><span class="nv">Timeout</span><span class="p">,</span> <span class="n">self</span><span class="p">(),</span> <span class="n">giveup</span><span class="p">),</span>
</span><span class='line'>    <span class="nv">State2</span> <span class="o">=</span> <span class="n">ping_nodes</span><span class="p">(</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State2</span><span class="p">}}.</span>
</span></code></pre></td></tr></table></div></figure>


<p>The giveup timeout is simple to deal with.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">handle_info</span><span class="p">(</span><span class="n">giveup</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">giveup</span><span class="p">,</span> <span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">]),</span>
</span><span class='line'>    <span class="n">remove_handler</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<p>As are the timeouts from individual signals.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">timeout</span><span class="p">,</span> <span class="nv">Node</span><span class="p">},</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">waiting</span><span class="o">=</span><span class="nv">Waiting</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">timeout</span><span class="p">,</span> <span class="nv">Node</span><span class="p">]),</span>
</span><span class='line'>    <span class="nv">State2</span> <span class="o">=</span> <span class="nv">State</span><span class="nl">#state</span><span class="p">{</span><span class="n">waiting</span><span class="o">=</span><span class="nn">pq</span><span class="p">:</span><span class="n">delete</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Waiting</span><span class="p">)},</span>
</span><span class='line'>    <span class="n">continue</span><span class="p">(</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State2</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>


<p>The last callback is the messiest. This essentially just calls ponged and continue, but first has to sanity check the incoming message.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">handle_event</span><span class="p">({</span><span class="n">recv</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">},</span> <span class="p">{</span><span class="nl">#conf</span><span class="p">{</span><span class="n">target</span><span class="o">=</span><span class="nv">Target</span><span class="p">}</span><span class="o">=</span><span class="nv">Conf</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">pinged</span><span class="o">=</span><span class="nv">Pinged</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="k">case</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">Telex</span><span class="p">,</span> <span class="n">&#39;.see&#39;</span><span class="p">)</span> <span class="k">of</span>
</span><span class='line'>        <span class="p">{</span><span class="n">error</span><span class="p">,</span> <span class="n">not_found</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">}};</span>
</span><span class='line'>        <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Address_binaries</span><span class="p">}</span> <span class="o">-&gt;</span>
</span><span class='line'>            <span class="nv">Dist</span> <span class="o">=</span> <span class="nn">util</span><span class="p">:</span><span class="n">distance</span><span class="p">(</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Target</span><span class="p">),</span>
</span><span class='line'>            <span class="nv">Node</span> <span class="o">=</span> <span class="p">{</span><span class="nv">Dist</span><span class="p">,</span> <span class="nv">Address</span><span class="p">},</span>
</span><span class='line'>            <span class="k">case</span> <span class="nn">sets</span><span class="p">:</span><span class="n">is_element</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Pinged</span><span class="p">)</span> <span class="k">of</span> <span class="c">% !!! command ids would make a better check</span>
</span><span class='line'>                <span class="n">false</span> <span class="o">-&gt;</span>
</span><span class='line'>                    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">}};</span>
</span><span class='line'>                <span class="n">true</span> <span class="o">-&gt;</span>
</span><span class='line'>                    <span class="k">try</span> <span class="p">[{</span><span class="nn">util</span><span class="p">:</span><span class="n">distance</span><span class="p">(</span><span class="nv">Target</span><span class="p">,</span> <span class="nv">Bin</span><span class="p">),</span> <span class="nn">util</span><span class="p">:</span><span class="n">binary_to_address</span><span class="p">(</span><span class="nv">Bin</span><span class="p">)}</span> <span class="p">||</span> <span class="nv">Bin</span> <span class="o">&lt;-</span> <span class="nv">Address_binaries</span><span class="p">]</span> <span class="k">of</span>
</span><span class='line'>                        <span class="nv">Nodes</span> <span class="o">-&gt;</span>
</span><span class='line'>                            <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">pong</span><span class="p">,</span> <span class="nv">Node</span><span class="p">,</span> <span class="nv">Nodes</span><span class="p">]),</span>
</span><span class='line'>                            <span class="nv">State2</span> <span class="o">=</span> <span class="n">ponged</span><span class="p">(</span><span class="nv">Node</span><span class="p">,</span> <span class="nv">Nodes</span><span class="p">,</span> <span class="nv">State</span><span class="p">),</span>
</span><span class='line'>                            <span class="n">continue</span><span class="p">(</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State2</span><span class="p">)</span>
</span><span class='line'>                    <span class="k">catch</span>
</span><span class='line'>                        <span class="p">_:</span><span class="nv">Error</span> <span class="o">-&gt;</span>
</span><span class='line'>                            <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">bad_see</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">,</span> <span class="nv">Error</span><span class="p">,</span> <span class="nn">erlang</span><span class="p">:</span><span class="n">get_stacktrace</span><span class="p">()]),</span>
</span><span class='line'>                            <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="p">{</span><span class="nv">Conf</span><span class="p">,</span> <span class="nv">State</span><span class="p">}}</span>
</span><span class='line'>                    <span class="k">end</span>
</span><span class='line'>            <span class="k">end</span>
</span><span class='line'>    <span class="k">end</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<p>That&#8217;s pretty much it - we now (probably) have a working dialer. I spent a fair few hours teasing this apart but hopefully the end result is fairly simple to understand. The full code is in the <a href="http://github.com/jamii/erl-telehash">repo</a> as always.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
<span class='line-number'>57</span>
<span class='line-number'>58</span>
<span class='line-number'>59</span>
<span class='line-number'>60</span>
<span class='line-number'>61</span>
<span class='line-number'>62</span>
<span class='line-number'>63</span>
<span class='line-number'>64</span>
<span class='line-number'>65</span>
<span class='line-number'>66</span>
<span class='line-number'>67</span>
<span class='line-number'>68</span>
<span class='line-number'>69</span>
<span class='line-number'>70</span>
<span class='line-number'>71</span>
<span class='line-number'>72</span>
<span class='line-number'>73</span>
<span class='line-number'>74</span>
<span class='line-number'>75</span>
<span class='line-number'>76</span>
<span class='line-number'>77</span>
<span class='line-number'>78</span>
<span class='line-number'>79</span>
<span class='line-number'>80</span>
<span class='line-number'>81</span>
<span class='line-number'>82</span>
<span class='line-number'>83</span>
<span class='line-number'>84</span>
<span class='line-number'>85</span>
<span class='line-number'>86</span>
<span class='line-number'>87</span>
<span class='line-number'>88</span>
<span class='line-number'>89</span>
<span class='line-number'>90</span>
<span class='line-number'>91</span>
<span class='line-number'>92</span>
<span class='line-number'>93</span>
<span class='line-number'>94</span>
<span class='line-number'>95</span>
<span class='line-number'>96</span>
<span class='line-number'>97</span>
<span class='line-number'>98</span>
<span class='line-number'>99</span>
<span class='line-number'>100</span>
<span class='line-number'>101</span>
<span class='line-number'>102</span>
<span class='line-number'>103</span>
<span class='line-number'>104</span>
<span class='line-number'>105</span>
<span class='line-number'>106</span>
<span class='line-number'>107</span>
<span class='line-number'>108</span>
<span class='line-number'>109</span>
<span class='line-number'>110</span>
<span class='line-number'>111</span>
<span class='line-number'>112</span>
<span class='line-number'>113</span>
<span class='line-number'>114</span>
<span class='line-number'>115</span>
<span class='line-number'>116</span>
<span class='line-number'>117</span>
<span class='line-number'>118</span>
<span class='line-number'>119</span>
<span class='line-number'>120</span>
<span class='line-number'>121</span>
<span class='line-number'>122</span>
<span class='line-number'>123</span>
<span class='line-number'>124</span>
<span class='line-number'>125</span>
<span class='line-number'>126</span>
<span class='line-number'>127</span>
<span class='line-number'>128</span>
<span class='line-number'>129</span>
<span class='line-number'>130</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="mi">4</span><span class="o">&gt;</span> <span class="nn">switch</span><span class="p">:</span><span class="n">start_link</span><span class="p">().</span>
</span><span class='line'><span class="p">{</span><span class="n">ok</span><span class="p">,</span><span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">,</span><span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">}</span>
</span><span class='line'><span class="mi">5</span><span class="o">&gt;</span> <span class="nv">Root</span> <span class="o">=</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span> <span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span> <span class="mi">42424</span><span class="p">}.</span>
</span><span class='line'><span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'><span class="mi">6</span><span class="o">&gt;</span> <span class="nn">dialer</span><span class="p">:</span><span class="n">dial_sync</span><span class="p">(</span><span class="nv">Root</span><span class="p">,</span> <span class="p">[</span><span class="nv">Root</span><span class="p">],</span> <span class="mi">10000</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">06</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">35</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">dialing</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="p">[{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}]</span>
</span><span class='line'>    <span class="mi">10000</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">06</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">ping</span>
</span><span class='line'>    <span class="mi">0</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">06</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">switch_event</span>
</span><span class='line'>    <span class="nb">send</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="nn">struct</span><span class="p">:</span> <span class="p">[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;208.68.163.247:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="n">&#39;+end&#39;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;38666817e1b38470644e004b9356c1622368fa57&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">switch_event</span>
</span><span class='line'>    <span class="n">recv</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="nn">struct</span><span class="p">:</span> <span class="p">[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_ring&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">18115</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;.see&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span>
</span><span class='line'>              <span class="p">[</span><span class="o">&lt;&lt;</span><span class="s">&quot;204.232.205.180:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;208.68.163.247:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">]},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_br&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">240</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;203.218.138.245:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">pong</span>
</span><span class='line'>    <span class="mi">0</span><span class="p">:</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="p">[{</span><span class="mi">535375931004298447338698443374311161987273280591</span><span class="p">,</span>
</span><span class='line'>      <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}},</span>
</span><span class='line'>     <span class="p">{</span><span class="mi">0</span><span class="p">,{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">ping</span>
</span><span class='line'>    <span class="mi">0</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">ping</span>
</span><span class='line'>    <span class="mi">535375931004298447338698443374311161987273280591</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">switch_event</span>
</span><span class='line'>    <span class="nb">send</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="nn">struct</span><span class="p">:</span> <span class="p">[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;208.68.163.247:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="n">&#39;+end&#39;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;38666817e1b38470644e004b9356c1622368fa57&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">switch_event</span>
</span><span class='line'>    <span class="nb">send</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="nn">struct</span><span class="p">:</span> <span class="p">[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;204.232.205.180:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="n">&#39;+end&#39;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;38666817e1b38470644e004b9356c1622368fa57&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">switch_event</span>
</span><span class='line'>    <span class="n">recv</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="nn">struct</span><span class="p">:</span> <span class="p">[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_ring&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">16506</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;.see&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span>
</span><span class='line'>              <span class="p">[</span><span class="o">&lt;&lt;</span><span class="s">&quot;204.232.205.180:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;208.68.163.247:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">]},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_br&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">162</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;203.218.138.245:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">pong</span>
</span><span class='line'>    <span class="mi">535375931004298447338698443374311161987273280591</span><span class="p">:</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span>
</span><span class='line'>                                                       <span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span>
</span><span class='line'>                                                       <span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="p">[{</span><span class="mi">535375931004298447338698443374311161987273280591</span><span class="p">,</span>
</span><span class='line'>      <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}},</span>
</span><span class='line'>     <span class="p">{</span><span class="mi">0</span><span class="p">,{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">switch_event</span>
</span><span class='line'>    <span class="n">recv</span>
</span><span class='line'>    <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="nn">struct</span><span class="p">:</span> <span class="p">[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_ring&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">18115</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;.see&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span>
</span><span class='line'>              <span class="p">[</span><span class="o">&lt;&lt;</span><span class="s">&quot;204.232.205.180:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;208.68.163.247:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">]},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_br&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">320</span><span class="p">},</span>
</span><span class='line'>             <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;203.218.138.245:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">pong</span>
</span><span class='line'>    <span class="mi">0</span><span class="p">:</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}</span>
</span><span class='line'>    <span class="p">[{</span><span class="mi">535375931004298447338698443374311161987273280591</span><span class="p">,</span>
</span><span class='line'>      <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}},</span>
</span><span class='line'>     <span class="p">{</span><span class="mi">0</span><span class="p">,{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}}]</span>
</span><span class='line'>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">21</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">14</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mi">07</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">79</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="n">dialer</span>
</span><span class='line'>    <span class="n">returning</span>
</span><span class='line'>    <span class="p">[{</span><span class="mi">0</span><span class="p">,{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}},</span>
</span><span class='line'>     <span class="p">{</span><span class="mi">535375931004298447338698443374311161987273280591</span><span class="p">,</span>
</span><span class='line'>      <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}}]</span>
</span><span class='line'><span class="p">{</span><span class="n">ok</span><span class="p">,[{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;208.68.163.247&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">},</span>
</span><span class='line'>     <span class="p">{</span><span class="n">address</span><span class="p">,</span><span class="s">&quot;204.232.205.180&quot;</span><span class="p">,</span><span class="mi">42424</span><span class="p">}]}</span>
</span></code></pre></td></tr></table></div></figure>


<p>One last note: after I finished writing this I started thinking about what would happen if I run more than one dialer in parallel. Unlike Kademlia, TeleHash does not currently use command IDs so the dialer cannot tell if the response came in reply to its own command or in reply to the command of another dialer on the same node. It&#8217;s the kind of bug that would be very rare in actual use but might be carefully exploited by a malicious node. Finding these kinds of bugs is going to be really hard.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Telehash: basics]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/03/17/telehash-basics/"/>
    <updated>2011-03-17T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/03/17/telehash-basics</id>
    <content type="html"><![CDATA[<p><a href="http://telehash.org">TeleHash</a> is a p2p network based on the <a href="http://en.wikipedia.org/wiki/Kademlia">Kademlia DHT</a> that provides addressing and NAT traversal. These are problems that every p2p app has to deal with, including my <a href="https://github.com/jamii/dissertation">poppi</a>. Unfortunately there is no erlang implementation yet so I have to roll my own. The code so far lives <a href="http://github.com/jamii/erl-telehash">here</a> In this first post I&#8217;ll just cover the absolute basics - sending, receiving, encoding and decoding messages.</p>

<!--more-->


<p>TeleHash messages (telexes) are utf8-encoded json packets sent over udp. Luckily, mochijson2 uses utf8 by default so encoding/decoding is trivial.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="nf">encode</span><span class="p">(</span><span class="nv">Telex</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">mochijson2</span><span class="p">:</span><span class="n">encode</span><span class="p">(</span><span class="nv">Telex</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">decode</span><span class="p">(</span><span class="nv">Json</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">mochijson2</span><span class="p">:</span><span class="n">decode</span><span class="p">(</span><span class="nv">Json</span><span class="p">).</span>
</span></code></pre></td></tr></table></div></figure>


<p>The <em>telex</em> module also defines some convenience methods for working with json - <em>get/2</em>, <em>set/3</em>, <em>update/4</em> - which are used like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="mi">2</span><span class="o">&gt;</span> <span class="nv">T</span> <span class="o">=</span> <span class="nn">telex</span><span class="p">:</span><span class="n">decode</span><span class="p">(</span><span class="s">&quot;{</span><span class="se">\&quot;</span><span class="s">foo</span><span class="se">\&quot;</span><span class="s">:[</span><span class="se">\&quot;</span><span class="s">bar</span><span class="se">\&quot;</span><span class="s">, {</span><span class="se">\&quot;</span><span class="s">baz</span><span class="se">\&quot;</span><span class="s">:0}]}&quot;</span><span class="p">).</span>
</span><span class='line'><span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;foo&quot;</span><span class="o">&gt;&gt;</span><span class="p">,[</span><span class="o">&lt;&lt;</span><span class="s">&quot;bar&quot;</span><span class="o">&gt;&gt;</span><span class="p">,{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;baz&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">0</span><span class="p">}]}]}]}</span>
</span><span class='line'><span class="mi">3</span><span class="o">&gt;</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="n">foo</span><span class="p">).</span>
</span><span class='line'><span class="p">[</span><span class="o">&lt;&lt;</span><span class="s">&quot;bar&quot;</span><span class="o">&gt;&gt;</span><span class="p">,{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;baz&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">0</span><span class="p">}]}]</span>
</span><span class='line'><span class="mi">4</span><span class="o">&gt;</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="p">{</span><span class="n">foo</span><span class="p">,</span><span class="mi">2</span><span class="p">}).</span>
</span><span class='line'><span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;baz&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">0</span><span class="p">}]}</span>
</span><span class='line'><span class="mi">5</span><span class="o">&gt;</span> <span class="nn">telex</span><span class="p">:</span><span class="nb">get</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="p">{</span><span class="n">foo</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="n">baz</span><span class="p">}).</span>
</span><span class='line'><span class="mi">0</span>
</span><span class='line'><span class="mi">6</span><span class="o">&gt;</span> <span class="nn">telex</span><span class="p">:</span><span class="n">set</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="p">{</span><span class="n">foo</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="n">baz</span><span class="p">},</span> <span class="mi">1</span><span class="p">).</span>
</span><span class='line'><span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;foo&quot;</span><span class="o">&gt;&gt;</span><span class="p">,[</span><span class="o">&lt;&lt;</span><span class="s">&quot;bar&quot;</span><span class="o">&gt;&gt;</span><span class="p">,{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;baz&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">1</span><span class="p">}]}]}]}</span>
</span><span class='line'><span class="mi">7</span><span class="o">&gt;</span> <span class="nn">telex</span><span class="p">:</span><span class="n">set</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="n">bigger</span><span class="p">,</span> <span class="n">true</span><span class="p">).</span>
</span><span class='line'><span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;bigger&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="n">true</span><span class="p">},</span>
</span><span class='line'>         <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;foo&quot;</span><span class="o">&gt;&gt;</span><span class="p">,[</span><span class="o">&lt;&lt;</span><span class="s">&quot;bar&quot;</span><span class="o">&gt;&gt;</span><span class="p">,{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;baz&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">0</span><span class="p">}]}]}]}</span>
</span><span class='line'><span class="mi">8</span><span class="o">&gt;</span> <span class="nn">telex</span><span class="p">:</span><span class="n">update</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="p">{</span><span class="n">foo</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="n">baz</span><span class="p">},</span> <span class="k">fun</span> <span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nv">X</span> <span class="o">+</span> <span class="mi">10</span> <span class="k">end</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">).</span>
</span><span class='line'><span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;foo&quot;</span><span class="o">&gt;&gt;</span><span class="p">,[</span><span class="o">&lt;&lt;</span><span class="s">&quot;bar&quot;</span><span class="o">&gt;&gt;</span><span class="p">,{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;baz&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">10</span><span class="p">}]}]}]}</span>
</span></code></pre></td></tr></table></div></figure>


<p>The next step is to be able to send and receive messages. The <em>switch</em> module runs a gen_server which manages the udp socket and a gen_event which allows other processes to subscribe to incoming messages.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
<span class='line-number'>57</span>
<span class='line-number'>58</span>
<span class='line-number'>59</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">module</span><span class="p">(</span><span class="n">switch</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">include</span><span class="p">(</span><span class="s">&quot;conf.hrl&quot;</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">start_link</span><span class="o">/</span><span class="mi">0</span><span class="p">,</span> <span class="n">add_handler</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">add_sup_handler</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="nb">send</span><span class="o">/</span><span class="mi">2</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">behaviour</span><span class="p">(</span><span class="n">gen_server</span><span class="p">).</span>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">init</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">handle_call</span><span class="o">/</span><span class="mi">3</span><span class="p">,</span> <span class="n">handle_cast</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">handle_info</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">terminate</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">code_change</span><span class="o">/</span><span class="mi">3</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">record</span><span class="p">(</span><span class="nl">state</span><span class="p">,</span> <span class="p">{</span><span class="n">socket</span><span class="p">}).</span>
</span><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">EVENT</span><span class="p">,</span> <span class="n">switch_event</span><span class="p">).</span>
</span><span class='line'><span class="p">-</span><span class="ni">define</span><span class="p">(</span><span class="no">SERVER</span><span class="p">,</span> <span class="n">switch_server</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- api ---</span>
</span><span class='line'>
</span><span class='line'><span class="nf">start_link</span><span class="p">()</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Gen_event</span><span class="p">}</span> <span class="o">=</span> <span class="nn">gen_event</span><span class="p">:</span><span class="n">start_link</span><span class="p">({</span><span class="n">local</span><span class="p">,</span> <span class="o">?</span><span class="nv">EVENT</span><span class="p">}),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Gen_server</span><span class="p">}</span> <span class="o">=</span> <span class="nn">gen_server</span><span class="p">:</span><span class="n">start_link</span><span class="p">({</span><span class="n">local</span><span class="p">,</span> <span class="o">?</span><span class="nv">SERVER</span><span class="p">},</span> <span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="p">[],</span> <span class="p">[]),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Gen_event</span><span class="p">,</span> <span class="nv">Gen_server</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">add_handler</span><span class="p">(</span><span class="nv">Module</span><span class="p">,</span> <span class="nv">Args</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">gen_event</span><span class="p">:</span><span class="n">add_handler</span><span class="p">(</span><span class="o">?</span><span class="nv">EVENT</span><span class="p">,</span> <span class="nv">Module</span><span class="p">,</span> <span class="nv">Args</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">add_sup_handler</span><span class="p">(</span><span class="nv">Module</span><span class="p">,</span> <span class="nv">Args</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">gen_event</span><span class="p">:</span><span class="n">add_sup_handler</span><span class="p">(</span><span class="o">?</span><span class="nv">EVENT</span><span class="p">,</span> <span class="nv">Module</span><span class="p">,</span> <span class="nv">Args</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nb">send</span><span class="p">({</span><span class="n">address</span><span class="p">,</span> <span class="p">_</span><span class="nv">Host</span><span class="p">,</span> <span class="p">_</span><span class="nv">Port</span><span class="p">}</span><span class="o">=</span><span class="nv">Address</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">gen_server</span><span class="p">:</span><span class="n">cast</span><span class="p">(</span><span class="o">?</span><span class="nv">SERVER</span><span class="p">,</span> <span class="p">{</span><span class="n">telex</span><span class="p">,</span> <span class="nv">Address</span><span class="p">,</span> <span class="nv">Telex</span><span class="p">}).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- gen_server callbacks ---</span>
</span><span class='line'>
</span><span class='line'><span class="nf">init</span><span class="p">([])</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">Socket</span><span class="p">}</span> <span class="o">=</span> <span class="nn">gen_udp</span><span class="p">:</span><span class="n">open</span><span class="p">(</span><span class="o">?</span><span class="nv">PORT</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">socket</span><span class="o">=</span><span class="nv">Socket</span><span class="p">}}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_call</span><span class="p">(_</span><span class="nv">Request</span><span class="p">,</span> <span class="p">_</span><span class="nv">From</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">reply</span><span class="p">,</span> <span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_cast</span><span class="p">({</span><span class="n">telex</span><span class="p">,</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span> <span class="nv">Host</span><span class="p">,</span> <span class="nv">Port</span><span class="p">},</span> <span class="nv">Telex</span><span class="p">},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">socket</span><span class="o">=</span><span class="nv">Socket</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">gen_udp</span><span class="p">:</span><span class="nb">send</span><span class="p">(</span><span class="nv">Socket</span><span class="p">,</span> <span class="nv">Host</span><span class="p">,</span> <span class="nv">Port</span><span class="p">,</span> <span class="nn">telex</span><span class="p">:</span><span class="n">encode</span><span class="p">(</span><span class="nv">Telex</span><span class="p">)),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="p">};</span>
</span><span class='line'><span class="nf">handle_cast</span><span class="p">(_</span><span class="nv">Msg</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_info</span><span class="p">({</span><span class="n">udp</span><span class="p">,</span> <span class="nv">Socket</span><span class="p">,</span> <span class="nv">Host</span><span class="p">,</span> <span class="nv">Port</span><span class="p">,</span> <span class="nv">Msg</span><span class="p">},</span> <span class="nl">#state</span><span class="p">{</span><span class="n">socket</span><span class="o">=</span><span class="nv">Socket</span><span class="p">}</span><span class="o">=</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nv">Event</span> <span class="o">=</span> <span class="p">{</span><span class="n">telex</span><span class="p">,</span> <span class="p">{</span><span class="n">address</span><span class="p">,</span> <span class="nv">Host</span><span class="p">,</span> <span class="nv">Port</span><span class="p">},</span> <span class="nn">telex</span><span class="p">:</span><span class="n">decode</span><span class="p">(</span><span class="nv">Msg</span><span class="p">)},</span>
</span><span class='line'>    <span class="nn">gen_event</span><span class="p">:</span><span class="n">notify</span><span class="p">(</span><span class="o">?</span><span class="nv">EVENT</span><span class="p">,</span> <span class="nv">Event</span><span class="p">),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="p">};</span>
</span><span class='line'><span class="nf">handle_info</span><span class="p">(_</span><span class="nv">Info</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">noreply</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">terminate</span><span class="p">(_</span><span class="nv">Reason</span><span class="p">,</span> <span class="nl">#state</span><span class="p">{</span><span class="n">socket</span><span class="o">=</span><span class="nv">Socket</span><span class="p">})</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">gen_udp</span><span class="p">:</span><span class="n">close</span><span class="p">(</span><span class="nv">Socket</span><span class="p">),</span>
</span><span class='line'>    <span class="n">ok</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">code_change</span><span class="p">(_</span><span class="nv">OldVsn</span><span class="p">,</span> <span class="nv">State</span><span class="p">,</span> <span class="p">_</span><span class="nv">Extra</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>  <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- end ---</span>
</span></code></pre></td></tr></table></div></figure>


<p>To demonstrate this, let&#8217;s write the simplest possible event handler:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="p">-</span><span class="ni">module</span><span class="p">(</span><span class="n">log</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">start</span><span class="o">/</span><span class="mi">0</span><span class="p">]).</span>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">info</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">warn</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">error</span><span class="o">/</span><span class="mi">1</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="p">-</span><span class="ni">behaviour</span><span class="p">(</span><span class="n">gen_event</span><span class="p">).</span>
</span><span class='line'><span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">init</span><span class="o">/</span><span class="mi">1</span><span class="p">,</span> <span class="n">handle_event</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">handle_call</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">handle_info</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">terminate</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">code_change</span><span class="o">/</span><span class="mi">3</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- api ---</span>
</span><span class='line'>
</span><span class='line'><span class="nf">start</span><span class="p">()</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">switch</span><span class="p">:</span><span class="n">add_sup_handler</span><span class="p">(</span><span class="o">?</span><span class="nv">MODULE</span><span class="p">,</span> <span class="n">none</span><span class="p">).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">info</span><span class="p">(</span><span class="nv">Info</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">error_logger</span><span class="p">:</span><span class="n">info_report</span><span class="p">([{</span><span class="n">pid</span><span class="p">,</span> <span class="n">self</span><span class="p">()}</span> <span class="p">|</span> <span class="nv">Info</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">warn</span><span class="p">(</span><span class="nv">Warn</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">error_logger</span><span class="p">:</span><span class="n">warning_report</span><span class="p">([{</span><span class="n">pid</span><span class="p">,</span> <span class="n">self</span><span class="p">()}</span> <span class="p">|</span> <span class="nv">Warn</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="nf">error</span><span class="p">(</span><span class="nv">Error</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">error_logger</span><span class="p">:</span><span class="n">error_report</span><span class="p">([{</span><span class="n">pid</span><span class="p">,</span> <span class="n">self</span><span class="p">()}</span> <span class="p">|</span> <span class="nv">Error</span><span class="p">]).</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- gen_event callbacks ---</span>
</span><span class='line'>
</span><span class='line'><span class="nf">init</span><span class="p">(</span><span class="n">none</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="n">none</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_event</span><span class="p">(</span><span class="nv">Event</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="nn">log</span><span class="p">:</span><span class="n">info</span><span class="p">([</span><span class="nv">Event</span><span class="p">]),</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_call</span><span class="p">(_</span><span class="nv">Request</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">handle_info</span><span class="p">(_</span><span class="nv">Info</span><span class="p">,</span> <span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">terminate</span><span class="p">(_</span><span class="nv">Reason</span><span class="p">,</span> <span class="p">_</span><span class="nv">State</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="n">ok</span><span class="p">.</span>
</span><span class='line'>
</span><span class='line'><span class="nf">code_change</span><span class="p">(_</span><span class="nv">OldVsn</span><span class="p">,</span> <span class="nv">State</span><span class="p">,</span> <span class="p">_</span><span class="nv">Extra</span><span class="p">)</span> <span class="o">-&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">ok</span><span class="p">,</span> <span class="nv">State</span><span class="p">}.</span>
</span><span class='line'>
</span><span class='line'><span class="c">% --- end ---</span>
</span></code></pre></td></tr></table></div></figure>


<p>Here we have some wrappers around the standard error logger and an event handler which (after masses of gen_event boilerplate) simply logs every event.</p>

<p>This is enough functionality now to start talking to a TeleHash node:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='erlang'><span class='line'><span class="mi">1</span><span class="o">&gt;</span> <span class="n">c</span><span class="p">(</span><span class="n">util</span><span class="p">),</span> <span class="n">c</span><span class="p">(</span><span class="n">telex</span><span class="p">),</span> <span class="n">c</span><span class="p">(</span><span class="n">switch</span><span class="p">),</span> <span class="n">c</span><span class="p">(</span><span class="n">log</span><span class="p">).</span>
</span><span class='line'><span class="p">{</span><span class="n">ok</span><span class="p">,</span><span class="n">log</span><span class="p">}</span>
</span><span class='line'><span class="mi">2</span><span class="o">&gt;</span> <span class="nn">switch</span><span class="p">:</span><span class="n">start_link</span><span class="p">().</span>
</span><span class='line'><span class="p">{</span><span class="n">ok</span><span class="p">,</span><span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">55</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">,</span><span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">56</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">}</span>
</span><span class='line'><span class="mi">3</span><span class="o">&gt;</span> <span class="nn">log</span><span class="p">:</span><span class="n">start</span><span class="p">().</span>
</span><span class='line'><span class="n">ok</span>
</span><span class='line'><span class="mi">4</span><span class="o">&gt;</span> <span class="nv">T</span> <span class="o">=</span> <span class="p">{</span><span class="n">struct</span><span class="p">,</span> <span class="p">[{</span><span class="n">&#39;+end&#39;</span><span class="p">,</span> <span class="n">&#39;a9993e364706816aba3e25717850c26c9cd0d89d&#39;</span><span class="p">}]}.</span>
</span><span class='line'><span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="n">&#39;+end&#39;</span><span class="p">,</span><span class="n">a9993e364706816aba3e25717850c26c9cd0d89d</span><span class="p">}]}</span>
</span><span class='line'><span class="mi">5</span><span class="o">&gt;</span> <span class="nn">switch</span><span class="p">:</span><span class="nb">send</span><span class="p">({</span><span class="n">address</span><span class="p">,</span> <span class="s">&quot;127.0.0.1&quot;</span><span class="p">,</span> <span class="mi">55555</span><span class="p">},</span> <span class="nv">T</span><span class="p">).</span>
</span><span class='line'><span class="n">ok</span>
</span><span class='line'><span class="mi">6</span><span class="o">&gt;</span>
</span><span class='line'><span class="o">=</span><span class="nv">INFO</span> <span class="nv">REPORT</span><span class="o">====</span> <span class="mi">17</span><span class="o">-</span><span class="nv">Mar</span><span class="o">-</span><span class="mi">2011</span><span class="p">::</span><span class="mi">12</span><span class="p">:</span><span class="mi">21</span><span class="p">:</span><span class="mi">13</span> <span class="o">===</span>
</span><span class='line'>    <span class="nn">pid</span><span class="p">:</span> <span class="o">&lt;</span><span class="mi">0</span><span class="p">.</span><span class="mi">55</span><span class="p">.</span><span class="mi">0</span><span class="o">&gt;</span>
</span><span class='line'>    <span class="p">{</span><span class="n">telex</span><span class="p">,{</span><span class="n">address</span><span class="p">,{</span><span class="mi">127</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">},</span><span class="mi">55555</span><span class="p">},</span>
</span><span class='line'>           <span class="p">{</span><span class="n">struct</span><span class="p">,[{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_ring&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">5932</span><span class="p">},</span>
</span><span class='line'>                    <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;.see&quot;</span><span class="o">&gt;&gt;</span><span class="p">,[]},</span>
</span><span class='line'>                    <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_br&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="mi">51</span><span class="p">},</span>
</span><span class='line'>                    <span class="p">{</span><span class="o">&lt;&lt;</span><span class="s">&quot;_to&quot;</span><span class="o">&gt;&gt;</span><span class="p">,</span><span class="o">&lt;&lt;</span><span class="s">&quot;127.0.0.1:42424&quot;</span><span class="o">&gt;&gt;</span><span class="p">}]}}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Here we ask localhost:55555 for the nearest nodes it knows to the end &#8216;a99&#8230;89d&#8217;. The reply is contained in the <em>.see</em> field (which is empty because localhost:55555 hasn&#8217;t seeded itself yet and so doesn&#8217;t know any nodes at all).</p>

<p>The next post will deal with dialing, at which point we will have a working announcer.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Transactional mealy machines]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/03/16/transactional-mealy-machines/"/>
    <updated>2011-03-16T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/03/16/transactional-mealy-machines</id>
    <content type="html"><![CDATA[<p>This is a hugely overdue post about an interesting system I worked on almost a year ago whilst at <a href="http://smarkets.com">Smarkets</a> and never got around to writing about. Unfortunately I don&#8217;t have the code in front of me but the overall idea is simple enough to explain without examples.</p>

<!--more-->


<p>Smarkets is a betting exchange (effectively a small stock exchange for buying and selling bets). The exchange system which handles all the money and manages the markets has quite stringent requirements. We want events to be serializable (because ordering is very important in a fast moving market), low latency and ideally distributed across more than one machine. However the exchange also has to handle a large number of bursty updates focused on a small number of records (popular markets, power users). I&#8217;m told that the early prototypes using postgres simply couldn&#8217;t handle the high contention so a move to a more loosely coupled system was necessary.</p>

<p>The architecture in place when I arrived at Smarkets was based on <a href="http://www.cidrdb.org/cidr2007/papers/cidr07p15.pdf">this paper</a> which I highly recommend reading. The main idea is that serializability across machines is difficult verging on impossible and that systems which try to paper over this (eg fully ACID distributed transactions) tend to be fragile at scale. The proposed solution is to identify specific sets of actions which must be serializable and handle each set with a single actor on a single machine. These actors then communicate with each other via asynchronous messages. In Smarkets&#8217; case the actors are individual markets, users, accounts and orders. These can be modeled nicely as <a href="http://en.wikipedia.org/wiki/Mealy_machine">mealy machines</a> where the output value is a list of messages, hence the title.</p>

<p>This idea was very effective but the implementation at Smarkets was some of the scariest code in the repository (thanks mostly to being the oldest code). Each actor was implemented as a single erlang process which archived messages (using couchdb) after reading them. There was a lot of repetitive boilerplate code, it was hard to test (because the actors message each other directly) and worst of all there were ways to lose messages before they were archived (eg process inbox is lost if the process dies, messages between machines can be dropped silently).</p>

<p>I wrote a new system to handle the actor implementation whilst keeping the domain-specific logic of each actor mostly unchanged. Each actor is defined by a pair of callback functions (a behaviour, in erlang-speak). The <em>init</em> function sets the initial state of the actor. The <em>transition</em> function takes the current state and an incoming message and returns the new state and possibly some outgoing messages. Everything else is handled by a generic module which takes this behaviour and turns it into a running actor. Each actor consists of an inbox, outbox and a current state, all of which are persisted using mnesia. Each actor also has a unique id used for addressing messages. The transition process - pop a message off the inbox queue, run the transition function, store the new state, push outgoing messages to the outbox - is implemented as a single ACID transaction using mnesia. For actors on the same machine messages are moved directly from one actor’s outbox to another&#8217;s inbox directly using mnesia transactions. For actors on different machines the outbox using erlang messages and sends repeatedly (with exponential backoff) until the receiver confirms receipt. The outbox attaches auto-incrementing message ids to each message which, together with the actor id of the sender, allows the receiver to ignore duplicate messages.</p>

<p>In this way the domain-specific logic is separated from message handling and storage. This led to much less repetition and a more maintainable system. It also made it easy to setup tests or replay past events without recreating the whole system. Last, but certainly not least, it can only lose messages if the database or disk fails and even then is easier to restore from backup than the previous system.</p>

<p>Note that this explanation is somewhat simplified. I have glossed over some fiddly implementation details like error handling (if an actor fails to handle a message the sender needs to be notified in many cases) and also left out extra features like subscribing to state changes (eg notify me when this order is filled). There is also a knack to designing actors which must cooperate without <a href="http://en.wikipedia.org/wiki/Common_knowledge_%28logic%29">common knowledge</a>. Hopefully the <a href="https://smarkets.com/about/contact/">Smarkets team</a> will find some time to open-source the actual code.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Tunneling in china]]></title>
    <link href="http://scattered-thoughts.net/blog/2011/02/07/tunneling-in-china/"/>
    <updated>2011-02-07T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2011/02/07/tunneling-in-china</id>
    <content type="html"><![CDATA[<p>In Shanghai I found that ssh was blocked at the protocol level so even running sshd on port 80 doesn&#8217;t work. I don&#8217;t whether this is widespread or whether it was our hotel in particular that was blocking it. Regardless, I found a workaround using httptunnel.</p>

<!--more-->


<p>On a server outside China:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>sudo apt-get install httptunnel
</span><span class='line'>sudo hts -F localhost:22 80
</span></code></pre></td></tr></table></div></figure>


<p>On your client machine:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>sudo apt-get install httptunnel
</span><span class='line'>sudo htc -F 22 my.server.com:80
</span></code></pre></td></tr></table></div></figure>


<p>Now you can use ssh to your hearts content:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>ssh user@localhost
</span><span class='line'>scp /some/file user@localhost:/some/file
</span><span class='line'>darcs push user@localhost:/some/repo
</span></code></pre></td></tr></table></div></figure>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Optimising texsearch: memory usage]]></title>
    <link href="http://scattered-thoughts.net/blog/2010/12/19/optimising-texsearch-memory-usage/"/>
    <updated>2010-12-19T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2010/12/19/optimising-texsearch-memory-usage</id>
    <content type="html"><![CDATA[<p>In my last post I discussed the new search algorithm behind texsearch. There is a significant speed improvement over previous versions but it now consumes a ridiculous amount of memory. The instance running <a href="http://latexsearch.com">latexsearch.com</a> wavers around 4.7 gb during normal operation and reaches 7-8 gb when updating the index. This pushes other services out of main memory and everything is horribly slow until they swap back in.</p>

<!--more-->


<p>he main data structure looks like this:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">type</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">{</span> <span class="n">latexs</span> <span class="o">:</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">opaques</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">deleted</span> <span class="o">:</span> <span class="kt">bool</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="n">next_id</span> <span class="o">:</span> <span class="n">id</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="kt">array</span> <span class="o">:</span> <span class="o">(</span><span class="n">id</span> <span class="o">*</span> <span class="n">pos</span><span class="o">)</span> <span class="kt">array</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="n">unsorted</span> <span class="o">:</span> <span class="o">(</span><span class="k">&#39;</span><span class="n">a</span> <span class="o">*</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span><span class="o">)</span> <span class="kt">list</span> <span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>The array field is responsible for the vast majority of the memory usage. Each cell in the array contains a pointer to a tuple containing two integers for a total of 4 words per suffix. The types id and pos are both small integers so if we pack them into a single unboxed integer we can reduce this to 1 word per suffix. We have a new module suffix.ml with some simple bit-munging:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">type</span> <span class="n">id</span> <span class="o">=</span> <span class="kt">int</span>
</span><span class='line'><span class="k">type</span> <span class="n">pos</span> <span class="o">=</span> <span class="kt">int</span>
</span><span class='line'>
</span><span class='line'><span class="k">type</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">int</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">pack_size</span> <span class="o">=</span> <span class="o">(</span><span class="nn">Sys</span><span class="p">.</span><span class="n">word_size</span> <span class="o">/</span> <span class="mi">2</span><span class="o">)</span> <span class="o">-</span> <span class="mi">1</span>
</span><span class='line'><span class="k">let</span> <span class="n">max_size</span> <span class="o">=</span> <span class="mi">1</span> <span class="ow">lsl</span> <span class="n">pack_size</span>
</span><span class='line'>
</span><span class='line'><span class="k">exception</span> <span class="nc">Invalid_suffix</span> <span class="k">of</span> <span class="n">id</span> <span class="o">*</span> <span class="n">pos</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">pack</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">pos</span><span class="o">)</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">if</span> <span class="o">(</span><span class="n">id</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="o">||</span> <span class="o">(</span><span class="n">id</span> <span class="o">&gt;=</span> <span class="n">max_size</span><span class="o">)</span>
</span><span class='line'>  <span class="o">||</span> <span class="o">(</span><span class="n">pos</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="o">||</span> <span class="o">(</span><span class="n">pos</span> <span class="o">&gt;=</span> <span class="n">max_size</span><span class="o">)</span>
</span><span class='line'>  <span class="k">then</span> <span class="k">raise</span> <span class="o">(</span><span class="nc">Invalid_suffix</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">pos</span><span class="o">))</span>
</span><span class='line'>  <span class="k">else</span> <span class="n">pos</span> <span class="ow">lor</span> <span class="o">(</span><span class="n">id</span> <span class="ow">lsl</span> <span class="n">pack_size</span><span class="o">)</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">unpack</span> <span class="n">suffix</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">id</span> <span class="o">=</span> <span class="n">suffix</span> <span class="n">lsr</span> <span class="n">pack_size</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">suffix</span> <span class="ow">land</span> <span class="o">(</span><span class="n">max_size</span> <span class="o">-</span> <span class="mi">1</span><span class="o">)</span> <span class="k">in</span>
</span><span class='line'>  <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">pos</span><span class="o">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Notice how confusing infix functions are in ocaml.</p>

<p>The suffix array type becomes:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">type</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">{</span> <span class="n">latexs</span> <span class="o">:</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">opaques</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">deleted</span> <span class="o">:</span> <span class="kt">bool</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="n">next_id</span> <span class="o">:</span> <span class="n">id</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="kt">array</span> <span class="o">:</span> <span class="nn">Suffix</span><span class="p">.</span><span class="n">t</span> <span class="kt">array</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="n">unsorted</span> <span class="o">:</span> <span class="o">(</span><span class="k">&#39;</span><span class="n">a</span> <span class="o">*</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span><span class="o">)</span> <span class="kt">list</span> <span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>With this change the memory usage drops down to 1.4 gb. The mean search time also improves. It seems that having fewer cache misses makes up for the extra computation involved in unpacking the suffixes.</p>

<p>Now that the array field is a single block it is easy to move it out of the heap entirely so the gc never has to scan it.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">let</span> <span class="n">ancientify</span> <span class="n">sa</span> <span class="o">=</span>
</span><span class='line'>  <span class="n">sa</span><span class="o">.</span><span class="kt">array</span> <span class="o">&lt;-</span> <span class="nn">Ancient</span><span class="p">.</span><span class="n">follow</span> <span class="o">(</span><span class="nn">Ancient</span><span class="p">.</span><span class="n">mark</span> <span class="n">sa</span><span class="o">.</span><span class="kt">array</span><span class="o">);</span>
</span><span class='line'>  <span class="nn">Gc</span><span class="p">.</span><span class="n">full_major</span> <span class="bp">()</span>
</span></code></pre></td></tr></table></div></figure>


<p>This eliminates the annoyingly noticeable gc pauses.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Optimising texsearch]]></title>
    <link href="http://scattered-thoughts.net/blog/2010/12/08/optimising-texsearch/"/>
    <updated>2010-12-08T06:16:00+00:00</updated>
    <id>http://scattered-thoughts.net/blog/2010/12/08/optimising-texsearch</id>
    <content type="html"><![CDATA[<p><a href="https://github.com/jamii/texsearch">Texsearch</a> is a search engine for LaTeX formulae. It forms part of the backend for <a href="http://latexsearch.com">latexsearch.com</a> which indexes the entire Springer corpus. It is also crazy slow, until today.</p>

<!--more-->


<p>Intuitively, when searching within LaTeX content we want results that represent the same formulae as the search term. Unfortunately LaTeX presents plenty of opportunities for obfuscating content with macros, presentation commands and just plain weird lexing.</p>

<p>Texsearch uses <a href="http://plastex.sourceforge.net/">PlasTeX</a> to parse LaTeX formulae and expand macros. The preprocessor then discards any LaTeX elements which relate to presentation rather than content (font, weight, colouring etc). The remaining LaTeX elements are each hashed into a 63 bit integer. This massively reduces the memory consumption, allowing the entire corpus and search index to be held in RAM. Collisions should be rare given that there are far less than 2<sup>63</sup> possible elements.</p>

<p>At the core of texsearch is a search algorithm which performs approximate searches over the search corpus. Specifically, given a search term S and a search radius R we want to return all corpus terms T such that the <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a> between S and some substring of T is less than R. This is a common problem in bioinformatics and NLP and there is a <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.7225&amp;rep=rep1&amp;type=pdf">substantial amount of research</a> on how to solve this efficiently. I have been through a range of different algorithms in previous iterations of texsearch and have only recently achieved reasonable performance (mean search time is now ~300ms for a corpus of 1.5m documents). The code is available <a href="https://github.com/jamii/texsearch">here</a>.</p>

<p>We define the distance from latexL to latexR as the minimum Levenshtein distance between latexL and any substring of latexR. With this definition we can specify the results of the search algorithm more simply as returning all corpus terms with distance R of S.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">let</span> <span class="n">distance</span> <span class="n">latexL</span> <span class="n">latexR</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">maxl</span><span class="o">,</span> <span class="n">maxr</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">length</span> <span class="n">latexL</span><span class="o">,</span> <span class="nn">Array</span><span class="p">.</span><span class="n">length</span> <span class="n">latexR</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">if</span> <span class="n">maxl</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">then</span> <span class="mi">0</span> <span class="k">else</span>
</span><span class='line'>  <span class="k">if</span> <span class="n">maxr</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">then</span> <span class="n">maxl</span> <span class="k">else</span>
</span><span class='line'>  <span class="c">(* cache.(l).(r) is the distance between latexL[l to maxl] and latexR[r to maxr] *)</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">cache</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">make_matrix</span> <span class="o">(</span><span class="n">maxl</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span> <span class="o">(</span><span class="n">maxr</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span> <span class="mi">0</span> <span class="k">in</span>
</span><span class='line'>  <span class="c">(* Must match everything on the left *)</span>
</span><span class='line'>  <span class="k">for</span> <span class="n">l</span> <span class="o">=</span> <span class="n">maxl</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">downto</span> <span class="mi">0</span> <span class="k">do</span>
</span><span class='line'>    <span class="n">cache</span><span class="o">.(</span><span class="n">l</span><span class="o">).(</span><span class="n">maxr</span><span class="o">)</span> <span class="o">&lt;-</span> <span class="mi">1</span> <span class="o">+</span> <span class="n">cache</span><span class="o">.(</span><span class="n">l</span><span class="o">+</span><span class="mi">1</span><span class="o">).(</span><span class="n">maxr</span><span class="o">)</span>
</span><span class='line'>  <span class="k">done</span><span class="o">;</span>
</span><span class='line'>  <span class="c">(* General matching *)</span>
</span><span class='line'>  <span class="k">for</span> <span class="n">l</span> <span class="o">=</span> <span class="n">maxl</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">downto</span> <span class="mi">1</span> <span class="k">do</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">r</span> <span class="o">=</span> <span class="n">maxr</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">downto</span> <span class="mi">0</span> <span class="k">do</span>
</span><span class='line'>      <span class="n">cache</span><span class="o">.(</span><span class="n">l</span><span class="o">).(</span><span class="n">r</span><span class="o">)</span> <span class="o">&lt;-</span>
</span><span class='line'>          <span class="n">minimum</span>
</span><span class='line'>            <span class="o">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">cache</span><span class="o">.(</span><span class="n">l</span><span class="o">).(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="o">))</span>
</span><span class='line'>            <span class="o">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">cache</span><span class="o">.(</span><span class="n">l</span><span class="o">+</span><span class="mi">1</span><span class="o">).(</span><span class="n">r</span><span class="o">))</span>
</span><span class='line'>            <span class="o">((</span><span class="n">abs</span> <span class="o">(</span><span class="n">compare</span> <span class="n">latexL</span><span class="o">.(</span><span class="n">l</span><span class="o">)</span> <span class="n">latexR</span><span class="o">.(</span><span class="n">r</span><span class="o">)))</span> <span class="o">+</span> <span class="n">cache</span><span class="o">.(</span><span class="n">l</span><span class="o">+</span><span class="mi">1</span><span class="o">).(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="o">))</span>
</span><span class='line'>  <span class="k">done</span> <span class="k">done</span><span class="o">;</span>
</span><span class='line'>  <span class="c">(* Non-matches on the right dont count until left starts matching *)</span>
</span><span class='line'>  <span class="k">for</span> <span class="n">r</span> <span class="o">=</span> <span class="n">maxr</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">downto</span> <span class="mi">0</span> <span class="k">do</span>
</span><span class='line'>    <span class="n">cache</span><span class="o">.(</span><span class="mi">0</span><span class="o">).(</span><span class="n">r</span><span class="o">)</span> <span class="o">&lt;-</span>
</span><span class='line'>        <span class="n">minimum</span>
</span><span class='line'>          <span class="o">(</span><span class="n">cache</span><span class="o">.(</span><span class="mi">0</span><span class="o">).(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="o">))</span>
</span><span class='line'>          <span class="o">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">cache</span><span class="o">.(</span><span class="mi">1</span><span class="o">).(</span><span class="n">r</span><span class="o">))</span>
</span><span class='line'>          <span class="o">((</span><span class="n">abs</span> <span class="o">(</span><span class="n">compare</span> <span class="n">latexL</span><span class="o">.(</span><span class="mi">0</span><span class="o">)</span> <span class="n">latexR</span><span class="o">.(</span><span class="n">r</span><span class="o">)))</span> <span class="o">+</span> <span class="n">cache</span><span class="o">.(</span><span class="mi">1</span><span class="o">).(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="o">))</span>
</span><span class='line'>  <span class="k">done</span><span class="o">;</span>
</span><span class='line'>  <span class="n">cache</span><span class="o">.(</span><span class="mi">0</span><span class="o">).(</span><span class="mi">0</span><span class="o">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>The search algorithm is built around a <a href="http://en.wikipedia.org/wiki/Suffix_array">suffix array</a> presenting the following interface:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">type</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span>
</span><span class='line'>
</span><span class='line'><span class="k">val</span> <span class="n">create</span> <span class="o">:</span> <span class="kt">unit</span> <span class="o">-&gt;</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span>
</span><span class='line'><span class="k">val</span> <span class="n">add</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="k">&#39;</span><span class="n">a</span> <span class="o">*</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span><span class="o">)</span> <span class="kt">list</span> <span class="o">-&gt;</span> <span class="kt">unit</span>
</span><span class='line'><span class="k">val</span> <span class="n">prepare</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">-&gt;</span> <span class="kt">unit</span>
</span><span class='line'>
</span><span class='line'><span class="k">val</span> <span class="n">delete</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="k">&#39;</span><span class="n">a</span> <span class="o">-&gt;</span> <span class="kt">bool</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="kt">unit</span>
</span><span class='line'>
</span><span class='line'><span class="k">val</span> <span class="n">find_exact</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">-&gt;</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="kt">int</span> <span class="o">*</span> <span class="k">&#39;</span><span class="n">a</span><span class="o">)</span> <span class="kt">list</span>
</span><span class='line'><span class="k">val</span> <span class="n">find_approx</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">-&gt;</span> <span class="kt">float</span> <span class="o">-&gt;</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="kt">int</span> <span class="o">*</span> <span class="k">&#39;</span><span class="n">a</span><span class="o">)</span> <span class="kt">list</span>
</span><span class='line'><span class="k">val</span> <span class="n">find_query</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">-&gt;</span> <span class="kt">float</span> <span class="o">-&gt;</span> <span class="nn">Query</span><span class="p">.</span><span class="n">t</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="kt">int</span> <span class="o">*</span> <span class="k">&#39;</span><span class="n">a</span><span class="o">)</span> <span class="kt">list</span>
</span></code></pre></td></tr></table></div></figure>


<p>The data structure is pretty straightforward. We store the LaTeX elements in a DynArray and represent suffixes by a pair of pointers - the first into the DynArray and the second into the LaTeX term itself. Each LaTeX term is matched to an opaque object which is used by the consumer of this module to id the terms.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">type</span> <span class="n">id</span> <span class="o">=</span> <span class="kt">int</span>
</span><span class='line'><span class="k">type</span> <span class="n">pos</span> <span class="o">=</span> <span class="kt">int</span>
</span><span class='line'>
</span><span class='line'><span class="k">type</span> <span class="k">&#39;</span><span class="n">a</span> <span class="n">t</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">{</span> <span class="n">latexs</span> <span class="o">:</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">opaques</span> <span class="o">:</span> <span class="k">&#39;</span><span class="n">a</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">t</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="n">next_id</span> <span class="o">:</span> <span class="n">id</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="kt">array</span> <span class="o">:</span> <span class="o">(</span><span class="n">id</span> <span class="o">*</span> <span class="n">pos</span><span class="o">)</span> <span class="kt">array</span>
</span><span class='line'>  <span class="o">;</span> <span class="k">mutable</span> <span class="n">unsorted</span> <span class="o">:</span> <span class="o">(</span><span class="k">&#39;</span><span class="n">a</span> <span class="o">*</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">t</span><span class="o">)</span> <span class="kt">list</span> <span class="o">}</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">create</span> <span class="bp">()</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">{</span> <span class="n">latexs</span> <span class="o">=</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">create</span> <span class="bp">()</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">opaques</span> <span class="o">=</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">create</span> <span class="bp">()</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">next_id</span> <span class="o">=</span> <span class="mi">0</span>
</span><span class='line'>  <span class="o">;</span> <span class="kt">array</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">make</span> <span class="mi">0</span> <span class="o">(</span><span class="mi">0</span><span class="o">,</span><span class="mi">0</span><span class="o">)</span>
</span><span class='line'>  <span class="o">;</span> <span class="n">unsorted</span> <span class="o">=</span> <span class="bp">[]</span><span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>The suffix array is built in a completely naive way. We just throw all the suffixes into a list and sort it. There are much more efficient methods known but this is fast enough, especially since we do updates offline. The building is separated into two functions to make incremental updates easier.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">let</span> <span class="n">add</span> <span class="n">sa</span> <span class="n">latexs</span> <span class="o">=</span>
</span><span class='line'>  <span class="n">sa</span><span class="o">.</span><span class="n">unsorted</span> <span class="o">&lt;-</span> <span class="n">latexs</span> <span class="o">@</span> <span class="n">sa</span><span class="o">.</span><span class="n">unsorted</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">insert</span> <span class="n">sa</span> <span class="o">(</span><span class="n">opaque</span><span class="o">,</span> <span class="n">latex</span><span class="o">)</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">id</span> <span class="o">=</span> <span class="n">sa</span><span class="o">.</span><span class="n">next_id</span> <span class="k">in</span>
</span><span class='line'>  <span class="n">sa</span><span class="o">.</span><span class="n">next_id</span> <span class="o">&lt;-</span> <span class="n">id</span> <span class="o">+</span> <span class="mi">1</span><span class="o">;</span>
</span><span class='line'>  <span class="nn">DynArray</span><span class="p">.</span><span class="n">add</span> <span class="n">sa</span><span class="o">.</span><span class="n">opaques</span> <span class="n">opaque</span><span class="o">;</span>
</span><span class='line'>  <span class="nn">DynArray</span><span class="p">.</span><span class="n">add</span> <span class="n">sa</span><span class="o">.</span><span class="n">latexs</span> <span class="n">latex</span><span class="o">;</span>
</span><span class='line'>  <span class="n">id</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">prepare</span> <span class="n">sa</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">ids</span> <span class="o">=</span> <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="o">(</span><span class="n">insert</span> <span class="n">sa</span><span class="o">)</span> <span class="n">sa</span><span class="o">.</span><span class="n">unsorted</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">new_suffixes</span> <span class="o">=</span> <span class="nn">Util</span><span class="p">.</span><span class="n">concat_map</span> <span class="o">(</span><span class="n">suffixes</span> <span class="n">sa</span><span class="o">)</span> <span class="n">ids</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">cmp</span> <span class="o">=</span> <span class="n">compare_suffix</span> <span class="n">sa</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="kt">array</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">of_list</span> <span class="o">(</span><span class="nn">List</span><span class="p">.</span><span class="n">merge</span> <span class="n">cmp</span> <span class="o">(</span><span class="nn">List</span><span class="p">.</span><span class="n">fast_sort</span> <span class="n">cmp</span> <span class="n">new_suffixes</span><span class="o">)</span> <span class="o">(</span><span class="nn">Array</span><span class="p">.</span><span class="n">to_list</span> <span class="n">sa</span><span class="o">.</span><span class="kt">array</span><span class="o">))</span> <span class="k">in</span>
</span><span class='line'>  <span class="n">sa</span><span class="o">.</span><span class="n">unsorted</span> <span class="o">&lt;-</span> <span class="bp">[]</span><span class="o">;</span>
</span><span class='line'>  <span class="n">sa</span><span class="o">.</span><span class="kt">array</span> <span class="o">&lt;-</span> <span class="kt">array</span>
</span></code></pre></td></tr></table></div></figure>


<p>So now we have a sorted array of suffixes of all our corpus terms. If we want to find all exact matches for a given search term we just do a binary search to find the first matching suffix and then scan through the array until the last matching suffix. For reasons that will make more sense later, we divide this into two stages. Most of the work is done in gather_exact (better name, anyone?), where we perform the search and dump the resulting LaTeX term ids into a HashSet. Then find_exact runs through the HashSet and looks up the matching opaques.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="c">(* binary search *)</span>
</span><span class='line'><span class="k">let</span> <span class="n">gather_exact</span> <span class="n">ids</span> <span class="n">sa</span> <span class="n">latex</span> <span class="o">=</span>
</span><span class='line'>  <span class="c">(* find beginning of region *)</span>
</span><span class='line'>  <span class="c">(* lo &lt; latex *)</span>
</span><span class='line'>  <span class="c">(* hi &gt;= latex *)</span>
</span><span class='line'>  <span class="k">let</span> <span class="k">rec</span> <span class="n">narrow</span> <span class="n">lo</span> <span class="n">hi</span> <span class="o">=</span>
</span><span class='line'>    <span class="k">let</span> <span class="n">mid</span> <span class="o">=</span> <span class="n">lo</span> <span class="o">+</span> <span class="o">((</span><span class="n">hi</span><span class="o">-</span><span class="n">lo</span><span class="o">)</span> <span class="o">/</span> <span class="mi">2</span><span class="o">)</span> <span class="k">in</span>
</span><span class='line'>    <span class="k">if</span> <span class="n">lo</span> <span class="o">=</span> <span class="n">mid</span> <span class="k">then</span> <span class="n">hi</span> <span class="k">else</span>
</span><span class='line'>    <span class="k">if</span> <span class="n">leq</span> <span class="n">sa</span> <span class="n">latex</span> <span class="n">sa</span><span class="o">.</span><span class="kt">array</span><span class="o">.(</span><span class="n">mid</span><span class="o">)</span>
</span><span class='line'>    <span class="k">then</span> <span class="n">narrow</span> <span class="n">lo</span> <span class="n">mid</span>
</span><span class='line'>    <span class="k">else</span> <span class="n">narrow</span> <span class="n">mid</span> <span class="n">hi</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">n</span> <span class="o">=</span> <span class="nn">Array</span><span class="p">.</span><span class="n">length</span> <span class="n">sa</span><span class="o">.</span><span class="kt">array</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="k">rec</span> <span class="n">traverse</span> <span class="n">index</span> <span class="o">=</span>
</span><span class='line'>    <span class="k">if</span> <span class="n">index</span> <span class="o">&gt;=</span> <span class="n">n</span> <span class="k">then</span> <span class="bp">()</span> <span class="k">else</span>
</span><span class='line'>    <span class="k">let</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">pos</span><span class="o">)</span> <span class="o">=</span> <span class="n">sa</span><span class="o">.</span><span class="kt">array</span><span class="o">.(</span><span class="n">index</span><span class="o">)</span> <span class="k">in</span>
</span><span class='line'>    <span class="k">if</span> <span class="n">is_prefix</span> <span class="n">sa</span> <span class="n">latex</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">pos</span><span class="o">)</span>
</span><span class='line'>    <span class="k">then</span>
</span><span class='line'>      <span class="k">begin</span>
</span><span class='line'>  <span class="nn">Hashset</span><span class="p">.</span><span class="n">add</span> <span class="n">ids</span> <span class="n">id</span><span class="o">;</span>
</span><span class='line'>  <span class="n">traverse</span> <span class="o">(</span><span class="n">index</span><span class="o">+</span><span class="mi">1</span><span class="o">)</span>
</span><span class='line'>      <span class="k">end</span>
</span><span class='line'>    <span class="k">else</span> <span class="bp">()</span> <span class="k">in</span>
</span><span class='line'>  <span class="n">traverse</span> <span class="o">(</span><span class="n">narrow</span> <span class="o">(-</span><span class="mi">1</span><span class="o">)</span> <span class="o">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">exact_match</span> <span class="n">sa</span> <span class="n">id</span> <span class="o">=</span>
</span><span class='line'>  <span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">get</span> <span class="n">sa</span><span class="o">.</span><span class="n">opaques</span> <span class="n">id</span><span class="o">)</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">find_exact</span> <span class="n">sa</span> <span class="n">latex</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">ids</span> <span class="o">=</span> <span class="nn">Hashset</span><span class="p">.</span><span class="n">create</span> <span class="mi">0</span> <span class="k">in</span>
</span><span class='line'>  <span class="n">gather_exact</span> <span class="n">ids</span> <span class="n">sa</span> <span class="n">latex</span><span class="o">;</span>
</span><span class='line'>  <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="o">(</span><span class="n">exact_match</span> <span class="n">sa</span><span class="o">)</span> <span class="o">(</span><span class="nn">Hashset</span><span class="p">.</span><span class="n">to_list</span> <span class="n">ids</span><span class="o">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>Now for the clever part - approximate search. First, convince yourself of the following. Suppose the distance from our search term S to some corpus term T is strictly less than the search radius R. Then if we split S into R pieces at least one of those pieces must match a substring of T exactly. So our approximate search algorithm is to perform exact searches for each of the R pieces and then calculate the distance to each of the results. Notice the similarity in structure to the previous algorithm. You can also see now that the exact search is split into two functions so that we can reuse gather_exact.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">let</span> <span class="n">gather_approx</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">latex</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">k</span> <span class="o">=</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">cutoff</span> <span class="n">precision</span> <span class="n">latex</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">ids</span> <span class="o">=</span> <span class="nn">Hashset</span><span class="p">.</span><span class="n">create</span> <span class="mi">0</span> <span class="k">in</span>
</span><span class='line'>  <span class="nn">List</span><span class="p">.</span><span class="n">iter</span> <span class="o">(</span><span class="n">gather_exact</span> <span class="n">ids</span> <span class="n">sa</span><span class="o">)</span> <span class="o">(</span><span class="nn">Latex</span><span class="p">.</span><span class="n">fragments</span> <span class="n">latex</span> <span class="n">k</span><span class="o">);</span>
</span><span class='line'>  <span class="n">ids</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">approx_match</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">latexL</span> <span class="n">id</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">latexR</span> <span class="o">=</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">get</span> <span class="n">sa</span><span class="o">.</span><span class="n">latexs</span> <span class="n">id</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">match</span> <span class="nn">Latex</span><span class="p">.</span><span class="n">similar</span> <span class="n">precision</span> <span class="n">latexL</span> <span class="n">latexR</span> <span class="k">with</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Some</span> <span class="n">dist</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">let</span> <span class="n">opaque</span> <span class="o">=</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">get</span> <span class="n">sa</span><span class="o">.</span><span class="n">opaques</span> <span class="n">id</span> <span class="k">in</span>
</span><span class='line'>      <span class="nc">Some</span> <span class="o">(</span><span class="n">dist</span><span class="o">,</span> <span class="n">opaque</span><span class="o">)</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">None</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nc">None</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">find_approx</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">latex</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">ids</span> <span class="o">=</span> <span class="n">gather_approx</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">latex</span> <span class="k">in</span>
</span><span class='line'>  <span class="nn">Util</span><span class="p">.</span><span class="n">filter_map</span> <span class="o">(</span><span class="n">approx_match</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">latex</span><span class="o">)</span> <span class="o">(</span><span class="nn">Hashset</span><span class="p">.</span><span class="n">to_list</span> <span class="n">ids</span><span class="o">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>We can also extend this to allow boolean queries.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='ocaml'><span class='line'><span class="k">let</span> <span class="k">rec</span> <span class="n">gather_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">match</span> <span class="n">query</span> <span class="k">with</span>
</span><span class='line'>  <span class="o">|</span> <span class="nn">Query</span><span class="p">.</span><span class="nc">Latex</span> <span class="o">(</span><span class="n">latex</span><span class="o">,</span> <span class="o">_)</span> <span class="o">-&gt;</span> <span class="n">gather_approx</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">latex</span>
</span><span class='line'>  <span class="o">|</span> <span class="nn">Query</span><span class="p">.</span><span class="nc">And</span> <span class="o">(</span><span class="n">query1</span><span class="o">,</span> <span class="n">query2</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="nn">Hashset</span><span class="p">.</span><span class="n">inter</span> <span class="o">(</span><span class="n">gather_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query1</span><span class="o">)</span> <span class="o">(</span><span class="n">gather_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query2</span><span class="o">)</span>
</span><span class='line'>  <span class="o">|</span> <span class="nn">Query</span><span class="p">.</span><span class="nc">Or</span> <span class="o">(</span><span class="n">query1</span><span class="o">,</span> <span class="n">query2</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="nn">Hashset</span><span class="p">.</span><span class="n">union</span> <span class="o">(</span><span class="n">gather_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query1</span><span class="o">)</span> <span class="o">(</span><span class="n">gather_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query2</span><span class="o">)</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">query_match</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query</span> <span class="n">id</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">latexR</span> <span class="o">=</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">get</span> <span class="n">sa</span><span class="o">.</span><span class="n">latexs</span> <span class="n">id</span> <span class="k">in</span>
</span><span class='line'>  <span class="k">match</span> <span class="nn">Query</span><span class="p">.</span><span class="n">similar</span> <span class="n">precision</span> <span class="n">query</span> <span class="n">latexR</span> <span class="k">with</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">Some</span> <span class="n">dist</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="k">let</span> <span class="n">opaque</span> <span class="o">=</span> <span class="nn">DynArray</span><span class="p">.</span><span class="n">get</span> <span class="n">sa</span><span class="o">.</span><span class="n">opaques</span> <span class="n">id</span> <span class="k">in</span>
</span><span class='line'>      <span class="nc">Some</span> <span class="o">(</span><span class="n">dist</span><span class="o">,</span> <span class="n">opaque</span><span class="o">)</span>
</span><span class='line'>  <span class="o">|</span> <span class="nc">None</span> <span class="o">-&gt;</span>
</span><span class='line'>      <span class="nc">None</span>
</span><span class='line'>
</span><span class='line'><span class="k">let</span> <span class="n">find_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query</span> <span class="o">=</span>
</span><span class='line'>  <span class="k">let</span> <span class="n">ids</span> <span class="o">=</span> <span class="n">gather_query</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query</span> <span class="k">in</span>
</span><span class='line'>  <span class="nn">Util</span><span class="p">.</span><span class="n">filter_map</span> <span class="o">(</span><span class="n">query_match</span> <span class="n">sa</span> <span class="n">precision</span> <span class="n">query</span><span class="o">)</span> <span class="o">(</span><span class="nn">Hashset</span><span class="p">.</span><span class="n">to_list</span> <span class="n">ids</span><span class="o">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is a lot simpler than my previous approach, which required some uncomfortable reasoning about overlapping regions in quasi-metric spaces.</p>

<p>It is instructive to compare texsearch with other math search engines. Texsearch is effectively a brute force solution that gave us an ok search engine search engine with minimal risk. It has minimal understanding of LaTeX and no understanding of the structure of the formulae it searches in. <a href="http://uniquation.com/en/">Uniquation</a> accepts only a small (but widely used) subset of LaTeX but understands the structure of the equation itself and so can infer scope and perform variable substitution when searching. I am not sure yet how much content they are indexing or how well they handle searching within full LaTeX content but hopefully this approach can scale up to big corpuses. <a href="http://haskell.org/hoogle/">Hoogle</a> is a search engine for haskell types which can handle even more sophisticated equivalences than uniquation thanks to its specialised domain. <a href="https://trac.kwarc.info/arXMLiv/">ArXMLiv</a> is developing tools for inferring semantic information from LaTeX content in order to convert it to Semantic MathML, which is much easier for search engines to handle.</p>

<p>So, in summary, LaTeX is a pain in the ass.</p>
]]></content>
  </entry>
  
</feed>
