I just released strucjure, a clojure library and DSL for parsing and pattern matching based on Ometa.
The readme on github has detailed descriptions of the syntax etc which I won’t repeat here. What I do want to do is run through a realistic example.
The readme has a large number of examples and I want to be sure that these are all correct and up to date. As part of the test-suite for strucjure I parse the readme source, pull out all the examples and make sure that they all run correctly and return the expected output.
1234567
jamie@alien:~/strucjure$ lein test strucjure.test
WARNING: newline already refers to: #'clojure.core/newline in namespace: strucjure.test, being replaced by: #'strucjure.test/newlinelein test strucjure.test
Ran 1 tests containing 166 assertions.
0 failures, 0 errors.
The readme parser is pretty simple. Since I control both the parser and the readme source so it doesn’t need to be bullet-proof, just the simplest thing that will get the job done. Strucjure is very bare-bones at the moment though so we have to create a lot of simple views that really belong in a library somewhere.
123456789101112131415161718
(defviewspace\space%)(defviewnewline\newline%)(defviewnot-newline(not \newline)%)(defviewline(and (not []); have to consume at least one char(prefix&((zero-or-morenot-newline)?line)&((optionalnewline)?end)))line)(defviewindented-line(prefix&((one-or-morespace)_)&(line?line))line)
We want a tokeniser for various parts of the readme. We could write it like this:
1234567
(defnviewtokenise[sep];; empty input[]'(());; throw away separator, start a new token[&(sep_)&((tokenisesep)?results)](cons ()results);; add the current char to the first token[?char&((tokenisesep)[?result&?results])](cons (cons char result)results))
Unfortunately in the current implementation of strucjure that recursive call goes on the stack, so this view will blow up on large inputs. For now we just have to implement this view by hand to get access to recur.
The rest of the parser makes more sense reading in reverse order. We start by splitting up the readme by code delimiters (triple backticks). This gives us chunks of alternating text and code, so we parse every other chunk as a block of code.
A few of the code blocks don’t contain examples - we can detect these because they don’t start with a “user> ” prompt. All the other blocks contain a list of examples separated by prompts.
1234567891011
(defviewprompt(prefix\u\s\e\r\>\space):prompt)(defviewcode-block-inner(and (prompt_)((tokeniseprompt)?chunks))(map (partial runexample)(filter #(not (empty?%))chunks))_;; not a block of examplesnil)
An example consists of an input, which may be on multiple lines, zero or more lines of printed output and finally a result.
Now we just have to turn the results into unit tests. We have to be careful about comparing the results of the examples because they might contain closures, which look different every time.
Running the examples is a little tricky because some of them create bindings or classes that are used by later examples. We end up needing to eval the code at runtime.
This is fun. Not only does strucjure parse its own syntax, it reads its own documentation!
Parts of this were a little painful. The next version of strucjure will definitely have improved string matching. I’m also looking at optimising/compiling views, as well as memoisation. Previous versions of strucjure supported both but were hard to maintain. For now I’m going to be moving on to using strucjure to build other useful DSLs.