I just released strucjure, a clojure library and DSL for parsing and pattern matching based on Ometa.
The readme on github has detailed descriptions of the syntax etc which I won’t repeat here. What I do want to do is run through a realistic example.
The readme has a large number of examples and I want to be sure that these are all correct and up to date. As part of the test-suite for strucjure I parse the readme source, pull out all the examples and make sure that they all run correctly and return the expected output.
jamie@alien:~/strucjure$ lein test strucjure.test
WARNING: newline already refers to: #'clojure.core/newline in namespace: strucjure.test, being replaced by: #'strucjure.test/newlinelein test strucjure.test
Ran 1 tests containing 166 assertions.
0 failures, 0 errors.
The readme parser is pretty simple. Since I control both the parser and the readme source so it doesn’t need to be bullet-proof, just the simplest thing that will get the job done. Strucjure is very bare-bones at the moment though so we have to create a lot of simple views that really belong in a library somewhere.
(defviewspace\space%)(defviewnewline\newline%)(defviewnot-newline(not \newline)%)(defviewline(and (not ); have to consume at least one char(prefix&((zero-or-morenot-newline)?line)&((optionalnewline)?end)))line)(defviewindented-line(prefix&((one-or-morespace)_)&(line?line))line)
We want a tokeniser for various parts of the readme. We could write it like this:
(defnviewtokenise[sep];; empty input'(());; throw away separator, start a new token[&(sep_)&((tokenisesep)?results)](cons ()results);; add the current char to the first token[?char&((tokenisesep)[?result&?results])](cons (cons char result)results))
Unfortunately in the current implementation of strucjure that recursive call goes on the stack, so this view will blow up on large inputs. For now we just have to implement this view by hand to get access to recur.
The rest of the parser makes more sense reading in reverse order. We start by splitting up the readme by code delimiters (triple backticks). This gives us chunks of alternating text and code, so we parse every other chunk as a block of code.
This is fun. Not only does strucjure parse its own syntax, it reads its own documentation!
Parts of this were a little painful. The next version of strucjure will definitely have improved string matching. I’m also looking at optimising/compiling views, as well as memoisation. Previous versions of strucjure supported both but were hard to maintain. For now I’m going to be moving on to using strucjure to build other useful DSLs.