In thisIn this series of posts, we will create a very simple language that can be used to specify a state machine. Then we will write a compiler that will translate the description to a code that can simulate this state machine.
Here is the agenda:
Part 1: What is a state machine? A (Rust) program to simulate state machine.
Part 2: Writing (Rust) macros to reduce repetition
Part 3: A small language and convert a description into a high level language(Rust) implementation
Proposed Part 4: Translating the description into a lower level
assembly like language(LLVM IR)
sembly like language(LLVM IR)
This is part 2.
Background
In the previous post, we ended up with a rust implementation of state machine simulation. We also concluded that in order to test different state machine, we would like a better way, as writing full code involves a lot of boilerplate.
Rust Macros allow us to generate rust code by parsing patterns and using them at appropriate places.
Structure
Our code can be divided into two parts:
-
Declarations a. States b. Symbols c. Tape machine specification
-
Simulation a. Given the tape machine state and tape head content, we perform the associated actions and transition to next state. We perform this for required number of steps.
Let’s take a looks at declarations first.
Declarations
Following is the declaration of states:
If we had to write the same code for a different state machine, say with states, a, b, c, d, then we will write:
Since only the inner contents of the enum are changing, it would be great if we could have something like this:
Rust macros does exactly this, with a little different syntax:
Let me present the macro definition and then i can explain how it works.
Lets break it down:
macro_rules!
is a macro definition. It takes a name and a body.states
is the name of the macro.( $( $x:ident ),* )
is the pattern. This is what macro will consume. Lets simplify it: a. If remove$
andident
etc., we will end up with(x),*
. This will expand tox
,x
,x
and so on. Rust will try to match the pattern with the input, and store the matched values in the$x
variables.ident
means the values should be identifiers. We can ignore it for now.#[derive(Debug, PartialEq, Eq)]
is the body of the macro. It will be output as it is.- Similarly,
enum TapeMachineState {
- Now we end up with:
$(
$x,
)*
This means for all the values stored in x
, concatenate them
with ,
and then output them.
}
is the end of the body.
So for our example:
x
will bea
,b
,c
,d
- On expanding the body
$($x),*
, we will get:a, b, c, d
- The whole body will be:
Exactly what we wanted.
Let’s look at symbol declarations:
We will want a macro like this:
Following similar logic, we can define the macro as:
[Clarify the B
, E
, A
symbols. We dont need them actually.]
However in our initial implementation, we also wanted support for printing the symbols. Here is the code that we used:
We would like to generate this as well. That means we will need to take the value to be printed with each symbol. A macro like following will do the job:
Here is the modified macro:
The only different is following line:
( $(($x: ident, $y: literal)),* )
This pattern will match the input so that x
and y
contain the
symbol name and the value to be printed in pair.
For example, following line will specify the different values to be printed for different symbols:
$(
TapeMachineSymbol::$x => $y,
)*
What remains in the declaration is tape machine declaration, i.e.
Notice how this specification doesnt vary with the symbols and state machine used. We can have a very simple macro like this:
[TODO - Highlight that this macro can be used only after the declaration of the enums and discuss if there is a better way to do it]
Invoking the macro will simply generate the code for the tape machine implementation:
To summarize, If we have following tape machine:
- States: a, b, c, d
- Symbols: 0, 1, 2, 3, 4
We can generate the declarations using the following code:
Let’s move on to the simulation part.
Simulation
Here is the crux of the simulation:
Let’s break down the structure:
for i in 0..steps
is the loop that will run for requiredprintln!
is the print statement- Match statement is simply conditioning on the current state and the current symbol.
- We perform the actions depending on the current state and symbol.
- We update the state and print the final state.
- We loop again.
If we can generate the code for one branch like this:
then we can just do $(code generation logic for one rule)*
in our
macro.
Let’s focus on a single rule for now. Here are the things we need:
- The state
- The symbol
- The actions in order. If its a print statement then value to be printed is the symbol.
- The next state
I propose the following macro:
Here is how this will look like in actual implementation:
([$state: ident], [$($condition: ident)|+], [$($action: expr),*], [$final_state: ident])
Let’s break it down:
- States:
[$state: ident]
- Symbols:
[$($condition: ident)|+]
: One or more coditions - Actions:
[$($action: expr),*]
: One or more actions - Final state:
[$final_state: ident]
All of them comma separated.
Now because there are going to be multiple rules, we will just do
$(rule)*
to match all the rules.
i.e.
$([$state: ident], [$($condition: ident)|+], [$($action: expr),*], [$final_state: ident])*
However there is a slight problem with this. There is no way to know
when a rule ends and when another rule starts. [ Clarify why we can’t just use ;
].
Let’s settle on specifying multiple rules like this:
i.e. each rule is wrapped up in curly braces, separated by comma.
So our new macro will be something like this:
$({ handle_rule },*)
Here is how it will look like in actual implementation:
$({ [$state: ident], [$($condition: ident)|+], [$($action: expr),*], [$final_state: ident] } ),*
// Break it down like this:
// handle_rule = [$state: ident], [$($condition: ident)|+], [$($action: expr),*], [$final_state: ident]
// Full rule = $({ handle_rule }),*
Since the patterns are hard to read, visualize what the pattern will allow us to do: It will parse the input and give us a list of rules. Each rule will have its state, symbol, actions and next state stored in the corresponding variables.
Now the only different with the macros is that you cannot access the list content by index. You will have to specify another pattern which takes the elements of the list one by one and generates the output sequence.
So to summarise:
Raw text -> Pattern to Match -> Parsed Sequence -> Pattern to Generate -> Generated code
We can also work with lists of lists, with same restriction: No indexing, only output pattern. [ Clarify how index can be passed, but again with complications]
With this in mind, let me present the final macro:
Notice the two levels of repetition:
$( // First level repetition starts
(TapeMachineState::$state,
$(
process_action!($condition)
)|* // Second level repetition 1
) => {
$(
$action;
)* // Second level repetition 2
$tape_machine.state = &TapeMachineState::$final_state;
println!("Final State: {:?}", TapeMachineState::$final_state);
}
)* // First level repetition ends
- First level repetitions are corresponding to different rules.
- Second level repetition 1 is for a particular rule, writing
multiple conditions. e.g.
Symbol1 | Symbol0
. Notice that we have used|
to separate the conditions and used another macroprocess_action!
to process the actions. This macro is defined below:
In summary all this trouble just to use *
inside the match statement.
- Second level repetition 2 is for a particular rule, writing
multiple actions. e.g.
R, P(x), L, L, L
will translate toR, P(x), L, L, L
.
But that’s not what we wanted, right? We wanted something like this:
How about we define few more macros:
These macros can be defined as follows:
These macros will perform following actions:
Where these functions have been defined in the declaration section.
A sidenote:
We need to pass tape_machine
and steps
in the transition_rules!
macros. Why? Because we cannot just use
arbitrary variables in macros. Macros can use the variables that are
defined inside the body or the ones that are explicitly passed
to the macro. Read more about this
Alright, now we have all the macros defined, let’s look at the final program:
Isn’t this pretty concise? For a different state machine:
Current State | Required Current Tape Head Content | Action | Next State |
---|---|---|---|
a | None | P(0) | b |
b | 0 | R-P(1) | b |
b | 1 | R | a |
The program will be:
Well, now we are in a much better state. Instead of having to write the same code for a different state machine, we can just specify the states, symbols and rules and we are done.
We are done for practical purposes, however the syntax is a little bit verbose. Can we specify the states, symbols and rules in a more concise way? How about this:
STATES: [b], o, q, p, f
SYMBOLS: 0, 1, e, x
TRANSITIONS:
b, *, P(e)-R-P(e)-R-P(0)-R-R-P(0)-L-L, o
o, 1, R-P(x)-L-L-L, o
o, 0, X, q
q, 0 | 1, R-R, q
q, X, P(1)-L, p
p, x, P(X)-R, q
p, e, R, f
p, X, L-L, p
f, *, R-R, f
f, X, P(0)-L-L, o
This is the most concise we will every get (may replacing L-L-L with 3L, but ignore that for now).
Can we parse this and generate the code from scratch? That is the topic of the next post.