In this series of posts, we In this series of posts, we will create a very simple language that can be used to specify a state machine. Then we will write a compiler that will translate the description to a code that can simulate this state machine.
Here is theHere is the agenda:
Part 1: What is a state machine? A (Rust) program to simulate state machine.
Part 2: Writing (Rust) macros to reduce repetition
Part 3: A small language and convert a description into a high level language(Rust) implementation
Proposed Part 4: Translating the description into a lower level
assembly like language(LLVM IR)
sembly like language(LLVM IR)
This is part 3.
Language
In the previous post, we decided to work with a much concise specification of the state machine.
Here is a specification for the state machine which
will generate the sequence 001011011101111011111..
STATES: [b], o, q, p, f
SYMBOLS: 0, 1, e, x
TRANSITIONS:
b, *, P(e)-R-P(e)-R-P(0)-R-R-P(0)-L-L, o
o, 1, R-P(x)-L-L-L, o
o, 0, X, q
q, 0 | 1, R-R, q
q, X, P(1)-L, p
p, x, P(X)-R, q
p, e, R, f
p, X, L-L, p
f, *, R-R, f
f, X, P(0)-L-L, o
Since this particular state machine is a little
complex, we will be using a much simple state machine
example, one that generates the sequence 010101..
:
STATES: [a], b
SYMBOLS: 0, 1
TRANSITIONS:
a, a, P(0), b
b, 0, R, a
b, 1, R, a
So the expectation is that we will be able to generate the rust code for the above state machine:
In order to generate this code directly from the specification, we will need to parse the specification into a data structure.
Here are few natural data structure to represent the specification:
Once we have parsed the specification, generating the rust code is just a matter of substituting the right values in the template.
Parsing a text is typically broken into two steps:
- Lexing: Breaking the text into tokens
- Parsing: Grouping tokens into recognizable structures
We will be following this tutorial closely.
Lexing
A token is a value with a type. In our case, we can expect the following types of tokens:
Lexer will simply go over the text separated by whitespaces and return a token or thrown an error in case of an invalid value.
As the tutorial suggests, let’s first outline the interfaces of the lexer:
I’ve outlined the interfaces, since the implementation is straightforward, we will not be going through it, line by line.
Full Lexer Code
To test the lexer, take a look at the following code:
A Sample Lexer Test Code
To move on to the next step, we will need to make sense of these series of tokens.
Parsing
In order to parse we will follow this general approach:
- Take a look at the current token
- Based on the context, decide whether we expect it to be a certain type of token: a. The token type is required: i. Is this token present? Consume it and update the parse tree. ii. Otherwise, throw an error. b. Is this token type optional? i. Try to consume it. ii. If its not present, then move on.
How do we know which tokens type we expect at any place? A Grammer of our language will help:
The above grammer, recursively defines what a valid
program is. It starts with a PROGRAM
which is a
series of declarations and then goes on to define
what different declarations are.
This grammer says that we expect SYMBOLS_DECLARATION
to come after STATES_DECLARATION
.
In our parsing logic, at any stage, we will be expecting a series of tokens as specified by the grammer. For example the top level logic will be following:
The states_declaration
will be defined as follows:
Let’s focus on the consume
function:
There is another variant of consume
which is try_consume
:
Let’s look at what an action
might look like, in case we happen to be
matching a initial state identifier:
An example where we are using try_consume
:
To summarise, let’s right down the interfaces of parser:
Each of the functions, trying to parse a certain rule, when successful, will
update the ParseTree
with the parsed data, otherwise throw an error.
Full Parser Code
Generating Rust code
Once we have parsed the specification, generating the rust code is just a matter of substituting the right values in the template.
Let’s take a look at the template:
The code to do this is pretty straightforward, take a look below:
Full Rust Code Generator
This completes our implementation end to end. Take a look at the full code, for few extra bells and whistles, like generating the dot file for the state machine etc.
Full Rust Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
use crate::lexer::{Lexer, Token, TokenType};
use log::{debug, error, info};
#[derive(Debug, PartialEq, Clone)]
pub enum Condition {
OR(Vec<String>),
Star,
}
#[derive(Debug, PartialEq, Clone)]
pub enum TransitionStep {
R,
L,
X,
P(String), // A function call
}
trait FromTokenAndValue {
fn from_token_and_value(token: &Token, value: Option<String>) -> Self;
}
impl FromTokenAndValue for TransitionStep {
fn from*token_and_value(token: &Token, value: Option<String>) -> Self {
match token.kind {
TokenType::R => TransitionStep::R,
TokenType::L => TransitionStep::L,
TokenType::X => TransitionStep::X,
TokenType::P => TransitionStep::P(value.unwrap()),
* => panic!("Invalid token type for TransitionStep"),
}
}
}
#[derive(Debug, PartialEq, Clone)]
pub struct Transition {
pub initial_state: String,
pub condition: Condition,
pub steps: Vec<TransitionStep>,
pub final_state: String,
}
impl Transition {
pub fn new() -> Self {
Transition {
initial_state: String::new(),
condition: Condition::OR(Vec::new()),
steps: Vec::new(),
final_state: String::new(),
}
}
}
#[derive(Debug, PartialEq, Clone)]
pub struct ParseTree {
pub states: Vec<String>,
pub initial_state: String,
pub symbols: Vec<String>,
pub transitions: Vec<Transition>,
}
impl ParseTree {
pub fn to_rust_code(&self) -> String {
let mut code = String::new();
// Generate the TapeMachineState enum
code.push_str(
"use std::fmt;\nuse std::io;\n\n#[derive(Debug, PartialEq, Eq)]\nenum TapeMachineState {\n",
);
for state in &self.states {
code.push_str(&format!(" {},\n", state));
}
code.push_str("}\n\n");
// Generate the TapeMachineSymbol enum
code.push_str("#[derive(Debug, PartialEq, Eq, Clone)]\nenum TapeMachineSymbol {\n");
for symbol in &self.symbols {
code.push_str(&format!(" Symbol{},\n", symbol));
}
code.push_str("}\n\n");
// Generate the TapeMachineSymbol implementation
code.push_str("impl TapeMachineSymbol {\n");
code.push_str(" fn as_str(&self) -> &'static str {\n");
code.push_str(" match self {\n");
code.push_str(
&self
.symbols
.iter()
.map(|symbol| {
format!(
" TapeMachineSymbol::Symbol{} => \"{}\"",
symbol, symbol
)
})
.collect::<Vec<String>>()
.join(",\n"),
);
code.push_str("\n }\n");
code.push_str(" }\n");
code.push_str("}\n\n");
// Generate the TapeMachine struct
code.push_str("struct TapeMachine<'a> {\n state: &'a TapeMachineState,\n result: &'a mut Vec<TapeMachineSymbol>,\n index: usize,\n}\n\n");
// Generate the TapeMachine implementation
code.push_str("impl<'a> TapeMachine<'a> {\n");
code.push_str(" pub fn new(state: &'a TapeMachineState, result: &'a mut Vec<TapeMachineSymbol>) -> Self {\n");
code.push_str(" Self {\n");
code.push_str(" state,\n");
code.push_str(" result,\n");
code.push_str(" index: 0,\n");
code.push_str(" }\n");
code.push_str(" }\n\n");
code.push_str(" fn p(&mut self, symbol: TapeMachineSymbol) {\n");
code.push_str(" self.result[self.index] = symbol;\n");
code.push_str(" }\n\n");
code.push_str(" fn r(&mut self) {\n");
code.push_str(" self.index += 1;\n");
code.push_str(" }\n\n");
code.push_str(" fn l(&mut self) {\n");
code.push_str(" self.index -= 1;\n");
code.push_str(" }\n");
code.push_str("}\n\n");
// Generate the main function
code.push_str("fn main() {\n");
code.push_str(" println!(\"Enter the number of steps:\");\n");
code.push_str(" let mut steps_input = String::new();\n");
code.push_str(" io::stdin().read_line(&mut steps_input).unwrap();\n");
code.push_str(" let steps: usize = steps_input.trim().parse().unwrap();\n\n");
code.push_str(" println!(\"Enter the total tape length:\");\n");
code.push_str(" let mut tape_length_input = String::new();\n");
code.push_str(" io::stdin().read_line(&mut tape_length_input).unwrap();\n");
code.push_str(" let max_len: usize = tape_length_input.trim().parse().unwrap();\n\n");
code.push_str(" let mut result = vec![TapeMachineSymbol::SymbolX; max_len];\n");
code.push_str(&format!(
" let mut tape_machine = TapeMachine::new(&TapeMachineState::{}, &mut result);\n\n",
self.initial_state
));
code.push_str(" for i in 0..steps {\n");
code.push_str(" println!(\"Step: {} State: {:?} Symbol: {:?}\",\n");
code.push_str(
" i, tape_machine.state, tape_machine.result[tape_machine.index]);\n\n",
);
code.push_str(
" match (tape_machine.state, &tape_machine.result[tape_machine.index]) {\n",
);
// Sort transitions so that the ones with star condition are executed last
// This is to ensure compatibility with switch statements
let mut sorted_transitions = self.transitions.clone();
sorted_transitions.sort_by(|a, b| {
if a.condition == Condition::Star {
std::cmp::Ordering::Greater
} else if b.condition == Condition::Star {
std::cmp::Ordering::Less
} else {
std::cmp::Ordering::Equal
}
});
for transition in sorted_transitions {
let condition = match &transition.condition {
Condition::OR(symbols) => {
let mut condition_str = String::new();
for (i, symbol) in symbols.iter().enumerate() {
condition_str.push_str(&format!("TapeMachineSymbol::Symbol{}", symbol));
if i < symbols.len() - 1 {
condition_str.push_str(" | ");
}
}
condition_str
}
Condition::Star => "_".to_string(),
};
code.push_str(&format!(
" (TapeMachineState::{}, {}) => ",
transition.initial_state, condition
));
code.push_str("{\n");
for step in &transition.steps {
match step {
TransitionStep::R => code.push_str(" tape_machine.r();\n"),
TransitionStep::L => code.push_str(" tape_machine.l();\n"),
TransitionStep::X => {
code.push_str(" // X means do nothing\n");
}
TransitionStep::P(symbol) => {
code.push_str(&format!(
" tape_machine.p(TapeMachineSymbol::Symbol{});\n",
symbol
));
}
}
}
code.push_str(&format!(
" tape_machine.state = &TapeMachineState::{};\n",
transition.final_state
));
code.push_str(&format!(
" println!(\"Final State: \", TapeMachineState::{});\n",
transition.final_state
));
code.push_str(" }\n");
}
code.push_str(" (_, _) => {\n");
code.push_str(" println!(\"State: {:?} Index: {:?} Symbol: {:?}\", tape_machine.state, tape_machine.index, tape_machine.result[tape_machine.index]);\n");
code.push_str(" let binary_result: String = tape_machine.result.iter().map(|x| x.as_str()).collect();\n");
code.push_str(" println!(\"{}\", binary_result);\n");
code.push_str(" panic!(\"Invalid state reached\");\n");
code.push_str(" }\n");
code.push_str(" }\n");
code.push_str(" }\n\n");
code.push_str(" let binary_result: String = tape_machine.result.iter().map(|x| x.as_str()).collect();\n");
code.push_str(" println!(\"{}\", binary_result);\n");
code.push_str(" let clean_result: String = tape_machine.result.iter().filter( |&x| x != &TapeMachineSymbol::SymbolX).map(|x| x.as_str()).collect();\n");
code.push_str(" println!(\"=========\\n\");\n");
code.push_str(" println!(\"{}\", clean_result);\n");
code.push_str("}\n");
code
}
}
pub trait ToDot {
fn to_dot(&self) -> String;
}
impl ToDot for ParseTree {
fn to_dot(&self) -> String {
let mut dot = String::from(
"digraph {
rankdir=LR;
labelloc=\"t\";
node [shape=circle, style=filled, fillcolor=lightblue, fontname=\"Arial\"];
edge [fontcolor=blue, fontname=\"Arial\"];
",
);
// Define states
for state in &self.states {
let shape = if state == &self.initial_state {
"doublecircle"
} else {
"circle"
};
let fillcolor = if state == &self.initial_state {
"lightgreen"
} else {
"lightblue"
};
let width = if state == &self.initial_state {
"1.5"
} else {
"1.2"
};
let height = if state == &self.initial_state {
"1.5"
} else {
"1.2"
};
dot.push_str(&format!(
" \"{}\" [shape={}, fillcolor={}, width={}, height={}]; ",
state, shape, fillcolor, width, height
));
}
// Define transitions
for transition in &self.transitions {
let condition = match &transition.condition {
Condition::OR(symbols) => format!("[{}]", symbols.join(",")),
Condition::Star => "*".to_string(),
};
let steps: Vec<String> = transition
.steps
.iter()
.map(|step| match step {
TransitionStep::R => "R".to_string(),
TransitionStep::L => "L".to_string(),
TransitionStep::X => "X".to_string(),
TransitionStep::P(func) => format!("P({})", func),
})
.collect();
let label = format!("{} / {}", condition, steps.join("-"));
let color = "black";
dot.push_str(&format!(
" \"{}\" -> \"{}\" [label=\"{}\", color={}];
",
transition.initial_state, transition.final_state, label, color
));
}
dot.push_str(" } ");
dot
}
}
#[derive(Debug, PartialEq, Clone)]
pub struct Parser {
lexer: Lexer,
cur_token: Token,
peek_token: Token,
pub tree: ParseTree,
}
impl Parser {
pub fn new(lexer: Lexer) -> Self {
info!("Initializing Parser");
let mut parser = Parser {
lexer,
cur_token: Token {
text: "\0".to_string(),
kind: TokenType::EOF,
},
peek_token: Token {
text: "\0".to_string(),
kind: TokenType::EOF,
},
tree: ParseTree {
states: Vec::new(),
initial_state: "".to_string(),
symbols: Vec::new(),
transitions: Vec::new(),
},
};
parser.next_token(); // Initialize peek_token
parser.next_token(); // Initialize cur_token
parser
}
// Check if the current token matches the expected token type
fn check_token(&self, kind: TokenType) -> bool {
self.cur_token.kind == kind
}
// Check if the next token has the expected token type
fn check_peek(&self, kind: TokenType) -> bool {
self.peek_token.kind == kind
}
// Advance to the next token
fn next_token(&mut self) {
self.cur_token = self.peek_token.clone();
self.peek_token = self.lexer.get_token().unwrap_or(Token {
text: "\0".to_string(),
kind: TokenType::EOF,
});
// If both current and peek token are newline, skip the newline
if self.check_token(TokenType::NEWLINE) && self.check_peek(TokenType::NEWLINE) {
self.next_token();
}
}
// Abort the parsing process with an error message
fn abort(&self, message: &str) {
error!("Parsing error: {}", message);
panic!("Parsing error: {}", message);
}
// Try to consume the current token if it matches the expected token type
// If successful, print the token type and text (if available) and execute the optional action
// Return true if the token was consumed, false otherwise
fn try_consume<F>(&mut self, kind: TokenType, action: Option<F>) -> bool
where
F: FnMut(&Token),
{
if self.check_token(kind) {
match kind {
TokenType::IDENT => debug!("{:?}: {}", kind, self.cur_token.text),
_ => debug!("{:?}", kind),
}
if let Some(mut action) = action {
action(&self.cur_token);
}
self.next_token();
true
} else {
false
}
}
// Consume the current token if it matches the expected token type
// If not, abort with an error message
// Execute the optional action if provided
fn consume<F>(&mut self, expected: TokenType, action: Option<F>)
where
F: FnMut(&Token),
{
if !self.try_consume(expected, action) {
self.abort(&format!(
"Expected {:?}, got {:?}",
expected, self.cur_token.kind
));
}
}
// Parse an initial state identifier: [IDENT]
fn initial_state_identifier(&mut self) {
self.consume(TokenType::LeftBracket, None::<fn(&Token)>);
let mut initial_state = String::new();
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
initial_state.push_str(&token.text);
}),
);
if self.tree.initial_state.is_empty() {
self.tree.initial_state = initial_state.clone();
self.tree.states.push(initial_state);
} else {
self.abort("Initial state already defined.");
}
self.consume(TokenType::RightBracket, None::<fn(&Token)>);
debug!("INITIAL_STATE_IDENTIFIER");
}
// Parse a list of state identifiers: IDENT (',' IDENT)*
fn state_identifier_list(&mut self) {
let mut state_identifiers = Vec::new();
// Consume all tokens
while self.check_token(TokenType::IDENT) || self.check_token(TokenType::LeftBracket) {
if self.check_token(TokenType::LeftBracket) {
self.initial_state_identifier();
} else if !self.try_consume(
TokenType::IDENT,
Some(|token: &Token| {
state_identifiers.push(token.text.clone());
}),
) {
break;
}
if !self.try_consume(TokenType::COMMA, None::<fn(&Token)>) {
debug!("STATE_IDENTIFIER_LIST");
break;
}
}
if self.tree.initial_state.is_empty() {
self.abort("Initial state not defined.");
}
// If state identifiers have duplicates, abort with an error message
state_identifiers.iter().for_each(|state_identifier| {
if self.tree.states.contains(state_identifier) {
self.abort(&format!("State {} already defined.", state_identifier));
} else {
self.tree.states.push(state_identifier.clone());
}
});
}
// Parse a states declaration: STATES ':' state_identifier_list NEWLINE
fn states_declaration(&mut self) {
self.consume(TokenType::STATES, None::<fn(&Token)>);
self.consume(TokenType::COLON, None::<fn(&Token)>);
self.state_identifier_list();
self.consume(TokenType::NEWLINE, None::<fn(&Token)>);
debug!("STATES_DECLARATION");
}
// Parse a list of symbol identifiers: IDENT (',' IDENT)*
fn symbol_identifiers(&mut self) {
let mut symbol_identifiers = Vec::new();
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
symbol_identifiers.push(token.text.clone());
}),
);
while self.try_consume(TokenType::COMMA, None::<fn(&Token)>) {
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
symbol_identifiers.push(token.text.clone());
}),
);
}
symbol_identifiers.iter().for_each(|symbol_identifier| {
if self.tree.symbols.contains(symbol_identifier) {
self.abort(&format!("Symbol {} already defined.", symbol_identifier));
} else {
self.tree.symbols.push(symbol_identifier.clone());
}
});
// X is a special symbol
self.tree.symbols.push("X".to_string());
debug!("SYMBOL_IDENTIFIERS");
}
// Parse a symbols declaration: SYMBOLS ':' symbol_identifiers NEWLINE
fn symbols_declaration(&mut self) {
self.consume(TokenType::SYMBOLS, None::<fn(&Token)>);
self.consume(TokenType::COLON, None::<fn(&Token)>);
self.symbol_identifiers();
self.consume(TokenType::NEWLINE, None::<fn(&Token)>);
debug!("SYMBOLS_DECLARATION");
}
// Parse a transition step: R | L | P '(' IDENT ')' | X
fn transition_step(&mut self) {
// By default, do nothing
let mut step: TransitionStep = TransitionStep::X;
if self.try_consume(
TokenType::R,
Some(|token: &Token| {
step = FromTokenAndValue::from_token_and_value(&token.clone(), None);
}),
) {
} else if self.try_consume(
TokenType::L,
Some(|token: &Token| {
step = FromTokenAndValue::from_token_and_value(&token.clone(), None);
}),
) {
} else if self.try_consume(
TokenType::X,
Some(|token: &Token| {
step = FromTokenAndValue::from_token_and_value(&token.clone(), None);
}),
) {
} else if self.try_consume(TokenType::P, None::<fn(&Token)>) {
self.consume(TokenType::LeftParen, None::<fn(&Token)>);
let mut print_string = String::new();
// Either X or a symbol identifier
if self.try_consume(
TokenType::X,
Some(|token: &Token| {
print_string.push_str(&token.text);
}),
) {
} else {
self.consume(
TokenType::IDENT,
Some(|step: &Token| {
print_string.push_str(&step.text);
}),
)
};
if !self.tree.symbols.contains(&print_string) {
self.abort(&format!(
"Symbol {} not defined, So cannot be printed.",
print_string
));
}
step = FromTokenAndValue::from_token_and_value(
&Token {
text: "P".to_string(),
kind: TokenType::P,
},
Some(print_string),
);
self.consume(TokenType::RightParen, None::<fn(&Token)>);
} else {
self.abort(&format!(
"Expected {:?} or {:?} or {:?} or {:?} as an action step, got {:?}: {:?}",
TokenType::R,
TokenType::L,
TokenType::P,
TokenType::X,
self.cur_token.kind,
self.cur_token.text
));
}
self.tree.transitions.last_mut().unwrap().steps.push(step);
}
fn transition_steps(&mut self) {
self.transition_step();
while self.try_consume(TokenType::DASH, None::<fn(&Token)>) {
self.transition_step();
}
debug!("TRANSITION_STEPS");
}
// Parse a list of transition conditions: IDENT ('|' IDENT)*
fn transition_condition_list(&mut self) {
let mut conditions: Vec<String> = Vec::new();
// Consume X as well
if self.try_consume(
TokenType::X,
Some(|token: &Token| {
conditions.push(token.text.clone());
}),
) {
} else {
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
conditions.push(token.text.clone());
}),
);
}
while self.try_consume(TokenType::OR, None::<fn(&Token)>) {
// Consume X as well
if self.try_consume(
TokenType::X,
Some(|token: &Token| {
conditions.push(token.text.clone());
}),
) {
} else {
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
conditions.push(token.text.clone());
}),
);
}
}
self.tree.transitions.last_mut().unwrap().condition = Condition::OR(conditions);
debug!("TRANSITION_CONDITION_LIST");
}
// Parse transition conditions: '*' | transition_condition_list
fn transition_conditions(&mut self) {
let mut star_condition = false;
if !self.try_consume(
TokenType::STAR,
Some(|_token: &Token| {
star_condition = true;
}),
) {
self.transition_condition_list();
}
// Override all other conditions with the star condition
if star_condition {
self.tree.transitions.last_mut().unwrap().condition = Condition::Star;
}
debug!("TRANSITION_CONDITIONS");
}
// Parse a transition declaration:
// IDENT ',' transition_conditions ',' transition_steps ',' IDENT
fn transition_declaration(&mut self) {
// Initialize a new transition
self.tree.transitions.push(Transition::new());
// Initial state
let mut initial_state = String::new();
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
initial_state.push_str(&token.text);
}),
);
self.tree.transitions.last_mut().unwrap().initial_state = initial_state;
debug!("INITIAL_STATE_IDENTIFIER");
self.consume(TokenType::COMMA, None::<fn(&Token)>);
// Conditions
self.transition_conditions();
self.consume(TokenType::COMMA, None::<fn(&Token)>);
// Actions
self.transition_steps();
self.consume(TokenType::COMMA, None::<fn(&Token)>);
// Final state
let mut final_state = String::new();
self.consume(
TokenType::IDENT,
Some(|token: &Token| {
final_state.push_str(&token.text);
}),
);
self.tree.transitions.last_mut().unwrap().final_state = final_state;
debug!("FINAL_STATE_IDENTIFIER");
debug!("TRANSITION_DECLARATION");
}
// Parse transitions declarations:
// TRANSITIONS ':' (NEWLINE transition_declaration)*
fn transitions_declaration(&mut self) {
self.consume(TokenType::TRANSITIONS, None::<fn(&Token)>);
self.consume(TokenType::COLON, None::<fn(&Token)>);
while self.try_consume(TokenType::NEWLINE, None::<fn(&Token)>) {
if self.check_token(TokenType::EOF) {
break;
}
self.transition_declaration();
}
debug!("TRANSITION_DECLARATIONS");
}
// Parse the entire program:
// NEWLINE? states_declaration symbols_declaration transitions_declaration NEWLINE? EOF
pub fn program(&mut self) {
// Consume newlines
while self.try_consume(TokenType::NEWLINE, None::<fn(&Token)>) {}
self.states_declaration();
self.symbols_declaration();
self.transitions_declaration();
// Consume newlines
while self.try_consume(TokenType::NEWLINE, None::<fn(&Token)>) {}
self.consume(TokenType::EOF, None::<fn(&Token)>);
debug!("PROGRAM");
}
}
Conclusion
We were able to generate the rust code for the state machine, with very minimal specification. You can find the full code and instructions to run here.
One thing still bugs me: at the end of the day, we just substituted the parsed values within the program. Can we actually construct the program as in:
- Creating a module
- Defining functions inside it
- Writing the logic using these functions.
We can go a level deeper and work with LLVM IR. We won’t try to have all bells and whistles, but just simulate the state machine and print the result.
That is the content of the next post [Proposed].