Compiling Grammars and Parsing
First we will implement compilation of pest grammars, and parsing text with a compiled grammar.
A pest grammar contains named rules that describe how to parse something.
For example, number = { ASCII_DIGIT+ }
means that a number
is parsed by parsing 1 or more ASCII_DIGIT
, with ASCII_DIGIT
being a builtin rule that parses ASCII numbers 0-9.
Add the following dev-dependencies to pie/Cargo.toml
:
- pest is the library for parsing with pest grammars.
- pest_meta validates, optimises, and compiles pest grammars.
- pest_vm provides parsing with a compiled pest grammar, without having to generate Rust code for grammars, enabling interactive use.
Create the pie/examples/parser_dev/main.rs
file and add an empty main function to it:
fn main() {
}
Confirm the example can be run with cargo run --example parser_dev
.
Let’s implement the pest grammar compiler and parser.
Add parse
as a public module to pie/examples/parser_dev/main.rs
:
We will add larger chunks of code from now on, compared to the rest of the tutorial, to keep things going.
Create the pie/examples/parser_dev/parse.rs
file and add to it:
use std::collections::HashSet;
use std::fmt::Write;
/// Parse programs with a compiled pest grammar.
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct CompiledGrammar {
rules: Vec<pest_meta::optimizer::OptimizedRule>,
rule_names: HashSet<String>,
}
impl CompiledGrammar {
/// Compile the pest grammar from `grammar_text`, using `path` to annotate errors. Returns a [`Self`] instance.
///
/// # Errors
///
/// Returns `Err(error_string)` when compiling the grammar fails.
pub fn new(grammar_text: &str, path: Option<&str>) -> Result<Self, String> {
match pest_meta::parse_and_optimize(grammar_text) {
Ok((builtin_rules, rules)) => {
let mut rule_names = HashSet::with_capacity(builtin_rules.len() + rules.len());
rule_names.extend(builtin_rules.iter().map(|s| s.to_string()));
rule_names.extend(rules.iter().map(|s| s.name.clone()));
Ok(Self { rules, rule_names })
},
Err(errors) => {
let mut error_string = String::new();
for mut error in errors {
if let Some(path) = path.as_ref() {
error = error.with_path(path);
}
error = error.renamed_rules(pest_meta::parser::rename_meta_rule);
let _ = writeln!(error_string, "{}", error); // Ignore error: writing to String cannot fail.
}
Err(error_string)
}
}
}
}
The CompiledGrammar
struct contains a parsed pest grammar, consisting of a Vec
of optimised parsing rules, and a hash set of rule names.
We will use this struct as an output of a task in the future, so we derive Clone
, Eq
, and Debug
.
The new
function takes text of a pest grammar, and an optional file path for error reporting, and creates a CompilerGrammar
or an error in the form of a String
.
We’re using String
s as errors in this example for simplicity.
We compile the grammar with pest_meta::parse_and_optimize
.
If successful, we gather the rule names into a hash set and return a CompiledGrammar
.
If not, multiple errors are returned, which are first preprocessed with with_path
and renamed_rules
, and then written to a single String with writeln!
, which is returned as the error.
Now we implement parsing using a CompiledGrammar
.
Add the parse
method to pie/examples/parser_dev/parse.rs
:
parse
takes the text of the program to parse, the rule name to start parsing with, and an optional file path for error reporting.
We first check whether rule_name
exists by looking for it in self.rule_names
, and return an error if it does not exist.
We have to do this because pest_vm
panics when the rule name does not exist, which would kill the entire program.
If the rule name is valid, we create a pest_vm::Vm
and parse
.
If successful, we get a pairs
iterator that describes how the program was parsed, which are typically used to create an Abstract Syntax Tree (AST) in Rust code.
However, for simplicity we just format the pairs as a String
and return that.
If not successful, we do the same as the previous function, but instead for 1 error instead of multiple.
Unfortunately we cannot store pest_vm::Vm
in CompiledGrammar
, because Vm
does not implement Clone
nor Eq
.
Therefore, we have to create a new Vm
every time we parse, which has a small performance overhead, but that is fine for this example.
To check whether this code does what we want, we’ll write a test for it (yes, you can add tests to examples in Rust!).
Add to pie/examples/parser_dev/parse.rs
:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_compile_parse() -> Result<(), String> {
// Grammar compilation failure.
let result = CompiledGrammar::new("asd = { fgh } qwe = { rty }", None);
assert!(result.is_err());
println!("{}", result.unwrap_err());
// Grammar that parses numbers.
let compiled_grammar = CompiledGrammar::new("num = { ASCII_DIGIT+ }", None)?;
println!("{:?}", compiled_grammar);
// Parse failure
let result = compiled_grammar.parse("a", "num", None);
assert!(result.is_err());
println!("{}", result.unwrap_err());
// Parse failure due to non-existent rule.
let result = compiled_grammar.parse("1", "asd", None);
assert!(result.is_err());
println!("{}", result.unwrap_err());
// Parse success
let result = compiled_grammar.parse("1", "num", None);
assert!(result.is_ok());
println!("{}", result.unwrap());
Ok(())
}
}
We test grammar compilation failure and success, and parse failure and success.
Run this test with cargo test --example parser_dev -- --show-output
, which also shows what the returned String
s look like.