Compiling Grammars and Parsing

First we will implement compilation of pest grammars, and parsing text with a compiled grammar. A pest grammar contains named rules that describe how to parse something. For example, number = { ASCII_DIGIT+ } means that a number is parsed by parsing 1 or more ASCII_DIGIT, with ASCII_DIGIT being a builtin rule that parses ASCII numbers 0-9.

Add the following dev-dependencies to pie/Cargo.toml:

  • pest is the library for parsing with pest grammars.
  • pest_meta validates, optimises, and compiles pest grammars.
  • pest_vm provides parsing with a compiled pest grammar, without having to generate Rust code for grammars, enabling interactive use.

Create the pie/examples/parser_dev/main.rs file and add an empty main function to it:

fn main() {

}

Confirm the example can be run with cargo run --example parser_dev.

Let’s implement the pest grammar compiler and parser. Add parse as a public module to pie/examples/parser_dev/main.rs:

We will add larger chunks of code from now on, compared to the rest of the tutorial, to keep things going. Create the pie/examples/parser_dev/parse.rs file and add to it:

use std::collections::HashSet;
use std::fmt::Write;

/// Parse programs with a compiled pest grammar.
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct CompiledGrammar {
  rules: Vec<pest_meta::optimizer::OptimizedRule>,
  rule_names: HashSet<String>,
}

impl CompiledGrammar {
  /// Compile the pest grammar from `grammar_text`, using `path` to annotate errors. Returns a [`Self`] instance.
  ///
  /// # Errors
  ///
  /// Returns `Err(error_string)` when compiling the grammar fails.
  pub fn new(grammar_text: &str, path: Option<&str>) -> Result<Self, String> {
    match pest_meta::parse_and_optimize(grammar_text) {
      Ok((builtin_rules, rules)) => {
        let mut rule_names = HashSet::with_capacity(builtin_rules.len() + rules.len());
        rule_names.extend(builtin_rules.iter().map(|s| s.to_string()));
        rule_names.extend(rules.iter().map(|s| s.name.clone()));
        Ok(Self { rules, rule_names })
      },
      Err(errors) => {
        let mut error_string = String::new();
        for mut error in errors {
          if let Some(path) = path.as_ref() {
            error = error.with_path(path);
          }
          error = error.renamed_rules(pest_meta::parser::rename_meta_rule);
          let _ = writeln!(error_string, "{}", error); // Ignore error: writing to String cannot fail.
        }
        Err(error_string)
      }
    }
  }
}

The CompiledGrammar struct contains a parsed pest grammar, consisting of a Vec of optimised parsing rules, and a hash set of rule names. We will use this struct as an output of a task in the future, so we derive Clone, Eq, and Debug.

The new function takes text of a pest grammar, and an optional file path for error reporting, and creates a CompilerGrammar or an error in the form of a String. We’re using Strings as errors in this example for simplicity.

We compile the grammar with pest_meta::parse_and_optimize. If successful, we gather the rule names into a hash set and return a CompiledGrammar. If not, multiple errors are returned, which are first preprocessed with with_path and renamed_rules, and then written to a single String with writeln!, which is returned as the error.

Now we implement parsing using a CompiledGrammar. Add the parse method to pie/examples/parser_dev/parse.rs:

parse takes the text of the program to parse, the rule name to start parsing with, and an optional file path for error reporting.

We first check whether rule_name exists by looking for it in self.rule_names, and return an error if it does not exist. We have to do this because pest_vm panics when the rule name does not exist, which would kill the entire program.

If the rule name is valid, we create a pest_vm::Vm and parse. If successful, we get a pairs iterator that describes how the program was parsed, which are typically used to create an Abstract Syntax Tree (AST) in Rust code. However, for simplicity we just format the pairs as a String and return that. If not successful, we do the same as the previous function, but instead for 1 error instead of multiple.

Unfortunately we cannot store pest_vm::Vm in CompiledGrammar, because Vm does not implement Clone nor Eq. Therefore, we have to create a new Vm every time we parse, which has a small performance overhead, but that is fine for this example.

To check whether this code does what we want, we’ll write a test for it (yes, you can add tests to examples in Rust!). Add to pie/examples/parser_dev/parse.rs:

#[cfg(test)]
mod tests {
  use super::*;

  #[test]
  fn test_compile_parse() -> Result<(), String> {
    // Grammar compilation failure.
    let result = CompiledGrammar::new("asd = { fgh } qwe = { rty }", None);
    assert!(result.is_err());
    println!("{}", result.unwrap_err());

    // Grammar that parses numbers.
    let compiled_grammar = CompiledGrammar::new("num = { ASCII_DIGIT+ }", None)?;
    println!("{:?}", compiled_grammar);

    // Parse failure
    let result = compiled_grammar.parse("a", "num", None);
    assert!(result.is_err());
    println!("{}", result.unwrap_err());
    // Parse failure due to non-existent rule.
    let result = compiled_grammar.parse("1", "asd", None);
    assert!(result.is_err());
    println!("{}", result.unwrap_err());
    // Parse success
    let result = compiled_grammar.parse("1", "num", None);
    assert!(result.is_ok());
    println!("{}", result.unwrap());

    Ok(())
  }
}

We test grammar compilation failure and success, and parse failure and success. Run this test with cargo test --example parser_dev -- --show-output, which also shows what the returned Strings look like.