Build your own Programmatic Incremental Build System
This is a programming tutorial where you will build your own programmatic incremental build system, which is a mix between an incremental build system and an incremental computation system. Programmatic incremental build systems enable programmers to write expressive build scripts and interactive programs in a regular programming language, with the system taking care of correct incrementality once and for all, freeing programmers from having to manually implement complicated and error-prone incrementality every time.
The primary goal of this tutorial is to provide understanding of programmatic incremental build systems through implementation and experimentation.
In this programming tutorial you will write Rust code, but you don’t need to be a Rust expert to follow it. A secondary goal of this tutorial is to teach more about Rust through implementation and experimentation, given that you already have some programming experience (in another language) and are willing to learn. Therefore, all Rust code is available, and I try to explain and link to the relevant Rust book chapters as much as possible.
This is of course not a full tutorial or book on Rust. For that, I can recommend the excellent The Rust Programming Language book. However, if you like to learn through examples and experimentation, or already know Rust basics and want to practice, this might be a fun programming tutorial for you!
We will first motivate programmatic incremental build systems in more detail.
Motivation
A programmatic incremental build system is a mix between an incremental build system and an incremental computation system, with the following key properties:
- Programmatic: Build scripts are regular programs written in a programming language, where parts of the program implement an API from the build system. This enables programmers to write incremental builds scripts and interactive programs with the full expressiveness of the programming language.
- Incremental: Builds are truly incremental – only the parts of a build that are affected by changes are executed.
- Correct: Builds are fully correct – all parts of the build that are affected by changes are executed. Builds are free of glitches: only up-to-date (consistent) data is observed.
- Automatic: The system takes care of incrementality and correctness. Programmers do not have to manually implement incrementality. Instead, they only have to explicitly declare dependencies.
To show the benefits of a build system with these key properties, below is a simplified version of the build script for compiling a formal grammar and parsing text with that compiled grammar, which is the build script you will implement in the final project chapter. This simplified version removes details that are not important for understanding programmatic incremental build systems at this moment.
Don’t worry if you do not (fully) understand this code, the tutorial will guide you more with programming and understanding this kind of code. This example is primarily here to motivate programmatic incremental build systems, as it is hard to do so without it.
pub enum ParseTasks {
CompileGrammar { grammar_file: PathBuf },
Parse { compile_grammar_task: Box<ParseTasks>, text_file: PathBuf, rule_name: String }
}
pub enum Outputs {
CompiledGrammar(CompiledGrammar),
Parsed(String)
}
impl Task for ParseTasks {
fn execute<C: Context>(&self, context: &mut C) -> Result<Outputs, Error> {
match self {
ParseTasks::CompileGrammar { grammar_file } => {
let grammar_text = context.require_file(grammar_file)?;
let compiled_grammar = CompiledGrammar::new(&grammar_text)?;
Ok(Outputs::CompiledGrammar(compiled_grammar))
}
ParseTasks::Parse { compile_grammar_task, text_file, rule_name } => {
let compiled_grammar = context.require_task(compile_grammar_task)?;
let text = context.require_file(text_file)?;
let output = compiled_grammar.parse(&text, rule_name)?;
Ok(Outputs::Parsed(output))
}
}
}
}
fn main() {
let compile_grammar_task = Box::new(ParseTasks::CompileGrammar {
grammar_file: PathBuf::from("grammar.pest")
});
let parse_1_task = ParseTasks::Parse {
compile_grammar_task: compile_grammar_task.clone(),
text_file: PathBuf::from("test_1.txt"),
rule_name: "main"
};
let parse_2_task = ParseTasks::Parse {
compile_grammar_task: compile_grammar_task.clone(),
text_file: PathBuf::from("test_2.txt"),
rule_name: "main"
};
let mut context = IncrementalBuildContext::default();
let output_1 = context.require_task(&parse_1_task).unwrap();
println("{output_1:?}");
let output_2 = context.require_task(&parse_2_task).unwrap();
println("{output_2:?}");
}
This is in essence just a normal (pure) Rust program: it has enums, a trait implementation for one of those enums, and a main
function.
However, this program is also a build script because ParseTasks
implements the Task
trait, which is the core trait defining the unit of computation in a programmatic incremental build system.
Because ParseTasks
is an enum, there are two kinds of tasks: a CompileGrammar
task that compiles a grammar, and a Parse
task that parses a text file using the compiled grammar.
Tasks
A task is kind of like a closure: a function along with its inputs that can be executed.
For example, CompileGrammar
carries grammar_file_path
which is the file path of the grammar that it will compile.
When we execute
a CompileGrammar
task, it reads the text of the grammar from the file, compiles that text into a grammar, and returns a compiled grammar.
Tasks differ from closures however, in that tasks are incremental.
Incremental File Dependencies
We want the CompileGrammar
task to be incremental, such that this task is only re-executed when the contents of the grammar_file
file changes.
Therefore, execute
has a context
parameter which is an incremental build context that tasks use to tell the build system about dependencies.
For example, CompileGrammar
tells the build system that it requires the grammar_file
file with context.require_file(grammar_file)
, creating a file read dependency to that file.
It is then the responsibility of the incremental build system to only execute this task if the file contents have changed.
Dynamic Dependencies
Note that this file dependency is created while the task is executing! We call these dynamic dependencies, as opposed to static dependencies. Dynamic dependencies enable the programmatic part of programmatic incremental build systems, because dependencies are made while your program is running, and can thus depend on values computed earlier in your program.
Incremental Task Dependencies
Dynamic dependencies are also created between tasks.
For example, Parse
carries compile_grammar_task
which is an instance of the CompileGrammar
task to compile a grammar.
When we execute
a Parse
task, it tells the build system that it depends on the compile grammar task with context.require_task(compile_grammar_task)
.
This also asks the build system to return the most up-to-date (consistent) output of that task. It is then the responsibility of the incremental build system to check whether the task is consistent, and to re-execute it only if it is inconsistent. In essence, the build system will take these steps:
- If
compile_grammar_task
was never executed before, the build system executes it, caches the compiled grammar, and returns the compiled grammar. - Otherwise, to check if the compile grammar task is consistent, we need to check its dependencies: the file dependency to
grammar_file
.- If the contents of the
grammar_file
file has changed, the task is inconsistent and the build system re-executes it, caches the new compiled grammar, and returns it. - Otherwise, the task is consistent and the build system simply returns the cached compiled grammar.
- If the contents of the
The Parse
task then has access to the compiled_grammar
, reads the text file to parse with require_file
, and finally parses the text
with compiled_grammar.parse
.
Using Tasks
Because this is just a regular Rust program, we can use the tasks in the same program with a main
function.
The main
function creates instances of these tasks, creates an IncrementalBuildContext
, and asks the build system to return the up-to-date outputs for two tasks with context.require_task
.
This main
function is performing an incremental batch build.
However, you can also use these same tasks to build an interactive application.
That is too much code to discuss here in this introduction, but the final project chapter shows a video of an interactive application that you can build using these tasks.
Conclusion
This is the essence of programmatic incremental build systems.
In this tutorial, we will define the Task
trait and implement the IncrementalBuildContext
over the course of several chapters.
However, before we start doing that, I want to first zoom back out and discuss the benefits (and drawbacks) of programmatic incremental build systems. If you already feel motivated enough, you can skip to here.
Benefits
The primary motivation for programmatic incremental build systems is that you can program your incremental builds and interactive applications in a regular programming language, instead of having to write it in a separate (declarative) build script language, and this has several benefits:
- You can re-use your knowledge of the programming language, instead of having to learn a new build script language.
- You can use tools of the programming language, such as the compiler that provides (good) error messages, an IDE that helps you read and write code, a debugger for understanding the program, unit and integration testing for improving code reliability, benchmarking for improving performance, etc.
- You can modularize your build script using facilities of the programming language, enabling you to reuse your build script as a library or to use modules created by others in your build script. You can also use regular modules of the programming language and integrate them into build scripts, and vice versa.
The other important benefit is that incrementality and correctness are taken care of by the build system. Therefore, you don’t have to manually implement incrementality in a correct way, which is complicated and error-prone to implement.
You do have to specify the exact dependencies of tasks to files and other tasks, as seen in the example, but this is easier than implementing incrementality.
Due to the dependencies being dynamic, you can use regular programming language constructs like calling a function to figure out what file to depend on, if
to create conditional dependencies, while
to create multiple dependencies, and so forth.
Exactly specifying the dependencies in this way has another important benefit: the dynamic dependencies of a task perfectly describe when the task should be re-executed, enabling the build system to be fully incremental and correct. This is in contrast to build system with static dependencies – dependencies that cannot use runtime values, typically using literal file names or patterns – where dependencies often have to be over-approximated (not fully incremental) or under-approximated (not correct) due to not being able to exactly specify dependencies.
Some build systems use multiple stages to emulate a limited form of dynamic dependencies. For example, dynamic dependencies in Make requires staging: first dynamically generate new makefiles with correct dependencies, and then recursively execute them. Gradle has a two-staged build process: first configure the task graph, then incrementally execute it, but no new dependencies nor tasks can be created during execution. This is an improvement over static dependencies, but requires you to think about what to do in each stage, requires maintenance of each stage, and limits what you can do in each stage.
A final benefit of dynamic dependencies is that they do away with staging because there is only a single stage: the execution of your build script, and you can create dynamic dependencies in this single stage. This increases expressiveness, makes build scripts easier to read and write, and reduces maintenance overhead.
Drawbacks
Of course, programmatic incremental build systems also have some drawbacks. These drawbacks become more clear during the tutorial, but I want to list them here to be up-front about it:
- The build system is more complicated, but hopefully this tutorial can help mitigate some of that by understanding the key ideas through implementation and experimentation.
- Some correctness properties are checked while building. Therefore, you need to test your builds to try to catch these issues before they reach users. However, I think that testing builds is something you should do regardless of the build system, to be more confident about the correctness of your build.
- More tracking is required at runtime compared to non-programmatic build systems. However, in our experience, the overhead is not excessive unless you try to do very fine-grained incrementalization. For fine-grained incrementalization, incremental computing approaches are more well suited.
PIE: a Programmatic Incremental Build System in Rust
We have developed PIE, a Rust library implementing a programmatic incremental build system adhering to the key properties listed above. It is still under development, and has not been published to crates.io yet, but it is already usable If you are interested in experimenting with a programmatic incremental build system, do check it out!
In this tutorial we will implement a subset of PIE. We simplify the internals in order to minimize distractions as much as possible, but still go over all the key ideas and concepts that make programmatic incremental build systems tick.
However, the idea of programmatic incremental build systems is not limited to PIE or the Rust language. You can implement a programmatic incremental build systems in any general-purpose programming language, or adapt the idea to better fit your preferences and/or requirements. In fact, we first implemented PIE in Java, with PIE in Rust being the second iteration, mostly simplifying internals to make it easier to explain.
For a more thorough discussion on PIE, see the PIE Implementations & Publications appendix chapter, and the Related Work appendix chapter.
Feedback & Contributing
This tutorial is open source, hosted at https://github.com/Gohla/pibs. If you find an error in the code or text of this tutorial, or want to report other kinds of problems, you can report it on the issue tracker. Small fixes can be sent as a pull request by pressing the edit button in the top-right corner.
Let’s continue with the tutorial. The next section covers installing Rust and setting up a fresh Rust project.
Setup
Rust
We will be using Rust to implement a programmatic incremental build system. Therefore, first make sure Rust is installed.
If you already have Rust installed, make sure to update to at least Rust 1.65.0 (Nov 2022), as we’re using features only available from that release.
Once Rust is installed, you should have access to cargo. Cargo is Rust’s package manager but also the command-line tool that ties everything together. As a consequence, Rust and Cargo are often used interchangeably when talking about developing Rust code.
Verify your Rust installation by running cargo
from the command-line, which will show the help page for the cargo command.
Rust Editor / IDE
This tutorial does not require a specific Rust editor or IDE.
All you need is some way to edit files, and some way to run cargo
.
If you like to work in a terminal, rust-analyzer, the primary Language Server Protocol (LSP) implementation of Rust, can be used in Emacs and Vim/Neovim.
If you prefer a graphical editor, a popular choice is Visual Studio Code with the rust-analyzer plugin.
Personally, I am using the RustRover IDE, as it provides (in my opinion) the most polished and feature-full Rust editor/IDE. At the time of writing, RustRover is still in early access preview, meaning that it is free to use, but is still technically beta software. However, it is very stable despite being in preview. Once RustRover comes out of preview, you will need to pay for it though (or apply for a free educational or open source license).
Creating a new Rust project
In this tutorial, we will create a subset of the PIE in Rust library, so we want to create a Rust package called pie
.
However, later on in the tutorial we will also create an additional package (for unit testing utilities), so we need to set up a Rust workspace that supports multiple packages.
Therefore, first create a pibs
directory, which will serve as the workspace directory of the project.
This does not have to be called pibs
, you can use a different name, but this tutorial will use pibs
.
Then create the pibs/Cargo.toml
file with the following contents:
[workspace]
members = [
"pie",
]
default-members = [
"pie",
]
resolver = "2"
This Cargo.toml
file marks the pibs
directory as a Cargo workspace, with one (default) member package called pie
.
This package does not exist yet, but we will create it shortly.
Because this pibs
directory is now marked as a workspace, we can run cargo
commands in this directory, and they will be automatically forwarded to the default members: currently only the pie
package.
Now let’s set up the pie
package.
Create the pibs/pie
directory, and then create the pibs/pie/Cargo.toml
file with the following contents:
[package]
name = "pie"
version = "0.1.0"
edition = "2021"
Then create the pibs/pie/src
directory and create the pibs/pie/src/lib.rs
file, which will be left empty for now.
This marks pie
as a Cargo package, with version “0.1.0” and using Rust edition 2021.
The lib.rs
file is the main library file of the pie
package.
The directory structure should look as follows:
pibs
├── pie
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
└── Cargo.toml
Now we can build the workspace to see if it was set up correctly.
Open up a terminal, go back into the pibs
workspace directory, and run cargo build
.
If all is well, the output should look something like:
Compiling pie v0.1.0 (/pie)
Finished dev [unoptimized + debuginfo] target(s) in 0.06s
If you’re using a Rust editor or IDE, it probably also has a mechanism for running cargo on your project. You can of course use that in place of running cargo from a terminal.
In the rest of the tutorial, we assume that you are in your pibs
workspace directory.
So if you are instructed to create files or directories, they are always relative to your pibs
workspace directory!
Also, if you are instructed to run cargo
commands, always run them inside the pibs
workspace directory!
Cargo Workspaces, Packages, and Crates; Rust Editions
Cargo workspaces enable development of Rust projects with multiple libraries (which are called crates in Rust). The reference page on workspaces has more info regarding workspaces.
Cargo packages can have up to one library crate, and many binary crates. A crate is the smallest amount of code that the Rust compiler considers at a time. However, the word crate is often used interchangeably with library.
Rust editions enable new features as an opt-in, without breaking existing code. We use Rust edition 2021, which is the latest edition at the time of writing.
Source control (optional but recommended)
I recommend storing your code in a source control system such as Git, and uploading it to a source code hub such as GitHub. A source control system allows you to look at changes and to go back to older versions, and uploading to a source code hub then provides a convenient backup.
If you use Git, create the .gitignore
file with:
/target
This ignores the target
directory that Cargo uses to store intermediate and binary files.
Continue to the next chapter where we will start implementing the “programmatic” part of programmatic incremental build systems.
Download source code
Programmability
In the introduction, we saw an example with tasks being the unit of incremental computation, and the context providing a way to create dynamic dependencies that enable incrementality.
Those two concepts, Task
and Context
, are the core of a programmatic incremental build system which we will implement in this chapter.
We will continue with the following steps in the next two sections:
- Create the
Task
andContext
API, forming the core of the build system. - Create a non-incremental
Context
implementation and test it againstTask
implementations, to get a feeling for the API without the complexities of incrementality.
Programmable Build System API
In this section, we will program the core API of the programmatic incremental build system. Although we are primarily concerned with programmability in this chapter, we must design the API to support incrementality!
The unit of computation in a programmatic build system is a task. A task is kind of like a closure: a value that can be executed to produce their output, but incremental. To provide incrementality, we also need to keep track of the dynamic dependencies that tasks make while they are executing. Therefore, tasks are executed under an incremental build context, enabling them to create these dynamic dependencies.
Tasks require files through the build context, creating a dynamic file dependency, ensuring the task gets re-executed when that file changes. Tasks also require other tasks through the build context, asking the build context to provide the consistent (most up-to-date) output of that task, and creating a dynamic task dependency to it.
It is then up to the build context to check if it actually needs to execute that required task. If the required task is already consistent, the build context can just return the cached output of that task. Otherwise, the build context executes the required task, caches its output, and returns the output to the requiring task. A non-incremental context can naively execute tasks without checking.
Because tasks require other tasks through the context, and the context selectively executes tasks, the definition of task and context is mutually recursive.
In this tutorial, we will be using the words context, build context, incremental build context, and build system interchangeably, typically using just context as it is concise.
Let’s make tasks and contexts more concrete by defining them in code.
API Implementation
Since we want users of the build system to implement their own tasks, we will define Task
as a trait.
Likewise, we will also be implementing multiple contexts in this tutorial, so we will also define Context
as a trait.
Add the following code to your pie/src/lib.rs
file:
use std::fmt::Debug;
use std::hash::Hash;
/// A unit of computation in a programmatic incremental build system.
pub trait Task: Clone + Eq + Hash + Debug {
/// Type of output this task returns when executed.
type Output: Clone + Eq + Debug;
/// Execute the task, using `context` to specify dynamic dependencies, returning `Self::Output`.
fn execute<C: Context<Self>>(&self, context: &mut C) -> Self::Output;
}
/// Programmatic incremental build context, enabling tasks to create dynamic dependencies that context implementations
/// use for incremental execution.
pub trait Context<T: Task> {
/// Requires given `task`, recording a dependency and selectively executing it. Returns its up-to-date output.
fn require_task(&mut self, task: &T) -> T::Output;
}
If this seems overwhelming to you, don’t worry. We will go through the API and explain things. But more importantly, the API should become more clear once we implement it in the next section and subsequent chapters.
Furthermore, if you’re new to Rust and/or need help understanding certain concepts, I will try to explain them in Rust Help blocks. They are collapsed by default to reduce distraction, clicking the header opens them. See the first Rust Help block at the end of this section.
The Task
trait has several supertraits that we will need later in the tutorial to implement incrementality:
Eq
andHash
: to check whether a task is equal to another one, and to create a hash of it, so we can use aHashMap
to get the output of a task if it is up-to-date.Clone
: to create a clone of the task so that we can store it in theHashMap
without having ownership of it.Debug
: to format the task for debugging purposes.
A Task
has a single method execute
, which takes a reference to itself (&self
), and a mutable reference to a context (context: &mut C
), and produces a value of type Self::Output
.
Because Context
is a trait, we use generics (<C: Context<Self>>
) to have execute
work for any Context
implementation (ignoring the Self
part for now).
The execute
method takes self by reference such that a task can access its data, but not mutate it, as that could throw off incrementality by changing the hash/equality of the task.
Finally, the type of output of a task is defined by the Output
associated type, and this type must implement Clone
, Eq
, and Debug
for the same reason as Task
.
The Context
trait is generic over Task
, allowing it to work with any task implementation.
It has a single method require_task
for creating a dependency to a task and returning its consistent (up-to-date) result.
It takes a mutable reference to itself, enabling dynamic dependency tracking and caching, which require mutation.
Because of this, the context reference passed to Task::execute
is also mutable.
This Task
and Context
API mirrors the mutually recursive definition of task and context we discussed earlier, and forms the basis for the entire build system.
Build the project by running cargo build
.
The output should look something like:
Compiling pie v0.1.0 (/pie)
Finished dev [unoptimized + debuginfo] target(s) in 0.03s
In the next section, we will implement a non-incremental Context
and test it against Task
implementations.
Rust Help: Modules, Imports, Ownership, Traits, Methods, Supertraits, Associated Types, Visibility
The Rust Programming Language is an introductory book about Rust. I will try to provide links to the book where possible.
Rust has a module system for project organization. The lib.rs
file is the “main file” of a library. Later on, we will be creating more modules in different files.
Things are imported into the current scope with
use
statements. We import the Debug
and Hash
traits from the standard library with two use
statements. Use statements use paths to refer to nested things. We use ::
for nesting, similar to namespaces in C++.
Rust models the concept of ownership to enable memory safety without a garbage collector.
The execute
method accepts a reference to the current type, indicated with &
: &self
. This reference is immutable, meaning that we can read data from it, but not mutate it. In Rust, things are immutable by default.
On the other hand, execute
accepts a mutable reference to the context, indicated with &mut
: context: &mut C
, which does allow mutation.
Traits are the main mechanism for open extensibility in Rust. They are comparable to interfaces in class-oriented languages. We will implement a context and tasks in the next section.
Supertraits are a kind of inheritance. The : Clone + Eq + Hash + Debug
part of the Task
trait means that every Task
implementation must also implement the Clone
, Eq
, Hash
, and Debug
traits. These traits are part of the standard library:
- Clone for duplicating values.
- Eq for equality comparisons, along with PartialEq.
- Hash for turning a value into a hash.
- Debug for formatting values in a programmer-facing debugging context.
Clone
and Eq
are so common that they are part of the Rust Prelude, so we don’t have to import those with use
statements.
Methods are functions that take a form of self
as the first argument. This enables convenient object-like calling syntax: context.require_task(&task);
.
Associated types are a kind of placeholder type in a trait such that methods of traits can use that type. In Task
this allows us to talk about the Output
type of a task. In Context
this allows us to refer to both the Task
type T
and its output type T::Output
. The ::
syntax here is used to access associated types of traits.
The Self
type in a trait is a built-in associated type that is a placeholder for the type that is implementing the trait.
The Task
trait is defined with pub
(public) visibility, such that users of the library can implement it. Because Task
uses Context
in its public API, Context
must also be public, even though we don’t intend for users to implement their own Context
.
Download source code
Non-Incremental Context
We set up the Task
and Context
API in such a way that we can implement incrementality.
However, incrementality is hard, so let’s start with an extremely simple non-incremental Context
implementation to get a feeling for the API.
We will implement file dependencies in the next chapter, as file dependencies only become important with incrementality.
Context module
Since we will be implementing three different contexts in this tutorial, we will separate them in different modules.
Create the context
module by adding a module to pie/src/lib.rs
:
This is a diff over pie/src/lib.rs
where lines with a green background are additions, lines with a red background are removals, lines without a special background are context on where to add/remove lines, and lines starting with @@
denote changed lines (in unified diff style). This is similar to diffs on source code hubs like GitHub.
Create the pie/src/context
directory, and in it, create the pie/src/context/mod.rs
file with the following contents:
pub mod non_incremental;
Both modules are public so that users of our library can access context implementations.
Create the pie/src/context/non_incremental.rs
file, it will be empty for now.
Your project structure should now look like:
pibs
├── pie
│ ├── Cargo.toml
│ └── src
│ ├── context
│ │ ├── non_incremental.rs
│ │ └── mod.rs
│ └── lib.rs
└── Cargo.toml
Confirm your module structure is correct by building with cargo build
.
Rust Help: Modules, Visibility
Modules are typically separated into different files.
Modules are declared with mod context
.
Then, the contents of a module are defined either by creating a sibling file with the same name: context.rs
, or by creating a sibling directory with the same name, with a mod.rs
file in it: context/mod.rs
.
Use the latter if you intend to nest modules, otherwise use the former.
Like traits, modules also have visibility.
Implementation
Implement the non-incremental context in pie/src/context/non_incremental.rs
by adding:
use crate::{Context, Task};
pub struct NonIncrementalContext;
impl<T: Task> Context<T> for NonIncrementalContext {
fn require_task(&mut self, task: &T) -> T::Output {
task.execute(self)
}
}
This NonIncrementalContext
is extremely simple: in require_task
we unconditionally execute the task, and pass self
along so the task we’re calling can require additional tasks.
Let’s write some tests to see if this does what we expect.
Rust Help: Crates (Libraries), Structs, Trait Implementations, Last Expression
In Rust, libraries are called crates.
We import the Context
and Task
traits from the root of your crate (i.e., the src/lib.rs
file) using crate::
as a prefix.
Structs are concrete types that can contain data through fields and implement traits, similar to classes in class-oriented languages.
Since we don’t need any data in NonIncrementalContext
, we define it as a unit-like struct.
Traits are implemented for a type with impl Context for NonIncrementalContext { ... }
, where we then have to implement all methods and associated types of the trait.
The Context
trait is generic over Task
, so in the impl
block we introduce a type parameter T
with impl<T>
, and use trait bounds as impl<T: Task>
to declare that T
must implement Task
.
The last expression of a function – in this case task.execute(self)
in require_task
which is an expression because it does not end with ;
– is used as the return value.
We could also write that as return task.execute(self);
, but that is more verbose.
Simple Test
Add the following test to pie/src/context/non_incremental.rs
:
#[cfg(test)]
mod test {
use super::*;
#[test]
fn test_require_task_direct() {
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
struct ReturnHelloWorld;
impl Task for ReturnHelloWorld {
type Output = String;
fn execute<C: Context<Self>>(&self, _context: &mut C) -> Self::Output {
"Hello World!".to_string()
}
}
let mut context = NonIncrementalContext;
assert_eq!("Hello World!", context.require_task(&ReturnHelloWorld));
}
}
In this test, we create a struct ReturnHelloWorld
which is the “hello world” of the build system.
We implement Task
for it, set its Output
associated type to be String
, and implement the execute
method to just return "Hello World!"
.
We derive the Clone
, Eq
, Hash
, and Debug
traits for ReturnHelloWorld
as they are required for all Task
implementations.
We require the task with our context by creating a NonIncrementalContext
, calling its require_task
method, passing
in a reference to the task.
It returns the output of the task, which we test with assert_eq!
.
Run the test by running cargo test
.
The output should look something like:
Compiling pie v0.1.0 (/pie)
Finished test [unoptimized + debuginfo] target(s) in 0.23s
Running unittests src/lib.rs (target/debug/deps/pie-4dd489880a9416ea)
running 1 test
test context::non_incremental::test::test_require_task_direct ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests pie
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Which indicates that the test indeed succeeds!
You can experiment by returning a different string from ReturnHelloWorld::execute
to see what a failed test looks like.
Rust Help: Unit Testing, Nested Items, Unused Parameters, Assertion Macros
Unit tests for a module
are typically defined by creating a nested module named test
with the #[cfg(test)]
attribute applied to it. In that
test
module, you apply #[test]
to testing functions, which then get executed when you run cargo test
.
The #[cfg(...)]
attribute provides conditional compilation for the item it is applied to. In this case, #[cfg(test)]
ensures that the module is only compiled when we run cargo test
.
We import all definitions from the parent module (i.e., the non_incremental
module) into the test
module with use super::*;
.
In Rust, items — that is, functions, structs, implementations, etc. —
can be nested inside functions. We use that in test_require_task_direct
to scope ReturnHelloWorld
and its implementation
to the test function, so it can’t clash with other test functions.
In execute
, we use _context
as the parameter name for the context, as the parameter is unused.
Unused parameters give a warning in Rust, unless it is prefixed by a _
.
assert_eq! is a macro that checks if its two expressions are equal. If not, it panics. This macro is typically used in tests for assertions, as a panic marks a test as failed.
Test with Multiple Tasks
Our first test only tests a single task that does not use the context, so let’s write a test with two tasks where one requires the other to increase our test coverage. Add the following test:
We use the same ReturnHelloWorld
task as before, but now also have a ToLowerCase
task which requires ReturnHelloWorld
and then turn its string lowercase.
However, due to the way we’ve set up the types between Task
and Context
, we will run into a problem.
Running cargo test
, you should get these errors:
Compiling pie v0.1.0 (/pie)
error[E0308]: mismatched types
--> pie/src/context/non_incremental.rs:47:30
|
47 | context.require_task(&ReturnHelloWorld).to_lowercase()
| ------------ ^^^^^^^^^^^^^^^^^ expected `&ToLowerCase`, found `&ReturnHelloWorld`
| |
| arguments to this method are incorrect
|
= note: expected reference `&ToLowerCase`
found reference `&non_incremental::test::test_require_task_problematic::ReturnHelloWorld`
note: method defined here
--> pie/src/lib.rs:18:6
|
18 | fn require_task(&mut self, task: &T) -> T::Output;
| ^^^^^^^^^^^^
For more information about this error, try `rustc --explain E0308`.
error: could not compile `pie` (lib test) due to previous error
The problem is that execute
of ToLowerCase
takes a Context<Self>
, so in impl Task for ToLowerCase
it takes a Context<ToLowerCase>
, while we’re trying to require &ReturnHelloWorld
through the context.
This doesn’t work as Context<ToLowerCase>::require_task
only takes a &ToLowerCase
as input.
We could change execute
of ToLowerCase
to take Context<ReturnHelloWorld>
:
But that is not allowed:
Compiling pie v0.1.0 (/pie)
error[E0276]: impl has stricter requirements than trait
--> pie/src/context/non_incremental.rs:46:21
|
46 | fn execute<C: Context<ReturnHelloWorld>>(&self, context: &mut C) -> Self::Output {
| ^^^^^^^^^^^^^^^^^^^^^^^^^ impl has extra requirement `C: Context<non_incremental::test::test_require_task_problematic::ReturnHelloWorld>`
|
::: pie/src/lib.rs:11:3
|
11 | fn execute<C: Context<Self>>(&self, context: &mut C) -> Self::Output;
| --------------------------------------------------------------------- definition of `execute` from trait
For more information about this error, try `rustc --explain E0276`.
error: could not compile `pie` (lib test) due to previous error
This is because the Task
trait defines execute
to take a Context<Self>
, thus every implementation of Task
must adhere to this, so we can’t solve it this way.
Effectively, due to the way we defined Task
and Context
, we can only use a single task implementation.
This is to simplify the implementation in this tutorial, as supporting multiple task types complicates matters a lot.
Why only a Single Task Type?
Currently, our context is parameterized by the type of tasks: Context<T>
.
Again, this is for simplicity.
An incremental context wants to build a single dependency graph and cache task outputs, so that we can figure out from the graph whether a task is affected by a change, and just return its output if it is not affected.
Therefore, a context implementation will maintain a Store<T>
.
Consider the case with two different task types
A Context<ReturnHelloWorld>
and Context<ToLowerCase>
would then have a Store<ReturnHelloWorld>
and Store<ToLowerCase>
respectively.
These two stores would then maintain two different dependency graphs, one where the nodes in the graph are ReturnHelloWorld
and one where the nodes are ToLowerCase
.
But that won’t work, as we need a single dependency graph over all tasks to figure out what is affected.
Therefore, we are restricted to a single task type in this tutorial.
To solve this, we would need to remove the T
generic parameter from Context
, and instead use trait objects.
However, this introduces a whole slew of problems because many traits that we use are not inherently trait-object safe.
Clone
is not object safe because it requires Sized
.
Eq
is not object safe because it uses Self
.
Serializing trait objects is problematic.
There are workarounds for all these things, but it is not pretty and very complicated.
The actual PIE library supports arbitrary task types through trait objects. We very carefully control where generic types are introduced, and which traits need to be object-safe. Check out the PIE library if you want to know more!
For now, we will solve this by just using a single task type which is an enumeration of the different possible tasks. First remove the problematic test:
Then add the following test:
Here, we instead define a single task Test
which is an enum
with two variants.
In its Task
implementation, we match ourselves and return "Hello World!"
when the variant is ReturnHelloWorld
.
When the variant is ReturnHelloWorld
, we require &Self::ReturnHelloWorld
through the context, which is now valid because it is an instance of Test
, and turn its string lowercase and return that.
This now works due to only having a single task type.
Run the test with cargo test
to confirm it is working.
Rust Help: Enum
Enums define a type by a set of variants, similar to enums in other languages, sometimes called tagged unions in other languages.
The match
expression matches the variant and dispatches based on that, similar to switch statements in other languages.
We have defined the API for the build system and implemented a non-incremental version of it. We’re now ready to start implementing an incremental context in the next chapter.
Download source code
Incrementality
In this chapter, we will implement an incremental build context. An incremental context selectively executes tasks — only those that are affected by a change. In other words, an incremental context executes the minimum number of tasks required to make all tasks up-to-date.
However, due to dynamic dependencies, this is not trivial. We cannot first gather all tasks into a dependency tree and then topologically sort that, as dependencies are added and removed while tasks are executing. To do incremental builds in the presence of dynamic dependencies, we need to check and execute affected tasks one at a time, updating the dependency graph, while tasks are executing. To achieve this, we will employ a technique called top-down incremental building, where we start checking if a top (root) task needs to be executed, and recursively check whether dependent tasks should be executed until we reach the bottom (leaf) task(s), akin to a depth-first search.
Furthermore, build systems almost always interact with the file system in some way. For example, tasks read configuration and source files, or write intermediate and binary files. Thus, a change in a file can affect a task that reads it, and executing a task can result in writing to new or existing files. Therefore, we will also keep track of file dependencies. Like task dependencies, file dependencies are also tracked dynamically while tasks are executing.
There are several ways to check if a file dependency is consistent (i.e., has not changed), such as checking the last modification date, or comparing a hash. To make this configurable on a per-dependency basis, we will implement stamps. A file stamp is just a value that is produced from a file, such as the modification date or hash, that is stored with the file dependency. To check if a file dependency is consistent, we just stamp the file again and compare it with the stored stamp.
Similarly, we can employ stamps for task dependencies as well by stamping the output of a task.
To achieve incrementality, we will continue with these steps in the following sections:
- Extend
Context
with a method to require a file, enabling tasks to specify dynamic dependencies to files. - Implement file stamps and task output stamps, and extend
Context
with methods to select stampers, enabling tasks to specify when a dependency is consistent. - Implement dynamic dependencies and their consistency checking.
- Implement a dependency graph store with methods to query and mutate the dependency graph.
- Implement the top-down incremental context that incrementally executes tasks.
Requiring Files
Since build systems frequently interact with files, and changes to files can affect tasks, we need to keep track of file dependencies.
Therefore, we will extend the Context
API with methods to require files, enabling tasks to specify dynamic dependencies to files.
Add a method to the Context
trait in pie/src/lib.rs
:
require_file
is similar to requiring a task, but instead takes a path
to a file or directory on the filesystem as input.
We use AsRef<Path>
as the type for the path, so that we can pass anything in that can dereference to a path.
For example, str
has an AsRef<Path>
implementation, so we can just use "test.txt"
as a path.
As an output, we return Result<Option<File>, io::Error>
, with File
being a handle to an open file.
The reason for this complicated type is:
- An incremental context will want to read the metadata (such as the last modified date) of the file, or create a hash over the file, to be able to detect changes. Because getting metadata or reading the file can fail, and we want to propagate this error, we return a
Result
withio::Error
as the error type. - Tasks can create a dependency to a file that does not exist, and the existence of that file affects the task. For example, a task that prints true or false based on if a file exists. If the file does not exist (or it is a directory), we cannot open it, so we cannot return a
File
, hence we useOption<File>
to returnNone
. - Otherwise, we return
Ok(Some(file))
so that the task can read the opened file.
Rust Help: Error Handling with Result, Optional, AsRef Conversion
Recoverable error handling in Rust is done with the Result<T, E>
type, which can either be Ok(t)
or Err(e)
.
In contrast to many languages which use exceptions, throwing, and exception handling; Rust treats recoverable errors just as regular values.
Similarly, optional values in Rust are defined using the Option<T>
type, which can either be Some(t)
or None
.
Rust has many traits for converting values or references into others, which provides a lot of convenience in what would otherwise require a lot of explicit conversions.
AsRef<T>
is such a conversion trait, that can convert itself into &T
.
Here, we use AsRef<Path>
as a generic with a trait bound to support many different kinds of values to the path
argument in require_file
.
For example, we can call context.require_file("test.txt")
because str
, which is the type of string constants, implements AsRef<Path>
.
You can also see this as a kind of method overloading, without having to provide concrete overloads for all supported types.
Now we need to implement this method for NonIncrementalContext
.
However, because we will be performing similar file system operations in the incremental context as well, we will create some utility functions for this first.
Filesystem utilities
Add the fs
module to pie/src/lib.rs
:
Create file pie/src/fs.rs
with:
use std::{fs, io};
use std::fs::{File, Metadata};
use std::path::Path;
/// Gets the metadata for given `path`, returning:
/// - `Ok(Some(metadata))` if a file or directory exists at given path,
/// - `Ok(None)` if no file or directory exists at given path,
/// - `Err(e)` if there was an error getting the metadata for given path.
pub fn metadata(path: impl AsRef<Path>) -> Result<Option<Metadata>, io::Error> {
match fs::metadata(path) {
Err(e) if e.kind() == io::ErrorKind::NotFound => Ok(None),
Err(e) => Err(e),
Ok(m) => Ok(Some(m))
}
}
/// Attempt to open file at given `path`, returning:
/// - `Ok(Some(file))` if the file exists at given path,
/// - `Ok(None)` if no file exists at given path (but a directory could exist at given path),
/// - `Err(e)` if there was an error getting the metadata for given path, or if there was an error opening the file.
///
/// This function is necessary due to Windows returning an error when attempting to open a directory.
pub fn open_if_file(path: impl AsRef<Path>) -> Result<Option<File>, io::Error> {
let file = match metadata(&path)? {
Some(metadata) if metadata.is_file() => Some(File::open(&path)?),
_ => None,
};
Ok(file)
}
The metadata
function gets the filesystem metadata given a path, and open_if_file
opens the file for given path.
The reason for these functions is that the standard library function std::fs::metadata
treats non-existent files as an error, whereas we don’t want to treat it as an error and just return None
.
Furthermore, open_if_file
works around an issue where opening a directory on Windows (and possibly other operating systems) is an error, where we want to treat it as None
again.
The documentation comments explain the exact behaviour.
Rust Help: Error Propagation, Documentation Comments
The ?
operator makes it easy to propgate errors.
Because errors are just values in Rust, to propgate an error, you’d normally have to match each result and manually propagate the error.
The r?
operator applied to a Result
r
does this for you, it basically desugars to something like match r { Err(e) => return Err(e), _ => {} }
.
Comments with three forward slashes ///
are documentation comments that document the function/struct/enum/trait/etc. they are applied to.
We will write some tests to confirm the behaviour, but for that we need utilities to create temporary files and directories.
Furthermore, we will be writing more unit tests – but also integration tests – in this tutorial, so we will set up these utilities in such a way that they are reachable by both unit and integration tests.
The only way to do that right now in Rust, is to create a separate package and have the pie
package depend on it.
And yes, we went from adding file dependencies, to creating file system utilities, to testing those file system utilities, to creating testing utilities, and now to making a crate for those testing utilities. Sorry about that 😅, we will start unwinding this stack soon!
Create the dev_shared
package
Next to the pie
directory, create a directory named dev_shared
(i.e., the pibs/dev_shared
directory where pibs
is your workspace directory).
Create the dev_shared/Cargo.toml
file with the following contents:
[package]
name = "dev_shared"
version = "0.1.0"
edition = "2021"
[dependencies]
tempfile = "3"
We’ve added the tempfile
dependency here already, which is a library that creates and automatically cleans up temporary files and directories.
Rust Help: Dependencies
We use other libraries (crates) by specifying dependencies.
Because basically every Rust library adheres to semantic versioning, we can use "3"
as a version requirement which indicates that we will use the most up-to-date 3.x.x
version.
Create the main library file dev_shared/src/lib.rs
, with functions for creating temporary files and directories:
use std::io;
use tempfile::{NamedTempFile, TempDir};
/// Creates a new temporary file that gets cleaned up when dropped.
pub fn create_temp_file() -> Result<NamedTempFile, io::Error> { NamedTempFile::new() }
/// Creates a new temporary directory that gets cleaned up when dropped.
pub fn create_temp_dir() -> Result<TempDir, io::Error> { TempDir::new() }
Now add the dev_shared
package to the members of the workspace in Cargo.toml
(i.e., the pibs/Cargo.toml
file where pibs
is your workspace directory):
Your directory structure should now look like this:
pibs
├── pie
│ ├── Cargo.toml
│ └── src
│ ├── context
│ │ ├── non_incremental.rs
│ │ └── mod.rs
│ ├── lib.rs
│ └── fs.rs
├── Cargo.toml
└── dev_shared
├── Cargo.toml
└── src
└── lib.rs
To access these utility functions in the pie
crate, add a dependency to dev_shared
in pie/Cargo.toml
along with another create that will help testing:
We’re using a path dependency to have pie
depend on the dev_shared
package at "../dev_shared"
in the workspace.
We’ve also added a dependency to assert_matches, which is a handy library for asserting that a value matches a pattern.
Note that these dependencies are added under dev-dependencies
, indicating that these dependencies are only available when running tests, benchmarks, and examples.
Therefore, users of our library will not depend on these libraries, which is good, because temporary file management and assertions are not necessary to users of the library.
Testing filesystem utilities
Back to testing our filesystem utilities.
Add the following tests to pie/src/fs.rs
:
#[cfg(test)]
mod test {
use std::fs::remove_file;
use std::io;
use assert_matches::assert_matches;
use dev_shared::{create_temp_dir, create_temp_file};
use super::*;
#[test]
fn test_metadata_ok() -> Result<(), io::Error> {
let temp_file = create_temp_file()?;
let metadata = metadata(temp_file)?;
assert_matches!(metadata, Some(metadata) => {
assert!(metadata.is_file());
});
Ok(())
}
#[test]
fn test_metadata_none() -> Result<(), io::Error> {
let temp_file = create_temp_file()?;
remove_file(&temp_file)?;
let metadata = metadata(&temp_file)?;
assert!(metadata.is_none());
Ok(())
}
#[test]
fn test_open_if_file() -> Result<(), io::Error> {
let temp_file = create_temp_file()?;
let file = open_if_file(&temp_file)?;
assert!(file.is_some());
Ok(())
}
#[test]
fn test_open_if_file_non_existent() -> Result<(), io::Error> {
let temp_file = create_temp_file()?;
remove_file(&temp_file)?;
let file = open_if_file(&temp_file)?;
assert!(file.is_none());
Ok(())
}
#[test]
fn test_open_if_file_on_directory() -> Result<(), io::Error> {
let temp_dir = create_temp_dir()?;
let file = open_if_file(temp_dir)?;
assert!(file.is_none());
Ok(())
}
}
We test whether the functions conform to the specified behaviour.
Unfortunately, we can’t easily test when metadata
and open_if_file
should return an error, because we cannot disable read permissions on files via the Rust standard library.
We use our create_temp_file
and create_temp_dir
utility functions to create temporary files and directories.
The tempfile
library takes care of deleting temporary files when they go out of scope (at the end of the test).
We use assert_matches!
to assert that metadata
is Some(metadata)
, binding metadata
in the => { ... }
block in which we assert that the metadata describes a file.
We will use this macro more in future integration tests.
Rust Help: Result in Tests
Tests can return Result
.
When a test returns an Err
, the test fails.
This allows us to write more concise tests using error propagation.
Implement require_file
Now we are done unwinding our stack and have filesystem and testing utilities.
Make the non-incremental context compatible by changing pie/src/context/non_incremental.rs
:
Since the non-incremental context does not track anything, we only try to open the file and return it, matching the contract in the documentation comment of the Context::require_file
trait method.
Confirm everything works with cargo test
.
Download source code
Stamps
To check whether we need to execute a task, we need to check the dependencies of that task to see if any of them are inconsistent.
To make this consistency checking configurable, we will use stamps.
A dependency is inconsistent if after stamping, the new stamp is different from the old stamp.
Therefore, we will implement a FileStamper
that stamps files and produces a FileStamp
, and an OutputStamper
that stamps task outputs and produces an OutputStamp
.
Add the stamp
module to pie/src/lib.rs
:
This module is public as users of the library will construct stampers.
File stamps
Create the pie/src/stamp.rs
file and add:
use std::fmt::Debug;
use std::io;
use std::path::Path;
use std::time::SystemTime;
use crate::fs::metadata;
#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash, Debug)]
pub enum FileStamper {
Exists,
Modified,
}
#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash, Debug)]
pub enum FileStamp {
Exists(bool),
Modified(Option<SystemTime>),
}
impl FileStamper {
pub fn stamp(&self, path: impl AsRef<Path>) -> Result<FileStamp, io::Error> {
match self {
FileStamper::Exists => {
Ok(FileStamp::Exists(path.as_ref().try_exists()?))
}
FileStamper::Modified => {
let Some(metadata) = metadata(path)? else {
return Ok(FileStamp::Modified(None));
};
Ok(FileStamp::Modified(Some(metadata.modified()?)))
}
}
}
}
We’re implementing FileStamper
as an enum for simplicity.
A FileStamper
has a single method stamp
which takes something that can be dereferenced to a path, and produces a FileStamp
or an error if creating the stamp failed.
For now, we implement only two kinds of file stampers: Exists
and Modified
.
The Exists
stamper just returns a boolean indicating whether a file exists.
It can be used to create a file dependency where a task behaves differently based on whether a file exists or not.
The Modified
stamper returns the last modification date if the file exists, or None
if the file does not exist.
We derive Eq
for stamps so that we can compare them.
Equal (same) stamps indicate a consistent dependency, unequal (different) indicates inconsistent.
We also derive Eq
for stampers, because the stamper of a dependency could change, making the dependency inconsistent.
Task output stamps
We implement task output stampers in a similar way.
Add to pie/src/stamp.rs
:
#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash, Debug)]
pub enum OutputStamper {
Inconsequential,
Equals,
}
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum OutputStamp<O> {
Inconsequential,
Equals(O),
}
impl OutputStamper {
pub fn stamp<O>(&self, output: O) -> OutputStamp<O> {
match self {
OutputStamper::Inconsequential => OutputStamp::Inconsequential,
OutputStamper::Equals => OutputStamp::Equals(output),
}
}
}
The Inconsequential
stamper simply ignores the output and always returns the same stamp (thus is always equal).
It can be used to create a task dependency where we are interested in some side effect of a task, but don’t care about its output.
The Equals
stamper simply wraps the output of a task, so the stamp is equal when the output is equal.
Output stamps are generic over the task output type O
.
Trait Bounds and Derive Macros
Because O
is used in the enum, the derive
attributes on OutputStamp
create bounds over O
.
Thus, OutputStamp
is only Clone
when O
is Clone
, OutputStamp
is only Eq
when O
is Eq
, and so forth.
Because we declared Task::Output
with bound Clone + Eq + Debug
, we can be sure that OutputStamp
is always Clone
, Eq
, and Debug
.
User-Defined Stamps?
FileStamper
and OutputStamper
could also be a trait which would allow users of the library to implement their own stampers.
For simplicity, we do not explore that option in this tutorial.
In the actual PIE library, stampers (called checkers) can be implemented by users of the library!
Tests
Finally, we write some tests.
Add to pie/src/stamp.rs
:
#[cfg(test)]
mod test {
use std::fs::{remove_file, write};
use std::io;
use dev_shared::create_temp_file;
use super::*;
#[test]
fn test_exists_file_stamper() -> Result<(), io::Error> {
let stamper = FileStamper::Exists;
let temp_file = create_temp_file()?;
let stamp = stamper.stamp(&temp_file)?;
assert_eq!(stamp, stamper.stamp(&temp_file)?);
remove_file(&temp_file)?;
assert_ne!(stamp, stamper.stamp(&temp_file)?);
Ok(())
}
#[test]
fn test_modified_file_stamper() -> Result<(), io::Error> {
let stamper = FileStamper::Modified;
let temp_file = create_temp_file()?;
let stamp = stamper.stamp(&temp_file)?;
assert_eq!(stamp, stamper.stamp(&temp_file)?);
write(&temp_file, format!("{:?}", stamp))?;
let new_stamp = stamper.stamp(&temp_file)?;
assert_ne!(stamp, new_stamp);
let stamp = new_stamp;
remove_file(&temp_file)?;
assert_ne!(stamp, stamper.stamp(&temp_file)?);
Ok(())
}
#[test]
fn test_inconsequential_output_stamper() {
let stamper = OutputStamper::Inconsequential;
let stamp = stamper.stamp(&1);
assert_eq!(stamp, stamper.stamp(&1));
assert_eq!(stamp, stamper.stamp(&2));
}
#[test]
fn test_equals_output_stamper() {
let stamper = OutputStamper::Equals;
let stamp = stamper.stamp(&1);
assert_eq!(stamp, stamper.stamp(&1));
assert_ne!(stamp, stamper.stamp(&2));
}
}
We test file stamps by creating a stamp, changing the file, creating a new stamp, and then compare the stamps.
We test task output stamps by just passing a different output value to the stamp
function, and then compare the stamps.
Run cargo test
to test the stamp implementation.
However, a test could fail on some operating systems.
Do continue to the next subsection if this happens.
Test test_modified_file_stamper
will likely fail. Do continue to the next subsection, because we’re going to fix it!
Testing with file modified time, correctly
Unfortunately, these tests may fail on some operating systems (Linux and Windows in my testing), due to an imprecise file last modified timer.
What can happen is that we write to a file, making the OS update its modified time to 1000
(as an example, not a real timestamp), then very quickly write to the file again, making the OS update its modified time to 1000
again.
Then, our test will fail because the stamp didn’t change even though we expect it to change.
This can happen with an imprecise timer that only increases once every millisecond (again, an example, not a real number) when we perform writes in between that millisecond. Even worse, our test can be flaky, sometimes succeeding if we write in between those milliseconds, sometimes failing if we write within a millisecond.
To solve this, add a function to the filesystem testing utility crate.
Change dev_shared/src/lib.rs
:
The write_until_modified
function writes to the file, but ensures its modified time will change.
Now change the tests in pie/src/stamp.rs
to use this function:
Now we use write_until_modified
to write to the file, ensuring its modified time will change, ensuring the stamp will change when it should.
Run cargo test
to confirm the stamp implementation, which should succeed now.
Stamps in Context
We now have a module dedicated to stamps.
However, stampers are constructed by users of the library that author tasks, and they need to pass in these stampers when creating dependencies.
Therefore, we need to update the Context
trait to allow passing in these stampers.
Change Context
in pie/src/lib.rs
:
We add the require_file_with_stamper
method which allow passing in a stamper.
We add a default implementation for require_file
that passes in a default stamper.
The default is provided by default_require_file_stamper
which can be overridden by context implementations.
Now apply the same to tasks, changing Context
again in pie/src/lib.rs
:
Update NonIncrementalContext
in src/context/non_incremental.rs
to implement the new methods:
We just ignore the stampers in NonIncrementalContext
, as they are only needed for incrementality.
Run cargo test
to confirm everything still works.
Download source code
Dynamic Dependencies
Now that we’ve implemented stamps, we can implement dynamic dependencies and their consistency checking.
A dependency is inconsistent if after stamping, the new stamp is different from the old stamp.
Therefore, dependencies need to keep track of their stamper and their previous stamp.
To that end, we will implement the FileDependency
and TaskDependency
types with methods for consistency checking.
We will also implement a Dependency
type that abstracts over FileDependency
and TaskDependency
, which we will need for the dependency graph implementation in the next chapter.
Add the dependency
module to pie/src/lib.rs
:
Users of the library will not construct dependencies.
They will create dependencies (and choose stampers) via Context
methods.
However, dependencies will be used in the public API for debug logging later, so we make the module public.
File dependencies
Create the pie/src/dependency.rs
file and add:
use std::fmt::Debug;
use std::fs::File;
use std::io;
use std::path::PathBuf;
use crate::{Context, Task};
use crate::fs::open_if_file;
use crate::stamp::{FileStamp, FileStamper, OutputStamp, OutputStamper};
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct FileDependency {
path: PathBuf,
stamper: FileStamper,
stamp: FileStamp,
}
impl FileDependency {
/// Creates a new file dependency with `path` and `stamper`, returning:
/// - `Ok(file_dependency)` normally,
/// - `Err(e)` if stamping failed.
#[allow(dead_code)]
pub fn new(path: impl Into<PathBuf>, stamper: FileStamper) -> Result<Self, io::Error> {
let path = path.into();
let stamp = stamper.stamp(&path)?;
let dependency = FileDependency { path, stamper, stamp };
Ok(dependency)
}
/// Creates a new file dependency with `path` and `stamper`, returning:
/// - `Ok((file_dependency, Some(file)))` if a file exists at given path,
/// - `Ok((file_dependency, None))` if no file exists at given path (but a directory could exist at given path),
/// - `Err(e)` if stamping or opening the file failed.
pub fn new_with_file(path: impl Into<PathBuf>, stamper: FileStamper) -> Result<(Self, Option<File>), io::Error> {
let path = path.into();
let stamp = stamper.stamp(&path)?;
let file = open_if_file(&path)?;
let dependency = FileDependency { path, stamper, stamp };
Ok((dependency, file))
}
/// Returns the path of this dependency.
#[allow(dead_code)]
pub fn path(&self) -> &PathBuf { &self.path }
/// Returns the stamper of this dependency.
#[allow(dead_code)]
pub fn stamper(&self) -> &FileStamper { &self.stamper }
/// Returns the stamp of this dependency.
#[allow(dead_code)]
pub fn stamp(&self) -> &FileStamp { &self.stamp }
/// Checks whether this file dependency is inconsistent, returning:
/// - `Ok(Some(stamp))` if this dependency is inconsistent (with `stamp` being the new stamp of the dependency),
/// - `Ok(None)` if this dependency is consistent,
/// - `Err(e)` if there was an error checking this dependency for consistency.
pub fn is_inconsistent(&self) -> Result<Option<FileStamp>, io::Error> {
let new_stamp = self.stamper.stamp(&self.path)?;
if new_stamp == self.stamp {
Ok(None)
} else {
Ok(Some(new_stamp))
}
}
}
A FileDependency
stores the path
the dependency is about, the stamper
used to create a stamp for this dependency, and the stamp
that was created at the time the file dependency was made.
The FileDependency::new_with_file
function also returns the opened file if it exists, so that users of this function can read from the file without having to open it again.
We add getter methods to get parts of the file dependency without allowing mutation.
Since we will use those getter methods later, we annotate them with #[allow(dead_code)]
to disable unused warnings.
A file dependency is inconsistent when the stored stamp is not equal to a stamp that we create at the time of checking, implemented in FileDependency::is_inconsistent
.
For example, if we created a file dependency (with modified stamper) for a file that was modified yesterday, then modify the file, and then call is_inconsistent
on the file dependency, it would return Some(new_stamp)
indicating that the dependency is inconsistent.
We implement an is_inconsistent
method here instead of an is_consistent
method, so that we can return the changed stamp when the dependency is inconsistent, which we will use for debug logging purposes later.
Creating and checking a file dependency can fail due to file operations failing (for example, cannot access the file), so we propagate those errors.
Task dependencies
Task dependencies are implemented in a similar way.
Add to pie/src/dependency.rs
:
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct TaskDependency<T, O> {
task: T,
stamper: OutputStamper,
stamp: OutputStamp<O>,
}
impl<T: Task> TaskDependency<T, T::Output> {
/// Creates a new `task` dependency with `stamper` and `output`.
pub fn new(task: T, stamper: OutputStamper, output: T::Output) -> Self {
let stamp = stamper.stamp(output);
Self { task, stamper, stamp }
}
/// Returns the task of this dependency.
#[allow(dead_code)]
pub fn task(&self) -> &T { &self.task }
/// Returns the stamper of this dependency.
#[allow(dead_code)]
pub fn stamper(&self) -> &OutputStamper { &self.stamper }
/// Returns the stamp of this dependency.
#[allow(dead_code)]
pub fn stamp(&self) -> &OutputStamp<T::Output> { &self.stamp }
/// Checks whether this task dependency is inconsistent, returning:
/// - `Some(stamp)` if this dependency is inconsistent (with `stamp` being the new stamp of the dependency),
/// - `None` if this dependency is consistent.
pub fn is_inconsistent<C: Context<T>>(&self, context: &mut C) -> Option<OutputStamp<T::Output>> {
let output = context.require_task(&self.task);
let new_stamp = self.stamper.stamp(output);
if new_stamp == self.stamp {
None
} else {
Some(new_stamp)
}
}
}
A TaskDependency
stores the task
the dependency is about, along with its stamper
and stamp
that is created when the dependency is created.
Task dependencies are generic over the type of tasks T
, and their type of outputs O
.
We also add immutable getters here.
Why not a Trait Bound on TaskDependency?
We chose not to put a Task
trait bound on TaskDependency
, and instead put the bound on the impl.
There are several up and downsides that should be considered when making such a decision.
The main upside for putting the Task
bound on the TaskDependency
struct, is that we can leave out O
and use OutputStamp<T::Output>
as the type of the stamp
field.
This cuts down a generic parameter, which reduces boilerplate.
The downside is that we need to then put the Task
bound on every struct that uses TaskDependency
, which increases boilerplate.
In this case, we chose not to put the trait bound on the struct to prevent that trait bound from bubbling up into other structs that use TaskDependency
, as it would need to appear in almost every struct in the library.
A task dependency is inconsistent if, after recursively checking it, its stamp has changed, implemented in TaskDependency::is_inconsistent
.
Usually, this will be using the Equals
task output stamper, so a task dependency is usually inconsistent when the output of the task changes.
Because we need to recursively check the task, TaskDependency::is_inconsistent
requires a context to be passed in.
Again, there is more mutual recursion here.
This recursive consistency checking is one of the core ideas that make programmatic incremental build systems possible. But why is this so important? Why do we need recursive checking? Well, we want our build system to be sound, meaning that we must execute all tasks that are affected by a change. When we do not execute a task that is affected by a change, we are unsound, and introduce an incrementality bug!
Because of dynamic dependencies, a change in a leaf in the dependency tree may affect a task at the root. For example, a compilation task depends on a task that reads a configuration file, which depends on the configuration file. A change to a configuration file (leaf) affects a task that reads the configuration file, which in turn affects the compilation task (root). Therefore, we need to recursively check the dependency tree in order to execute all tasks affected by changes.
A different way to think about this, is to think about the invariant of the dependency consistency checking. The invariant is that a dependency is consistent if and only if the subtree of that dependency is consistent, and the dependency itself is consistent. The easiest way to adhere to this invariant, is recursive checking.
A final note about recursive checking is that tasks can be executed during it, and executing task can lead to new dynamic dependencies.
However, recursive checking handles this without problems because these dependencies are created through the Context
, which in turn will call is_inconsistent
when needed.
Dependency enum
Finally, we create a Dependency
enum that abstracts over these two kinds of dependencies.
Add to pie/src/dependency.rs
:
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum Dependency<T, O> {
RequireFile(FileDependency),
RequireTask(TaskDependency<T, O>),
}
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum Inconsistency<O> {
File(FileStamp),
Task(OutputStamp<O>),
}
impl<T: Task> Dependency<T, T::Output> {
/// Checks whether this dependency is inconsistent, returning:
/// - `Ok(Some(stamp))` if the dependency is inconsistent (with `stamp` being the new stamp of the dependency),
/// - `Ok(None)` if the dependency is consistent,
/// - `Err(e)` if there was an error checking the dependency for consistency.
pub fn is_inconsistent<C: Context<T>>(&self, context: &mut C) -> Result<Option<Inconsistency<T::Output>>, io::Error> {
let option = match self {
Dependency::RequireFile(d) => d.is_inconsistent()?
.map(|s| Inconsistency::File(s)),
Dependency::RequireTask(d) => d.is_inconsistent(context)
.map(|s| Inconsistency::Task(s)),
};
Ok(option)
}
}
Dependency
just merges the two kinds of dependencies and provides an is_inconsistent
method that calls the corresponding method.
We return the changed stamp here as well for debug logging later.
We wrap the changed stamp in an Inconsistency
enum, and map to the correct variant if there is an inconsistency.
Because Dependency
can store a TaskDependency
, we need to propagate the T
and O
generics.
Likewise, Inconsistency
propagates the O
generic for OutputStamp
.
User-Defined Dependencies?
Like with stampers, Dependency
could also be a trait to allow users of the library to define their own dependencies.
However, there are two requirements that make it hard to define such a trait:
- We can implement different
Context
s which treat some dependencies differently. For example, in the actual PIE library, we have a bottom-up context that schedules tasks from the bottom-up. This bottom-up context treats file and task dependencies in a completely different way compared to the top-down context. - Dynamic dependencies also require validation to ensure correctness, which we will do later on in the tutorial.
It is currently unclear to me how to create a Dependency
trait with these requirements in mind.
Tests
As usual, we write some tests to confirm the behaviour.
Add tests to pie/src/dependency.rs
:
#[cfg(test)]
mod test {
use std::fs::write;
use std::io::{self, Read};
use dev_shared::{create_temp_file, write_until_modified};
use crate::context::non_incremental::NonIncrementalContext;
use super::*;
/// Task that reads file at given path and returns it contents as a string.
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
struct ReadStringFromFile(PathBuf);
impl Task for ReadStringFromFile {
type Output = String;
fn execute<C: Context<Self>>(&self, context: &mut C) -> Self::Output {
let mut string = String::new();
let file = context.require_file(&self.0).expect("failed to require file");
if let Some(mut file) = file {
file.read_to_string(&mut string).expect("failed to read from file");
};
string
}
}
#[test]
fn test_file_dependency_consistency() -> Result<(), io::Error> {
let mut context = NonIncrementalContext;
let temp_file = create_temp_file()?;
write(&temp_file, "test1")?;
let file_dependency = FileDependency::new(temp_file.path(), FileStamper::Modified)?;
let dependency: Dependency<ReadStringFromFile, String> = Dependency::RequireFile(file_dependency.clone());
assert!(file_dependency.is_inconsistent()?.is_none());
assert!(dependency.is_inconsistent(&mut context)?.is_none());
// Change the file, changing the stamp the stamper will create next time, making the file dependency inconsistent.
write_until_modified(&temp_file, "test2")?;
assert!(file_dependency.is_inconsistent()?.is_some());
assert!(dependency.is_inconsistent(&mut context)?.is_some());
Ok(())
}
#[test]
fn test_task_dependency_consistency() -> Result<(), io::Error> {
let mut context = NonIncrementalContext;
let temp_file = create_temp_file()?;
write(&temp_file, "test1")?;
let task = ReadStringFromFile(temp_file.path().to_path_buf());
let output = context.require_task(&task);
let task_dependency = TaskDependency::new(task.clone(), OutputStamper::Equals, output);
let dependency = Dependency::RequireTask(task_dependency.clone());
assert!(task_dependency.is_inconsistent(&mut context).is_none());
assert!(dependency.is_inconsistent(&mut context)?.is_none());
// Change the file, causing the task to return a different output, changing the stamp the stamper will create next
// time, making the task dependency inconsistent.
write_until_modified(&temp_file, "test2")?;
assert!(task_dependency.is_inconsistent(&mut context).is_some());
assert!(dependency.is_inconsistent(&mut context)?.is_some());
Ok(())
}
}
We test a file dependency by asserting that is_inconsistent
returns Some
after changing the file.
Testing task dependencies requires a bit more work.
We create task ReadStringFromFile
that reads a string from a file, and then returns that string as output.
We require the task to get its output ("test1"
), and create a task dependency with it.
Then, we change the file and check consistency of the task dependency.
That recursively requires the task, the context will execute the task, and the task now returns ("test2"
).
Since we use the Equals
output stamper, and "test1"
does not equal "test2"
, the dependency is inconsistent and returns a stamp containing "test2"
.
Note that we are both testing the specific dependencies (FileDependency
and TaskDependency
), and the general Dependency
.
Normally, a task such as ReadStringFromFile
shound return a Result<String, io::Error>
, but for testing purposes we are just using panics with expect
.
In the file dependency case, using Dependency
requires an explicit type annotation because there is no task to infer the type from.
We just use Dependency<ReadStringFromFile, String>
as the type, and this is fine even though we don’t use ReadStringFromFile
in that test, because the Dependency::RequireFile
variant does not use those types.
Run cargo test
to confirm everything still works.
You will get some warnings about unused things, but that is ok as we will use them in the next section.
Download source code
Dependency Graph Store
To do incremental building, we need to keep track of all files, tasks, their dependencies, and task outputs, in a dependency graph.
This will be the responsibility of the Store
data structure.
Context implementations will use methods on Store
to query and mutate the dependency graph.
In other words, Store
encapsulates the dependency graph.
However, writing a dependency graph data structure is outside of the scope of this tutorial, so we will be using the pie_graph
library which we prepared exactly for this use case.
The graph from this library is a directed acyclic graph (DAG), meaning that edges are directed and there may be no cycles in edges, as that would prohibit topological orderings.
Graph Library
The pie_graph
library is a modified version of the great
incremental-topo
library which implements incremental topological ordering: it keeps the topological ordering up-to-date incrementally while nodes and edges are added and removed.
That is exactly what we need, as dynamic dependencies prevents us from calculating the topological ordering in one go, and calculating the topological ordering after every task execution is prohibitively expensive.
The implementation in the incremental-topo
library is based on a paper by D. J. Pearce and P. H. J. Kelly that describes several dynamic topological sort algorithms for directed acyclic graphs.
Add the pie_graph
dependency to pie/Cargo.toml
:
Store basics
Add the store
module to pie/src/lib.rs
:
This module is private, as users of the library should not interact with the store.
Only Context
implementations will use the store.
Create the pie/src/store.rs
file and add the following to get started:
use std::path::{Path, PathBuf};
use pie_graph::{DAG, Node};
use crate::dependency::{Dependency, FileDependency, TaskDependency};
use crate::Task;
/// Stores files and tasks, and their dependencies, in a DAG (directed acyclic graph). Provides operations to mutate
/// and query this graph.
pub struct Store<T, O> {
graph: DAG<NodeData<T, O>, Dependency<T, O>>,
}
#[derive(Debug)]
enum NodeData<T, O> {
File(PathBuf),
Task {
task: T,
output: Option<O>,
},
}
impl<T: Task> Default for Store<T, T::Output> {
fn default() -> Self {
Self {
graph: DAG::default(),
}
}
}
The Store
is generic over tasks T
and their outputs O
, like we have done before with Dependency
.
The DAG
type from pie_graph
represents a DAG with nodes and edges, and data attached to those nodes and edges.
The nodes in our graph are either files or tasks, and the edges are dependencies.
The first generic argument to DAG
is the type of data to attach to nodes, which is NodeData<T, O>
in our case.
Because nodes can be files or tasks, NodeData<T, O>
enumerates these, storing the path for files, and the task along with its output for tasks.
We store file paths as PathBuf
, which is the owned version of Path
(similar to String
being the owned version of str
).
The task output is stored as Option<O>
because we can add a task to the graph without having executed it, so we don’t have its output yet.
The second argument is the type of data to attach to edges, which is Dependency<T, O>
, using the Dependency
enum we defined earlier.
We implement Default
for the store to initialize it.
Why not Derive Default?
We cannot derive this Default
implementation even though it seems we should be able to, because the derived implementation will require T
and O
to be Default
, and this is not always the case.
This is because the Default
derive macro is conservative and adds a : Default
bound to every generic argument in the Default
trait implementation, and there is no way to disable this behaviour.
Therefore, we implement Default
ourselves.
There are several crates that have more configurable derive macros for these things, but adding an extra dependency to generate a few lines of code is not worth the extra compilation time, so we just implement it manually here.
Graph nodes
A node in DAG
is represented by a Node
, which is a transparent identifier (sometimes called a handle) that points to the node and its data.
We can create nodes in the graph, and then query attached data (NodeData
) given a node.
So DAG
allows us to go from Node
to a PathBuf
and task T
through attached NodeData
.
However, we want each unique file and task to be represented by a single unique node in the graph. We need this for incrementality so that if the build system encounters the same task twice, we can find the corresponding task node in the graph the second time, check if it is consistent, and return its output if it is.
To ensure unique nodes, we need to maintain the reverse mapping from PathBuf
and T
to Node
ourselves, which we will do with HashMap
s.
This is also the reason for the Eq
and Hash
trait bounds on the Task
trait, so we can use them as keys in HashMap
s.
Change pie/src/store.rs
to add hash maps to map between these things:
To prevent accidentally using a file node as a task node, and vice versa, change pie/src/store.rs
to add specific types of nodes:
The FileNode
and TaskNode
types are newtypes that wrap a Node
into a specific type of node.
The Borrow
implementations will make subsequent code a bit more concise by automatically converting &FileNode
and &TaskNode
s to &Node
s.
Do these Newtypes Improve Type-Safety?
Because the Node
s inside the newtypes are not public, it is not possible to construct a FileNode
or TaskNode
outside of this module.
Therefore, if we only accept and create FileNode
and TaskNode
in the Store
API, it is not possible to use the wrong kind of node, increasing type-safety.
The Borrow
implementation does leak outside of this module, but not outside of this crate (library).
This is because the visibility of a trait implementation is the intersection of the visibilities of the trait and type it is implemented on.
Borrow
is public, but FileNode
and TaskNode
are only public within this crate.
Thefore, modules of this crate can extract the Node
out of FileNode
and TaskNode
.
However, that Node
cannot be used to construct a FileNode
or TaskNode
, so it is not a problem.
Now we will add methods create nodes and to query their attached data.
Add the following code to pie/src/store.rs
:
impl<T: Task> Store<T, T::Output> {
/// Gets the file node for `path`, or creates a file node by adding it to the dependency graph.
pub fn get_or_create_file_node(&mut self, path: impl AsRef<Path>) -> FileNode {
let path = path.as_ref();
if let Some(file_node) = self.file_to_node.get(path) {
*file_node
} else {
let node = self.graph.add_node(NodeData::File(path.to_path_buf()));
let node = FileNode(node);
self.file_to_node.insert(path.to_path_buf(), node);
node
}
}
/// Gets the path for `node`.
///
/// # Panics
///
/// Panics if `node` was not found in the dependency graph.
#[allow(dead_code)]
pub fn get_file_path(&self, node: &FileNode) -> &PathBuf {
let Some(NodeData::File(path)) = self.graph.get_node_data(node) else {
panic!("BUG: node {:?} was not found in the dependency graph", node);
};
path
}
/// Gets the task node for `task`, or creates a task node by adding it to the dependency graph.
pub fn get_or_create_task_node(&mut self, task: &T) -> TaskNode {
if let Some(node) = self.task_to_node.get(task) {
*node
} else {
let node = self.graph.add_node(NodeData::Task {
task: task.clone(),
output: None,
});
let node = TaskNode(node);
self.task_to_node.insert(task.clone(), node);
node
}
}
/// Gets the task for `node`.
///
/// # Panics
///
/// Panics if `node` was not found in the dependency graph.
pub fn get_task(&self, node: &TaskNode) -> &T {
let Some(NodeData::Task { task, .. }) = self.graph.get_node_data(node) else {
panic!("BUG: node {:?} was not found in the dependency graph", node);
};
task
}
}
The get_or_create_file_node
method creates file nodes.
When we want to go from a file path (using impl AsRef<Path>
) to a FileNode
, either we have already added this file path to the graph and want to get the FileNode
for it, or we have not yet added it to the graph yet and should add it.
The former is handled by the if branch in get_or_create_file_node
, where we just retrieve the FileNode
from the file_to_node
hash map.
The latter is handled by the else branch where we add the node to the graph with graph.add_node
which attaches the NodeData::File
data to the node, and then returns a FileNode
which we insert into the file_to_node
map.
The get_file_path
method does the inverse.
We get the attached data given a node, and extract the file path from it.
Note that we are using panic!
here to indicate that invalid usage of this method is an unrecoverable programming error that should not occur.
Returning an Option<&PathBuf>
makes no sense here, as the caller of this method has no way to recover from this.
Because this is not an end-user-facing API (store
module is private), we control all the calls to this method, and thus we are responsible for using these methods in a valid way.
Therefore, when we call these methods, we should document why it is valid (if this is not immediately obvious), and we need to test whether we really use it in a valid way.
We’re also documenting the panics in a # Panics
section in the documentation comment, as is common practice in Rust.
How to Trigger these Panics?
Because only Store
can create FileNode
s and TaskNode
s, and all methods only take these values as inputs, these panics will not happen under normal usage.
The only way to trigger these panics (in safe Rust) would be to create two stores, and use the nodes from one store in another.
However, since this is a private module, we just need to make sure that we don’t do that.
There are some tricks to prevent even this kind of invalid usage. For example, the generativity crate generates unique identifiers based on lifetimes. However, that is a bit overkill, especially for an internal API, so we won’t be using that.
We implement similar methods for task nodes in get_or_create_task_node
and get_task
.
Task outputs
When we do not need to execute a task because it is consistent, we still need to return its output.
Therefore, we store the task output in NodeData::Task
and add methods to query and manipulate task outputs.
Add the following code to pie/src/store.rs
:
impl<T: Task> Store<T, T::Output> {
/// Checks whether task `node` has an output. Returns `false` if `node` does not have an output.
///
/// # Panics
///
/// Panics if `node` was not found in the dependency graph.
pub fn task_has_output(&self, node: &TaskNode) -> bool {
let Some(NodeData::Task { output, .. }) = self.graph.get_node_data(node) else {
panic!("BUG: node {:?} was not found in the dependency graph", node);
};
output.is_some()
}
/// Gets the output for task `node`.
///
/// # Panics
///
/// Panics if `node` was not found in the dependency graph, or if the task has no output.
pub fn get_task_output(&self, node: &TaskNode) -> &T::Output {
let Some(NodeData::Task { output: Some(output), .. }) = self.graph.get_node_data(node) else {
panic!("BUG: node {:?} was not found in the dependency graph, or does not have an output", node);
};
output
}
/// Sets the output for task `node` to `new_output`.
///
/// # Panics
///
/// Panics if task `node` was not found in the dependency graph.
pub fn set_task_output(&mut self, node: &TaskNode, new_output: T::Output) {
let Some(NodeData::Task { output, .. }) = self.graph.get_node_data_mut(node) else {
panic!("BUG: node {:?} was not found in the dependency graph", node);
};
output.replace(new_output);
}
}
The task_has_output
, get_task_output
, and set_task_output
methods manipulate task outputs in NodeData::Task
.
Again, we are using panics here to indicate unrecoverable programming errors.
Dependencies
Now we need methods to query and manipulate dependencies. The edges in the graph are dependencies between tasks and files. Tasks can depend on other tasks and files, but there are no dependencies between files. An edge does not have its own dedicated representation, and is simply represented by two nodes: the source node and the destination node of the edge.
Add the following code to pie/src/store.rs
:
impl<T: Task> Store<T, T::Output> {
/// Get all dependencies of task `src`.
///
/// # Panics
///
/// Panics in development builds if `src` was not found in the dependency graph.
pub fn get_dependencies_of_task<'a>(&'a self, src: &'a TaskNode) -> impl Iterator<Item=&'a Dependency<T, T::Output>> + 'a {
debug_assert!(self.graph.contains_node(src), "BUG: node {:?} was not found in the dependency graph", src);
self.graph.get_outgoing_edge_data(src)
}
/// Add a file require `dependency` from task `src` to file `dst`.
///
/// # Panics
///
/// Panics if `src` or `dst` were not found in the dependency graph, or if a cycle is created by adding this dependency.
pub fn add_file_require_dependency(&mut self, src: &TaskNode, dst: &FileNode, dependency: FileDependency) {
match self.graph.add_edge(src, dst, Dependency::RequireFile(dependency)) {
Err(pie_graph::Error::NodeMissing) => panic!("BUG: source node {:?} or destination node {:?} was not found in the dependency graph", src, dst),
Err(pie_graph::Error::CycleDetected) => panic!("BUG: cycle detected when adding file dependency from {:?} to {:?}", src, dst),
_ => {},
}
}
/// Adds a task require `dependency` from task `src` to task `dst`.
///
/// # Errors
///
/// Returns `Err(())` if adding this dependency to the graph creates a cycle.
///
/// # Panics
///
/// Panics if `src` or `dst` were not found in the dependency graph.
pub fn add_task_require_dependency(&mut self, src: &TaskNode, dst: &TaskNode, dependency: TaskDependency<T, T::Output>) -> Result<(), ()> {
match self.graph.add_edge(src, dst, Dependency::RequireTask(dependency)) {
Err(pie_graph::Error::NodeMissing) => panic!("BUG: source node {:?} or destination node {:?} was not found in the dependency graph", src, dst),
Err(pie_graph::Error::CycleDetected) => Err(()),
_ => Ok(()),
}
}
}
The get_dependencies_of_task
method gets the dependencies (edge data of outgoing edges) of a task, and returns it as an iterator (which is empty if task has no dependencies).
This method needs explicit lifetime annotations due to the signature of get_outgoing_edge_data
and the way we return an iterator using impl Iterator<...
.
We’re using debug_assert!
here to trigger a panic indicating an unrecoverable programming error only in development mode, because this check is too expensive to run in release (optimized) mode.
The add_file_require_dependency
method adds a file dependency.
Adding an edge to the graph can result in cycles, which are not allowed in a directed acyclic graph (DAG).
Therefore, graph.add_edge
can return an Err
indicating that there is a cycle.
In case of files, this cannot happen because files do not have outgoing dependencies, and the API enforces this by never taking a FileNode
as a source (src
) of an edge.
Tasks can depend on other tasks, so they can create cycles.
In add_task_require_dependency
, we propagate the cycle detected error (by returning Err(())
) to the caller because the caller has more information to create an error message for the user that made a cyclic task dependency.
Resetting tasks
Finally, when we determine that a task is inconsistent and needs to be executed, we first need to remove its output and remove its outgoing dependencies, as those will interfere with incrementality when not removed.
We do NOT want to remove incoming dependencies, as that would remove dependencies from other tasks to this task, which breaks incrementality, so we can’t just remove and re-add the task to the graph.
Add the reset_task
method that does this to pie/src/store.rs
:
impl<T: Task> Store<T, T::Output> {
/// Reset task `src`, removing its output and removing all its outgoing dependencies.
///
/// # Panics
///
/// Panics if task `src` was not found in the dependency graph.
pub fn reset_task(&mut self, src: &TaskNode) {
if let Some(NodeData::Task { output, .. }) = self.graph.get_node_data_mut(src) {
*output = None;
} else {
panic!("BUG: node {:?} was not found in the dependency graph", src);
}
self.graph.remove_outgoing_edges_of_node(src);
}
}
This will reset the task output back to None
, and remove all outgoing edges (dependencies).
Tests
Now we’ve implemented everything we need for implementing the top-down context, but first we will write some tests.
Testing file mapping
Add the following code to pie/src/store.rs
for testing the file mapping:
#[cfg(test)]
mod test {
use crate::Context;
use crate::stamp::{FileStamper, OutputStamper};
use super::*;
/// Task that returns its owned string. Never executed, just used for testing the store.
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
struct StringConstant(String);
impl StringConstant {
pub fn new(string: impl Into<String>) -> Self { Self(string.into()) }
}
impl Task for StringConstant {
type Output = String;
fn execute<C: Context<Self>>(&self, _context: &mut C) -> Self::Output {
self.0.clone()
}
}
#[test]
fn test_file_mapping() {
let mut store: Store<StringConstant, String> = Store::default();
let path_a = PathBuf::from("hello.txt");
let node_a = store.get_or_create_file_node(&path_a);
assert_eq!(node_a, store.get_or_create_file_node(&path_a)); // Same node
assert_eq!(&path_a, store.get_file_path(&node_a)); // Same file path
let path_b = PathBuf::from("world.txt");
let node_b = store.get_or_create_file_node(&path_b);
assert_eq!(node_b, store.get_or_create_file_node(&path_b));
assert_eq!(&path_b, store.get_file_path(&node_b));
assert_ne!(node_a, node_b); // Different nodes
}
#[test]
#[should_panic]
fn test_file_mapping_panics() {
let mut fake_store: Store<StringConstant, String> = Store::default();
let fake_node = fake_store.get_or_create_file_node("hello.txt");
let store: Store<StringConstant, String> = Store::default();
store.get_file_path(&fake_node);
}
}
We create a simple task StringConstant
because we need a Task
implementation to test Store
, as Store
is generic over a Task
type.
We will never execute it because Store
does not execute tasks.
Test test_file_mapping
checks whether the file node mapping works as expected:
get_or_create_file_node
calls with the same path should produce the sameFileNode
.get_or_create_file_node
calls with different paths should produce differentFileNode
s.
This works because "hello.txt"
and "world.txt"
are different paths, thus their Eq
and Hash
implementations ensure they get separate spots in the file_to_node
hash map.
Test test_file_mapping_panics
triggers the panic in get_file_path
by creating a FileNode
with a “fake store”, and then using that rogue file node in another store.
While it is unlikely that we will make this mistake when using Store
, it is good to confirm that this panics.
Rust Help: Testing Panics
The #[should_panic]
attribute makes the test succeed if it panics, and fail if it does not panic.
Testing task mapping
Test the task mapping by inserting the following code into the test
module (before the last }
):
#[test]
fn test_task_mapping() {
let mut store = Store::default();
let task_a = StringConstant::new("Hello");
let node_a = store.get_or_create_task_node(&task_a);
assert_eq!(node_a, store.get_or_create_task_node(&task_a)); // Same node
assert_eq!(&task_a, store.get_task(&node_a)); // Same task
let task_b = StringConstant::new("World");
let node_b = store.get_or_create_task_node(&task_b);
assert_eq!(node_b, store.get_or_create_task_node(&task_b));
assert_eq!(&task_b, store.get_task(&node_b));
assert_ne!(node_a, node_b); // Different nodes
}
#[test]
#[should_panic]
fn test_task_mapping_panics() {
let mut fake_store = Store::default();
let fake_node = fake_store.get_or_create_task_node(&StringConstant::new("Hello"));
let store: Store<StringConstant, String> = Store::default();
store.get_task(&fake_node);
}
We test this in the same way as the file mapping.
Again, this works because StringConstant("Hello")
and StringConstant("World")
are different due to their derived Eq
and Hash
implementations, which make them different due to the strings being different.
Likewise, StringConstant::new("Hello")
and StringConstant::new("Hello")
are equal even if they are created with 2 separate invocations of new
.
These (in)equalities might seem quite obvious, but it is important to keep in mind because incrementality can only work if we can identify equal tasks at a later time, so that we can check their dependencies and return their cached output when those dependencies are consistent. Later on we will also see that this is important for soundness of the incremental build system.
Testing task outputs
Test task outputs by inserting the following code into the test
module:
#[test]
fn test_task_outputs() {
let mut store = Store::default();
let output_a = "Hello".to_string();
let task_a = StringConstant::new(&output_a);
let node_a = store.get_or_create_task_node(&task_a);
let output_b = "World".to_string();
let task_b = StringConstant::new(&output_b);
let node_b = store.get_or_create_task_node(&task_b);
// Assert that tasks have no output by default.
assert!(!store.task_has_output(&node_a));
assert!(!store.task_has_output(&node_b));
// Set output for task A, assert that A has that output but B is unchanged.
store.set_task_output(&node_a, output_a.clone());
assert!(store.task_has_output(&node_a));
assert_eq!(store.get_task_output(&node_a), &output_a);
assert!(!store.task_has_output(&node_b));
// Set output for task B, assert that B has that output but A is unchanged.
store.set_task_output(&node_b, output_b.clone());
assert!(store.task_has_output(&node_a));
assert_eq!(store.get_task_output(&node_a), &output_a);
assert!(store.task_has_output(&node_b));
assert_eq!(store.get_task_output(&node_b), &output_b);
}
#[test]
#[should_panic]
fn test_task_has_output_panics() {
let mut fake_store = Store::default();
let fake_node = fake_store.get_or_create_task_node(&StringConstant::new("Hello"));
let store: Store<StringConstant, String> = Store::default();
store.task_has_output(&fake_node);
}
#[test]
#[should_panic]
fn test_get_task_output_panics() {
let mut store = Store::default();
let node = store.get_or_create_task_node(&StringConstant::new("Hello"));
store.get_task_output(&node);
}
#[test]
#[should_panic]
fn test_set_task_output_panics() {
let mut fake_store = Store::default();
let fake_node = fake_store.get_or_create_task_node(&StringConstant::new("Hello"));
let mut store: Store<StringConstant, String> = Store::default();
store.set_task_output(&fake_node, "Hello".to_string());
}
Test test_task_outputs
ensures that:
task_has_output
only returns true if given task has an output,- and that
get_task_output
returns the output set byset_task_output
for given task.
Test test_get_task_output_panics
triggers a panic when we call get_task_output
for a task that has no output, which is an invalid usage of Store
that is more likely to happen than the other panics.
Testing dependencies
Test dependencies by inserting the following code into the test
module:
#[test]
fn test_dependencies() {
let mut store = Store::default();
let output_a = "Hello".to_string();
let task_a = StringConstant::new(output_a.clone());
let node_a = store.get_or_create_task_node(&task_a);
let output_b = "World".to_string();
let task_b = StringConstant::new(output_b.clone());
let node_b = store.get_or_create_task_node(&task_b);
let path_c = PathBuf::from("hello.txt");
let node_c = store.get_or_create_file_node(&path_c);
assert_eq!(store.get_dependencies_of_task(&node_a).next(), None);
assert_eq!(store.get_dependencies_of_task(&node_b).next(), None);
// Add file dependency from task A to file C.
let file_dependency_a2c = FileDependency::new(&path_c, FileStamper::Exists).unwrap();
store.add_file_require_dependency(&node_a, &node_c, file_dependency_a2c.clone());
let deps_of_a: Vec<_> = store.get_dependencies_of_task(&node_a).cloned().collect();
assert_eq!(deps_of_a.get(0), Some(&Dependency::RequireFile(file_dependency_a2c.clone())));
assert_eq!(deps_of_a.get(1), None);
assert_eq!(store.get_dependencies_of_task(&node_b).next(), None);
// Add task dependency from task B to task A.
let task_dependency_b2a = TaskDependency::new(task_a.clone(), OutputStamper::Equals, output_a.clone());
let result = store.add_task_require_dependency(&node_b, &node_a, task_dependency_b2a.clone());
assert_eq!(result, Ok(()));
let deps_of_a: Vec<_> = store.get_dependencies_of_task(&node_a).cloned().collect();
assert_eq!(deps_of_a.get(0), Some(&Dependency::RequireFile(file_dependency_a2c.clone())));
assert_eq!(deps_of_a.get(1), None);
let deps_of_b: Vec<_> = store.get_dependencies_of_task(&node_b).cloned().collect();
assert_eq!(deps_of_b.get(0), Some(&Dependency::RequireTask(task_dependency_b2a.clone())));
assert_eq!(deps_of_b.get(1), None);
// Add file dependency from task B to file C.
let file_dependency_b2c = FileDependency::new(&path_c, FileStamper::Exists).unwrap();
store.add_file_require_dependency(&node_b, &node_c, file_dependency_b2c.clone());
let deps_of_a: Vec<_> = store.get_dependencies_of_task(&node_a).cloned().collect();
assert_eq!(deps_of_a.get(0), Some(&Dependency::RequireFile(file_dependency_a2c.clone())));
assert_eq!(deps_of_a.get(1), None);
let deps_of_b: Vec<_> = store.get_dependencies_of_task(&node_b).cloned().collect();
assert_eq!(deps_of_b.get(0), Some(&Dependency::RequireTask(task_dependency_b2a.clone())));
assert_eq!(deps_of_b.get(1), Some(&Dependency::RequireFile(file_dependency_b2c.clone())));
assert_eq!(deps_of_b.get(2), None);
// Add task dependency from task A to task B, creating a cycle.
let task_dependency_a2b = TaskDependency::new(task_a.clone(), OutputStamper::Equals, output_a.clone());
let result = store.add_task_require_dependency(&node_a, &node_b, task_dependency_a2b);
assert_eq!(result, Err(())); // Creates a cycle: error
}
#[test]
#[should_panic]
fn test_get_dependencies_of_task_panics() {
let mut fake_store = Store::default();
let fake_node = fake_store.get_or_create_task_node(&StringConstant::new("Hello"));
let store: Store<StringConstant, String> = Store::default();
let _ = store.get_dependencies_of_task(&fake_node);
}
#[test]
#[should_panic]
fn test_add_file_require_dependency_panics() {
let mut fake_store = Store::default();
let fake_file_node = fake_store.get_or_create_file_node("hello.txt");
let fake_task_node = fake_store.get_or_create_task_node(&StringConstant::new("Hello"));
let mut store: Store<StringConstant, String> = Store::default();
let dependency = FileDependency::new("hello.txt", FileStamper::Exists).unwrap();
store.add_file_require_dependency(&fake_task_node, &fake_file_node, dependency);
}
#[test]
#[should_panic]
fn test_add_task_require_dependency_panics() {
let mut fake_store = Store::default();
let output = "Hello".to_string();
let task = StringConstant::new(&output);
let fake_task_node = fake_store.get_or_create_task_node(&task);
let mut store: Store<StringConstant, String> = Store::default();
let dependency = TaskDependency::new(task, OutputStamper::Equals, output);
let _ = store.add_task_require_dependency(&fake_task_node, &fake_task_node, dependency);
}
The test_dependencies
test is a bit more involved because it ensures that:
get_dependencies_of_task
returns the dependencies of given task. If the task has no dependencies, the iterator is empty. We test if an iterator is empty by getting the first element of the iterator with.next()
and assert that it isNone
.get_dependencies_of_task
returns the dependencies of given task in the order in which they were added, which will be important for soundness later. The graph library returns dependencies in insertion order.add_task_require_dependency
adds a dependency to the correct task.- creating a cycle with
add_task_require_dependency
results in it returningErr(())
.
Note that the StringConstant
task does not actually create file or task dependencies, but since Store
never executes a task, we can pretend that it does in tests.
Testing task reset
Finally, test task reset by inserting the following code into the test
module:
#[test]
fn test_reset() {
let mut store = Store::default();
let output_a = "Hello".to_string();
let task_a = StringConstant::new(output_a.clone());
let task_a_node = store.get_or_create_task_node(&task_a);
let output_b = "World".to_string();
let task_b = StringConstant::new(output_b.clone());
let task_b_node = store.get_or_create_task_node(&task_b);
let path = PathBuf::from("hello.txt");
let file_node = store.get_or_create_file_node(&path);
// Set outputs for task A and B.
store.set_task_output(&task_a_node, output_a.clone());
assert!(store.task_has_output(&task_a_node));
assert_eq!(store.get_task_output(&task_a_node), &output_a);
store.set_task_output(&task_b_node, output_b.clone());
assert!(store.task_has_output(&task_b_node));
assert_eq!(store.get_task_output(&task_b_node), &output_b);
// Add file dependency for task A and B.
let file_dependency = FileDependency::new(&path, FileStamper::Exists).unwrap();
store.add_file_require_dependency(&task_a_node, &file_node, file_dependency.clone());
let deps_of_a: Vec<_> = store.get_dependencies_of_task(&task_a_node).cloned().collect();
assert_eq!(deps_of_a.get(0), Some(&Dependency::RequireFile(file_dependency.clone())));
assert_eq!(deps_of_a.get(1), None);
store.add_file_require_dependency(&task_b_node, &file_node, file_dependency.clone());
let deps_of_b: Vec<_> = store.get_dependencies_of_task(&task_b_node).cloned().collect();
assert_eq!(deps_of_b.get(0), Some(&Dependency::RequireFile(file_dependency.clone())));
assert_eq!(deps_of_b.get(1), None);
// Reset only task A.
store.reset_task(&task_a_node);
// Assert that task A is reset.
assert!(!store.task_has_output(&task_a_node));
assert_eq!(store.get_dependencies_of_task(&task_a_node).next(), None);
// Assert that task B is unchanged.
assert!(store.task_has_output(&task_b_node));
assert_eq!(store.get_task_output(&task_b_node), &output_b);
let deps_of_b: Vec<_> = store.get_dependencies_of_task(&task_b_node).cloned().collect();
assert_eq!(deps_of_b.get(0), Some(&Dependency::RequireFile(file_dependency.clone())));
assert_eq!(deps_of_b.get(1), None);
}
#[test]
#[should_panic]
fn test_reset_task_panics() {
let mut fake_store = Store::default();
let fake_node = fake_store.get_or_create_task_node(&StringConstant::new("Hello"));
let mut store: Store<StringConstant, String> = Store::default();
store.reset_task(&fake_node);
}
Here, we ensure that a task with an output and dependencies, does not have an output and dependencies after a reset, while leaving another task untouched.
Confirm that the store implementation works with cargo test
.
Download source code
Top-down Context
We’ve implemented all the prerequisites for incremental top-down building.
Now we will create the TopDownContext
type which implements the Context
trait in an incremental way.
Top-down context basics
Add the top_down
module to pie/src/context/mod.rs
:
Create the pie/src/context/top_down.rs
file and add the following to get started:
use std::fs::File;
use std::io;
use std::path::Path;
use crate::{Context, fs, Task};
use crate::dependency::{FileDependency, TaskDependency};
use crate::stamp::{FileStamper, OutputStamper};
use crate::store::{Store, TaskNode};
pub struct TopDownContext<T, O> {
store: Store<T, O>,
}
impl<T: Task> TopDownContext<T, T::Output> {
pub fn new() -> Self {
Self {
store: Store::default(),
}
}
}
impl<T: Task> Context<T> for TopDownContext<T, T::Output> {
fn require_file_with_stamper<P: AsRef<Path>>(&mut self, path: P, stamper: FileStamper) -> Result<Option<File>, io::Error> {
todo!()
}
fn require_task_with_stamper(&mut self, task: &T, stamper: OutputStamper) -> T::Output {
todo!()
}
}
The TopDownContext
type is generic over tasks T
and their outputs O
, owns a Store
, and can be created using new
.
TopDownContext
implements Context
, and the main challenge will be implementing the require_file_with_stamper
and require_task_with_stamper
methods incrementally and correctly.
Requiring files
Tasks such as ReadStringFromFile
which we’ve used in tests before call context.require_file
to declare that they depend on a file in the filesystem.
For incrementality, we need to add this dependency to the dependency graph.
This dependency will go from the current executing task to the file.
Therefore, we will need to keep track of the current executing task.
Change pie/src/context/mod.rs
to add a field for tracking the current executing task, and use it in require_file_with_stamper
:
We’re not setting current_executing_task
yet, as that is the responsibility of require_task_with_stamper
which we will implement later.
In require_file_with_stamper
we’re now getting the current executing task.
If there is no current executing task, which only happens if a user directly calls require_file
on a context, we don’t make a dependency and just open the file.
Now we need to add the file dependency, change pie/src/context/mod.rs
to do this:
We simply create or get an existing file node, create a file dependency, and add the file require dependency to the graph via store
.
Errors are propagated to the caller, so they can react accordingly to filesystem operation failures.
Requiring tasks
To implement require_task_with_stamper
, we need to check whether we should execute a task.
A task should be executed either if it’s new (it does not have an output stored yet), or if at least one of its dependencies is inconsistent.
If we don’t execute it, then it must have an output value and all its dependencies are consistent, so we just return its output value.
Change pie/src/context/mod.rs
to implement this logic:
We first create or get an existing file node.
Then, we check whether the task should be executed with should_execute_task
which we still need to implement.
If that returns true, we reset the task, set the current executing task, actually execute the task, restore the previous executing task, and set the task output.
Otherwise, we get the output of the task from the store, which cannot panic because should_execute_task
ensures that the task has an output if it returns false.
Finally, we return the output.
We still need to create a task dependency. Change pie/src/context/mod.rs
to add the dependency:
If there is no current executing task, which occurs when a user requires the initial task, we skip creating a dependency. Otherwise, we create a dependency and add it to the store. However, creating a task dependency can create cycles, and we need to handle that error.
At this point, we need to make a hard decision about the API of our library.
require_task_with_stamper
returns the task output, with no opportunity to return an error.
If we want to propagate this error, we’d need to change the Context::require_task
API to return Result<T::Output, CycleError>
.
However, because tasks call these methods on Context
, we’d also need to change Task::execute
to return Result<T::Output, CycleError>
.
That would require all tasks to propagate these cycle errors every time they require another task.
Furthermore, some tasks want to return their own kinds of errors, where T::Output
will be Result<AnOutput, AnError>
.
In that case, the concrete return type would be Result<Result<AnOutput, AnError>, CycleError>
, which is annoying to deal with.
On the other hand, we can panic when a cycle is found, which requires no changes to the API. We do end up in a mostly unrecoverable state, so a panic is a valid option. However, this is not ideal, because it means the build system can panic due to invalid task dependencies created by the user of the system. Panics will (most of the time) stop the program, which can be annoying to deal with.
This is a hard trade-off to make. Either we propagate errors which will not end the program but will introduce a lot of boilerplate and annoyance in task implementations. Or we panic which will end the program but introduces no boilerplate.
In this tutorial, we will go with panics on cycles, because it results in a much simpler system.
How to Recover from Panics?
Panics either abort the program (when panics are set to abort in Cargo.toml
), or unwind the call stack and then end the program.
When panics abort, there is nothing we can do about it.
A panic will immediately abort the program.
When panics unwind, the call stack is unwound, which still runs all destructors (
Drop
), and this unwinding can be caught.
We can catch unwinding panics with
catch_unwind
, which is a way to recover from panics.
This does require that the types used in the closure passed to catch_unwind
are unwind safe.
This is because panics exit a function early, which can mess up some invariants of your code.
For example, a call to set a task output can be skipped when a panic occurs, breaking a code invariant.
Therefore, types such as &mut T
are not unwind safe by default, because these invariants can break under panics.
Note that unwind safety is something different than the general safety guarantees provided by Rust: type-safe, memory-safe, thread-safe. An unwind unsafe type is still type-safe, memory-safe, and thread-safe.
Unwind safety can be more easily achieved by using owned types which run destructors when the function call ends, which work under normal circumstances, but also when unwinding panics.
In the context of the PIE build system, if we panic on unrecoverable errors, but want to allow catching these panics, we need to think about unwind safety. At any point we panic, we need to think about keeping the system in a valid state.
Another way to recover from panics is to run the panicking code on a different thread. If the code panics, it will only end that thread, effectively allowing panic recovery. However, this does require some form of thread-safety, beause you are moving a computation to a different thread. Furthermore, some platforms do not have access to threads, such as WASM, where this approach would not work.
A final note is that care must be taken when unwiding panics across foreign function interfaces (FFI).
Checking tasks
The final piece to our puzzle is the should_execute_task
implementation.
Add the following code to pie/src/context/top_down.rs
:
The premise of should_execute_task
is simple: go over the dependencies of a task until dependency.is_inconsistent
is true, at which we return true.
If all dependencies are consistent, then return true only if the task has no output.
Otherwise, return false.
However, there are some complications due to borrowing.
Checking if a task dependency is inconsistent requires recursive checking: TaskDependency::is_inconsistent
requires a &mut Context
to call Context::require_task
, which in turn can require this method again.
To that end, we pass self
to is_inconsistent
, because self
is an instance of TopDownContext
which implements Context
.
In this method, self
is &mut self
, a mutable borrow.
Therefore, we cannot have any other borrows active while is_inconsistent
is being called, because that would violate one of the safety mechanisms of Rust where mutable borrows are exclusive.
Getting the task’s dependencies from the store requires a borrow, so we cannot hold onto that borrow.
We get around that here by cloning the dependencies and collecting them into a Vec
.
We also document this fact in a comment to explain to readers (us in the future) why we do this cloning, preventing refactorings only to hit that same borrowing issue again.
Cloning and collecting does have a performance overhead as we need to clone the dependencies and heap allocate a Vec
to store them.
For this tutorial, that is fine, but in a real-world application we should minimize cloning if possible and look into reducing heap allocations.
Rust Help: Reference Counting to Avoid Clones
Cloning a Dependency
results in heap allocations, because cloning FileDependency
clones a PathBuf
which is a heap allocated string (basically a Vec<u8>
), and cloning a TaskDependency
clones the Task
, which may require allocations as well.
One way to avoid heap allocations in both kinds of dependencies is to store the PathBuf
and Task
in a reference-counting pointer Rc
.
Then, there will only be one heap allocated PathBuf
and Task
, and cloning just increments the reference count.
The upside is that this approach is easy to implement and reduces allocations.
The downside is that clones require incrementing the reference count, which is a write operation that does have a tiny bit of overhead.
In many cases, this overhead is smaller than cloning data when the data is large enough or requires heap allocations.
In our case, it would probably be worth doing this, but benchmarking is required to confirm this.
Note that instead of always wrapping tasks in a Rc
, task authors could implement Task
on Rc<TheirTask>
instead.
Since Rc
implements Clone
, any time we task.clone()
, we would just increase the reference count instead.
When working in a multi-threaded situation, you would use the thread-safe
Arc
instead.
How to Avoid Heap Allocations from String?
A technique for reducing allocations on strings (and string-like types such as PathBuf
) is to apply small string optimization, where small strings are stored inline instead of requiring a heap allocation.
This only works if the strings are usually small enough to fit inline on the stack (for example, 32 bytes).
Another technique for strings is string interning, where equal strings are stored in a central place and then re-used everywhere. This technique is great when we use the same string a lot of times. That may be a good strategy for a build system, where we work with the same file paths over and over.
There are several crates implementing these techniques, but I have not used one myself yet, so I cannot recommend one.
How to Avoid Heap Allocations from Collecting into Vec?
Collecting the elements of an iterator into a Vec
requires heap allocations as Vec
is allocated on the heap.
We can avoid or at least reduce the number of heap allocations by re-using the same Vec
instead of creating a new one.
Instead of collecting, you would store the Vec
in the struct, clear it, and then extend
it with the iterator.
When you clear
a Vec
, it removes all the elements, but keeps the heap allocated space.
Only if you would add more elements than it has space for, another heap allocation would be required, which will happen less and less frequently when you keep reusing the same Vec
.
The downside is that you are keeping this heap allocated space for as long as you keep reusing the same Vec
, which could waste some memory, but usually this is not a big problem.
You could of course call vec.shrink_to_fit()
after not using it for a while to free up this space.
However, we cannot apply this technique here, because if we store the Vec
in TopDownContext
, we would run into the same borrowing problem again.
This technique also requires that you have mutable access to the Vec
in order to mutate it.
Both of these limitations can be overcome by using a
Cell
.
Cell
allows mutation to its inner value in an immutable context.
The catch is that you cannot get a reference to its inner value, you can only take
the value out, mutate it, and then set
it back.
Unfortunately, even this technique cannot be fully applied to should_execute_task
, because it is called recursively and therefore the Cell
will be empty when we try to take
the Vec
out.
If we want to avoid heap allocations from collecting new Vec
s in should_execute_task
, we would need to come up with a creative solution.
But this is outside of the scope of even this extra information block, so we’ll just leave it at that.
Finally, we need to do something with dependency checking failures.
We’ve ignored the case where dependency.is_inconsistent
returns Err
.
When dependency checking result in an error, we should store the error for the user to investigate, and assume the dependency is inconsistent.
Change pie/src/context/mod.rs
to store dependency check errors and give users access to it:
And then change pie/src/context/mod.rs
to store these errors:
It took us a while, but now we’ve implemented an incremental build system with dynamic dependencies 🎉. Let’s set up a simple example to see the fruits of our labour.
Download source code
Incrementality Example
In this example, we will run our build system and show off simple incrementality with a task that reads a string from a file.
ReadStringFromFile
task
Create the pie/examples
directory, and create the pie/examples/incrementality.rs
file with the following contents:
#![allow(unused_imports, unused_variables)]
use std::io::{self, Read};
use std::path::{Path, PathBuf};
use dev_shared::{create_temp_dir, write_until_modified};
use pie::{Context, Task};
use pie::context::top_down::TopDownContext;
use pie::stamp::FileStamper;
/// Task that reads a string from a file.
#[derive(Clone, Ord, PartialOrd, Eq, PartialEq, Hash, Debug)]
struct ReadStringFromFile(PathBuf, FileStamper);
impl ReadStringFromFile {
fn new(path: impl AsRef<Path>, stamper: FileStamper) -> Self {
Self(path.as_ref().to_path_buf(), stamper)
}
}
impl Task for ReadStringFromFile {
type Output = Result<String, io::ErrorKind>;
fn execute<C: Context<Self>>(&self, context: &mut C) -> Self::Output {
println!("Reading from {} with {:?} stamper", self.0.file_name().unwrap().to_string_lossy(), self.1);
let file = context.require_file_with_stamper(&self.0, self.1).map_err(|e| e.kind())?;
if let Some(mut file) = file {
let mut string = String::new();
file.read_to_string(&mut string).map_err(|e| e.kind())?;
Ok(string)
} else {
Err(io::ErrorKind::NotFound)
}
}
}
The ReadStringFromFile
task is similar to the one we defined earlier in a test, but this one accepts a FileStamper
as input, and propagates errors by returning a Result
.
We cannot use std::io::Error
as the error in the Result
, because it does not implement Clone
nor Eq
, which need to be implemented for task outputs.
Therefore, we use std::io::ErrorKind
which does implement these traits.
Exploring incrementality
We’ve implemented the task, now add a main
function to pie/examples/incrementality.rs
:
fn main() -> Result<(), io::Error> {
let temp_dir = create_temp_dir()?;
let input_file = temp_dir.path().join("input.txt");
write_until_modified(&input_file, "Hi")?;
let mut context = TopDownContext::new();
let read_task = ReadStringFromFile::new(&input_file, FileStamper::Modified);
println!("A) New task: expect `read_task` to execute");
// `read_task` is new, meaning that we have no cached output for it, thus it must be executed.
let output = context.require_task(&read_task)?;
assert_eq!(&output, "Hi");
Ok(())
}
We create a temporary file, create a task, create a context, and require our first task.
Run this example with cargo run --example incremental
.
You should see the println!
in ReadStringFromFile
appear in your console as the incremental context correctly determines that this task is new (i.e., has no output) and must be executed.
It should look something like:
Compiling pie v0.1.0 (/pie)
Finished dev [unoptimized + debuginfo] target(s) in 0.60s
Running `target/debug/examples/incremental`
A) New task: expect `read_task` to execute
Reading from input.txt with Modified stamper
Reuse
If we require the task again, what should happen?
Insert the following code into the main
method:
println!("\nB) Reuse: expect no execution");
// `read_task` is not new and its file dependency is still consistent. It is consistent because the modified time of
// `input_file` has not changed, thus the modified stamp is equal.
let output = context.require_task(&read_task)?;
assert_eq!(&output, "Hi");
Running with cargo run --example incremental
should produce output like:
Compiling pie v0.1.0 (/pie)
Finished dev [unoptimized + debuginfo] target(s) in 0.34s
Running `target/debug/examples/incremental`
A) New task: expect `read_task` to execute
Reading from input.txt with Modified stamper
B) Reuse: expect no execution
We don’t see the println!
from ReadStringFromFile
, so it was not executed, so our incremental build system has correctly reused its output!
Normally we would write a test to confirm that the task was executed the first time, and that it was not executed the second time.
However, this is not trivial.
How do we know if the task was executed?
We could track it with a global mutable boolean that ReadStringFromFile
keeps track of, but this quickly becomes a mess.
Therefore, we will look into creating a proper testing infrastructure in the next chapter.
For now, we will continue this example with a couple more interesting cases. The comments in the code explain in more detail why the build system behaves in this way.
Inconsistent file dependency
Insert into the main
method:
write_until_modified(&input_file, "Hello")?;
println!("\nC) Inconsistent file dependency: expect `read_task` to execute");
// The file dependency of `read_task` is inconsistent due to the changed modified time of `input_file`.
let output = context.require_task(&read_task)?;
assert_eq!(&output, "Hello");
If we change the file (using write_until_modified
to ensure that the modified time changes to trigger the Modified
file stamper) and require the task, it should execute, because the file dependency of the task is no longer consistent.
Different tasks
Insert into the main
method:
let input_file_b = temp_dir.path().join("input_b.txt");
write_until_modified(&input_file_b, "Test")?;
let read_task_b_modified = ReadStringFromFile::new(&input_file_b, FileStamper::Modified);
let read_task_b_exists = ReadStringFromFile::new(&input_file_b, FileStamper::Exists);
println!("\nD) Different tasks: expect `read_task_b_modified` and `read_task_b_exists` to execute");
// Task `read_task`, `read_task_b_modified` and `read_task_b_exists` are different, due to their `Eq` implementation
// determining that their paths and stampers are different. Therefore, `read_task_b_modified` and `read_task_b_exists`
// are new tasks, and must be executed.
let output = context.require_task(&read_task_b_modified)?;
assert_eq!(&output, "Test");
let output = context.require_task(&read_task_b_exists)?;
assert_eq!(&output, "Test");
The identity of tasks is determined by their Eq
and Hash
implementations, which are typically derived to compare and hash all their fields.
Therefore, if we create read tasks for different input file input_file_b
and different stamper FileStamper::Exists
, these read tasks are not equal to the existing read task, and thus are new tasks with a different identity.
We require read_task_b_modified
and read_task_b_exists
, they are new, and are therefore executed.
Same file different stampers
Insert into the main
method:
write_until_modified(&input_file_b, "Test Test")?;
println!("\nE) Different stampers: expect only `read_task_b_modified` to execute");
// Both `read_task_b_modified` and `read_task_b_exists` read from the same file, but they use different stampers.
// Therefore, `read_task_b_modified` must be executed because the modified time has changed, but `read_task_b_exists`
// will not be executed because its file dependency stamper only checks for existence of the file, and the existence
// of the file has not changed.
//
// Note that using an `Exists` stamper for this task does not make a lot of sense, since it will only read the file
// on first execute and when it is recreated. But this is just to demonstrate different stampers.
let output = context.require_task(&read_task_b_modified)?;
assert_eq!(&output, "Test Test");
let output = context.require_task(&read_task_b_exists)?;
assert_eq!(&output, "Test");
Here we write to input_file_b
and then require read_task_b_modified
and read_task_b_exists
.
We expect read_task_b_modified
to be executed, but read_task_b_exists
to be skipped, because its file dependency only checks for the existence of the input file, which has not changed.
This shows that tasks can depend on the same file with different stampers, which influences whether the tasks are affected by a file change individually.
Of course, using an Exists
stamper for ReadStringFromFile
does not make a lot of sense, but this is for demonstration purposes only.
Running cargo run --example incremental
now should produce output like:
Compiling pie v0.1.0 (/pie)
Finished dev [unoptimized + debuginfo] target(s) in 0.35s
Running `target/debug/examples/incremental`
A) New task: expect `read_task` to execute
Reading from input.txt with Modified stamper
B) Reuse: expect no execution
C) Inconsistent file dependency: expect `read_task` to execute
Reading from input.txt with Modified stamper
D) Different tasks: expect `read_task_b_modified` and `read_task_b_exists` to execute
Reading from input_b.txt with Modified stamper
Reading from input_b.txt with Exists stamper
E) Different stampers: expect only `read_task_b_modified` to execute
Reading from input_b.txt with Modified stamper
Feel free to experiment more with this example (or new example files) before continuing. In the next chapter, we will define minimality and soundness, set up an infrastructure for testing those properties, and fix issues uncovered by testing.
Download source code
Testing Incrementality and Correctness
So far, we have not fully tested whether our build system is incremental and correct. We have reasoned that it is incremental and correct, implemented an example that seems to be incremental and correct, and have unit tests covering most individual components of the build system. However, we have not yet performed integration testing, where all components of the build system are integrated and tested together. Furthermore, we haven’t really defined incremental and correct. In this chapter, we will define those more concretely, do integration testing to test whether our build system really is incremental and correct, and (spoilers!) fix uncovered incrementality and correctness issues.
Incremental and Correct
In essence, a build system is incremental (also called minimal) if it performs the least amount of work possible to bring the system in a consistent state again after a change. More concretely, an incremental build system executes at most all inconsistent tasks. If it executes more tasks than necessary, it is not fully incremental. A trivial way to be incremental is to never execute anything, but that is of course not correct.
On the other hand, a build system is correct (also called sound) if it performs all work required to bring the system in a consistent state again after a change. More concretely, a correct build system executes at least all inconsistent tasks. If it executes fewer tasks than necessary, it is not correct. A trivial way to be correct is to execute everything, but that in turn is not incremental.
Combining these definitions: a correct incremental build system executes exactly all inconsistent tasks.
Whether a task is inconsistent or not, is characterized by its dependencies. A task is inconsistent when any of its dependencies are inconsistent, and consequently only consistent when all its dependencies are consistent. A file dependency is inconsistent if its file stamp changes. A task dependency is inconsistent if, after recursively checking the task, its output stamp changes. An inconsistent task is made consistent by executing it, because executing it makes all its dependencies consistent!
New tasks are tasks that have not yet been executed (no cached output), and are deemed inconsistent, and thus must be executed. Once executed, they have had a chance to create dependencies, and are no longer new: their consistency then depends on the consistency of their dependencies.
Tasks Without Dependencies
Tasks without dependencies (that are not new) are forever deemed consistent, and never have to be re-executed. This is rare in practice, but can be useful for one-time expensive calculations.
By defining incremental and correct in terms of dependencies (through consistency), a task author forgetting to create a dependency or not choosing the correct stamper, does not change whether our build system is incremental and correct. PIE works under the assumption that task authors correctly list all dependencies that mark their task as affected by a change when it actually is.
The Ever-Changing Filesystem
One issue with this definition is that we do not control the filesystem: changes to the filesystem can happen at any time during the build. Therefore, we would need to constantly check file dependencies for consistency, and we can never be sure that a task is really consistent! That makes incremental builds infeasible.
To solve that problem, we will introduce the concept of a build session in which we only check tasks for consistency once. Once a task has been executed or checked, we don’t check it anymore that session, solving the problem of constantly having to check file dependencies. A new session has to created to check those tasks again. Therefore, sessions are typically short-lived, and are created whenever file changes should be detected again.
Integration Testing
In this chapter, we will show incrementality and correctness by integration testing. However, this requires quite some setup, as testing incrementality requires checking whether tasks are executed or not. Therefore, we will create an infrastructure for tracking build events which we will use to test incrementality.
Then we will spend several sections writing integration tests to find issues, and fix them.
Proving Incrementality and Correctness?
We will continue as follows:
- Introduce sessions and change the API to work with sessions:
Session
type for performing builds in a session, and thePie
type as the entry point that manages sessions. We do this first as it introduces API changes that would be annoying to deal with later. - Create infrastructure to track build events for testing and debugging purposes. Create the
Tracker
trait, and implement aWritingTracker
for debugging andEventTracker
for testing. - Create integration tests that test incrementality and correctness.
- Find a bug where superfluous dependencies are being created, and fix it.
- Find a soundness hole where multiple tasks write to the same file. Fix it by tracking file write dependencies separately from read dependencies, and catch these mistakes with dynamic verification.
- Find a soundness hole where a task reads from a file before another task writes to it. Fix it by catching these mistakes with dynamic verification.
- Find a soundness hole where cyclic task execution can still occur. Fix it by changing how task dependencies are stored.
Incrementality with Sessions
A task is consistent if its dependencies are consistent, and consistency of file dependencies is based on the filesystem. However, the filesystem can change during a build, meaning that a task can be affected by multiple different changes in one build. For example, after executing a task, it could immediately be affected by a change in a source file again without the build system knowing about it, and that would not be minimal nor sound.
Therefore, we will introduce the concept of a session. Builds are only performed in a session, and at most one session may exist at any given time. In one session, each task is checked or executed at most once, meaning that changes made to source files during a session are not guaranteed to be detected.
The result is that if a task is deemed inconsistent at the time it is checked, it will be executed and made consistent, and will not be checked nor executed any more that session. If a task is deemed consistent at the time it is checked, it will not be checked any more that session. This simplifies incrementality and correctness, as we do not need to worry about checking tasks multiple times. Furthermore, it is also an optimisation, as requiring the same task many times only results in one check.
We will continue as follows:
- Create the
Session
type to hold all session data, and thePie
type as an entry point into the build system that manages a session. - Update
TopDownContext
to work withSession
. - Update the incrementality example to work with
Session
andPie
. - Ensure minimality by keeping track whether a task has been required this session.
PIE and Session
Change the imports in pie/src/lib.rs
:
Now add the Pie
and Session
types to pie/src/lib.rs
:
/// Main entry point into PIE, a sound and incremental programmatic build system.
pub struct Pie<T, O> {
store: Store<T, O>,
}
impl<T: Task> Default for Pie<T, T::Output> {
fn default() -> Self { Self { store: Store::default() } }
}
impl<T: Task> Pie<T, T::Output> {
/// Creates a new build session. Only one session may be active at once, enforced via mutable (exclusive) borrow.
pub fn new_session(&mut self) -> Session<T, T::Output> { Session::new(self) }
/// Runs `f` inside a new build session.
pub fn run_in_session<R>(&mut self, f: impl FnOnce(Session<T, T::Output>) -> R) -> R {
let session = self.new_session();
f(session)
}
}
/// A session in which builds are executed.
pub struct Session<'p, T, O> {
store: &'p mut Store<T, O>,
current_executing_task: Option<TaskNode>,
dependency_check_errors: Vec<io::Error>,
}
impl<'p, T: Task> Session<'p, T, T::Output> {
fn new(pie: &'p mut Pie<T, T::Output>) -> Self {
Self {
store: &mut pie.store,
current_executing_task: None,
dependency_check_errors: Vec::default(),
}
}
/// Requires `task`, returning its up-to-date output.
pub fn require(&mut self, task: &T) -> T::Output {
todo!("Create TopDownContext with this session, and require the task")
}
/// Gets all errors produced during dependency checks.
pub fn dependency_check_errors(&self) -> &[io::Error] { &self.dependency_check_errors }
}
We set up the types such that Pie
owns the store, and Session
owns all data for a build session that TopDownContext
previously owned.
We put the store in Pie
because we want to keep the dependency graph and task outputs between build sessions, otherwise we cannot be incremental.
A Session
is created with Pie::new_session
, which borrows Pie
mutibly, ensuring that there can only be one Session
instance (per Pie
instance).
run_in_session
is a convenience method that runs given function inside a new session.
Session::require
should require the task with the top-down context and return its up-to-date output, which we will implement once we’ve changed TopDownContext
.
The dependency check errors can be accessed with Session::dependency_check_errors
.
Note that Session
also has access to Store
, because TopDownContext
needs access to the store.
The store is mutibly borrowed from Pie
.
Therefore, the Session
struct is generic over the 'p
lifetime, where the p
stands for Pie
.
We can leave out this lifetime in Pie::new_session
, because the compiler infers it from us, but we must be explicit in structs and most impls.
Check that the code compiles (but gives warnings) with cargo check
.
Now we need to modify TopDownContext
to work with Session
.
Update TopDownContext
Change TopDownContext
to only contain a mutable reference to Session
in pie/src/context/top_down.rs
:
Here, we use lifetime 's
to denote the lifetime of a session, and make TopDownContext
generic over it.
new
now just accepts a mutable reference to the session.
The get_dependency_check_errors
method can be removed.
We add a require_initial
convenience method for Session
.
In the rest of the file, we need to update the impl
lines to include the lifetimes, and we need to replace most instances of self
with self.session
.
You could do this with the following find-replace regex: self\.([\w\d_]+)\.
-> self.session.$1.
Change pie/src/context/top_down.rs
:
Now we change Session
to use TopDownContext
.
Update Session
Change pie/src/lib.rs
:
We reset the current_executing_task
to None
, to be sure that we start a build without an executing task.
Then, we just create a TopDownContext
and call require_initial
.
Finally, we can now make the context
module private, as users of the library run builds using Session
, instead of having to create a context implementation.
Change pie/src/lib.rs
:
Check that the code compiles with cargo check --lib
.
This only checks if the library builds, but not any examples.
We need to update the incrementality example to work with these changes.
Update incremental example
Change pie/examples/incremental.rs
to use sessions:
When we only require one task, we replace context.require_task
with pie.new_session().require
.
When we want to require multiple tasks, we use new_session
and call session.require
multiple times.
It is very important to create a new session each time in this example, because a task is only checked/executed once each session. If we use a single session, our changes are never seen, and we just execute each task once, which is not what we want. Therefore, every time we make changes to source files, or expect that changes have been made to source files, we must create a new session.
In changes D and E, Rust is smart enough to allow creating a new session even though the previous session
variable is still active, because it knows that we don’t use that previous session anymore.
Check that the example works with cargo run --example incremental
, and check that the rest of the code works by running cargo test
.
Incrementality
Now we can ensure incrementality by keeping track whether a task has been required this session.
Change pie/lib.rs
:
We add the consistent
field to Session
which is a hash set over task nodes.
We create a new one each session, because we only want to keep track of which tasks are consistent on a per-session basis.
Now change the top-down context in pie/context/top_down.rs
to use this:
At the start of requiring a task, we check whether the task is already deemed consistent this session, using the consistent
hash set in Session
.
If the task is consistent, we skip execution by using !already_consistent &&
in the if check.
Because &&
is short-circuiting (also called lazy), we even skip the entire should_execute
call that checks whether we should execute a task, when the task is already consistent.
This increases performance when a lot of consistent tasks are required.
Finally, at the end of require
, we insert the task node into the consistent
hash set, to denote that the task is now consistent this session.
That’s it! This was a simple change due to the work we did before to get the Session
API in place.
With this new API in place, incrementality of task checking and execution in place, and all code adjusted to work with it, we can continue with tracking build events.
Download source code
Tracking Build Events
So far we have had no convenient way to inspect what our build system is doing, apart from println!
debugging or attaching a debugger to the program.
In this section, we will change that by tracking build events for debugging and integration testing purposes.
We will:
- Create a
Tracker
trait that receives build events through method calls. TheTracker
trait can be implemented in different ways to handle build events in different ways. - Implement a
NoopTracker
that does nothing, removing the tracking overhead. - Make the build system generic over
Tracker
, such thatContext
implementations call methods on the tracker to create build events. - Implement a
WritingTracker
that writes build events to standard output or standard error, for debugging purposes. - Implement an
EventTracker
that stores build events for later inspection, for integration testing purposes. - Implement a
CompositeTracker
that forwards build events to 2 other trackers, so we can use multiple trackers at the same time.
Tracker
trait
Add the tracker
module to pie/src/lib.rs
:
Then create the pie/src/tracker
directory, create the pie/src/tracker/mod.rs
file, and add the following content:
use std::io;
use crate::dependency::{Dependency, FileDependency, Inconsistency, TaskDependency};
use crate::stamp::OutputStamper;
use crate::Task;
/// Trait for tracking build events. Can be used to implement logging, event tracing, progress tracking, metrics, etc.
#[allow(unused_variables)]
pub trait Tracker<T: Task> {
/// Start: a new build.
fn build_start(&mut self) {}
/// End: completed build.
fn build_end(&mut self) {}
/// End: created a file `dependency`.
fn require_file_end(&mut self, dependency: &FileDependency) {}
/// Start: require `task` using `stamper`.
fn require_task_start(&mut self, task: &T, stamper: &OutputStamper) {}
/// End: required a task, resulting in a task `dependency` and `output`, and the task `was_executed`.
fn require_task_end(&mut self, dependency: &TaskDependency<T, T::Output>, output: &T::Output, was_executed: bool) {}
/// Start: check consistency of `dependency`.
fn check_dependency_start(&mut self, dependency: &Dependency<T, T::Output>) {}
/// End: checked consistency of `dependency`, possibly found `inconsistency`.
fn check_dependency_end(
&mut self,
dependency: &Dependency<T, T::Output>,
inconsistency: Result<Option<&Inconsistency<T::Output>>, &io::Error>
) {}
/// Start: execute `task`.
fn execute_start(&mut self, task: &T) {}
/// End: executed `task` resulting in `output`.
fn execute_end(&mut self, task: &T, output: &T::Output) {}
}
The Tracker
trait is generic over Task
.
Here, we chose to put the Task
trait bound on the trait itself.
This will not lead to cascading trait bounds, as the Tracker
trait will only be used as a bound in impl
s, not in structs or other traits.
Tracker
has methods corresponding to events that happen during a build, such as a build starting, requiring a file, requiring a task, checking a dependency, and executing a task.
All but the require_file
event have start and end variants to give trackers control over nesting these kind of events.
Then end variants usually have more parameters as more info is available when something is has finished.
Tracker methods accept &mut self
so that tracker implementations can perform mutation, such as storing a build event.
We provide default methods that do nothing so that implementors of Tracker
only have to override the methods for events they are interested in.
We use #[allow(unused_variables)]
on the trait to not give warnings for unused variables, as all variables are unused due to the empty default implementations.
Rust Help: References in Result and Option
The check_dependency_end
method accepts the inconsistency as Result<Option<&Inconsistency<T::Output>>, &io::Error>
.
The reason we accept it like this is that many methods in Result
and Option
take self
, not &self
, and therefore cannot be called on &Result<T, E>
and &Option<T>
.
We can turn &Result<T, E>
into Result<&T, &E>
with
as_ref
(same for Option
).
Since trackers always want to work with Result<&T, &E>
, it makes more sense for the caller of the tracker method to call as_ref
to turn their result into Result<&T, &E>
.
The final reason to accept Result<&T, &E>
is that if you have a &T
or &E
, you can easily construct a Result<&T, &E>
with Ok(&t)
and Err(&e)
.
However, you cannot construct a &Result<T, E>
from &T
or &E
, so Result<&T, &E>
is a more flexible type.
Are these Default Methods Useful?
Adding a method to Tracker
with a default implementation ensures that implementations of Tracker
do not have to be changed to work with the new method.
This is both good and bad.
Good because we can add methods without breaking compatibility.
Bad because we can forget to handle a new method, which can lead to problems with for example a composite tracker that forwards events to 2 trackers.
In this tutorial we chose the convenient option, but be sure to think about these kind of tradeoffs yourself!
Check that the code compiles with cargo test
.
No-op tracker
Add a no-op tracker, which is a tracker that does nothing, by adding the following code to pie/src/tracker/mod.rs
:
/// [`Tracker`] that does nothing.
#[derive(Copy, Clone, Debug)]
pub struct NoopTracker;
impl<T: Task> Tracker<T> for NoopTracker {}
Due to the default methods that do nothing on Tracker
, this implementation is extremely simple.
Rust Help: Removing Tracker Overhead
We will use generics to select which tracker implementation to use.
Therefore, all calls to trackers are statically dispatched, and could be inlined.
Because NoopTracker
only has empty methods, and those empty methods can be inlined, using NoopTracker
will effectively remove all tracking code from your binary, thus removing the overhead of tracking if you don’t want it.
In this tutorial, we do not annotate methods with
#[inline]
, meaning that the Rust compiler (and the LLVM backend) will make its own decisions on what to make inlineable and what not.
If you care about performance here, be sure to annotate those default empty methods with #[inline]
.
Using the Tracker
trait
Now we will make the build system generic over Tracker
, and insert Tracker
calls in context implementations.
Make Pie
and Session
generic over Tracker
by modifying pie/src/lib.rs
:
We use A
as the generic argument for tracker types in the source code.
The Pie
struct owns the tracker, similarly to how it owns the store.
Pie
can be created with a specific tracker with with_tracker
, and provides access to the tracker with tracker
and tracker_mut
.
Rust Help: Default Type
We assign NoopTracker
as the default type for trackers in Pie
, so that no tracking is performed when we use the Pie
type without an explicit tracker type.
The Default
implementation only works with NoopTracker
, because we impl Default for Pie<T, T::Output>
, which is equivalent to impl Default for Pie<T, T::Output, NoopTracker>
due to the default type.
We make Session
generic over trackers, and mutibly borrow the tracker from Pie
, again like we do with the store.
For convenience, Session
also provides access to the tracker with tracker
and tracker_mut
.
Now we make TopDownContext
generic over Tracker
, and insert calls to tracker methods.
Modify pie/src/context/top_down.rs
:
We make TopDownContext
generic over trackers, and call methods on the tracker:
- In
require_initial
we callbuild_start
/build_end
to track builds. - In
require_file_with_stamper
we callrequire_file_end
to track file dependencies. - In
require_file_with_stamper
we callrequire_task_start
/require_task_end
to track task dependencies.- We extract
should_execute
into a variable, and pulldependency
out of theif
, so that we can pass them totracker.required_task
. - We also call
execute_start
/execute_end
to track execution.
- We extract
- In
should_execute_task
we callcheck_dependency_start
/check_dependency_end
to track dependency checking.- We extract
inconsistency
into a variable, and convert it into the right type forcheck_dependency_end
.
- We extract
Check that the code compiles with cargo test
.
Existing code should keep working due to the NoopTracker
default type in Pie
.
We won’t modify NonIncrementalContext
to use a tracker, as NonIncrementalContext
has no state, so we cannot pass a tracker to it.
Implement writing tracker
Now we can implement some interesting trackers.
We start with a simple WritingTracker
that writes build events to some writer.
Add the writing
module to pie/src/tracker/mod.rs
:
Then create the pie/src/tracker/writing.rs
file and add:
use std::io::{self, BufWriter, Stderr, Stdout, Write};
use crate::dependency::{Dependency, FileDependency, Inconsistency, TaskDependency};
use crate::stamp::OutputStamper;
use crate::Task;
use crate::tracker::Tracker;
/// [`Tracker`] that writes events to a [`Write`] instance, for example [`Stdout`].
#[derive(Clone, Debug)]
pub struct WritingTracker<W> {
writer: W,
indentation: u32,
}
impl WritingTracker<BufWriter<Stdout>> {
/// Creates a [`WritingTracker`] that writes to buffered standard output.
pub fn with_stdout() -> Self { Self::new(BufWriter::new(io::stdout())) }
}
impl WritingTracker<BufWriter<Stderr>> {
/// Creates a [`WritingTracker`] that writes to buffered standard error.
pub fn with_stderr() -> Self { Self::new(BufWriter::new(io::stderr())) }
}
impl<W: Write> WritingTracker<W> {
/// Creates a [`WritingTracker`] that writes to `writer`.
pub fn new(writer: W) -> Self {
Self {
writer,
indentation: 0,
}
}
/// Gets the writer of this writing tracker.
pub fn writer(&self) -> &W { &self.writer }
/// Gets the mutable writer of this writing tracker.
pub fn writer_mut(&mut self) -> &mut W { &mut self.writer }
}
The WritingTracker
is generic over a writer W
that must implement std::io::Write
, which is a standard trait for writing bytes to something.
with_stdout
and with_stderr
are used to create buffered writers to standard output and standard error.
new
can be used to create a writer to anything that implements Write
, such as a File
.
writer
and writer_mut
are for retrieving the underlying writer.
Add some utility functions for WritingTracker
to pie/src/tracker/writing.rs
:
#[allow(dead_code)]
impl<W: Write> WritingTracker<W> {
fn writeln(&mut self, args: std::fmt::Arguments) {
self.write_indentation();
let _ = writeln!(&mut self.writer, "{}", args);
}
fn write(&mut self, args: std::fmt::Arguments) {
let _ = write!(&mut self.writer, "{}", args);
}
fn write_nl(&mut self) {
let _ = write!(&mut self.writer, "\n");
}
fn indent(&mut self) {
self.indentation = self.indentation.saturating_add(1);
}
fn unindent(&mut self) {
self.indentation = self.indentation.saturating_sub(1);
}
fn write_indentation(&mut self) {
for _ in 0..self.indentation {
let _ = write!(&mut self.writer, " ");
}
}
fn flush(&mut self) {
let _ = self.writer.flush();
}
}
writeln
and write
will mainly be used for writing text.
The text to write is passed into these methods using std::fmt::Arguments
for flexibility, accepting the result of format_args!
.
WritingTracker
keeps track of indentation
to show recursive dependency checking and execution, which is controlled with indent
and unindent
.
Since we are usually writing to buffers, we must flush
them to observe the output.
Failing Writes
Writes can fail, but we silently ignore them in this tutorial (with let _ = ...
) for simplicity.
You could panic when writing fails, but panicking when writing to standard output fails is probably going a bit too far.
You could store the latest write error and give access to it, which at least allows users of WritingTracker
check for some errors.
In general, tracking events can fail, but the current Tracker
API does not allow for propagating these errors with Result
.
This in turn because TopDownContext
does not return Result
for require_task
due to the trade-offs discussed in the section on TopDownContext
.
Rust Help: Saturating Arithmetic
We use
saturating_add
and
saturating_sub
for safety, which are saturating arithmetic operations that saturate at the numeric bounds instead of overflowing.
For example, 0u32.saturating_sub(1)
will result in 0
instead of overflowing into 4294967295
.
These saturating operations are not really needed when calls to indent
and unindent
are balanced.
However, if we make a mistake, it is better to write no indentation than to write 4294967295 spaces of indentation.
Alternatively, we could use standard arithmetic operations, which panic on overflow in debug/development mode, but silently overflow in release mode.
Now we can implement the tracker using these utility methods.
Add the Tracker
implementation to pie/src/tracker/writing.rs
:
impl<W: Write, T: Task> Tracker<T> for WritingTracker<W> {
fn build_start(&mut self) {
self.indentation = 0;
}
fn build_end(&mut self) {
self.writeln(format_args!("🏁"));
self.flush();
}
fn require_file_end(&mut self, dependency: &FileDependency) {
self.writeln(format_args!("- {}", dependency.path().display()));
}
fn require_task_start(&mut self, task: &T, _stamper: &OutputStamper) {
self.writeln(format_args!("→ {:?}", task));
self.indent();
self.flush();
}
fn require_task_end(&mut self, _dependency: &TaskDependency<T, T::Output>, output: &T::Output, _was_executed: bool) {
self.unindent();
self.writeln(format_args!("← {:?}", output));
self.flush();
}
fn check_dependency_start(&mut self, dependency: &Dependency<T, T::Output>) {
match dependency {
Dependency::RequireTask(d) => {
self.writeln(format_args!("? {:?}", d.task()));
self.indent();
self.flush();
},
_ => {},
}
}
fn check_dependency_end(
&mut self,
dependency: &Dependency<T, T::Output>,
inconsistency: Result<Option<&Inconsistency<T::Output>>, &io::Error>
) {
match dependency {
Dependency::RequireFile(d) => {
match inconsistency {
Err(e) => self.writeln(format_args!("✗ {} (err: {:?})", d.path().display(), e)),
Ok(Some(Inconsistency::File(s))) =>
self.writeln(format_args!("✗ {} (old: {:?} ≠ new: {:?})", d.path().display(), d.stamp(), s)),
Ok(None) => self.writeln(format_args!("✓ {}", d.path().display())),
_ => {}, // Other variants cannot occur.
}
},
Dependency::RequireTask(d) => {
self.unindent();
match inconsistency {
Ok(Some(Inconsistency::Task(s))) =>
self.writeln(format_args!("✗ {:?} (old: {:?} ≠ new: {:?})", d.task(), d.stamp(), s)),
Ok(None) => self.writeln(format_args!("✓ {:?}", d.task())),
_ => {}, // Other variants cannot occur.
}
}
}
self.flush()
}
fn execute_start(&mut self, task: &T) {
self.writeln(format_args!("▶ {:?}", task));
self.indent();
self.flush();
}
fn execute_end(&mut self, _task: &T, output: &T::Output) {
self.unindent();
self.writeln(format_args!("◀ {:?}", output));
self.flush();
}
}
We implement most tracker methods and write what is happening, using some unicode symbols to signify events:
🏁
: end of a build,-
: created a file dependency,→
: start requiring a task,←
: end of requiring a task,?
: start checking a task dependency,✓
: end of dependency checking, when the dependency is consistent,✗
: end of dependency checking, when the dependency is inconsistent,▶
: start of task execution,◀
: end of task execution.
We flush
the writer after every event to ensure that bytes are written out.
When a task is required, checked, or executed, we increase indentation to signify the recursive checking/execution.
When a task is done being required, checked, or executed, we decrease the indentation again.
In check_dependency_end
we write the old and new stamps if a dependency is inconsistent.
This tracker is very verbose. You can add configuration booleans to control what should be written, but in this tutorial we will keep it simple like this.
Check that the code compiles with cargo test
.
Let’s try out our writing tracker in the incrementality example by modifying pie/examples/incremental.rs
:
We remove the println!
statements from tasks and create Pie
with WritingTracker
.
Now run the example with cargo run --example incremental
, and you should see the writing tracker print to standard output:
Compiling pie v0.1.0 (/pie)
Finished dev [unoptimized + debuginfo] target(s) in 0.44s
Running `target/debug/examples/incremental`
A) New task: expect `read_task` to execute
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input.txt", Modified)
▶ ReadStringFromFile("/tmp/.tmpS2I0ZA/input.txt", Modified)
- /tmp/.tmpS2I0ZA/input.txt
◀ Ok("Hi")
← Ok("Hi")
🏁
B) Reuse: expect no execution
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input.txt", Modified)
✓ /tmp/.tmpS2I0ZA/input.txt
← Ok("Hi")
🏁
C) Inconsistent file dependency: expect `read_task` to execute
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input.txt", Modified)
✗ /tmp/.tmpS2I0ZA/input.txt (old: Modified(Some(SystemTime { tv_sec: 1703256537, tv_nsec: 965189499 })) ≠ new: Modified(Some(SystemTime { tv_sec: 1703256537, tv_nsec: 969189501 })))
▶ ReadStringFromFile("/tmp/.tmpS2I0ZA/input.txt", Modified)
- /tmp/.tmpS2I0ZA/input.txt
◀ Ok("Hello")
← Ok("Hello")
🏁
D) Different tasks: expect `read_task_b_modified` and `read_task_b_exists` to execute
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Modified)
▶ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Modified)
- /tmp/.tmpS2I0ZA/input_b.txt
◀ Ok("Test")
← Ok("Test")
🏁
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Exists)
▶ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Exists)
- /tmp/.tmpS2I0ZA/input_b.txt
◀ Ok("Test")
← Ok("Test")
🏁
E) Different stampers: expect only `read_task_b_modified` to execute
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Modified)
✗ /tmp/.tmpS2I0ZA/input_b.txt (old: Modified(Some(SystemTime { tv_sec: 1703256537, tv_nsec: 969189501 })) ≠ new: Modified(Some(SystemTime { tv_sec: 1703256537, tv_nsec: 973189502 })))
▶ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Modified)
- /tmp/.tmpS2I0ZA/input_b.txt
◀ Ok("Test Test")
← Ok("Test Test")
🏁
→ ReadStringFromFile("/tmp/.tmpS2I0ZA/input_b.txt", Exists)
✓ /tmp/.tmpS2I0ZA/input_b.txt
← Ok("Test")
🏁
Implement event tracker
The writing tracker is great for debugging purposes, but we cannot use it to check whether our build system is incremental and sound.
To check incrementality and soundness, we need to be able to check whether a task has executed or not, and check the order of build events.
Therefore, we will implement the EventTracker
that stores build events for later inspection.
Add the event
module to pie/src/tracker/mod.rs
:
Then create the pie/src/tracker/event.rs
file and add:
use std::ops::RangeInclusive;
use std::path::{Path, PathBuf};
use crate::dependency::{FileDependency, TaskDependency};
use crate::stamp::{FileStamp, FileStamper, OutputStamp, OutputStamper};
use crate::Task;
use crate::tracker::Tracker;
/// [`Tracker`] that stores [events](Event) in a [`Vec`], useful in testing to assert that a context implementation is
/// incremental and sound.
#[derive(Clone, Debug)]
pub struct EventTracker<T, O> {
events: Vec<Event<T, O>>,
}
impl<T: Task> Default for EventTracker<T, T::Output> {
fn default() -> Self { Self { events: Vec::new() } }
}
/// Enumeration of important build events.
#[derive(Clone, Debug)]
pub enum Event<T, O> {
RequireFileEnd(RequireFileEnd),
RequireTaskStart(RequireTaskStart<T>),
RequireTaskEnd(RequireTaskEnd<T, O>),
ExecuteStart(ExecuteStart<T>),
ExecuteEnd(ExecuteEnd<T, O>),
}
/// End: required file at `path` using `stamper` to create `stamp`.
#[derive(Clone, Debug)]
pub struct RequireFileEnd {
pub path: PathBuf,
pub stamper: FileStamper,
pub stamp: FileStamp,
pub index: usize,
}
/// Start: require `task` using `stamper`.
#[derive(Clone, Debug)]
pub struct RequireTaskStart<T> {
pub task: T,
pub stamper: OutputStamper,
pub index: usize,
}
/// End: required `task` resulting in `output`, using `stamper` to create `stamp`, and the task `was_executed`.
#[derive(Clone, Debug)]
pub struct RequireTaskEnd<T, O> {
pub task: T,
pub output: O,
pub stamper: OutputStamper,
pub stamp: OutputStamp<O>,
pub was_executed: bool,
pub index: usize,
}
/// Start: execute `task`.
#[derive(Clone, Debug)]
pub struct ExecuteStart<T> {
pub task: T,
pub index: usize,
}
/// End: executed `task`, producing `output`.
#[derive(Clone, Debug)]
pub struct ExecuteEnd<T, O> {
pub task: T,
pub output: O,
pub index: usize,
}
The EventTracker
stores build events in a Vec
.
The Event
enumeration mimics the relevant Tracker
methods, but uses structs with all arguments in owned form (for example task: T
instead of task: &T
) as we want to store these events.
We also store the index of every event, so we can easily check whether an event happened before or after another.
Add the tracker implementation to pie/src/tracker/event.rs
:
impl<T: Task> Tracker<T> for EventTracker<T, T::Output> {
fn build_start(&mut self) {
self.events.clear();
}
fn require_file_end(&mut self, dependency: &FileDependency) {
let data = RequireFileEnd {
path: dependency.path().into(),
stamper: *dependency.stamper(),
stamp: *dependency.stamp(),
index: self.events.len()
};
self.events.push(Event::RequireFileEnd(data));
}
fn require_task_start(&mut self, task: &T, stamper: &OutputStamper) {
let data = RequireTaskStart { task: task.clone(), stamper: stamper.clone(), index: self.events.len() };
self.events.push(Event::RequireTaskStart(data));
}
fn require_task_end(&mut self, dependency: &TaskDependency<T, T::Output>, output: &T::Output, was_executed: bool) {
let data = RequireTaskEnd {
task: dependency.task().clone(),
stamper: *dependency.stamper(),
stamp: dependency.stamp().clone(),
output: output.clone(),
was_executed,
index: self.events.len()
};
self.events.push(Event::RequireTaskEnd(data));
}
fn execute_start(&mut self, task: &T) {
let data = ExecuteStart { task: task.clone(), index: self.events.len() };
self.events.push(Event::ExecuteStart(data));
}
fn execute_end(&mut self, task: &T, output: &T::Output) {
let data = ExecuteEnd { task: task.clone(), output: output.clone(), index: self.events.len() };
self.events.push(Event::ExecuteEnd(data));
}
}
We implement the relevant methods from Tracker
and store the build events as Event
instances in self.events
.
When a new build starts, we clear the events.
Now we will add code to inspect the build events. This is quite a bit of code that we will be using in integration testing to test incrementality and soundness. We’ll add in just two steps to keep the tutorial going, and we will use this code in the next section, but feel free to take some time to inspect the code.
First we add some methods to Event
to make finding the right event and getting its data easier for the rest of the code.
Add the following code to pie/src/tracker/event.rs
:
impl<T: Task> Event<T, T::Output> {
/// Returns `Some(&data)` if this is a [require file end event](Event::RequireFileEnd) for file at `path`, or `None`
/// otherwise.
pub fn match_require_file_end(&self, path: impl AsRef<Path>) -> Option<&RequireFileEnd> {
let path = path.as_ref();
match self {
Event::RequireFileEnd(data) if data.path == path => Some(data),
_ => None,
}
}
/// Returns `Some(&data)` if this is a [require start event](Event::RequireTaskStart) for `task`, or `None` otherwise.
pub fn match_require_task_start(&self, task: &T) -> Option<&RequireTaskStart<T>> {
match self {
Event::RequireTaskStart(data) if data.task == *task => Some(data),
_ => None,
}
}
/// Returns `Some(&data)` if this is a [require end event](Event::RequireTaskEnd) for `task`, or `None` otherwise.
pub fn match_require_task_end(&self, task: &T) -> Option<&RequireTaskEnd<T, T::Output>> {
match self {
Event::RequireTaskEnd(data) if data.task == *task => Some(data),
_ => None,
}
}
/// Returns `true` if this is a task execute [start](Event::ExecuteStart) or [end](Event::ExecuteEnd) event.
pub fn is_execute(&self) -> bool {
match self {
Event::ExecuteStart(_) | Event::ExecuteEnd(_) => true,
_ => false,
}
}
/// Returns `true` if this is an execute [start](Event::ExecuteStart) or [end](Event::ExecuteEnd) event for `task`.
pub fn is_execute_of(&self, task: &T) -> bool {
match self {
Event::ExecuteStart(ExecuteStart { task: t, .. }) |
Event::ExecuteEnd(ExecuteEnd { task: t, .. }) if t == task => true,
_ => false,
}
}
/// Returns `Some(&data)` if this is an [execute start event](Event::ExecuteStart) for `task`, or `None` otherwise.
pub fn match_execute_start(&self, task: &T) -> Option<&ExecuteStart<T>> {
match self {
Event::ExecuteStart(data) if data.task == *task => Some(data),
_ => None,
}
}
/// Returns `Some(&data)` if this is an [execute end event](Event::ExecuteEnd) for `task`, or `None` otherwise.
pub fn match_execute_end(&self, task: &T) -> Option<&ExecuteEnd<T, T::Output>> {
match self {
Event::ExecuteEnd(data) if data.task == *task => Some(data),
_ => None,
}
}
}
These methods check if the current event is a specific kind of event, and return their specific data as Some(data)
, or None
if it is a different kind of event.
Finally, we add methods to EventTracker
for inspecting events.
Add the following code to pie/src/tracker/event.rs
:
impl<T: Task> EventTracker<T, T::Output> {
/// Returns a slice over all events.
pub fn slice(&self) -> &[Event<T, T::Output>] {
&self.events
}
/// Returns an iterator over all events.
pub fn iter(&self) -> impl Iterator<Item=&Event<T, T::Output>> {
self.events.iter()
}
/// Returns `true` if `predicate` returns `true` for any event.
pub fn any(&self, predicate: impl FnMut(&Event<T, T::Output>) -> bool) -> bool {
self.iter().any(predicate)
}
/// Returns `true` if `predicate` returns `true` for exactly one event.
pub fn one(&self, predicate: impl FnMut(&&Event<T, T::Output>) -> bool) -> bool {
self.iter().filter(predicate).count() == 1
}
/// Returns `Some(v)` for the first event `e` where `f(e)` returns `Some(v)`, or `None` otherwise.
pub fn find_map<R>(&self, f: impl FnMut(&Event<T, T::Output>) -> Option<&R>) -> Option<&R> {
self.iter().find_map(f)
}
/// Finds the first [require file end event](Event::RequireFileEnd) for `path` and returns `Some(&data)`, or `None`
/// otherwise.
pub fn first_require_file(&self, path: &PathBuf) -> Option<&RequireFileEnd> {
self.find_map(|e| e.match_require_file_end(path))
}
/// Finds the first [require file end event](Event::RequireFileEnd) for `path` and returns `Some(&index)`, or `None`
/// otherwise.
pub fn first_require_file_index(&self, path: &PathBuf) -> Option<&usize> {
self.first_require_file(path).map(|d| &d.index)
}
/// Finds the first require [start](Event::RequireTaskStart) and [end](Event::RequireTaskEnd) event for `task` and
/// returns `Some((&start_data, &end_data))`, or `None` otherwise.
pub fn first_require_task(&self, task: &T) -> Option<(&RequireTaskStart<T>, &RequireTaskEnd<T, T::Output>)> {
let start_data = self.find_map(|e| e.match_require_task_start(task));
let end_data = self.find_map(|e| e.match_require_task_end(task));
start_data.zip(end_data)
}
/// Finds the first require [start](Event::RequireTaskStart) and [end](Event::RequireTaskEnd) event for `task` and
/// returns `Some(start_data.index..=end_data.index)`, or `None` otherwise.
pub fn first_require_task_range(&self, task: &T) -> Option<RangeInclusive<usize>> {
self.first_require_task(task).map(|(s, e)| s.index..=e.index)
}
/// Returns `true` if any task was executed.
pub fn any_execute(&self) -> bool {
self.any(|e| e.is_execute())
}
/// Returns `true` if `task` was executed.
pub fn any_execute_of(&self, task: &T) -> bool {
self.any(|e| e.is_execute_of(task))
}
/// Returns `true` if `task` was executed exactly once.
pub fn one_execute_of(&self, task: &T) -> bool {
self.one(|e| e.match_execute_start(task).is_some())
}
/// Finds the first execute [start](Event::ExecuteStart) and [end](Event::ExecuteEnd) event for `task` and returns
/// `Some((&start_data, &end_data))`, or `None` otherwise.
pub fn first_execute(&self, task: &T) -> Option<(&ExecuteStart<T>, &ExecuteEnd<T, T::Output>)> {
let start_data = self.find_map(|e| e.match_execute_start(task));
let end_data = self.find_map(|e| e.match_execute_end(task));
start_data.zip(end_data)
}
/// Finds the first execute [start](Event::ExecuteStart) and [end](Event::ExecuteEnd) event for `task` and returns
/// `Some(start_data.index..=end_data.index)`, or `None` otherwise.
pub fn first_execute_range(&self, task: &T) -> Option<RangeInclusive<usize>> {
self.first_execute(task).map(|(s, e)| s.index..=e.index)
}
}
We add several general inspection methods:
slice
anditer
provide raw access to all storedEvent
s,any
andone
are for checking predicates over all events,find_map
for finding the first event given some function, returning the output of that function.
Then we add methods for specific kinds of events, following the general methods.
For example, first_require_task
finds the first require task start and end events for a task, and return their event data as a tuple.
first_require_task_range
finds the same events, but returns their indices as a RangeInclusive<usize>
.
Check that the code compiles with cargo test
.
Implement composite tracker
Currently, we cannot use both EventTracker
and WritingTracker
at the same time.
We want this so that we can check incrementality and soundness, but also look at standard output for debugging, at the same time.
Therefore, we will implement a CompositeTracker
that forwards build events to 2 trackers.
Add the following code to pie/src/tracker/mod.rs
:
/// [`Tracker`] that forwards build events to 2 trackers.
#[derive(Copy, Clone, Debug)]
pub struct CompositeTracker<A1, A2>(pub A1, pub A2);
impl<T: Task, A1: Tracker<T>, A2: Tracker<T>> Tracker<T> for CompositeTracker<A1, A2> {
fn build_start(&mut self) {
self.0.build_start();
self.1.build_start();
}
fn build_end(&mut self) {
self.0.build_end();
self.1.build_end();
}
fn require_file_end(&mut self, dependency: &FileDependency) {
self.0.require_file_end(dependency);
self.1.require_file_end(dependency);
}
fn require_task_start(&mut self, task: &T, stamper: &OutputStamper) {
self.0.require_task_start(task, stamper);
self.1.require_task_start(task, stamper);
}
fn require_task_end(&mut self, dependency: &TaskDependency<T, T::Output>, output: &T::Output, was_executed: bool) {
self.0.require_task_end(dependency, output, was_executed);
self.1.require_task_end(dependency, output, was_executed);
}
fn check_dependency_start(&mut self, dependency: &Dependency<T, T::Output>) {
self.0.check_dependency_start(dependency);
self.1.check_dependency_start(dependency);
}
fn check_dependency_end(
&mut self,
dependency: &Dependency<T, T::Output>,
inconsistency: Result<Option<&Inconsistency<T::Output>>, &io::Error>
) {
self.0.check_dependency_end(dependency, inconsistency);
self.1.check_dependency_end(dependency, inconsistency);
}
fn execute_start(&mut self, task: &T) {
self.0.execute_start(task);
self.1.execute_start(task);
}
fn execute_end(&mut self, task: &T, output: &T::Output) {
self.0.execute_end(task, output);
self.1.execute_end(task, output);
}
}
CompositeTracker
is a tuple struct containing 2 trackers that implements all tracker methods and forwards them to the 2 contained trackers.
Its tuple fields are pub
so it can be constructed with CompositeTracker(t1, t2)
and the contained trackers can be accessed with c.0
and c.1
.
Check that the code compiles with cargo test
.
Now that the build event tracking infrastructure is in place, we can start integration testing!
Download source code
Integration Testing
Testing utilities
First we start by adding testing utilities (it never ends, does it?) that will make writing integration tests more convenient.
Unfortunately, we can’t use dev_shared
for this, as we would need to add a dependency to from dev_shared
to pie
, resulting in a dependency cycle because pie
depends on dev_shared
.
Development Dependency Cycle
If you would create this cycle, the code would still compile, but there would be 2 different instances of pie
at the same time: one with unit testing enabled (#[cfg(test)]
), and one without.
Even though these libraries are very similar, they are effectively 2 completely different libraries.
When pie
uses code from dev_shared
that depends again on pie
, then there will be errors about types and traits not matching.
This is probably a bug in cargo, or at least undesired behaviour. It should allow this cycle and make it work correctly, or disallow it.
We will put the utilities in a common file and use that as a module in integration tests.
Create the pie/src/tests
directory, create the pie/src/tests/common
directory, and create the pie/src/tests/common/mod.rs
file.
Add the following code to pie/src/tests/common/mod.rs
:
use std::io::{BufWriter, ErrorKind, Stdout};
use pie::{Context, Pie, Task};
use pie::tracker::CompositeTracker;
use pie::tracker::event::EventTracker;
use pie::tracker::writing::WritingTracker;
/// Testing tracker composed of an [`EventTracker`] for testing and stdout [`WritingTracker`] for debugging.
pub type TestTracker<T> = CompositeTracker<EventTracker<T, <T as Task>::Output>, WritingTracker<BufWriter<Stdout>>>;
pub fn test_tracker<T: Task>() -> TestTracker<T> {
CompositeTracker(EventTracker::default(), WritingTracker::with_stdout())
}
/// Testing [`Pie`] using [`TestTracker`].
pub type TestPie<T> = Pie<T, <T as Task>::Output, TestTracker<T>>;
pub fn test_pie<T: Task>() -> TestPie<T> {
TestPie::with_tracker(test_tracker())
}
These are just types and functions to create TestPie
instances, which are Pie
instances using CompositeTracker<EventTracker, WritingTracker>
as tracker, where the writing tracker will write to standard output.
Add the following to pie/src/tests/common/mod.rs
:
/// Testing extensions for [`TestPie`].
pub trait TestPieExt<T: Task> {
/// Require `task` in a new session, assert that there are no dependency check errors, then runs `test_assert_func`
/// on the event tracker for test assertion purposes.
fn require_then_assert(
&mut self,
task: &T,
test_assert_func: impl FnOnce(&EventTracker<T, T::Output>),
) -> T::Output;
/// Require `task` in a new session, asserts that there are no dependency check errors.
fn require(&mut self, task: &T) -> T::Output {
self.require_then_assert(task, |_| {})
}
/// Require `task` in a new session, then assert that it is not executed.
fn require_then_assert_no_execute(&mut self, task: &T) -> T::Output {
self.require_then_assert(task, |t|
assert!(!t.any_execute_of(task), "expected no execution of task {:?}, but it was executed", task),
)
}
/// Require `task` in a new session, then assert that it is executed exactly once.
fn require_then_assert_one_execute(&mut self, task: &T) -> T::Output {
self.require_then_assert(task, |t|
assert!(t.one_execute_of(task), "expected one execution of task {:?}, but it was not executed, or was executed more than once", task),
)
}
}
impl<T: Task> TestPieExt<T> for TestPie<T> {
fn require_then_assert(&mut self, task: &T, test_assert_func: impl FnOnce(&EventTracker<T, T::Output>)) -> T::Output {
let mut session = self.new_session();
let output = session.require(task);
assert!(session.dependency_check_errors().is_empty(), "expected no dependency checking errors, but there are \
dependency checking errors: {:?}", session.dependency_check_errors());
test_assert_func(&self.tracker().0);
output
}
}
We define an extension trait TestPieExt
with a require_then_assert
method, which requires a task in a new session, asserts that there are no dependency check errors, and then gives us the opportunity to perform additional assertions via a function that gives access to EventTracker
.
This is very convenient for integration testing, as most tests will follow the pattern of requiring a task and then asserting properties.
This trait also provides:
require
which isrequire_then_assert
without an assertion closure,require_then_assert_no_execute
which after requiring asserts that the task has not been executed using!t.any_execution_of(task)
fromEventTracker
,require_then_assert_one_execute
which does the same but asserts that it has been executed exactly once.
We implement TestPieExt
for TestPie
so that we can call require_then_assert
on any TestPie
instance.
Rust Help: Extension Trait
Rust does not allow adding methods to an existing type/trait to ensure forward compatibility.
For example, if your library could add a method foo
to String
, but in a later Rust version the String::foo
method would be added to the standard library, then all users of your library will run into an ambiguity and fail to compile.
Extension traits are a pattern in Rust where we can add methods to an existing type via a trait (typically named TraitExt
) and an implementation of that trait for the existing type.
Because the extension trait must be imported to make the methods available to the current module, this can only cause compatibility issues if the trait is actually imported.
We still need to define a task for testing.
Add the following to pie/src/tests/common/mod.rs
:
/// Testing tasks enumeration.
#[derive(Clone, Eq, PartialEq, Hash, Debug)]
pub enum TestTask {
Return(&'static str),
}
impl Task for TestTask {
type Output = Result<TestOutput, ErrorKind>;
fn execute<C: Context<Self>>(&self, _context: &mut C) -> Self::Output {
match self {
TestTask::Return(string) => Ok(string.to_string().into()),
}
}
}
/// [`TestTask`] output enumeration.
#[derive(Clone, Eq, PartialEq, Hash, Debug)]
pub enum TestOutput {
String(String),
}
impl From<String> for TestOutput {
fn from(value: String) -> Self { Self::String(value) }
}
impl TestOutput {
pub fn as_str(&self) -> &str {
match self {
Self::String(s) => &s,
}
}
}
We define a TestTask
enumeration containing all testing tasks, which for now is just a Return
task that just returns its string, and implement Task
for it.
The Output
for TestTask
is Result<TestOutput, ErrorKind>
so that we can propagate IO errors in the future.
TestOutput
enumerates all possible outputs for TestTask
, which for now is just a String
.
We implement From<String>
for TestOutput
so we can easily convert String
s into TestOutput
.
as_str
performs the opposite operation.
Check that the code compiles with cargo test
.
First integration test
Now we’re ready to test incrementality and soundness of the top-down incremental context through integration tests.
Create the pie/src/tests/top_down.rs
file and add to it:
use std::io;
use assert_matches::assert_matches;
use pie::tracker::event::*;
use crate::common::{test_pie, TestPieExt, TestTask::*};
mod common;
#[test]
fn test_execution() -> Result<(), io::Error> {
let mut pie = test_pie();
let task = Return("Hello, World!");
let output = pie.require_then_assert(&task, |tracker| {
let events = tracker.slice();
assert_matches!(events.get(0), Some(Event::RequireTaskStart(RequireTaskStart { task: t, .. })) if t == &task);
assert_matches!(events.get(1), Some(Event::ExecuteStart(ExecuteStart { task: t, .. })) if t == &task);
assert_matches!(events.get(2), Some(Event::ExecuteEnd(ExecuteEnd { task: t, .. })) if t == &task);
assert_matches!(events.get(3), Some(Event::RequireTaskEnd(RequireTaskEnd { task: t, .. })) if t == &task);
})?;
assert_eq!(output.as_str(), "Hello, World!");
Ok(())
}
In this first test_execution
test we are just making sure that new tasks are executed, assert that the order of operations is correct, and check the task output.
We use require_then_assert
to require the task and then perform assertions through a closure.
We’re using tracker.slice()
to get a slice of all build events, and assert (using
assert_matches!
again) that the following operations happen in order:
- start requiring
task
, - start executing
task
, - done executing
task
, - done requiring
task
.
require_then_assert
returns the output of the task, which is a Result
, so we first propagate the error with ?
.
Finally, we assert that the output equals what we expect.
Check that this test succeeds with cargo test
.
To see what test failures look like, temporarily change events.get(2)
to events.get(3)
for example.
Rust Help: Integration Testing
Integration tests in Rust are for testing whether the different parts of your library work together correctly. Integration tests have access to the public API of your crate.
In this top_down.rs
integration test file, we’re importing common/mod.rs
by creating a module for it via mod common;
.
If we create another integration testing file, we would again create a module for it in that integration testing file.
This is because every file in the tests
directory is compiled as a separate crate, and can basically be seen as a separate lib.rs
or main.rs
file.
Putting the testing utilities behind a common
directory ensures that it will not be compiled as a separate integration testing crate.
Testing incrementality and soundness
We will now test incrementality and soundness.
No dependencies
Let’s first test that requiring a task without dependencies twice, only executes it once.
Add the following test to pie/src/tests/top_down.rs
:
#[test]
fn test_reuse() -> Result<(), io::Error> {
let mut pie = test_pie();
let task = Return("Hello, World!");
// New task: execute.
let output = pie.require(&task)?;
assert_eq!(output.as_str(), "Hello, World!");
// Nothing changed: no execute
pie.require_then_assert_no_execute(&task)?;
Ok(())
}
We’re using require
and require_then_assert_no_execute
from TestPieExt
which require the same task twice, in two different sessions.
Since Return
has no dependencies, it should only ever be executed once, after which its output is cached for all eternity.
Check that this test succeeds with cargo test
.
Cargo runs tests in parallel by default, which is good to run all tests as fast as possible (and it’s also safe due to Rust’s memory-safety and thread-safety guarantees!) However, this mixes the standard outputs of all tests, which makes reading the build log from our writing tracker impossible. If you want to see the standard output, either:
- Run tests consecutively with:
cargo test -- --test-threads=1
- Run a single test in the
top_down
integration test file with:cargo test --test top_down test_reuse
The second command should result in something like:
Finished test [unoptimized + debuginfo] target(s) in 0.02s
Running tests/top_down.rs (target/debug/deps/top_down-d1d18ce2b849af7d)
running 1 test
→ Return("Hello, World!")
▶ Return("Hello, World!")
◀ Ok(String("Hello, World!"))
← Ok(String("Hello, World!"))
🏁
→ Return("Hello, World!")
← Ok(String("Hello, World!"))
🏁
test test_reuse ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s
Testing file dependencies
Next we want to test that a task with dependencies is not executed if its dependencies are consistent, and is executed when any of its dependencies are inconsistent. Therefore, we need to add a task that has dependencies.
Modify pie/src/tests/common/mod.rs
:
We add a ReadFile
task that requires a file and returns its content as a string, similar to the ones we have implemented in the past.
Modify pie/src/tests/top_down.rs
to add a new test:
In test_require_file
, we first create a temporary directory and file
, and a ReadFile
task
that reads from file
.
We require the task
task several times, and assert whether it should be executed or not:
- The task is new, so it should be executed, which we assert with
require_then_assert_one_execute
. - The task is not new, but its single require file dependency is still consistent, so it should not be executed.
- We change the file the task depends on with
write_until_modified
. - We require the task again. This time it should be executed because its file dependency became inconsistent.
We repeat the test with the FileStamper::Exists
stamper, which correctly results in the task only being executed once.
It is a new task because its stamper is different, and it is not re-executed when the file is changed due to FileStamper::Exists
only checking if the file exists.
Note that in general, the FileStamper::Exists
stamper is not a good stamper to use with ReadFile
, because it will only be re-executed when the file is added or removed.
But for testing purposes, this is fine.
Check that this test succeeds with cargo test
.
Testing task dependencies
Now it’s time to test the more complicated task dependencies.
For that, we’ll implement a task that depends on another task.
Modify pie/src/tests/common/mod.rs
:
We add a ToLower
task that requires another task (stored as Box<TestTask>
) to get a String
, which it then converts to lower case.
We also add the into_string
method to TestOutput
for conveniently getting an owned String
from a TestOutput
.
Rust Help: Boxing to Prevent Cyclic Size Calculation
We store the string providing task as Box<TestTask>
in order to prevent cyclic size calculation, which would cause TestTask
to have an undetermined size.
This is due to several reasons:
- In Rust, values are stored on the stack by default. To store something on the stack, Rust needs to know its size at compile-time.
- The size of an
enum
is the size of the largest variant. - The size of a struct is the sum of the size of the fields.
If we don’t box the task, to calculate the size of the ToLower
enum variant, we need to calculate the size of TestTask
, which would require calculating the size of the ToLower
variant, and so forth.
Therefore, we can’t calulate the size of ToLower
nor TestTask
, which is an error.
Boxing solves this because Box<TestTask>
allocates a TestTask
on the heap, and then creates a pointer to it.
Therefore, the size of Box<TestTask>
is the size of one pointer, breaking the cycle in the size calculations.
Note that this explanation simplifies many aspects of Rust’s size calculation.
Now add a test to pie/src/tests/top_down.rs
:
In test_require_task
, we create a read
task that reads from file
, and a lower
task that requires read
.
In this test, we want to test three properties:
- When we require
lower
for the first time, it will requireread
, which will requirefile
,read
will return the contents offile
as a string, andlower
will turn that string into lowercase and return it. - When we require
lower
whenfile
has not been changed, no task is executed. - When we require
lower
whenfile
’s contents have changed, then firstread
must be executed, and thenlower
must be executed with the output ofread
.
Initial require
Test the first property by adding the following code to pie/src/tests/top_down.rs
:
We require lower
for the first time and then assert properties using require_then_assert
.
The comment shows the expected build log.
Inside require_then_assert
we will make extensive use of the indexes and ranges from our EventTracker
, and use assert_matches!
to ensure these indexes and ranges exist (i.e., return Some
).
Ranges (RangeInclusive
) are just the start and end indices of events, accessed with .start()
and .end()
.
Indices are numbers (usize
) that we can compare using the standard >
operator.
A higher index indicates that the event happened later.
We get the ranges for requiring and executing the lower
and read
tasks, asserting that they are both required and executed.
Then we perform some sanity checks in assert_task_temporally_sound
:
- Require and execute end events should come after their start events.
- A task only starts being executed after it starts being required. If a task is executed without being required (and thus without being checked), we are breaking incrementality.
- A task must only finish being required after it is finished being executed. If requiring a task ends before executing it, we are breaking soundness, because we are returning an inconsistent value to the requiring task.
We confirm that file
is required and get the corresponding event index into file_require
.
Then we assert several properties:
read
is required/executed whilelower
is being required/executed.- If
read
would be executed afterlower
finished executing, we are breaking soundness because then we would have executedlower
without first requiring/executing its dependencies. - If
read
would be executed beforelower
started executing, we are breaking incrementality due to executing a task that was not required. In this test, we would not really break incrementality if this happened, but in general we could.
- If
file
is required whileread
is being executed. A sanity check to ensure the file dependency is made by the right task.
Finally, we assert that the final output of requiring lower
is "hello world!"
, which is the contents of the file in lowercase.
Check that this test succeeds with cargo test
.
That concludes the first property that we wanted to test!
No changes
The second one is easier: when file
has not changed, no task is executed.
Add the following code to pie/src/tests/top_down.rs
:
Here we change nothing and use require_then_assert_no_execute
to assert no task is executed.
Check that this test succeeds with cargo test
.
Changed file affects task
Now we test the third property, testing soundness and incrementality after a change.
Add the following code to pie/src/tests/top_down.rs
:
We first change the file
contents such that read
’s file
dependency becomes inconsistent, and then require lower
again.
In the require_then_assert
block we first assert that both tasks are required and executed, file
is required, and perform sanity checks again.
Now let’s go back to the build log in the comment, which is lot more complicated this time due to recursive consistency checking. The gist is:
- To check if
lower
should be executed, we check its dependencies: a task dependency toread
.- To check if
read
should be executed, we check its dependencies: afile
dependency, which is inconsistent, thus we executeread
. read
executes and now returns"!DLROW OLLEH"
instead of"HELLO WORLD!"
.
- To check if
- Then we are back to checking
lower
’s task dependency toread
, which is inconsistent becauseread
returns a different value, which is inconsistent due to the equals output stamper. - Thus, we execute
lower
which requiresread
. - We can skip checking
read
because we already checked and executed it: it is deemed consistent this session. We immediately return its output"!DLROW OLLEH"
tolower
. lower
turns the string lowercase and returns it.
Note that we are executing read
before executing lower
this time (but still while requiring lower
).
This is important for incrementality because if read
had not returned a different output, we would not have to execute lower
due to its equals output stamp still being consistent (we call this early cutoff).
We test this property with the last 3 assertions in the require_then_assert
block.
Finally, we assert that the output is "!dlrow olleh"
as expected.
Confirm that this test succeeds with cargo test
.
Now that we’re testing task dependencies anyway, let’s also test a fourth property: the early cutoff behaviour.
Early cutoff
Early cutoff can happen in this test when read
is re-executed due to its file dependency being inconsistent (modified file stamp change), but returns the same output as last time.
In that case, we don’t have to execute lower
because its task dependency to read
is still consistent (equals output stamp is the same).
We can trigger this case in this test by changing file
such that its last modified date changes, but its contents stay the same.
Add the following code to pie/src/tests/top_down.rs
:
We change file
in the way we discussed, and then assert that read
is executed, but lower
is not.
Confirm that this test succeeds with cargo test
.
Early cutoff is one of the great benefits of a build system with precise dynamic dependencies. In larger builds, it can cut off large parts of the build which do not need to be executed.
In our build system, we only have the simple equals output stamper. But if you extend the build system with user-defined stampers (which isn’t too hard), task authors have much more control over early cutoff. For example, we could require a task that parses a configuration file, but use a stamper that extracts only the particular configuration option our task is using. Then, our task will only be re-executed if that configuration option changes.
Thus, stampers can increase the precision of task dependencies, which in turn increases incrementality with early cutoff.
Nice! These tests give quite some confidence that what we’ve been doing so far seems to be sound and incremental. We can (and should) of course write more tests for better coverage of the implementation. For example, we haven’t tested tasks with multiple dependencies yet. However, in this tutorial we will move on to a couple of specific tests first, because there are several issues still hiding in our implementation: (at least) one bug, and three soundness holes. After we’ve uncovered those issues and fix them, feel free to write more tests yourself!
Download source code
Fix Superfluous Task Dependency
There is actually a massive bug in our implementation. If you noticed the issue while following this tutorial, you’re pretty observant! If you didn’t catch it, don’t worry. I actually didn’t catch it either up until this point!
I decided to keep the bug in to show how hard it is to test incrementality and soundness.
Let’s start by adding another testing task which we need to uncover the bug, and then write a test that manifests the bug.
Add ToUpper
task
Modify pie/src/tests/common/mod.rs
to add another task:
The ToUpper
task does (as expected) the opposite of the ToLower
task: it requires the string providing task and returns the string in uppercase.
Test case setup
No we set up a test case uncover the bug.
Modify pie/src/tests/top_down.rs
to add another test:
This test is similar to the previous one, but we have added a ToUpper
task which requires the ToLower
task.
We first require ToLower
and assert that only ToLower
and ReadFile
are executed.
ToUpper
should not be executed because we have not required it, and neither ToLower
nor ReadFile
require it.
Then, we require ToUpper
and assert that it is executed.
Neither ToLower
nor ReadFile
should be executed because their dependencies are still consistent.
Check that this test, so far, succeeds with cargo test
.
You can inspect the build log with cargo test --test top_down test_no_superfluous_task_dependencies
to see what is going on, but it should look pretty normal.
The important part of this setup is that ToLower
returns "hello, world!"
.
Manifest the bug
Manifest the bug by modifying pie/src/tests/top_down.rs
:
We change file
in a very specific way: we capitalize the l
characters to L
characters.
We do this to trigger early cutoff.
By changing file
in this way, we expect ReadFile
to execute and return "HeLLo, WorLd!"
.
This in turn means that ToLower
’s task dependency to ReadFile
is inconsistent, because the output changed, so ToLower
is executed.
However, ToLower
changes those L
characters back into l
and returns "hello, world!"
, which is the same as last time.
Therefore, ToUpper
’s task dependency to ToLower
is still consistent, and we can cut off the build early.
We assert this inside the require_then_assert
block.
But, if you run the tests with cargo test
, this test will fail!
How can that be?
Test test_no_superfluous_task_dependencies
will fail as expected, which we will fix in this section!
Inspect the build log with cargo test --test top_down test_no_superfluous_task_dependencies
.
The third (last) build log should look like this:
→ ToUpper(ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)))
? ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
→ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
✗ /tmp/.tmpzsuOJk/in.txt (old: Modified(Some(SystemTime { tv_sec: 1703256545, tv_nsec: 289192956 })) ≠ new: Modified(Some(SystemTime { tv_sec: 1703256545, tv_nsec: 293192959 })))
▶ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
- /tmp/.tmpzsuOJk/in.txt
◀ Ok(String("HeLLo, WorLd!"))
← Ok(String("HeLLo, WorLd!"))
✗ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified) (old: Equals(Ok(String("Hello, World!"))) ≠ new: Equals(Ok(String("HeLLo, WorLd!"))))
▶ ToUpper(ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)))
→ ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified))
? ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
→ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
← Ok(String("HeLLo, WorLd!"))
✗ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified) (old: Equals(Ok(String("Hello, World!"))) ≠ new: Equals(Ok(String("HeLLo, WorLd!"))))
▶ ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified))
→ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
← Ok(String("HeLLo, WorLd!"))
◀ Ok(String("hello, world!"))
← Ok(String("hello, world!"))
◀ Ok(String("HELLO, WORLD!"))
← Ok(String("HELLO, WORLD!"))
In this last build, ToUpper
is required, and it will check its dependency to ReadFile
.
But that shouldn’t happen, because ToUpper
only has a dependency to ToLower
!
There seems to be a bug where ToLower
’s task dependency to ReadFile
, somehow ended up with ToUpper
.
We need to go back to our consistency checking code to find the cause.
Finding the cause
In the previous chapter, we implemented dynamic dependencies including an is_inconsistent
method to check if a dependency is consistent.
This is the code we used for task dependencies:
pub fn is_inconsistent<C: Context<T>>(&self, context: &mut C) -> Option<OutputStamp<T::Output>> {
let output = context.require_task(&self.task);
let new_stamp = self.stamper.stamp(output);
if new_stamp == self.stamp {
None
} else {
Some(new_stamp)
}
}
To check if a task dependency is consistent, we call require
on the context (which calls require_task_with_stamper
with a default stamper).
Later on we implemented this require_task_with_stamper
method for TopDownContext
:
self.session.tracker.require_task_start(task, &stamper);
let node = self.session.store.get_or_create_task_node(task);
// Get required task output by executing it if needed, or by getting the output from the store if not needed.
let already_consistent = self.session.consistent.contains(&node);
let should_execute = !already_consistent && self.should_execute_task(&node);
let output = if should_execute {
self.session.tracker.execute_start(task);
self.session.store.reset_task(&node);
let previous_executing_task = self.session.current_executing_task.replace(node);
let output = task.execute(self);
self.session.current_executing_task = previous_executing_task;
self.session.store.set_task_output(&node, output.clone());
self.session.tracker.execute_end(task, &output);
output
} else {
// Correctness: when `should_execute_task` returns `true`, the above block is executed. Otherwise this block is
// executed and `should_execute_task` ensures that the task has an output.
self.session.store.get_task_output(&node).clone()
};
let dependency = TaskDependency::new(task.clone(), stamper, output.clone());
self.session.tracker.require_task_end(&dependency, &output, should_execute);
// Create task require dependency if a task is currently executing (i.e., we are not requiring the initial task).
if let Some(current_executing_task_node) = &self.session.current_executing_task {
if self.session.store.add_task_require_dependency(current_executing_task_node, &node, dependency).is_err() {
let current_executing_task = self.session.store.get_task(current_executing_task_node);
panic!("Cyclic task dependency; current executing task '{:?}' is requiring task '{:?}' which was already required", current_executing_task, task);
}
}
self.session.consistent.insert(node);
output
}
}
In the if let Some(current_executing_task_node)
block we are adding a task dependency from the current executing task (if any), to the task being required.
This is the cause of the bug.
Even if we are only consistency checking a task to see if it should be executed, we could end up adding a task dependency to the current executing task, which is not correct.
We only manifested the bug in the last test due to having a chain of 2 task dependencies, and by carefully controlling what is being executed and what is being checked.
Recall the second build in the test_no_superfluous_task_dependencies
test.
The build log for that build looks like:
→ ToUpper(ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)))
▶ ToUpper(ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)))
→ ToLower(ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified))
? ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
→ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
✓ /tmp/.tmpzsuOJk/in.txt
← Ok(String("Hello, World!"))
✓ ReadFile("/tmp/.tmpzsuOJk/in.txt", Modified)
← Ok(String("hello, world!"))
◀ Ok(String("HELLO, WORLD!"))
← Ok(String("HELLO, WORLD!"))
In this build we are executing ToUpper
, then require ToLower
, then consistency check ReadFile
, which in turn requires ReadFile
.
At the end of that require, an incorrect dependency from ToUpper
to ReadFile
is made (although not really visible in the log).
This incorrect dependency then later breaks incrementality.
To fix this bug, we need to make sure that we only add task dependencies when an executing task directly requires another task, not when consistency checking!
Fixing the bug
To fix the bug, we will separate the process of making a task consistent, which does not add task dependencies, from requiring a task, which can add task dependencies.
We will split this part into a make_task_consistent
method.
Modify the top-down context from pie/src/context/top_down.rs
:
We extracted the core of the require_task_with_stamper
method into a make_task_consistent
method, and call make_task_consistent
in require_task_with_stamper
.
This is a refactoring, so the require_task_with_stamper
method will behave the same as before.
To reiterate, make_task_consistent
makes given task is consistent by checking the task, executing it if inconsistent, and returning its output.
If it is already consistent, we return the cached output.
In both cases we also use and update self.session.consistent
: the set of already consistent tasks this session.
Now we need to change the is_inconsistent
methods on dependencies to use make_task_consistent
instead.
However, the is_inconsistent
method is generic over Context
, which doesn’t expose the make_task_consistent
method.
We also do not want to expose make_task_consistent
to users of the library, as it would allow tasks authors to make tasks consistent without adding dependencies, which could break incrementality.
Therefore, we will define a MakeConsistent
trait in the dependency module, and have TopDownContext
implement that.
Modify pie/src/dependency.rs
:
We add the MakeConsistent
trait and use it in is_inconsistent
.
Now we go back to TopDownContext
and implement that trait.
Modify pie/src/context/top_down.rs
:
We implement the MakeConsistent
trait on TopDownContext
, forwarding it to the make_task_consistent
method.
We also need to implement this trait for the NonIncrementalContext
.
Modify pie/src/context/non_incremental.rs
:
If you run cargo test
now, test_no_superfluous_task_dependencies
should succeed, indicating that we fixed the bug!
However, test_require_task
now fails 😅.
An assert!(execute.start() > require.start())
in this test now fails, which is a sanity check asserting that “executes starts” should be later than “require starts”
This is because our changes have correctly removed several superfluous task requires, which influences these assertions.
Inspect the build log for this test with cargo test --test top_down test_require_task
.
The second build now looks like:
→ ToLower(ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified))
? ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified)
✓ /tmp/.tmpngoUZ4/in.txt
✓ ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified)
← Ok(String("hello world!"))
In this second build, ReadFile
is now no longer required, and instead is only checked.
This is correct, and does not make any assertions fail.
The third build now looks like:
→ ToLower(ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified))
? ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified)
✗ /tmp/.tmpngoUZ4/in.txt (old: Modified(Some(SystemTime { tv_sec: 1703256546, tv_nsec: 177193271 })) ≠ new: Modified(Some(SystemTime { tv_sec: 1703256546, tv_nsec: 181193272 })))
▶ ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified)
- /tmp/.tmpngoUZ4/in.txt
◀ Ok(String("!DLROW OLLEH"))
✗ ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified) (old: Equals(Ok(String("HELLO WORLD!"))) ≠ new: Equals(Ok(String("!DLROW OLLEH"))))
▶ ToLower(ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified))
→ ReadFile("/tmp/.tmpngoUZ4/in.txt", Modified)
← Ok(String("!DLROW OLLEH"))
◀ Ok(String("!dlrow olleh"))
← Ok(String("!dlrow olleh"))
In this third build, ReadFile
is also no longer required at the start, but later on in the build it is required when ToLower
is executed.
This is correct, as it is only checked (using make_task_consistent
) at the start, but required later while ToLower
is executing.
The problem is that this assertion just does not hold anymore, as a task can be executed without first being required. What does hold, is that a task is only executed after being checked or required. However, we don’t track checking in the event tracker, so we will just remove this assertion to keep the tutorial going. We will also update the expected build logs in the comments to reflect our changes.
Fix the tests by modifying pie/src/tests/top_down.rs
:
Confirm this fixes the tests with cargo test
.
All tests are green! 🎉🎉🎉
Download source code
Prevent Overlapping File Writes
So far we have only considered reading files in the build system. However, there are many tasks that also write files. For example, a C compiler reads a C source file (and header files) and writes an object file with the compiled result, which is typically an intermediate file that gets passed to a linker. Another example is a file copy task that reads a file and copies it to another file.
We can handle file writes in tasks with context.require_file
.
However, what should happen when two tasks write to the same file?
In a non-incremental setting, the last writer wins by overwriting (or appending to) the file.
Does this behaviour also occur in our incremental build system?
Unfortunately, this is not always the case in our incremental build system, because we can require
individual tasks in a specific order that would cause an inconsistency, making the first writer win.
This is a bit tricky to explain without an example, so we will first add some testing tasks and write a test that showcases the problem.
In this section, we will continue with:
- Add the
WriteFile
andSequence
tasks to the testing tasks. - Create a
test_overlapping_file_write
test to showcase the issue. - Introduce a new kind of dependency: a provide file dependency for writing to (and creating) files.
- Prevent overlapping file writes by checking for them at runtime, fixing the issue.
- Improve and add additional tests
Add WriteFile
and Sequence
tasks
Add the WriteFile
and Sequence
tasks to pie/tests/common/mod.rs
:
WriteFile
requires a string providing task to produce a string, writes that string to given file, then requires the file with given stamper to create a dependency.
It uses write_until_modified
to ensure that writes change the modification time, which we need for consistent testing.
Sequence
requires multiple tasks stored as a Vec<TestTask>
.
Both return TestOutput::Unit
when successful, but propagate errors.
TestOutput::Unit
is like ()
, the unit type with a single value.
Because TestOutput
now has two variants, the as_str
and into_string
methods can now fail with a panic (which is fine for testing).
Why not use the Inconsequential Stamper?
Sequence
ignores Result::Ok
outputs from required tasks, but it propagates Result::Err
outputs.
Therefore, we cannot use the Inconsequential
output stamper, as it would not re-execute Sequence
when a task it requires goes from returning Ok
to Err
, and vice versa.
We could, however, implement a stamper that ignores changes to the Ok
variant of results, but not the Err
variant, to increase incrementality.
Test to showcase the issue
Now we write a test to showcase the issue.
Add the following test to pie/tests/top_down.rs
:
In this test, we create 2 WriteFile
tasks that both write to output_file
.
write_1
gets the string to write from ret
, which returns "Hi there"
.
write_2
gets the string to write from read
, which returns the contents of input_file
.
Both write tasks are put into a Sequence
, with first write_1
, then write_2
.
We require seq
and assert that output_file
should contain the result of executing write_2
, which requires read
to get the contents of input_file
.
This result makes sense, it’s what would happen in a non-incremental setting.
However, we then modify input_file
to make write_2
inconsistent, and then require write_1
directly instead of requiring seq
.
The result is that output_file
now contains "Hi there"
, even though write_2
is inconsistent!
This behaviour stems from the fact that we can require
individual tasks, which is actually a great feature, not a bug!
When we require
a task, we are asking the build system to make that task consistent, and get its up-to-date output.
We are not asking the build system to make all tasks consistent.
The build system recursively checks and executes only the tasks that are absolutely necessary to make that task consistent.
If it would not do that, it would not truly be incremental!
Therefore, we cannot (and shouldn’t) get rid of this feature, and instead need to find another solution to this problem.
While we require tasks “manually” here, through the Pie
/ Session
API, this problem can also occur with tasks that require other tasks.
For example, if seq
would just be Sequence(vec![write_1])
, and we’d end up in the same inconsistent state when requiring seq
.
Especially in large incremental builds with many different tasks, this can easily occur accidentally, causing subtle incrementality bugs.
Let’s go back to the test.
In the test, output_file
is not in a consistent state because write_2
is inconsistent and needs to be executed to bring output_file
into a consistent state.
However, if write_2
would write to another file, there would be no inconsistency.
Let’s write a test with separate output files.
Add the following test to pie/tests/top_down.rs
:
#[test]
fn test_separate_output_files() -> Result<(), io::Error> {
let mut pie = test_pie();
let temp_dir = create_temp_dir()?;
let ret = Return("Hi there");
let output_file_1 = temp_dir.path().join("out_1.txt");
let write_1 = WriteFile(Box::new(ret.clone()), output_file_1.clone(), FileStamper::Modified);
let input_file = temp_dir.path().join("in.txt");
write(&input_file, "Hello, World!")?;
let read = ReadFile(input_file.clone(), FileStamper::Modified);
let output_file_2 = temp_dir.path().join("out_2.txt");
let write_2 = WriteFile(Box::new(read.clone()), output_file_2.clone(), FileStamper::Modified);
let seq = Sequence(vec![write_1.clone(), write_2.clone()]);
pie.require(&seq)?;
assert_eq!(read_to_string(&output_file_1)?, "Hi there");
assert_eq!(read_to_string(&output_file_2)?, "Hello, World!");
write_until_modified(&input_file, "World, Hello?")?;
// Require `write_1` to make `output_file_1` consistent.
pie.require_then_assert_no_execute(&write_1)?;
assert_eq!(read_to_string(&output_file_1)?, "Hi there");
// Require `write_2` to make `output_file_2` consistent.
pie.require_then_assert_one_execute(&write_2)?;
assert_eq!(read_to_string(&output_file_2)?, "World, Hello?");
Ok(())
}
Here, write_1
writes to output_file_1
, and write_2
writes to output_file_2
.
Thus, requiring write_1
makes output_file_1
consistent.
Requiring write_2
makes output_file_2
consistent.
The last two require_then_assert_no_execute
statements do this, and there are no inconsistencies with these separate output files.
Therefore, to prevent confusion, inconsistencies, and (subtle) incrementality bugs, we will detect overlapping file writes and disallow them.
Before continuing, confirm both tests succeed with cargo test
.
We will modify the first test to assert the desired behaviour later.
Reduce Programming Errors by Returning Paths
In this last test, we can still make a programming error where we read an output file without first requiring the task that makes that output file consistent.
We can mostly solve that by having WriteFile
return the path it wrote to:
TestTask::WriteFile(string_provider_task, path, stamper) => {
let string = context.require_task(string_provider_task.as_ref())?.into_string();
write_until_modified(path, string.as_bytes()).map_err(|e| e.kind())?;
context.require_file_with_stamper(path, *stamper).map_err(|e| e.kind())?;
Ok(TestOutput::Path(path.clone()))
}
Then you can have WriteFile
take ownership of the path so we don’t accidentally use it:
let ret = Return("Hi there");
let write_1 = WriteFile(Box::new(ret.clone()), temp_dir.path().join("out_1.txt"), FileStamper::Modified);
And you can read the output file with:
{
let output_file = pie.require_then_assert_no_execute(&write_1)?;
assert_eq!(read_to_string(output_file.as_path())?, "Hi there");
}
You can still manually construct the path to the output file and read it to break this, but at least this prevents most accidental reads.
Implement provided files
We currently have no means to disallow overlapping file writes. We only have one kind of file dependency: require file, which is currently used for both reading from and writing to files. It’s perfectly fine to read from a single file from multiple tasks, so we can’t disallow multiple tasks from creating a require file dependency to the same file. Therefore, we must introduce a new kind of dependency for writing to (and creating) files: the provide file dependency. A file may only be provided by one task.
To implement this dependency, we will:
- Add a
ProvideFile
variant toDependency
. - Update
Tracker
and implementations to handle file provide dependencies. - Add a
add_file_provide_dependency
method toStore
. - Add
provide_file
methods toContext
and implement them in implementations.
Add ProvideFile
variant to Dependency
Modify pie/src/dependency.rs
:
We add the ProvideFile
variant to Dependency
, handle it in is_inconsistent
, and update the unit test to also test that variant.
If you compile the code, you’ll get an error because this new variant needs to be handled in WritingTracker
, so let’s update the trackers first.
Update Tracker
and implementations
Update pie/src/tracker/mod.rs
:
We add provide_file_end
to Tracker
and handle it in CompositeTracker
.
Update pie/src/tracker/writing.rs
:
We change require_file_end
to print r
instead of -
to distinguish it from file provides.
We implement the provide_file_end
method, printing the provided file.
Finally, we support the Dependency::ProvideFile
variant by adding a branch for it in the match statement.
This fixes the compilation error.
Check that everything works with cargo test
.
Add add_file_provide_dependency
method to Store
First we need to support provide file dependencies in the Store
.
Update pie/src/store.rs
:
We add the add_file_provide_dependency
method, which does the same as add_require_provide_dependency
but creates a ProvideFile
dependency instead.
We update the test_dependencies
unit test to create a provide file dependency, and add a test to check whether add_require_provide_dependency
panics when used incorrectly.
Confirm the changes work with cargo test
.
Add methods to Context
and implementations
We are not creating provide file dependencies yet, so let’s work on that now.
Add methods to Context
, enabling tasks to create provide file dependencies, in pie/src/lib.rs
:
These methods are like require_file
, but must be called after writing to the file, so that the stamper creates a stamp that includes the (meta)data that was written to the file.
Therefore, these methods do not return a File
handle, because the caller creates a file handle for writing.
Implement this method in pie/src/context/non_incremental.rs
:
The non-incremental context just ignores provided files.
Implement the method in pie/src/context/top_down.rs
:
Again, this method is similar to the requiring version, except that it creates a provide file dependency, and returns ()
instead of a file handle.
Check that your changes work with cargo test
.
Detect and disallow overlapping provided files
Now we will detect and disallow overlapping provided files.
The only source of provided files is the provide_file_with_stamper
method we just implemented.
Therefore, we can easily check for overlapping dependencies there.
Whenever a file provide dependency is made, we just need to check if a task is already providing that file, and disallow that.
First add a method to Store
to get the providing task (if any) for a file.
Modify pie/src/store.rs
:
get_task_providing_file
does exactly that.
We get an iterator over incoming dependency edges for the destination file node using get_incoming_edges
.
We use
filter_map
to both filter out non-provide file dependencies, and map to a TaskNode
when it is a provide file dependency.
Only tasks can be the source of provide file dependencies, so it is always correct to create a TaskNode
here.
We get the first (if any) element of the iterator with .next()
, which is the task that provides the file.
Because we will disallow multiple tasks from providing a file, this method will return Option<TaskNode>
, since there can only be 0 or 1 task providing a file.
We modify the dependency test again, testing that get_task_providing_file
returns what we expect.
We assert (in development builds, like get_dependencies_of_task
) that the file node must exist in the dependency graph, as a sanity check, and test that.
Now we’ll use this in provide_file_with_stamper
to panic when overlap is detected.
Modify pie/src/context/top_down.rs
:
If we find a task that provides the file, we panic with a message explaining why.
Note that we don’t have to check whether previous_providing_task_node == current_executing_task_node
, and then not panic.
This is because when executing a task, we first reset it, which removes all its outgoing dependencies.
Therefore, current_executing_task_node
cannot have a provide file dependency to the file.
Unless it provides the same file twice, but that is overlap that we also want to disallow.
We discussed panicking in the section on the incremental top-down context, but want to reiterate it here.
Instead of panicking, we could have provide_file_with_stamper
return an error indicating overlap was found.
However, that error would then propagate throughout the entire API.
Tasks would have to propagate it in their execute
method, meaning that Context::require
will also be able to return this error.
When tasks already return their own errors, you’d end up with return types such as Result<Result<AnOutput, AnError>, OverlapError>
which are annoying to deal with.
This is a hard trade-off to make, but in this tutorial (and in the actual PIE library) we will panic.
Confirm our changes work with cargo test
.
Wait, shouldn’t the overlap test now fail?
No, we didn’t change our WriteFile
task to use provide_file
yet.
Let’s fix that now.
Modify pie/tests/common/mod.rs
:
Just replace require_file_with_stamper
with provide_file_with_stamper
in WriteFile
.
Running cargo test
should make the overlap test now fail!
Modify the test in pie/tests/top_down.rs
:
We change the test into one that should panic.
We use expected = "Overlapping provided file"
to indicate that the panic should include "Overlapping provided file"
, so that the test does not succeed due to another unrelated panic.
Unfortunately, tests that should panic may not return a Result
.
We work around that by wrapping the entire test in a nested run
function that returns a Result
, and by calling run().unwrap()
at the end of the test.
We rename the test to test_overlapping_provided_file_panics
which better describes what it is testing and what is expected.
And we simply the test a lot, because it will panic when we call require
, so the other part of the test is no longer required.
Run cargo test
to check that this test will now succeed.
Test test_overlapping_provided_file_panics
(was: test_show_overlap_issue
) should now succeed.
Let’s add two more tests: one that confirms overlap is detected when we manually require
two different tasks, and one that confirms that requiring (and executing) the same task does not cause overlap.
Add these tests to pie/tests/top_down.rs
:
Confirm that these tests also succeed with cargo test
.
We’re now preventing the inconsistencies of overlapping file writes that occur in an incremental setting. This does require some care when writing to files in a programmatic incremental build system, as task authors need to ensure that distinct tasks only write to distinct files. And we only detect this at run-time, while running the build, so task authors must test their tasks, combinations of tasks, and with different inputs, to have some certainty that their tasks have no overlapping file writes. However, I think this kind of run-time checking is preferable over incremental builds being inconsistent or incorrect.
Detect Overlap Statically?
As far as I know, there is no easy way to detect overlap statically in the presence of dynamic dependencies and incrementality. You’d have to encode file names and paths in the type system, and restrict what kind of names and paths you can use.
Matthew Hammer et al. developed Fungi, a typed functional language for incremental computation with names to solve these kind of problems, but it is quite involved! Be sure to read that paper and their previous work on Adapton (non-HTTPS) if you’re interested in that line of research.
In the next section, we will detect and disallow another inconsistency in incremental builds: hidden dependencies.
Download source code
Prevent Hidden Dependencies
There is one more file-based inconsistency in our incremental build system that we need to prevent: hidden dependencies. A hidden dependency occurs when a task R requires file F that is provided by another task P, without task R requiring task P.
Hidden dependencies are problematic for the same reason as overlapping provided files: we can require tasks in a specific order that causes an inconsistency. For example, we could first require task R, which reads file F, and then we could require task P, which writes to and changes file F in such a way that R’s dependency to it becomes inconsistent. This is incorrect, because we made task R consistent while its file dependency to F is inconsistent, so R should be inconsistent!
To prevent this problem, task R needs to require task P. Then, when task R is required, task P will always first be made consistent, first writing its changes to file F, before task R reads file F.
This is all a bit abstract so let’s do the same as the previous section: write tests to show the problem. In this section, we will:
- Create tests to showcase the hidden dependency problem.
- Prevent hidden dependencies by checking for them at runtime, fixing the issue.
- Improve and add additional tests.
Test to showcase the issue
Add the following test to pie/tests/top_down.rs
:
// Hidden dependency tests
#[test]
fn test_hidden_dependency() -> Result<(), io::Error> {
let mut pie = test_pie();
let temp_dir = create_temp_dir()?;
let file = temp_dir.path().join("in_out.txt");
write(&file, "Hello, World!")?;
let read = ReadFile(file.clone(), FileStamper::Modified);
let input_file = temp_dir.path().join("in.txt");
write(&input_file, "Hi there")?;
let read_for_write = ReadFile(input_file.clone(), FileStamper::Modified);
let write = WriteFile(Box::new(read_for_write.clone()), file.clone(), FileStamper::Modified);
// Require `write` and `read`, assert they are executed because they are new.
pie.require_then_assert_one_execute(&write)?;
assert_eq!(read_to_string(&file)?, "Hi there");
let output = pie.require_then_assert_one_execute(&read)?;
assert_eq!(output.as_str(), "Hi there");
// Although there is a hidden dependency here (`read` doesn't require `write`), we happened to have required `write`
// and `read` in the correct order, so there is no inconsistency yet. The output of `read` is `"Hi there"` which is
// correct.
Ok(())
}
In this test, task read
reads from file
, and task write
writes to file
.
Task write
gets the string to write through read_for_write
which reads it from input_file
.
There is a hidden dependency here, because read
reads file
, which is provided by write
, without read
actually requiring write
.
We can say that the dependency from read
to write
is hidden by file
.
We first require write
, assert that it is executed, and assert that file
now contains "Hi there"
which is what write
wrote into file
.
Then we require read
and assert that it is executed and returns "Hi there"
.
Even though there is a hidden dependency, we have not observed an inconsistency yet, because we’ve required the tasks in the correct order.
Now extend this test in pie/tests/top_down.rs
:
We change input_file
such that read_for_write
becomes inconsistent, making write
inconsistent.
Basically, write
now wants to write "Hello There!"
to file
.
We then require the tasks in the opposite order, first read
and then write
, but the result is incorrect.
Requiring read
still returns "Hi there"
, even though write
is inconsistent and needs to first write "Hello There!"
to file
before read
reads it!
Requiring read
should really return "Hello There!"
.
Similarly to overlapping provided files, this inconsistent behaviour is caused by the ability to require individual tasks, and our build system (incrementally and correctly) making only the required task (and its dependencies) consistent. This inconsistent behaviour is undesirable, and should be prevented.
Before continuing, confirm the test succeeds with cargo test
.
We will modify this test to assert the desired behaviour later.
Prevent hidden dependencies
There are two ways in which a hidden dependency can be manifested:
- When a task R requires file F: if F is provided by task P, and R does not require P, there is a hidden dependency.
- When a task P provides file F: if F is required by tasks R*, and one or more tasks from R* does not require P, there is a hidden dependency.
We already saw an example of the first case in the test. The second case occurs when a task first requires a file that is not yet provided by a task, but then later on a task provides it. In both cases, the hidden dependency can result in tasks reading from a file that will later be written to (provided) by another task, which leaves those reading tasks in an inconsistent state.
We will need to check for both cases.
The first case can be checked in the require_file_with_stamper
method, and the second one in provide_file_with_stamper
.
Both checks need some way to query whether a task depends on another task.
We could query whether task R depends on P directly, and that would work fine.
However, sometimes task R will not require P directly, but require it through some other task(s) that require P eventually.
This is still correct, because P will be made consistent before R.
Therefore, we need to add a method to Store
to query whether a task directly or indirectly (also called transitively) depends on another.
Furthermore, in the second check we need to get all tasks that require a file, for which we will also need a Store
method.
Add Store
methods
Let’s add those methods to Store
.
Modify pie/src/store.rs
:
We add the get_tasks_requiring_file
method that does what it says on the tin.
It is almost identical to get_task_providing_file
, but returns an Iterator
because multiple tasks can require a single file.
We also have to make the lifetimes more explicit, to explain to Rust that the lifetimes on self
and dst
are not related to the implicit '_
lifetime of the iterator.
This works because we are not borrowing anything in the iterator, because our filter_map
copies nodes with TaskNode(*n)
.
The contains_transitive_task_dependency
method also does what it says.
Luckily, the graph library takes care of this query.
Per usual, add some tests for these methods in pie/src/store.rs
:
We assert that the new methods return what is expected in test_dependencies
, and add tests confirming panics when used on non-existent nodes.
Add checks to TopDownContext
Now we can add hidden dependency checks to TopDownContext
.
Add the checks to pie/src/context/top_down.rs
:
We add the first check to require_file_with_stamper
, where the current executing task is requiring a file.
We check whether a task provides the file being required, and (inside the if let
) check that the current executing task requires the providing task.
If not, we panic with a hidden dependency error.
Similarly, in provide_file_with_stamper
, we perform the second check.
Because multiple task can require a file, we perform the check for every requiring task with a for
loop.
If any requiring task fails the check, there is a hidden dependency, and panic with an error.
That’s it!
Test your changes with cargo test
, which should make the test_hidden_dependency
test fail as expected!
Fixing and improving the tests
Like with the overlapping provided file test, we’ll heavily simplify our test to only test that it panics.
Modify pie/tests/top_down.rs
:
We check for a "Hidden dependency"
panic, rename the test, wrap it in a nested run
function to support Result
, and simplify it.
The second call to require_then_assert_one_execute
will panic due to a hidden dependency: read
requires file
without a task dependency to write
.
Now add a test for the second case to pie/tests/top_down.rs
:
#[test]
#[should_panic(expected = "Hidden dependency")]
fn test_provide_hidden_dependency_panics() {
fn run() -> Result<(), io::Error> {
let mut pie = test_pie();
let temp_dir = create_temp_dir()?;
let file = temp_dir.path().join("in_out.txt");
write(&file, "Hello, World!")?;
let read = ReadFile(file.clone(), FileStamper::Modified);
let write = WriteFile(Box::new(Return("Hi there")), file.clone(), FileStamper::Modified);
pie.require_then_assert_one_execute(&read)?;
pie.require_then_assert_one_execute(&write)?;
Ok(())
}
run().unwrap();
}
Here, the second call to require_then_assert_one_execute
will panic due to a hidden dependency: write
provides file
which is required by read
which does not have a task dependency to write
.
Confirm both tests succeed with cargo test
.
All tests are succeeding again 🎉.
Test test_require_hidden_dependency_panics
(was: test_hidden_dependency
) should now succeed.
We should also write some tests that show that non-hidden (visible?) dependencies do actually work.
However, our ReadFile
task is not capable of making task dependencies at all, so we will need to fix that first (and refactor all uses of ReadFile
unfortunately).
Modify pie/tests/common/mod.rs
:
We add an optional task argument to ReadFile
, which we require when the read task is executed.
We call this optional task argument an origin
, a shorthand for “originating task”.
This is a pattern that appears in programmatic build systems, where a task requires certain files, but those files could be provided by another task.
Instead of Option<Box<TestTask>>
, we can also use Vec<TestTask>
if multiple originating tasks are required.
Due to disallowing hidden dependencies, we need to make these originating tasks explicit, which unfortunately requires some additional work when authoring tasks. However, the reward is that the build system will incrementalize running our tasks for free, and also ensure that the incremental build is correct. Also, I think it is not such a bad idea to be explicit about these dependencies, because these really are dependencies that exist in the build!
Returning Paths
A slightly cleaner approach would be to make WriteFile
return the path it wrote to, and change ReadFile
to accept a task in place of its PathBuf
argument.
Then we could pass a WriteFile
task as the path argument for ReadFile
.
We already hinted to this approach in the “Reduce Programming Errors by Returning Paths” block from the previous section.
However, that change would require a bigger refactoring, so we’ll go with the simpler (but also more flexible) approach in this tutorial.
Now we need to refactor the tests to provide None
as the origin task for every ReadFile
task we create.
Modify pie/tests/top_down.rs
:
Confirm your changes are correct with cargo test
.
Now add the following test to pie/tests/top_down.rs
:
#[test]
fn test_non_hidden_dependency() -> Result<(), io::Error> {
let mut pie = test_pie();
let temp_dir = create_temp_dir()?;
let file = temp_dir.path().join("in_out.txt");
write(&file, "Hello, World!")?;
let input_file = temp_dir.path().join("in.txt");
write(&input_file, "Hi There!")?;
let read_input = ReadFile(input_file.clone(), FileStamper::Modified, None);
let write = WriteFile(Box::new(read_input.clone()), file.clone(), FileStamper::Modified);
let read = ReadFile(file.clone(), FileStamper::Modified, Some(Box::new(write.clone())));
// Require `read`, which requires `write` to update the provided file. All tasks are executed because they are new.
let output = pie.require_then_assert(&read, |tracker| {
assert!(tracker.one_execute_of(&read));
assert!(tracker.one_execute_of(&write));
assert!(tracker.one_execute_of(&read_input));
})?;
// `read` should output what `write` wrote, which is what `read_input` read from `input_file`.
assert_eq!(output.as_str(), "Hi There!");
Ok(())
}
This is similar to earlier tests, but now we create an explicit dependency from read
to write
by passing in the write
task as the last argument to ReadFile
.
When we require read
, it will first require its origin task write
to make file
up-to-date, and then require file
and read from it.
This is not a hidden dependency: file
is provided by write
, but read
has a dependency to write
!
Now let’s test what happens if we remove file
.
Modify the test in pie/tests/top_down.rs
:
When we remove file
and require read
, read
will check its task dependency to write
.
write
is inconsistent due to the file being removed (modified stamp becomes None
), so it will re-execute and re-generate the provided file!
This is another great benefit of the precise dynamic dependencies in programmatic builds: removing an intermediate or output file does not break the build.
Instead, the file is just re-generated as needed, and the build is brought into a consistent state again.
Similarly, modifying file
would result in the same behaviour: the provided file is re-generated and does not break the build.
Unfortunately, read
is re-executed because its file
dependency is inconsistent due to the changed modified date of file
.
If we implement a file contents hash stamper and use that as the stamper for file
, we can prevent this re-execution because the file contents is still the same.
This of course is not free, as hashing file contents has an I/O and processing (CPU) overhead.
In this case, read
is so simple that the overhead from a hash stamper would be larger than the gains of not executing read
.
But for expensive tasks with lots of I/O operations and/or processing, a file contents hash stamper makes a lot of sense.
As the last test, we will modify input_file
and confirm that changes to that file propagate to read
.
Modify the test in pie/tests/top_down.rs
:
Confirm the test succeeds with cargo test
.
We’ve now fixed the last file dependency inconsistency in the build system. Absence of overlapping provided files ensures that provided files always end up in a consistent state, even in a programmatic incremental setting. Absence of hidden dependencies ensures that when a task requires a file, that file is always in a consistent (up-to-date) state.
We needed to prevent these issues in order to make the builds correct under incrementality. However, we can also use these properties in alternative context implementations to reason about whether certain states can occur. For example, the actual PIE library contains a bottom-up incremental build context that instead performs incremental builds from the bottom up. There we can use the absence of overlapping provided files and hidden dependencies to reason that a bottom-up build can correctly skip checking tasks in certain situations, increasing incrementality. We do not (currently) cover bottom-up builds in this tutorial, but I found it important to highlight that these are fundamental properties.
A file or directory can be a symbolic link to another file or directory. In this tutorial we do not deal with symbol links at all, and this is a threat to correctness. For example, a task could circumvent a hidden dependency by creating a new symbolic link that links to the file it wants to read, where the linked-to file is provided by a task. An overlapping provided file can be made in a similar way.
Therefore, we should resolve symbolic links, right… right? Surely this should be easy.
One does not simply resolve a symbolic link in an incremental system. The problem is that creating a dependency to a symbolic link, is actually creating two dependencies:
- a dependency to the symbolic link file/directory, with the stamper working on the link,
- a dependency to the linked-to file/directory, with the stamper working on the linked-to file/directory.
But wait, there’s more. A symbolic link can point to a file with another symbolic link! Therefore, any file dependency could become many file dependencies. We have to recursively traverse the symbolic link tree.
What if I told you that there can even be cycles in symbolic links? In that case, creating a file dependency actually creates infinite file dependencies!
We chose not to deal with this in the tutorial for simplicity. In fact, I would almost refuse to support symbolic links, as they are the root of all evil from an incremental build systems’s perspective.
Performance Impact of Symbolic Links
Symbolic links can be a performance problem, because resolving a symbolic link requires a system call, and we need to resolve every path. We need to resolve every path because any path could point to a file or directory that again points to a different file or directory (and this can be recursive even!) Therefore, the presence of symbolic links turn simple and cheap path operations into complex and expensive system calls.
There is an even simpler way than symbolic links to circumvent our checks: just create different paths that point to the same file.
For example, in_out.txt
and ./in_out.txt
both point to the same file, but are different paths (i.e., comparing them with Eq
will return false
).
The issue is that we use non-canonical paths in the dependency graph, and thus also to check for overlapping provided files and hidden dependencies.
Instead, we should first canonicalize a path, converting relative paths to absolute ones, removing excess ..
.
parts, and more.
We could use Rust’s
canonicalize
function, but on Windows this returns paths that many tools do not support.
The dunce library can resolve this issue by canonicalizing to more compatible paths on Windows.
However, canonicalizing a path also resolves symbolic links. If we resolve symbolic links but do not create separate dependencies to link files and linked-to files, we are breaking incrementality and correctness there.
We have four options:
- Write our own path canonicalization function that does not resolve symbolic links. Document that a dependency to a symbolic link only results in a dependency to the symbolic link file, which breaks incrementality and correctness when the linked-to file changes.
- Write our own path canonicalization function that also correctly resolves symbolic links by creating dependencies to both link files and linked-to files, handle recursion, and handle cycles.
- Canonicalize the path. Document that a dependency to a symbolic link only results in a dependency to the pointed-to file, which breaks incrementality and correctness when the link changes.
- Don’t care about any of this.
In this tutorial, we go for option 4 for simplicity. Personally, I would choose for option 3 unless it is critical that symbolic links are handled in a correct way (then I’d have to choose option 2 and be grumpy).
There are many other ways to circumvent the hidden dependency check. A simple one is to just not create a dependency!
We cannot fully waterproof our system, just like you can circumvent Rust’s safety with unsafe
or by sharing mutable state via files.
That is fine.
We should at least try our best to catch accidents, such as accidentally using different non-canonical paths for the same file.
In the next section, we will fix the remaining correctness issue related to cyclic tasks.
Download source code
Prevent Cycles
In this section, we will fix the remaining correctness issue with cyclic tasks.
Didn’t we already catch dependency graph cycles in the Incremental Top-Down Context section? Yes, you remembered right! However, there is a corner case that we didn’t handle. The issue is that we add a task dependency to the dependency graph only after the task has finished executing. We do this because we need the output from executing the task to create the dependency.
But what would happen if we made a task that just requires itself? Let’s figure that out in this section, in which we will:
- Add cyclic tasks to the testing tasks.
- Create tests to showcase the cyclic task execution problem.
- Prevent cycles by reserving a task dependency before executing the task.
Add cyclic testing tasks
We don’t have any testing tasks to easily construct different kinds of cycles yet, so we will add those first.
Modify pie/tests/common/mod.rs
:
We add the RequireSelf
task which directly requires itself.
We also add the RequireA
and RequireB
tasks which require each other in a cycle.
We want to prevent both of these kinds of cycles.
Add cycle tests
Now add tests that check whether requiring these tasks (correctly) panics due to cycles.
Modify pie/tests/top_down.rs
:
// Cycle tests
#[test]
#[should_panic(expected = "Cyclic task dependency")]
fn require_self_panics() {
let mut pie = test_pie();
pie.require(&RequireSelf).unwrap();
}
#[test]
#[should_panic(expected = "Cyclic task dependency")]
fn require_cycle_a_panics() {
let mut pie = test_pie();
pie.require(&RequireA).unwrap();
}
#[test]
#[should_panic(expected = "Cyclic task dependency")]
fn require_cycle_b_panics() {
let mut pie = test_pie();
pie.require(&RequireB).unwrap();
}
These test are simple: require the task and that’s it. Which of these tests will correctly result in a cyclic task dependency panic?
Running these tests will result in infinite recursion, but should quickly cause a stack overflow. However, be sure to save everything in the event your computer becomes unresponsive.
Tests require_self_panics
, require_cycle_a_panics
, and require_cycle_b_panics
will fail as expected, which we will fix in this section!
Run the tests with cargo test
, or skip running them (and comment them out) if you don’t want to waste battery life running infinite recursions.
These tests will infinitely recurse and thus fail.
The issue is that we only add a dependency to the dependency graph after the task has executed. We do this because we need the output from the executing task to create the dependency. Therefore, no dependencies are ever added to the dependency graph in these tests, because a task never finishes executing! This in turn causes the cycle detection to never trigger, because there is no cycle in the dependency graph.
To fix this, we need to add task dependencies to the dependency graph before we execute the task. But we cannot do this, because we need the output of the task to create the task dependency. Therefore, we need to first reserve a task dependency in the dependency graph, which creates an edge in the dependency graph without any attached data.
Reserving task dependencies
To support reserved task dependencies, we will first add a ReservedRequireTask
variant to Dependency
.
Modify pie/src/dependency.rs
:
The ReservedRequireTask
variant has no data, as this variant needs to be creatable before we have the output of the task.
A reserved task dependency cannot be consistency checked, so we panic if this occurs, but this will never occur if our implementation is correct.
A reserved task dependency is added from the current executing task that is being made consistent, and we never check a task that is already consistent this session.
As long as the reserved task dependency is updated to a real RequireTask
dependency within this session, we will never check a reserved task dependency.
Again, it is great that we have defined these kind of properties/invariants of the build system, so we can informally reason about whether certain situations occur or not.
This change breaks the WritingTracker
, which we will update in pie/src/tracker/writing.rs
:
Since reserved task dependencies are never checked, we can just ignore them in the check_dependency_end
method.
Now we update the Store
with a method to reserve a task dependency, and a method to update a reserved task dependency to a real one.
Modify pie/src/store.rs
:
We rename add_task_require_dependency
to reserve_task_require_dependency
, change it to add Dependency::ReservedRequireTask
as edge dependency data, and remove the dependency
parameter since we don’t need that anymore.
Note that this method still creates the dependency edge, and can still create cycles which need to be handled.
This is good, because this allows us to catch cycles before we start checking and potentially executing a task.
For example, this will catch the self-cycle created by TestTask::RequireSelf
because graph.add_edge
returns a cycle error on a self-cycle.
We add the update_task_require_dependency
method to update a reserved task dependency to a real one.
As per usual, we will update the tests.
Modify pie/src/store.rs
:
We update test_dependencies
to reserve and update task dependencies, and rename test_add_task_require_dependency_panics
.
We add 2 tests for update_task_require_dependency
.
The store now handles reserved task dependencies.
Now we need to use them in TopDownContext
.
Task dependencies are made in require_file_with_stamper
, so we just need to update that method.
Modify pie/src/context/top_down.rs
:
Before we call make_task_consistent
and potentially execute a task, we first reserve the task dependency (if a task is currently executing).
Since reserve_task_require_dependency
now can make cycles, we move the cycle check to the start.
As mentioned before, this will catch self cycles.
Additionally, this will add a dependency edge to the graph that is needed to catch future cycles, such as the cycle between TestTask::RequireA
and TestTask::RequireB
.
For example, TestTask::RequireA
executes and requires TestTask::RequireB
, thus we reserve an edge from A to B.
Then, we require and execute B, which requires A, thus we reserve an edge from B to A, and the cycle is detected!
If the edge from A to B was not in the graph before executing B, we would not catch this cycle.
After the call to make_task_consistent
we have the consistent output of the task, and we update the reserved dependency to a real one with update_task_require_dependency
.
This will correctly detect all cyclic tasks.
Confirm your changes work and all tests now succeed with cargo test
.
Tests require_self_panics
, require_cycle_a_panics
, and require_cycle_b_panics
should now succeed.
We don’t need to write additional tests, as these 3 tests capture the kind of cycles we wanted to fix. Additional positive tests are not really needed, as the other tests cover the fact that cycles are only detected when there actually is one.
This is the last correctness issue that needed to be solved. Our programmatic incremental build system is now truly incremental (minimal) and correct (sound)! There are of course certain caveats, such as non-canonical paths and symbolic links which need to be solved for additional correctness. We will not do that in this tutorial, but feel free to solve those issues (and write tests for them!).
Download source code
In the next chapter, we will implement a “parser development” application using PIE, which can do batch builds but also provides an interactive parser development environment, using a single set of tasks.
Project: Interactive Parser Development
To demonstrate what can be done with the programmatic incremental build system we just created, we will develop a “parser development” build script and interactive editor as a project. In this project, we can develop a grammar for a new (programming) language, and test that grammar against several example files written in the new language.
It will have both a batch mode and an interactive mode. In the batch mode, the grammar is checked and compiled, the example program files are parsed with the grammar, and the results are printed to the terminal. The interactive mode will start up an interactive editor in which we can develop and test the grammar interactively. We will develop tasks to perform grammar compilation and parsing, and incrementally execute them with PIE. Both batch and interactive mode will use the same tasks!
We will use pest as the parser framework, because it is written in Rust and can be easily embedded into an application. Pest uses Parsing Expression Grammars (PEGs) which are easy to understand, which is also good for this project.
For the GUI, we will use Ratatui, which is a cross-platform terminal GUI framework, along with tui-textarea for a text editor widget. We could use a more featured GUI framework like egui, but for this project we’ll keep it simple and runnable in a terminal.
As a little teaser, this is what the interactive mode looks like:
We will continue as follows:
- Implement compilation of pest grammars and parsing of text with the compiled grammar.
- Create tasks for grammar compilation and parsing.
- Parse CLI arguments and run these tasks in a non-interactive setting.
- Create a terminal GUI for interactive parser development.
Compiling Grammars and Parsing
First we will implement compilation of pest grammars, and parsing text with a compiled grammar.
A pest grammar contains named rules that describe how to parse something.
For example, number = { ASCII_DIGIT+ }
means that a number
is parsed by parsing 1 or more ASCII_DIGIT
, with ASCII_DIGIT
being a builtin rule that parses ASCII numbers 0-9.
Add the following dev-dependencies to pie/Cargo.toml
:
- pest is the library for parsing with pest grammars.
- pest_meta validates, optimises, and compiles pest grammars.
- pest_vm provides parsing with a compiled pest grammar, without having to generate Rust code for grammars, enabling interactive use.
Create the pie/examples/parser_dev/main.rs
file and add an empty main function to it:
fn main() {
}
Confirm the example can be run with cargo run --example parser_dev
.
Let’s implement the pest grammar compiler and parser.
Add parse
as a public module to pie/examples/parser_dev/main.rs
:
We will add larger chunks of code from now on, compared to the rest of the tutorial, to keep things going.
Create the pie/examples/parser_dev/parse.rs
file and add to it:
use std::collections::HashSet;
use std::fmt::Write;
/// Parse programs with a compiled pest grammar.
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct CompiledGrammar {
rules: Vec<pest_meta::optimizer::OptimizedRule>,
rule_names: HashSet<String>,
}
impl CompiledGrammar {
/// Compile the pest grammar from `grammar_text`, using `path` to annotate errors. Returns a [`Self`] instance.
///
/// # Errors
///
/// Returns `Err(error_string)` when compiling the grammar fails.
pub fn new(grammar_text: &str, path: Option<&str>) -> Result<Self, String> {
match pest_meta::parse_and_optimize(grammar_text) {
Ok((builtin_rules, rules)) => {
let mut rule_names = HashSet::with_capacity(builtin_rules.len() + rules.len());
rule_names.extend(builtin_rules.iter().map(|s| s.to_string()));
rule_names.extend(rules.iter().map(|s| s.name.clone()));
Ok(Self { rules, rule_names })
},
Err(errors) => {
let mut error_string = String::new();
for mut error in errors {
if let Some(path) = path.as_ref() {
error = error.with_path(path);
}
error = error.renamed_rules(pest_meta::parser::rename_meta_rule);
let _ = writeln!(error_string, "{}", error); // Ignore error: writing to String cannot fail.
}
Err(error_string)
}
}
}
}
The CompiledGrammar
struct contains a parsed pest grammar, consisting of a Vec
of optimised parsing rules, and a hash set of rule names.
We will use this struct as an output of a task in the future, so we derive Clone
, Eq
, and Debug
.
The new
function takes text of a pest grammar, and an optional file path for error reporting, and creates a CompilerGrammar
or an error in the form of a String
.
We’re using String
s as errors in this example for simplicity.
We compile the grammar with pest_meta::parse_and_optimize
.
If successful, we gather the rule names into a hash set and return a CompiledGrammar
.
If not, multiple errors are returned, which are first preprocessed with with_path
and renamed_rules
, and then written to a single String with writeln!
, which is returned as the error.
Now we implement parsing using a CompiledGrammar
.
Add the parse
method to pie/examples/parser_dev/parse.rs
:
parse
takes the text of the program to parse, the rule name to start parsing with, and an optional file path for error reporting.
We first check whether rule_name
exists by looking for it in self.rule_names
, and return an error if it does not exist.
We have to do this because pest_vm
panics when the rule name does not exist, which would kill the entire program.
If the rule name is valid, we create a pest_vm::Vm
and parse
.
If successful, we get a pairs
iterator that describes how the program was parsed, which are typically used to create an Abstract Syntax Tree (AST) in Rust code.
However, for simplicity we just format the pairs as a String
and return that.
If not successful, we do the same as the previous function, but instead for 1 error instead of multiple.
Unfortunately we cannot store pest_vm::Vm
in CompiledGrammar
, because Vm
does not implement Clone
nor Eq
.
Therefore, we have to create a new Vm
every time we parse, which has a small performance overhead, but that is fine for this example.
To check whether this code does what we want, we’ll write a test for it (yes, you can add tests to examples in Rust!).
Add to pie/examples/parser_dev/parse.rs
:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_compile_parse() -> Result<(), String> {
// Grammar compilation failure.
let result = CompiledGrammar::new("asd = { fgh } qwe = { rty }", None);
assert!(result.is_err());
println!("{}", result.unwrap_err());
// Grammar that parses numbers.
let compiled_grammar = CompiledGrammar::new("num = { ASCII_DIGIT+ }", None)?;
println!("{:?}", compiled_grammar);
// Parse failure
let result = compiled_grammar.parse("a", "num", None);
assert!(result.is_err());
println!("{}", result.unwrap_err());
// Parse failure due to non-existent rule.
let result = compiled_grammar.parse("1", "asd", None);
assert!(result.is_err());
println!("{}", result.unwrap_err());
// Parse success
let result = compiled_grammar.parse("1", "num", None);
assert!(result.is_ok());
println!("{}", result.unwrap());
Ok(())
}
}
We test grammar compilation failure and success, and parse failure and success.
Run this test with cargo test --example parser_dev -- --show-output
, which also shows what the returned String
s look like.
Task Implementation
Now we’ll implement tasks for compiling a grammar and parsing.
Add task
as a public module to pie/examples/parser_dev/main.rs
:
Create the pie/examples/parser_dev/task.rs
file and add to it:
use std::io::Read;
use std::path::{Path, PathBuf};
use pie::{Context, Task};
use crate::parse::CompiledGrammar;
/// Tasks for compiling a grammar and parsing files with it.
#[derive(Clone, Eq, PartialEq, Hash, Debug)]
pub enum Tasks {
CompileGrammar { grammar_file_path: PathBuf },
Parse { compiled_grammar_task: Box<Tasks>, program_file_path: PathBuf, rule_name: String }
}
impl Tasks {
/// Create a [`Self::CompileGrammar`] task that compiles the grammar in file `grammar_file_path`.
pub fn compile_grammar(grammar_file_path: impl Into<PathBuf>) -> Self {
Self::CompileGrammar { grammar_file_path: grammar_file_path.into() }
}
/// Create a [`Self::Parse`] task that uses the compiled grammar returned by requiring `compiled_grammar_task` to
/// parse the program in file `program_file_path`, starting parsing with `rule_name`.
pub fn parse(
compiled_grammar_task: &Tasks,
program_file_path: impl Into<PathBuf>,
rule_name: impl Into<String>
) -> Self {
Self::Parse {
compiled_grammar_task: Box::new(compiled_grammar_task.clone()),
program_file_path: program_file_path.into(),
rule_name: rule_name.into()
}
}
}
/// Outputs for [`Tasks`].
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum Outputs {
CompiledGrammar(CompiledGrammar),
Parsed(Option<String>)
}
We create a Tasks
enum with:
- A
CompileGrammar
variant for compiling a grammar from a file. - A
Parse
variant that uses the compiled grammar returned from another task to parse a program in a file, starting parsing with a specific rule given by name.
compile_grammar
and parse
are convenience functions for creating these variants.
We derive Clone
, Eq
, Hash
and Debug
as these are required for tasks.
We create an Outputs
enum for storing the results of these tasks, and derive the required traits.
Since both tasks will require a file, and we’re using String
s as errors, we will implement a convenience function for this.
Add to pie/examples/parser_dev/task.rs
:
fn require_file_to_string<C: Context<Tasks>>(context: &mut C, path: impl AsRef<Path>) -> Result<String, String> {
let path = path.as_ref();
let mut file = context.require_file(path)
.map_err(|e| format!("Opening file '{}' for reading failed: {}", path.display(), e))?
.ok_or_else(|| format!("File '{}' does not exist", path.display()))?;
let mut text = String::new();
file.read_to_string(&mut text)
.map_err(|e| format!("Reading file '{}' failed: {}", path.display(), e))?;
Ok(text)
}
require_file_to_string
is like context.require_file
, but converts all errors to String
.
Now we implement Task
for Tasks
.
Add to pie/examples/parser_dev/task.rs
:
impl Task for Tasks {
type Output = Result<Outputs, String>;
fn execute<C: Context<Self>>(&self, context: &mut C) -> Self::Output {
match self {
Tasks::CompileGrammar { grammar_file_path } => {
let grammar_text = require_file_to_string(context, grammar_file_path)?;
let compiled_grammar = CompiledGrammar::new(&grammar_text, Some(grammar_file_path.to_string_lossy().as_ref()))?;
Ok(Outputs::CompiledGrammar(compiled_grammar))
}
Tasks::Parse { compiled_grammar_task, program_file_path, rule_name } => {
let Ok(Outputs::CompiledGrammar(compiled_grammar)) = context.require_task(compiled_grammar_task.as_ref()) else {
// Return `None` if compiling grammar failed. Don't propagate the error, otherwise the error would be
// duplicated for all `Parse` tasks.
return Ok(Outputs::Parsed(None));
};
let program_text = require_file_to_string(context, program_file_path)?;
let output = compiled_grammar.parse(&program_text, rule_name, Some(program_file_path.to_string_lossy().as_ref()))?;
Ok(Outputs::Parsed(Some(output)))
}
}
}
}
The output is Result<Outputs, String>
: either an Outputs
if the task succeeds, or a String
if not.
In execute
we match our variant and either compile a grammar or parse, which are mostly straightforward.
In the Parse
variant, we require the compile grammar task, but don’t propagate its errors and instead return Ok(Outputs::Parsed(None))
.
We do this to prevent duplicate errors.
If we propagated the error, the grammar compilation error would be duplicated into every parse task.
Confirm the code compiles with cargo build --example parser_dev
.
We won’t test this code as we’ll use these tasks in the main
function next.
CLI for Incremental Batch Builds
We have tasks for compiling grammars and parsing files, but we need to pass file paths and a rule name into these tasks.
We will pass this data to the program via command-line arguments.
To parse command-line arguments, we will use clap, which is an awesome library for easily parsing command-line arguments.
Add clap as a dependency to pie/Cargo.toml
:
We’re using the derive
feature of clap to automatically derive a full-featured argument parser from a struct.
Modify pie/examples/parser_dev/main.rs
:
The Args
struct contains exactly the data we need: the path to the grammar file, the name of the rule to start parsing with, and paths to program files to parse.
We derive an argument parser for Args
with #[derive(Parser)]
.
Then we parse command-line arguments in main
with Args::parse()
.
Test this program with cargo run --example parser_dev -- --help
, which should result in usage help for the program.
Note that the names, ordering, and doc-comments of the fields are used to generate this help.
You can test out several more commands:
cargo run --example parser_dev --
cargo run --example parser_dev -- foo
cargo run --example parser_dev -- foo bar
cargo run --example parser_dev -- foo bar baz qux
Now let’s use these arguments to actually compile the grammar and parse example program files.
Modify pie/examples/parser_dev/main.rs
:
In compile_grammar_and_parse
, we create a new Pie
instance that writes the build log to stderr, and create a new build session.
Then, we require a compile grammar task using the grammar_file_path
from Args
, and write any errors to the errors
String
.
We then require a parse task for every path in args.program_file_paths
, using the previously created compile_grammar_task
and args.rule_name
.
Successes are printed to stdout and errors are written to errors
.
Finally, we print errors
to stdout if there are any.
To test this out, we need a grammar and some test files. Create grammar.pest
:
num = @{ ASCII_DIGIT+ }
main = { SOI ~ num ~ EOI }
WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
You don’t need to fully understand pest grammars to finish this example. However, I will explain the basics of this grammar here. Feel free to learn and experiment more if you are interested.
Grammars are lists of rules, such as num
and main
.
This grammar parses numbers with the num
rule, matching 1 or more ASCII_DIGIT
with repetition.
The main
rule ensures that there is no additional text before and after a num
rule, using
SOI
(start of input) EOI
(end of input), and using the
~
operator to sequence these rules.
We set the
WHITESPACE
builtin rule to { " " | "\t" | "\n" | "\r" }
so that spaces, tabs, newlines, and carriage return characters are implicitly allowed between sequenced rules.
The @
operator before {
indicates that it is an atomic rule, disallowing implicit whitespace.
We want this on the num
rule so that we can’t add spaces in between digits of a number (try removing it and see!)
The _
operator before {
indicates that it is a silent rule that does not contribute to the parse result.
This is important when processing the parse result into an Abstract Syntax Tree (AST).
In this example we just print the parse result, so silent rules are not really needed, but I included it for completeness.
Create test_1.txt
with:
42
And create test_2.txt
with:
foo
Run the program with cargo run --example parser_dev -- grammar.pest main test_1.txt test_2.txt
.
This should result in a build log showing that the grammar is successfully compiled, that one file is successfully parsed, and that one file has a parse error.
Unfortunately, there is no incrementality between different runs of the example, because the Pie
Store
is not persisted.
The Store
only exists in-memory while the program is running, and is then thrown away.
Thus, there cannot be any incrementality.
To get incrementality, we need to serialize the Store
before the program exits, and deserialize it when the program starts.
This is possible and not actually that hard, I just never got around to explaining it in this tutorial.
See the Side Note: Serialization section at the end for info on how this can be implemented.
If you are using a bash-like shell on a UNIX-like OS, you can hide the build log by redirecting stderr to /dev/null
with: cargo run --example parser_dev -- grammar.pest main test_1.txt test_2.txt 2>/dev/null
.
Otherwise, you can hide the build log by replacing WritingTracker::with_stderr()
with NoopTracker
.
Feel free to experiment a bit with the grammar, example files, etc. before continuing. We will develop an interactive editor next however, which will make experimentation easier!
Interactive Parser Development
Now we’ll create an interactive version of this grammar compilation and parsing pipeline, using Ratatui to create a terminal GUI.
Since we need to edit text files, we’ll use tui-textarea, which is a text editor widget for Ratatui.
Ratatui works with multiple backends, with crossterm being the default backend since it is cross-platform.
Add these libraries as a dependency to pie/Cargo.toml
:
We continue as follows:
- Set up the scaffolding for a Ratatui application.
- Create a text editor
Buffer
using tui-textarea to edit the grammar and example program files. - Draw and update those text editor
Buffer
s, and keep track of the active buffer. - Save
Buffer
s back to files and run theCompileGrammar
andParse
tasks to provide feedback on the grammar and example programs. - Show the build log in the application.
Ratatui Scaffolding
We will put the editor in a separate module, and start out with the basic scaffolding of a Ratatui “Hello World” application.
Add editor
as a public module to pie/examples/parser_dev/main.rs
:
Create the pie/examples/parser_dev/editor.rs
file and add the following to it:
use std::io;
use crossterm::event::{DisableMouseCapture, EnableMouseCapture, Event, KeyCode, KeyEventKind};
use crossterm::terminal::{disable_raw_mode, enable_raw_mode, EnterAlternateScreen, LeaveAlternateScreen};
use ratatui::backend::{Backend, CrosstermBackend};
use ratatui::Terminal;
use ratatui::widgets::Paragraph;
use crate::Args;
/// Live parser development editor.
pub struct Editor {}
impl Editor {
/// Create a new editor from `args`.
pub fn new(_args: Args) -> Result<Self, io::Error> {
Ok(Self {})
}
/// Run the editor, drawing it into an alternate screen of the terminal.
pub fn run(&mut self) -> Result<(), io::Error> {
// Setup terminal for GUI rendering.
enable_raw_mode()?;
let mut backend = CrosstermBackend::new(io::stdout());
crossterm::execute!(backend, EnterAlternateScreen, EnableMouseCapture)?;
let mut terminal = Terminal::new(backend)?;
terminal.clear()?;
// Draw and process events in a loop until a quit is requested or an error occurs.
let result = loop {
match self.draw_and_process_event(&mut terminal) {
Ok(false) => break Ok(()), // Quit requested
Err(e) => break Err(e), // Error
_ => {},
}
};
// First undo our changes to the terminal.
disable_raw_mode()?;
crossterm::execute!(terminal.backend_mut(), LeaveAlternateScreen, DisableMouseCapture)?;
terminal.show_cursor()?;
// Then present the result to the user.
result
}
fn draw_and_process_event<B: Backend>(&mut self, terminal: &mut Terminal<B>) -> Result<bool, io::Error> {
terminal.draw(|frame| {
frame.render_widget(Paragraph::new("Hello World! Press Esc to exit."), frame.size());
})?;
match crossterm::event::read()? {
Event::Key(key) if key.kind == KeyEventKind::Release => return Ok(true), // Skip releases.
Event::Key(key) if key.code == KeyCode::Esc => return Ok(false),
_ => {}
};
Ok(true)
}
}
The Editor
struct will hold the state of the editor application, which is currently empty, but we’ll add fields to it later.
Likewise, the new
function doesn’t do a lot right now, but it is scaffolding for when we add state.
It returns a Result
because it can fail in the future.
The run
method sets up the terminal for GUI rendering, draws the GUI and processes events in a loop until stopped, and then undoes our changes to the terminal.
It is set up in such a way that undoing our changes to the terminal happens regardless if there is an error or not (although panics would still skip that code and leave the terminal in a bad state).
This is a standard program loop for Ratatui.
Rust Help: Returning From Loops
The draw_and_process_event
method first draws the GUI, currently just a hello world message, and then processes events such as key presses.
Currently, this skips key releases because we are only interested in presses, and returns Ok(false)
if escape is pressed, causing the loop
to be break
ed out.
Now we need to go back to our command-line argument parsing and add a flag indicating that we want to start up an interactive editor.
Modify pie/examples/parser_dev/main.rs
:
We add a new Cli
struct with an edit
field that is settable by a short (-e
) or long (--edit
) flag, and flatten Args
into it.
Using this new Cli
struct here keeps Args
clean, since the existing code does not need to know about the edit
flag.
Instead of using a flag, you could also define a separate command for editing.
In main
, we parse Cli
instead, check whether cli.edit
is set, and create and run the editor if it is.
Otherwise, we do a batch build.
Try out the code with cargo run --example parser_dev -- test.pest main test_1.test test_2.test -e
in a terminal, which should open up a separate screen with a hello world text.
Press escape to exit out of the application.
If the program ever panics, your terminal will be left in a bad state. In that case, you’ll have to reset your terminal back to a good state, or restart your terminal.
Text Editor Buffer
The goal of this application is to develop a grammar alongside example programs of that grammar, getting feedback whether the grammar is correct, but also getting feedback whether the example programs can be parsed with the grammar.
Therefore, we will need to draw multiple text editors along with space for feedback, and be able to swap between active editors.
This will be the responsibility of the Buffer
struct which we will create in a separate module.
Add the buffer
module to pie/examples/parser_dev/editor.rs
:
Then create the pie/examples/parser_dev/editor/buffer.rs
file and add to it:
#![allow(dead_code)]
use std::fs::{File, read_to_string};
use std::io::{self, Write};
use std::path::PathBuf;
use crossterm::event::Event;
use ratatui::Frame;
use ratatui::layout::{Constraint, Direction, Layout, Rect};
use ratatui::style::{Color, Modifier, Style};
use ratatui::widgets::{Block, Borders, Paragraph, Wrap};
use tui_textarea::TextArea;
/// Editable text buffer for a file.
pub struct Buffer {
path: PathBuf,
editor: TextArea<'static>,
feedback: String,
modified: bool,
}
impl Buffer {
/// Create a new [`Buffer`] for file at `path`.
///
/// # Errors
///
/// Returns an error when reading file at `path` fails.
pub fn new(path: PathBuf) -> Result<Self, io::Error> {
let text = read_to_string(&path)?;
let mut editor = TextArea::from(text.lines());
// Enable line numbers. Default style = no additional styling (inherit).
editor.set_line_number_style(Style::default());
Ok(Self { path, editor, feedback: String::default(), modified: false })
}
/// Draws this buffer with `frame` into `area`, highlighting it if it is `active`.
pub fn draw(&mut self, frame: &mut Frame, area: Rect, active: bool) {
// Determine and set styles based on whether this buffer is active. Default style = no additional styling (inherit).
let mut cursor_line_style = Style::default();
let mut cursor_style = Style::default();
let mut block_style = Style::default();
if active { // Highlight active editor.
cursor_line_style = cursor_line_style.add_modifier(Modifier::UNDERLINED);
cursor_style = cursor_style.add_modifier(Modifier::REVERSED);
block_style = block_style.fg(Color::Gray);
}
self.editor.set_cursor_line_style(cursor_line_style);
self.editor.set_cursor_style(cursor_style);
// Create and set the block for the text editor, bordering it and providing a title.
let mut block = Block::default().borders(Borders::ALL).style(block_style);
if let Some(file_name) = self.path.file_name() { // Add file name as title.
block = block.title(format!("{}", file_name.to_string_lossy()))
}
if self.modified { // Add modified to title.
block = block.title("[modified]");
}
self.editor.set_block(block);
// Split area up into a text editor (80% of available space), and feedback text (minimum of 7 lines).
let areas = Layout::default()
.direction(Direction::Vertical)
.constraints(vec![Constraint::Percentage(80), Constraint::Min(7)])
.split(area);
// Render text editor into first area (`areas[0]`).
frame.render_widget(self.editor.widget(), areas[0]);
// Render feedback text into second area (`areas[1]`).
let feedback = Paragraph::new(self.feedback.clone())
.wrap(Wrap::default())
.block(Block::default().style(block_style).borders(Borders::ALL - Borders::TOP));
frame.render_widget(feedback, areas[1]);
}
/// Process `event`, updating whether this buffer is modified.
pub fn process_event(&mut self, event: Event) {
self.modified |= self.editor.input(event);
}
/// Save this buffer to its file if it is modified. Does nothing if not modified. Sets as unmodified when successful.
///
/// # Errors
///
/// Returns an error if writing buffer text to the file fails.
pub fn save_if_modified(&mut self) -> Result<(), io::Error> {
if !self.modified {
return Ok(());
}
let mut file = io::BufWriter::new(File::create(&self.path)?);
for line in self.editor.lines() {
file.write_all(line.as_bytes())?;
file.write_all(b"\n")?;
}
file.flush()?;
self.modified = false;
Ok(())
}
/// Gets the file path of this buffer.
pub fn path(&self) -> &PathBuf { &self.path }
/// Gets the mutable feedback text of this buffer.
pub fn feedback_mut(&mut self) -> &mut String { &mut self.feedback }
}
A Buffer
is a text editor for a text file at a certain path
.
It keeps track of a text editor with TextArea<'static>
, feedback
text, and whether the text was modified
in relation to the file.
new
creates a Buffer
and is fallible due to reading a file.
The draw
method draws/renders the buffer (using the Ratatui frame
) into area
, with active
signifying that this buffer is active and should be highlighted differently.
The first part sets the style of the editor, mainly highlighting an active editor by using Color::Gray
as the block style.
Default styles indicate that no additional styling is done, basically inheriting the style from a parent widget (i.e., a block), or using the style from your terminal.
The second part creates a block that renders a border around the text editor and renders a title on the upper border.
The third part splits up the available space into space for the text editor (80%), and space for the feedback text (at least 7 lines), and renders the text editor and feedback text into those spaces.
The layout can of course be tweaked, but it works for this example.
process_event
lets the text editor process input events, and updates whether the text has been modified.
save_if_modified
saves the text to file, but only if modified.
path
gets the file path of the buffer.
feedback_mut
returns a mutable borrow to the feedback text, enabling modification of the feedback text.
It is up to the user of Buffer
to keep track of the active buffer, sending active: true
to the draw
method of that buffer, and calling process_event
on the active buffer.
That’s exactly what we’re going to implement next.
Drawing and Updating Buffer
s
We’ll create Buffers
in Editor
and keep track of the active buffer.
To keep this example simple, we’ll create buffers only for the grammar file and example program files given as command-line arguments.
If you want more or less example files, you’ll have to exit the application, add those example files to the command-line arguments, and then start the application again.
Modify pie/examples/parser_dev/editor.rs
:
Editor
now has a list of buffers
via Vec<Buffer>
and keeps track of the active tracker via active_buffer
which is an index into buffers
.
In new
, we create buffers based on the grammar and program file paths in args
.
The buffers Vec
is created in such a way that the first buffer is always the grammar buffer, with the rest being example program buffers.
The grammar buffer always exists because args.grammar_file_path
is mandatory, but there can be 0 or more example program buffers.
draw_and_process_event
now splits up the available space.
First vertically: as much space as possible is reserved for buffers, with at least 1 line being reserved for a help line at the bottom.
Then horizontally: half of the horizontal space is reserved for a grammar buffer, and the other half for program buffers.
The vertical space for program buffers (program_buffer_areas
) is further divided: evenly split between all program buffers.
Then, the buffers are drawn in the corresponding spaces with active
only being true
if we are drawing the active buffer, based on the active_buffer
index.
In the event processing code, we match the Control+T shortcut and increase the active_buffer
index.
We wrap back to 0 when the active_buffer
index would overflow, using a modulo (%) operator, ensuring that active_buffer
is always a correct index into the buffers
Vec
.
Finally, if none of the other shortcuts match, we send the event to the active buffer.
Try out the code again with cargo run --example parser_dev -- test.pest main test_1.test test_2.test -e
in a terminal.
This should open up the application with a grammar buffer on the left, and two program buffers on the right.
Use Control+T to swap between buffers, and escape to exit.
Saving Buffer
s and Providing Feedback
Next up is saving the buffers, running the compile grammar and parse tasks, and show feedback from those tasks in the feedback space of buffers.
Modify pie/examples/parser_dev/editor.rs
:
The biggest addition as at the bottom: the save_and_update_buffers
method.
This method first clears the feedback text for all buffers, and saves all buffers (if save
is true
).
Then we create a new PIE session and require the compile grammar task and parse tasks, similar to compile_grammar_and_parse
in the main file.
Here we instead writeln!
the results to the feedback text of buffers.
We store the rule_name
in Editor
as that is needed to create parse tasks, and store a Pie
instance so that we can create new PIE sessions to require tasks.
When the Control+S shortcut is pressed, we call save_and_update_buffers
with save
set to true
.
We also call save_and_update_buffers
in Editor::new
to provide feedback when the application starts out, but with save
set to false, so we don’t immediately save all files.
Finally, we update the help line to include the Control+S shortcut.
Try out the code again with cargo run --example parser_dev -- test.pest main test_1.test test_2.test -e
in a terminal.
Now you should be able to make changes to the grammar and/or example programs, press Control+S to save modified files, and get feedback on grammar compilation and parsing example programs.
If you like, you can go through the pest parser book and experiment with/develop a parser.
Showing the Build Log
We’ll add one more feature to the editor: showing the build log.
We can do this by writing the build log to an in-memory text buffer, and by drawing that text buffer.
Modify pie/examples/parser_dev/editor.rs
:
In new
we now create the Pie
instance with a writing tracker: WritingTracker::new(Cursor::new(Vec::new()))
.
This writing tracker writes to a
Cursor
, specifically Cursor<Vec<u8>>
for which
Write
is implemented.
We modify the type of the pie
field to include the tracker type to reflect this: WritingTracker<Cursor<Vec<u8>>>
.
Build logs will then be written to the Vec<u8>
inside the Cursor
.
To draw the build log in between the buffers and help line, we first modify the layout split into root_areas
: buffers now take up 70% of vertical space, and add a new constraint for the build log which takes 30% of vertical space.
We access the in-memory buffer via &self.pie.tracker().writer().get_ref()
, convert this to a string via
String::from_utf8_lossy
, and convert that to Ratatui Text
which can be passed to
Paragraph::new
and also gives us line information for scrolling the build log.
The scroll calculation is explained in the comments.
We then draw the build log as a Paragraph
.
Finally, we update the area for the help line from root_areas[1]
to root_areas[2]
, as adding the layout constraint shifted the index up.
Try out the code again with cargo run --example parser_dev -- test.pest main test_1.test test_2.test -e
in a terminal.
Pressing Control+S causes tasks to be required, which is shown in the build log.
Try modifying a single file to see what tasks PIE executes, or what the effect of an error in the grammar has.
And with that, we’re done with the interactive parser development example 🎉🎉🎉!
Conclusion
In this example, we developed tasks for compiling a grammar and parsing files with that grammar, and then used those tasks to implement both a batch build, and an interactive parser development environment.
In the introduction, we motivated programmatic incremental build systems with the key properties of: programmatic, incremental, correct, automatic, and multipurpose. Did these properties help with the implementation of this example application?
- Programmatic: due to the build script – that is: the compile grammar and parse tasks – being written in the same programming language as the application, it was extremely simple to integrate. We also didn’t have to learn a separate language, we could just apply our knowledge of Rust!
- Incremental: PIE incrementalized the build for us, so we didn’t have to implement incrementality. This saves a lot of development effort as implemented incrementality is complicated.
- The batch build is unfortunately not incremental due to not having implemented serialization in this tutorial, but this is not a fundamental limitation. See Side Note: Serialization for info on how to solve this.
- Correct: PIE ensures the build is correct, so we don’t have to worry about glitches or inconsistent data, again saving development effort that would otherwise be spent on ensuring incrementality is correct.
- For a real application, we should write tests to increase the confidence that our build is correct, because PIE checks for correctness at runtime.
- Automatic: we didn’t manually implement incrementality, but only specified the dependencies: from compile grammar/parse task to a file, and from parse tasks to compile grammar tasks.
- Multipurpose: we reused the same tasks for both a batch build and for use in an interactive environment, without any modifications. Again, this saves development time.
So yes, I think that programmatic incremental build systems – and in particular PIE – help a lot when developing applications that require incremental batch builds or interactive pipelines, and especially when both are required. The main benefit is reduced development effort, due to not having to solve the problem of correct incrementality, due to easy integration, and due to only needing to know and use a single programming language.
Larger applications with more features and complications that need incrementality would require an even bigger implementation effort. Therefore, larger applications could benefit even more from using PIE. Of course, you cannot really extrapolate that from this small example. However, I have applied PIE to a larger application: the Spoofax Language Workbench, and found similar benefits. More info on this can be found in the appendix.
You should of course decide for yourself whether a programmatic incremental build system really helped with implementing this example. Every problem is different, and requires separate consideration as to what tools best solve a particular problem.
This is currently the end of the guided programming tutorial. In the appendix chapters, we discuss PIE implementations and publications, related work, and future work.
Download source code
Side Note: Serialization
To get incrementality between different runs (i.e., processes) of the program, we need to serialize the Store
before the program exits, and deserialize the Store
when the program starts.
The de-facto standard (and awesome) serialization library in Rust in serde.
See the PIE in Rust repository at the pre_type_refactor
tag for a version of PIE with serde serialization.
For example, the
Store
struct has annotations for deriving serde::Deserialize
and serde::Serialize
.
These attributes are somewhat convoluted due to serialization being optional, and due to the H
generic type parameter which should not be included into serialization bounds.
You should derive serde::Deserialize
and serde::Serialize
for all required types in the PIE library, but also all tasks, and all task outputs.
The pie_graph
library support serialization when the serde
feature is enabled, which is enabled by default.
Then, see this serialization integration test.
PIE Implementations & Publications
Implementations
In this tutorial, you implemented a large part of the PIE, the programmatic incremental build system that I developed during my PhD and Postdoc. There are currently two versions of PIE:
-
PIE in Rust, a superset of what you have been developing in this tutorial.
- The largest differences between PIE in this tutorial and the PIE library are:
- Support for arbitrary task and resource types, achieved by using trait objects to provide dynamic dispatch.
- Resource abstraction enables resources other than files. Resources are global mutable state where the state is not handled by the PIE library (as opposed to task inputs and outputs), but read and write access to that state is handled by PIE. Files (as
PathBuf
) are a resource, but so is a hashmap. - Terminology differences. The PIE library uses read and write for resource dependencies instead of require and provide. This allows us to use require only for tasks, and read and write only for resources. It uses checkers instead of stampers.
- The motivation for developing a PIE library in Rust was to test whether the idea of a programmatic incremental build system really is programming-language agnostic, as a target for developing this tutorial, and to get a higher-performance implementation compared to the Java implementation of PIE.
- In my opinion, implementing PIE in Rust as part of this tutorial is a much nicer experience than implementing it in Java, due to the more powerful type system and great tooling provided by Cargo. However, supporting multiple task types, which we didn’t do in this tutorial, is a bit of a pain due to requiring trait objects, which can be really complicated to work with in certain cases. In Java, everything is a like a trait object, and you get many of these things for free, at the cost of garbage collection and performance of course.
- The largest differences between PIE in this tutorial and the PIE library are:
-
PIE in Java. The motivation for using Java was so that we could use PIE to correctly incrementalize the Spoofax Language Workbench, a set of tools and interactive development environment (IDE) for developing programming languages. In Spoofax, you develop a programming language by defining the aspects of your language in domain-specific meta-languages, such as SDF3 for syntax definition, and Statix for type system and name binding definitions.
Applying PIE to Spoofax culminated in Spoofax 3 (sometimes also called Spoofax-PIE), a new version of Spoofax that uses PIE for all tasks such as generating parsers, running parsers on files to create ASTs, running highlighters on those ASTs to provide syntax highlighting for code editors, etc. Because all tasks are PIE tasks, we can do correct and incremental batch builds of language definitions, but also live development of those language definitions in an IDE, using PIE to get feedback such as inline errors and syntax highlighting as fast as possible.
Publications
We wrote two papers about programmatic incremental build systems and PIE, for which updated versions are in my dissertation:
-
Chapter 7, page 83: PIE: A Domain-Specific Language for Interactive Software Development Pipelines.
This describes a domain-specific language (DSL) for programmatic incremental build systems, and introduces the PIE library in Kotlin. This implementation was later changed to a pure Java library to reduce the number of dependencies.
-
Chapter 8, page 109: Scalable Incremental Building with Dynamic Task Dependencies.
This describes a hybrid incremental build algorithm that builds from the bottom-up, only switching to top-down building when necessary. Bottom-up builds are more efficient with changes that have a small effect (i.e., most changes), due to only checking the part of the dependency graph affected by changes. Therefore, this algorithm scales down to small changes while scaling up to large dependency graphs.
Unfortunately, we did not implement (hybrid) bottom-up building in this tutorial due to a lack of time. However, the PIE in Rust library has a bottom-up context implementation which you can check out. Due to similarities between the top-down and bottom-up context, some common functionality was extracted into an extension trait.
We published a summary/abstract paper as well:
Two master students graduated on extensions to PIE:
-
Roelof Sol: Task Observability in Change Driven Incremental Build Systems with Dynamic Dependencies. A problem with bottom-up builds is that tasks stay in the dependency graph forever, even if they are no longer needed. Even though those tasks are not executed (because they are not needed), they do need to be checked and increase the size of the dependency graph which in turn has overhead for several graph operations. To solve that problem, we introduce task observability.
A task is observable if and only if it is explicitly observed by the user of the build system through directly requiring (
Session::require
) the task, or if it is implicitly observed by a require task dependency from another task. Otherwise, the task is unobserved. The build system updates the observability status of tasks while the build is executing.Unobserved tasks are never checked, removing the checking overhead. Unobserved tasks can be removed from the dependency graph in a “garbage collection” pass, removing graph operation overhead. Removing unobserved tasks is flexible: during the garbage collection pass you can decide to keep a task in the dependency graph if you think it will become observed again, to keep its cached output. You can also remove the provided (intermediate or output) files of an unobserved task to clean up disk space, which is correct due to the absence of hidden dependencies!
Currently, observability is implemented in the Java implementation of PIE, but not yet in the Rust implementation of PIE.
-
Ivo Wilms: Extending the DSL for PIE. This improves and solves many problems in the original PIE DSL implementation. It introduces a module system, compatibility with dependency injection, and generics with subtyping into the DSL. Generics and subtyping have a proper type system implementation in the Statix meta-DSL.
One paper was published about using PIE:
- Constructing Hybrid Incremental Compilers for Cross-Module Extensibility with an Internal Build System. This paper introduces a compiler design approach for reusing parts of a non-incremental to build an incremental compiler, using PIE to perform the incrementalization. The approach is applied to Stratego, a term transformation meta-DSL with several cross-cutting features that make incremental compilation hard. The result is the Stratego 2 compiler that is split up into multiple PIE tasks to do incremental parsing (per-file), incremental name analysis, and incremental compilation. Stratego 2 was also extended with gradual typing at a later stage, where the gradual typing was also performed in PIE tasks.
Related Work
There are several other programmatic incremental build systems and works published about them. This subsection discusses them. For additional related work discussion, check the related work sections of chapter 7 (page 104) and chapter 8 (page 126) of my dissertation.
Pluto
PIE is based on Pluto, a programmatic incremental build system developed by Sebastian Erdweg et al. This is not a coincidence, as Sebastian Erdweg was my PhD promotor, and we developed and wrote the “Scalable Incremental Building with Dynamic Task Dependencies” paper together.
The Pluto paper provides a more formal proof of incrementality and correctness for the top-down build algorithm, which provides confidence that this algorithm works correctly, but also explains the intricate details of the algorithm very well. Note that Pluto uses “builder” instead of “task”. In fact, a Pluto builder is more like an incremental function that does not carry its input, whereas a PIE task is more like an incremental closure that includes its input.
PIE uses almost the same top-down build algorithm as Pluto, but there are some technical changes that make PIE more convenient to use.
In Pluto, tasks are responsible for storing their output and dependencies, called “build units”, which are typically stored in files.
In PIE, the library handles this for you.
The downside is that PIE requires a mapping from a Task
(using its Eq
and Hash
impls) to its dependencies and output (which is what the Store
does), and some modifications had to be done to the consistency checking routines.
The upside is that tasks don’t have to manage these build unit files, and the central Store
can efficiently manage the entire dependency graph.
Especially this central dependency graph management is useful for the bottom-up build algorithm, as we can use dynamic topological sort algorithms for directed acyclic graphs.
Other Incremental Build Systems with Dynamic Dependencies
Build Systems à la Carte shows a systematic and executable framework (in Haskell) for developing and comparing build systems. It compares the impact of design decisions such as what persistent build information to store, the scheduler to use, static/dynamic dependencies, whether it is minimal, supports early cutoff, and whether it supports distributed (cloud) builds. Even though the Haskell code might be a bit confusing if you’re not used to functional programming, it is a great paper that discusses many aspects of programmatic incremental build systems and how to implement them.
Shake
Shake is an incremental build system implemented in Haskell, described in detail in the Shake Before Building paper. The main difference in the model between Shake and PIE is that Shake follows a more target-based approach as seen in Make, where targets are build tasks that provide the files of the target. Therefore, the output (provided) files of a build task need to be known up-front. The upside of this approach is that build scripts are easier to read and write and easier to parallelize. However, the main downside is that it is not possible to express build tasks where the names of provided files are only known after executing the task. For example, compiling a Java class with inner classes results in a class file for every inner class with a name based on the outer and inner class, which is not known up-front.
Implementation wise, Shake supports explicit parallelism, whereas PIE cannot (at the time of writing).
Parallel builds in PIE are tricky because two build tasks executing in parallel could require/provide (read/write) to the same file, which can result in data races.
Shake avoids this issue by requiring provided files to be specified as targets up-front, speeding up builds through explicit parallelism.
In PIE, this might be solvable with a protocol where tasks first call a Context
method to tell PIE about the files that will be provided, or the directory in which files will be provided, so PIE can limit parallelism on those files and directories.
Tasks that do not know this up-front cannot be executed in parallel, but can still be executed normally.
Rattle
Rattle is a build system focussing on easily turning build scripts into incremental and parallel build scripts without requiring dependency annotations, described in detail in the Build Scripts with Perfect Dependencies paper. To make this possible, Rattle has a very different model compared to PIE.
Rattle build scripts consist of (terminal/command-line) commands such as gcc -c main.c
, and simple control logic/manipulation to work with the results of commands, such as if checks, for loops, or changing the file extension in a path.
Therefore, future commands can use values of previous commands, and use control logic to selectively or iteratively execute commands.
Commands create dynamic file dependencies, both reading (require) and writing (provide), which are automatically detected with dependency tracing on the OS level.
There are no explicit dependencies between commands, but implicit dependencies arise when a command reads a file that another command writes for example.
Rattle incrementally executes the commands of a build script, skipping commands for which no files have changed. The control logic/manipulation around the commands is not incrementalized. Rattle build scripts can be explicitly parallelized, but Rattle also implicitly parallelizes builds by speculatively executing future commands. If speculation results in a hazard, such as a command reading a file and then a command writing to that file – equivalent to a hidden dependency in PIE – then the build is inconsistent and must be restarted without speculative parallelism.
Core difference
The best way I can explain the core difference is that Rattle builds a single build script which is a stream of commands with file dependencies; whereas in PIE, every build task is in essence its own build script that produces an output value, with file dependencies but also dependencies between build tasks. Both models have merit!
The primary advantage of the Rattle model is that existing build scripts, such as Make scripts or even just Bash scripts, can be easily converted to Rattle build scripts by converting the commands and control logic/manipulation into Rattle. No file dependencies have to be specified since they are automatically detected with file dependency tracing. Then, Rattle can parallelize and incrementally execute the build script. Therefore, Rattle is great for incrementalizing and parallelizing existing Make/Bash/similar build scripts with very low effort.
While it is possible to incrementalize these kind of builds in PIE, the effort will be higher due to having to split commands into task, and having to report the file dependencies to PIE. If PIE had access to reliable cross-platform automated file dependency tracing, we could reduce this effort by building a “command task” that executes arbitrary terminal/command-line commands. However, reliable cross-platform file dependency tracking does not exist (to my knowledge, at the time of writing). The library that Rattle uses, Fsatrace, has limitations such as not detecting reads/writes to directories, and having to disable system integrity protection on macOS. Therefore, Rattle also (as mentioned in the paper, frustratingly) inherits the limitations of this library.
Compared to Rattle, the primary advantages of programmatic incremental build systems (i.e., the PIE model) are:
- PIE can rebuild a subset of the build script, instead of only the entire build script.
- The entire build is incrementalized (using tasks as a boundary), not just commands.
- Tasks can return any value of the programming language, not just strings.
- Tasks are modular, and can be shared using the mechanism of the programming language.
These properties are a necessity for use in interactive environments, such as a code editors, IDEs, or other using-facing interactive applications. Therefore, the PIE model is more suited towards incrementalization in interactive environment, but can still be used to do incremental batch builds.
Implicit Parallelism (Speculative Execution)
Rattle supports both implicit and explicit parallelization, whereas PIE does not at the time of writing. Explicit parallelism was already discussed in the Shake section.
After a first build, Rattle knows which commands have been executed and can perform implicit parallelization by speculatively executing future commands. If a hazard occurs, the build is restarted without speculation (other recovery mechanisms are also mentioned in the paper), although the evaluation shows that this is rare, and even then the builds are still fast due to incrementality and explicit parallelism.
After the initial build, PIE also has full knowledge of the build script. In fact, we know more about the build script due to tracking both the file dependencies and the dependencies between tasks. However, just like Rattle, PIE doesn’t know whether the tasks that were required last time, will be the tasks that are required this time. In principle, 0 tasks that were required last time can be required the next time. Therefore, if we would do speculative execution of future commands, we could run into similar hazard: hidden dependencies and overlapping provided files.
However, I think that restarting the build without speculative execution, when a hazard is detected, is incorrect in PIE. This is because PIE keeps track of the entire dependency graph, including task output values, which would not be correct after a hazard. Restarting the build could then produce a different result, because PIE uses the previously created dependency graph for incrementality. In Rattle this is correct because it only keeps track of file dependencies of commands.
So I am currently not sure if and how we could do implicit parallelism in PIE.
Self-Tracking
Self-tracking is the ability of an incremental build system to correctly react to changes in the build script. If a part of the build script changes, that part should be re-executed.
Rattle supports self-tracking without special support for it, because Rattle makes no assumption about the build script, and re-executes the build script every time (while skipping commands that have not been affected). Therefore, build script changes are handled automatically.
PIE supports self-tracking by creating a dependency to the source code or binary file of a task.
However, this requires support from the programming language to find the source or binary file corresponding to a task.
In the Java implementation of PIE, we can use class loaders to get the (binary) class files for tasks and related files.
In the Rust implementation of PIE, we have not yet implemented self-tracking.
In Rust, we could implement self-tracking by writing a procedural macro that can be applied to Task
implementations to embed a self-tracking dependency (probably a hash over the Task
impl
) into the Task::execute
method.
However, since PIE is fully programmatic, tasks can use arbitrary code. To be fully correct, we’d need to over-approximate: check whether the binary of the program has changed and consider all tasks inconsistent if the binary has changed. In practice, the approach from the Java implementation of PIE works well, alongside a version number that gets updated when code used by tasks changes semantics in a significant way.
Cloud Builds
Rattle could support “cloud builds” where the output files of a command are stored on a server, using the hashed inputs (command string and read files) of the command as a key. Subsequent builds that run command with matching hashes could then just download the output files and put them in the right spot. It is unclear if Rattle actually does this, but they discuss it (and several problems in practice) in the paper.
PIE does not currently support this, but could support it in a similar way (with the same practical problems).
In essence, the Store
as implemented in this tutorial is such a key-value store, except that it is locally stored.
We also cache task outputs, but they could be stored in a similar way.
Whether this is a good idea depends on the task. For tasks that are expensive to execute, querying a server and getting the data from the server can be faster than executing the task. For tasks that are cheap to execute, just executing it can be faster.
Future work
I’d like to go over all kinds of extensions to the build system, as there are a lot of interesting ones. Unfortunately, those will not be guided like the rest of this programming tutorial, due to lack of time.