Maximilian Schwenger

What is RustTyC?

RustTyC provides an interface that allows for an intuitive translation of inference rules based on a Hindney-Milner-like type lattice into rust code. If I lost you completely with the last sentence, congrats for not being in academia! But fret not, as per usual, these are big words enabling precise reasoning — something we won't need all that much. Anyway, let me clarify a couple things first so we are on a common ground. For this, I need to know your background knowledge.

I'm eager to learn more.

You don't know what a type system is but are keen to find out? Perfect, I'll tell you about types, how they are used, and what this library is in my popular science blog post on Programming Languages and Type Systems (coming soon). If this sparks your interest just read on for more on RustTyC specifically.

Non-Theoretician.

You have first-hand experience with types, have fought a bitter battle with the Borrow Checker or two. Type theory is not entirely your area of expertise but you're willing to dig deeper?
Just read on.

Straight to the beef!

I eat Hindney-Milner for breakfast and my first tattoo was a type lattice. Just give me the specifics of RustTyC.
Jump ahead to here.

Let's Talk Theory

Did you knwo that as little as 5 grams of theory are worth two kilos of reading the API?

Of Inference Rules and Lattices

We academics love, cherish, worship inference rules. The concept is simple: An inference rule tells you what you need, the premise, to learn something new, the conclusion. Assume you have a furry thing, with sloppy ears, that enjoys licking your face and swipes the entire coffee table with a tail whip when excited. What do you have? If you have a pet other than a dog in mind, I seriously have to know about it. Here, the description is the premise, and the knowledge that it's a dog is the conclusion. Simple enough. We can only draw the conclusion if all premises are fulfilled. I.e., if you only have a wet face and a cleared coffee table, the culprit might be a dog, but it might also be a dragon for all we know.

Inference rules enable modular reasoning about types. Practically, we first design a couple of rules for any possible expression in our syntax in isolation, i.e., some rules for addition, some rules for conditionals, etc. Then, given a concrete program, we apply the rules accordingly by traversing the abstract syntax tree.

Let's get more concrete. Assume the program contains a statement assigning the value of 5 + 3 to the variable c. What's the type of c? Well, the inference rule for assignments will be something like 'given the expression on the right-hand side of the = has type t, then the variable on the left-hand side has the same type.' Well, that isn't particularly useful, as it requires us to know the type of 5 + 3 first. Luckily, we have an inference rule for that. Like I said: we academics are obsessed with inference rules. The rule for addition says 'provided the operands are numeric, the result will be as well.' Thanks a bunch, now we have to apply even more inference rules because we need to figure out whether 5 and 3 are numerics. Don't worry, there will be a rule stating that numeric literals like 5 and 3 will have a numeric type. Now we climb the tree back up by learning that 5 and 3 are numerics, thus 5 + 3 is numeric, thus c is numeric. Awesome!

The next step is to map the numeric type to something that's actually realizable on a computer. Here, we tend to distinguish between several varieties of numeric types such as '(signed) integers', 'unsigned integers', and 'floating point numbers'. The more general, abstract 'numeric' type encompasses all three varieties, so a numeric value can be either one of them. Let's add another restriction on our type system: In some languages with a focus on safety or mathematics like rust or SML, these varieties are strictly separated. That means, we can't add a signed and an unsigned value without further ado. And voilà, belive it or not, we now got our type lattice. A lattice is fancy-speak for a hierarchical structure. In our case, the numeric type is the most general type and encompasses the three variants (integer, unsigned, float). Since we embrace family values, we call them the children of the numeric type. All three are strictly separated, there is no relation between them except for their sibling-ship.

For the practical consequences of this lattice, consider an addition of two variables. If both of them are numeric, then the result will be numeric as well. Simple enough. Conversely, if one is an integer and the other one is a floating point number, then we have ourselves a type error: we can't add an integer and a float. Lastly, if one is a numeric and one is an unsigned integer, we learn two things. First, we know that we would get a type error if the numeric type would be anything other than an unsigned integer. Thus, optimistic as we are, we assume it to be exactly that. Secondly, since we now add two unsigned integers, the result will be an unsigned integer a well.

And this, my friends, is type inference based on inference rules of a type lattice. Phew, that's enough basics for now. Let me just fix some nomenclature real quick. A value that is higher up in the lattice is more abstract, a type that is lower is more concrete. When climbing the lattice up as much as possible, we'll get a unique element, the top element. This is a type for which we essentially have no information. It's anything. Conversely, when climbing to the bottom of the lattice, we get the bottom element, that is a type error. The combination of two types such as the addition before is called a 'meet' operation. The first paragraph of the next section clarifies the structure of the lattice for academic folk, so feel free to skip.

Let's Dive Into RustTyC

You'll learn how to import the lib and create a simple type system in no time!

Lattice Structure for Academics

RustTyC assumes the lattice to be a bounded meet-semilattice where the top element represents the unconstrained type. A type t is more concrete that t' if t ≤ t'. In this case, t' is more abstract than t. While the lattice does not necessarily have a lower bound, meet operations can fail, resulting in a type error. The error is arguably the bottom element. However, in practice, it's clearer to handle errors than silently creating a type-variant representing an error.

Note that a type lattice can equivalently be defined as a bounded join-semilattice. The decision is more or less arbitrary, there are apostles for both ways, so no matter which one I choose, someone will be mad.

Usage

Enough beating around the bush: Let's start type-checking!

The first step is to add the dependency to your Cargo.toml:


rusttyc = "^0.3.1"

Next, let's talk data structures. I assume you already have some code data structure representing programs you want to type-check. We'll call this the AST. Then, you need some representation of types, which we'll call AbsTy. Most likely, this is an enum listing all possible types a value can have, including fully unresolved types (Top/Unconstrained/Infer), abstract types (Numeric) and recursive type (Option(Box<AbsTy>) or Either(Box<AbsTy>, Box<AbsTy>)).

To obtain a lattice structure, implement the rusttyc::Variant type. This requires you to implement three functions and provide an associated error type (Variant::Err). Before getting to the meet of the implementation (do you see what I did there?), let's cover the simple functions: Variant::top(), which provides the top element of the type lattice, and Variant::arity(&self) -> Arity, which provides the arity of a specific type. The arity can be either Arity::Variable or Arity::Fixed(usize). Nothing too exciting here. Lastly, there is the Variant::meet(Partial<Self>, Partial<Self>) -> Result<Partial<Self>, Self::Err> function. If you forget about the Partial thingy for a second, this is exactly what we would expect: it takes two abstract types and provides a new one according to the rules for type lattices. If the two types are incompatible, it returns an error with some debug information. Note that type checker will enrich the error with some more context information to make it easily traceable, hence producing a TcErr. If you are curious, have a glance at it here. So, what's the deal with the Partial? A Partial is the combination of a Variant and the least number of children the respective type has. So, for example, a numeric type typically does not have a child. Thus, when wrapped in a Partial its Partial::least_arity will always be 0. A tuple-type, however, is a different story entirely. Suppose a variable represents a nonuple (9-tuple), but the type inference only infered that there will be a child at places 0, 3 and 4. In this case, its Partial::least_arity will be 5 (recall: index 4 occupied indicates at least 5 elements). Similarly, if the type is the top variant, it might resolve to a tuple or an option, hence it arity can only be a lower bound.

What's left to do now is to start collecting constraints. In your code, create a new type checker with rusttyc::TypeChecker::new() and traverse your AST. For each node in the AST create a rusttyc::TcKey and impose a set of constraints on it. The key represents either a node of the AST (whatever this may be) or a variable. The special point of variables is that they might occur several times in the AST and refer to the same object. There are several ways to handle this challenge, the by far easiest is to let the type checker take care of it. For this, call the TypeChecker::get_var_key(&mut self, var: Var) -> TcKey function. Calling this function multiple times with the same variable will return the same key; for your convenience.

Assume the AST is the tree representation of c := a + 3. You'll want to traverse the tree by recursively calling something like a tc_ast function. Assume tc is the type checker and the function returns a Result<TcKey, TcErr<AbsTy>> where the key is the key containing the type of the node. We'll discuss the nodes bottom to top. The first node is the variable a. There's not much to do here, just retrieve a key for a and return it: Ok(tc.get_var_key(var)). Next is the integer literal 3. Assume such a literal should bind the value to the type AbsTy::Unsigned. Thus, use:


let key = tc.new_term_key();  // Generate a fresh key.
// Set an explicit type as bound for `key`.
tc.impose(key.concretizes_explicit(AbsTy::Unsigned))?;  
Ok(key)

Now the interesting part: performing the addition. The idea is to check both sub-terms recursively, meet their types, and return the result.


let left = tc_expr(&mut tc, lhs);
let right = tc_expr(&mut tc, rhs);
let key = tc.new_term_key();
tc.impose(key.is_meet_of(left, right))?;
Ok(key)

Well, that was simpler than expected, eh?

Let's wrap up by assigning the value. For this, we recursively check the expression, retrieve the key for c and equate it with the result of the expression.


let res = tc_expr(rhs);
let key = tc.get_var_key(lhs);
tc.impose(lhs.is_sym_meet_of(key))?;
Ok(key)

And that's it! Retrieve the result of the whole procedure by generating a type table.


let type_table = tc.type_check()?;

Done!

(A-)Symmetric Relations

One thing you need to keep in mind is that type relations are like friendships: some are symmetric, some are not. And all is well if everyone is aware of that. So RustTyc offers to impose both kinds of relations. A symmetric relation between two keys k, and k' entails that a refinement of one also refines the other. Suppose k is in an asymmetric relation with k', e.g. k is more concrete than k'. In this case, refining the type of k' entails a refinement of k, but not vice versa. A regular meet (TcKey::is_meet_of(...)) is inherently asymmetric, the symmetric counterpart is TcKey::is_sym_meet_of(...).

What next?

Where to find more information and how to contribute

Find out more about RustTyC and contribute on the github page (repo, suprisingly extensive docs). It also contains some examples. If you want an immense example, check out the type checker of the formal specification language RTLola (project, github)

If you have any questions or feedback, I'm curious to hear from you!