Introduction
lsts is a proof assistant and sometimes a programming language. LSTS is designed for assisted reasoning in mathematical proofs. For those feeling particularly adventurous, it may even be possible to use as a programming language.
- Lightweight. LSTS is only 3K sloc of Rust code with no dependencies.
- Fast. Benchmarks compare to Rust itself for compilation speed and efficiency.
- Familiar. The Syntax is usually similar to Coq, ML, or Haskell.
Intended Audience
This tutorial and reference has been compiled for people who have significant mathematical or programming experience and are looking to try something new. LSTS is anything but average. There are still huge pieces of functionality missing from LSTS. There are also huge pieces of functionality that are already outperforming comparable software products. Hopefully this guide will help you find something inspiring. Soon this guide will help you also find something useful. You can help us improve more quickly by suggesting improvements or asking questions.
Contributing
LSTS is free and open source. You can find the source code on GitHub and issues and feature requests can be posted on the GitHub issue tracker. LSTS relies on the community to fix bugs and add features: if you would like to contribute, please read the Code of Conduct guide and consider opening a pull request.
License
The LSTS source and documentation are released under the MIT License.
Installation
There are multiple ways to install the lsts CLI tool. Choose any one of the methods below that best suit your needs.
Build from source using Rust
To build the lsts
executable from source, you will first need to install Rust and Cargo.
Follow the instructions on the Rust installation page.
Once you have installed Rust, the following command can be used to build and install lsts:
cargo install lsts
This will automatically download lsts from crates.io, build it, and install it in Cargo's global binary directory (~/.cargo/bin/
by default).
To uninstall, run the command cargo uninstall lsts
.
Installing the latest master version
The version published to crates.io will ever so slightly be behind the version hosted on GitHub. If you need the latest version you can build the git version of lsts yourself.
cargo install --git https://github.com/rust-lang/LSTS.git lsts
Again, make sure to add the Cargo bin directory to your PATH
.
If you are interested in making modifications to lsts itself, check out the Contributing Guide for more information.
LSTS Syntax
LSTS syntax is derived from Lambda Calculus. Extensions to the Lambda Calculus syntax are modelled after those from previous languages also derived from Lambda Calculus. Some of these languages include Coq, ML, and Haskell.
Terms
Terms are all desugared into Lambda Calculus expressions.
1;
"abc";
2 + 3;
"[" + (for x in range(1,25) yield x^3).join(",") + "]";
The iterator syntax and method call are all rewritten as a Lambda Calculus expression.
Types
Type definitions define logical statements that are then attached to Terms. All valid Terms have at least one Type. Some Terms may have more than one Type. Types may define invariant properties. These invariant properties impose preconditions and postconditions on what values may occupy that Type. Values going into a Type must satisfy that Type's preconditions. Values coming out of a Term are then known to have satisfied each Type's invariants.
type Prime: Integer
where self > 1
and a:Integer. 2>a || a>self-1 || self%a != 0
Statements
Statements connect logic to form conclusions. Each Statement has a Term part and a Type part. A Statement may optionally have a label so it can be referenced directly later. Statements, when applied, provide new information to the Type of a Term. When a Statement is applied, it must match the pattern of its application context. An application context consists of a Term and a Type, which is then compared to the Term and Type of the Statement. These Term x Type relations form the basis of strict reasoning for LSTS.
forall @inc_odd x: Odd. Even = x + 1;
forall @dec_odd x: Odd. Even = x - 1;
forall @inc_even x: Even. Odd = x + 1;
forall @dec_even x: Even. Odd = x - 1;
((8: Even) + 1) @inc_even : Odd
Terms
Terms inhabit Types. Terms can be evaluated to produce Values. A Value has a finite representation. When run as a program, these values will be represented as 1s and 0s.
True: Boolean;
1: Integer;
1.2: Real;
1.2+3i: Complex;
'a': Character;
"bc": String;
(1,True): (Odd, Boolean);
[7, 11]: Prime[];
{2, 6, 10}: Set<Even>;
{1=2, 3=4, 5=6, 7=8}: Map<Integer,Integer>;
Value Expressions
Values are existential proofs that a type bound can be satisfied. The expression "1 is an Integer" proves that integers exist. We refer to this important relationship as inhabitance. Values inhabit Types. Types inhabit Kinds. A type with no values is said to be vacuous.
Literal Values
Literal values are assigned to a Type through a combination of inference and regex pattern matching. A Literal Type Definition will define a grammar that matches some value string. An example of a literal type is the traditional concept of an Integer. An Integer is denoted as a string of digits: [0-9]+.
After creating a value, it can be passed around
1 + 2
or assigned to a binding
let x:Integer = 5
Block Expressions
Blocks group one or more statements together into a logical unit. The existence of block statements is to some extent an artifact of the development team's familiarity to programming languages and unfamiliarity with traditional proof assistants. All block statements exist to create quantified proofs of properties in a style familiar to programmers.
Loops
Loops can prove very interesting properties of language.
A while loop can prove that a condition is terminal, subject to some computation.
let more(): Boolean;
while more() {
//does this ever end?
}
For completeness' sake we have also built most traditional control-flow blocks.
for x:Prime in primes {
//this is not unlike a forall statement
}
loop {
//this is called a 1+ loop
} while( do_continue() )
If Conditionals
Where would we be without branching conditionals?
if a then b
else if c then d
else e
We also define pattern matching conditionals which unpack structured data.
if let Some(a) = b {
a: Odd
}
Exhaustive Pattern Matching
For complex value unpacking we have match blocks.
match x {
EnumBranch1(y) => { y: Prime; },
EnumBranch2(z,w) => { z: Odd; w: Even; },
}
Grouping Blocks
If you just want to group some statements together, for this we have simple brace blocks.
{
let x: Odd;
a: Integer;
b: Prime;
x + a + b: Integer;
}
Compound Values
A compound value is made up of smaller parts. Some compounds contain heterogenous components. Some compounds contain homogenous components. The user can define custom compound values as new types. Other compound values are implicitly introduced.
Tuples
A Tuple is an extension of the English concept of "double", "triple", "quadruple", "n-tuple". Tuples contain a number of heterogenous values. A Tuple value is introduced using parentheses and comma syntax. Tuple types are also introduced using parentheses and comma syntax.
(1,'a',"bc",True) : (Number, Character, String, Boolean)
Homogenous Tuples
Homogenous Tuples have multiple elements of the same type. In some languages this data structure is called an array or maybe a list or vector. In LSTS we use some syntax that is similar to these other languages, but in documentation we still call these homogenous tuples.
(1,2,3,5,8) : Integer[5]
Named Tuples
Tuples can be given names and type signatures in a type definition. Named tuples must declare the type of all of their contained values.
type MyStruct = {
x: Number,
c: Character,
s: String,
b: Boolean
}
After a named tuple has been defined, it can be instantiated with similar syntax.
MyStruct { x=1, c='a', s="bc", b=True }
Union Types
Union Types define branched types. A Union value may match with one of, but not multiple of, the Union's branches. Each branch in a Union is given a tag. This kind of Union is sometimes called a Tagged Union for that reason.
type MyUnion = FirstTag { x:Integer }
| SecondTag { y:Even }
| ThirdTag { z:Odd }
| FourthTag { x:Prime, y:Prime, z:Prime }
Union Types are instantiated with the same syntax as for named tuples.
SecondTag { y:8 }
Lists
Lists have builtin syntax for their types and for their values.
let xs:Integer[3] = [1,2,3];
let vs:Odd[] = [3,5,11,15,17]
Sets
Sets have builtin syntax for their types and for their values.
let ss:Set<Integer> = {1, 2, 2, 3}
Maps
Maps have builtin syntax for their types and for their values.
let ms:Map<Integer,Prime> = { 1=2, 2=3, 3=5, 4=7 }
Quantified Values
Quantified Values inhabit Quantified Types. This concept maps exactly to the programming concept of iterators, so we will just steal that syntax and use it here.
Iterators
A naked iterator looks much like a for loop.
for i:Integer in ls yield i+1
Containers
Iterators can fill containers,
like sets
{for a:Integer in s yield a*a}
or lists
[for a:Integer in s yield a*a]
or maps.
{for a:Integer in s yield a=a*a}
Fallible Arrows
Arrow Terms, also called Lambdas, or Functions, are a central concept of functional programming. In LSTS most functions are defined as let bindings. When a function is defined as a let binding, its binding is considered total over its type. A total function means that, given any term satisfying the function's domain type, a value will be returned that satisfies the function's range type.
By contrast, some arrows are fallible. A match expression uses many possible arrows to return values from each case of the pattern. These arrows do not match all values of their types, so they are not total. Functions that are not total are also called fallible. If-let expressions also use fallible arrows.
//f is infallible and total
let f(x: Integer) = x*x;
//match arms are fallible
//considered as a whole, this match is total
let m = match t {
Some(x) => { f(x) },
None => { f(1) },
};
//if-let arms are fallible
//this expression is not total unless you consider the implicit else expression
if let Some(x) = t {
f(x)
};
Let bindings can also be marked fallible. An example of a fallible function is division. Any number divided by zero is undefined. There is no result, the result is just undefined. Below we define this type signature for LSTS.
let $"/"(n: Number, d: Number): Number
where fails if d == 0;
Types
A Type describes a property of a Term. Terms are multifaceted and can be described from a variety of perspectives. A Term can have multiple properties, so Terms can also have multiple Types.
Each Term is associated with a singular Type Expression. A Type Expression represents all properties of that term, structured in Conjunctive Normal Form (CNF). Conjunctive Normal Form is just a way of saying that all the Ands are at the top level, above the Ors.
Two Types can be attached to one Term.
a: Integer + Prime
One Type can imply another Type. When this happens there is an implicit expansion of the implication to enforce CNF.
type Aa: Bb;
a : Aa // Aa + Aa => Bb
//rewrite as just Aa + Bb
a : Aa + Bb
Literal Types
Literal Types are defined by a grammar that matches a subset of value characters. LSTS does not really define any builtin literal types. These literal types are all left to preludes and the user to define.
type Number;
type Integer : Number = /^[0-9][_0-9]*([eE][_0-9]+)?$/;
type Real : Number = /^[0-9][_.0-9]*([eE][-]?[_0-9]+)?$/;
type Complex : Number = /^[0-9][_.0-9]*([eE][-]?[_0-9]+)?([+][0-9][_.0-9]*([eE][-]?[_0-9]+)?[i])?$/;
As you can see in the above code section, numerical types can be associated with a definitive regular grammar. There are plans to support Context-Free Grammars and even Free Grammars but this feature is only in discussion.
When the type checker sees a literal value it needs to associate a literal type with it. Here the first type definition that matches the longest possible literal will be assigned. In the above example 34 would be assigned to the type Integer. By contrast 34.0 would become a Real number because it matches a longer string than the Integer.
Structured Types
Structured types are also referred to as Named Tuples. A structured type is a named type with optional parameters. An example of a structured type would be the concept of a Point in N-dimensional space.
struct Point2D<N:Number> = { x:N, y:N };
struct Point3D<N:Number> = { x:N, y:N, z:N };
struct Point4D<N:Number> = { x:N, y:N, z:N, w:N };
Type parameters can be inferred when the struct is instantiated.
Point3D { x:1, y:2, z:3 }
If you want to provide a more explicit parameter bound, then you can annotate the parameters on the constructor.
Point2D<Prime> { x:7, y:17 }
Union Types
Union Types define multiple branches of possible structured data. Each branch must also be annotated with a tag. For this reason Union Types are also called Tagged Unions. Each branch of a Union Type is declared with a tag and its structured data. Multiple branches are separated with a vertical bar.
type Boolean = True | False;
type Option<A> = None | Some { value:A };
Union types are instantiated in identical fashion to structured data. The only difference is that instead of the type name, the tag name is used to prefix the data.
Some { value=5 }
Parameterized Types
Type parameters create generic types. A Generic Type is a single type definition that can have multiple type signatures. Each type signature will have a similar, but not quite exactly the same, structure. Parameters can be added to any type definition to create a generic type.
type Point2D<N:Integer> = { x:N, y:N };
type Result<OK,ERR> = Ok { ok: OK }
| Err { err: ERR };
Type parameters can also be added to any parameterized type in its constructor. This creates a very strict enforcement of the type's parameter bounds.
Ok<Odd,String> { ok=1 }
Unification of Type Parameters
Unification of Type Parameters is broken into two logical steps: unification and substitution. Unification checks that two type expressions are compatible and records any substitutions that are required. Substitution unifies all required substitutions, then replaces them in the type expression. This distinction holds for both regular type parameters and variables in Constant types.
By example
Integer => X = X where X=Integer
X => Integer = {}
[1] => [x] = [x] where [x|1]
[x] => [1] = {}
When a variable appears in multiple places in a type expression, it will be substituted multiple times. These substitutions must unify with each other under the rules of the Most General Unifier.
By example
MGU( Integer, Number ) = Number
MGU( Integer, Boolean ) = {}
MGU( [1], [self%2==1] ) = [self%2==1]
MGU( [1], [2] ) = {}
Type Invariants
Type invariants carry a lot of the weight of proofs in practice. Type invariants are sometimes call refinement types. A Refinement Type describes a subset of the values of a Total Type.
A Total Type describes all values that can be represented. UnsignedInteger32 would be a Total Type. UnsignedInteger32 can represent all 32-bit unsigned integers.
By comparison, a refinement type, or implicitly any type with invariants, would not be Total. Odd numbers are a subset of all Integers. Odd numbers might be represented as Integers, but not all Integers are Odd.
987654321 : Odd + Integer
Type Invariants are properties that are attached to a type definition. The definition of Odd is as follows:
type Odd: Integer
where self%2 | 1
Multiple properties can be attached to the same type definition.
type Factor23: Integer
where self%2 | 0
and self%3 | 0;
The self keyword represents some value of the declared type. The bar syntax indicates that the preceding expression produces the following expression. When a value is bound by a type that has invariants, each invariant is checked as a precondition to satisfy that type. If a bound value does not satisfy its type's preconditions, then an error will occur.
Postconditions are the properties that we know about any given value. We may not know the exact value of a term, however we may know some of its properties. When this happens, we can often use the postconditions of a type to complete a proof.
In the above example of Factor23, we know that any value of that type will divide evenly into 2 and 3. This can be used as part of a proof, or in a program it can be used for constant folding.
let x:Factor23;
if x%2 == 0 then {
//yes, always this branch
} else {
//no, never this branch
}
Constant Types
Constant Types have their own Kind. Constants are very important in most programs because they validate the invariant properties of types. All Constants are either Integer or Boolean values. There are also some builtin functions for manipulation of Constant values.
Some values are given constant types. This creates a relationship between Term values and Constant types.
1: [1];
4: [4];
When a Term does a computation, sometimes the Type might also follow along at the type level.
1 + 3 : [1 + 3]
Type invariants are evaluated in the context of Constant types.
let x:Odd;
x % 2 : [1]
Type Unification
Unification is the process of testing whether one type implies another. Sometimes unification is written as A <: B, which is called subsumption. For our use we tend towards arrow notation A => B which is called implication. The primary difference between the two is the directionality of the operation. A <: B is equivalent to B => A, for the extent of our usage at least.
In LSTS we often use the term Property and Type interchangeably. The primary difference in usage between these two concepts is how they are attached to Terms. A Term can typically have many properties. A Term can typically have only one type. We break this convention when we say that Types are Properties. In LSTS, Terms have multiple Properties. Equivalently, Terms have multiple Types.
This would be all well and good if there were no consequences to this breaking from convention. There are consequences however. Due to these consequences we sometimes need to say, there can only be one type.
One case where you can only have one type is in a normal kind. It does not make much sense to say that a term is both a Second and a Metre. A measurement can only have one Unit. We extend this principal to apply to all normal kinds. Normal kinds can only be inhabited by a single Type.
A similar situation appears when we talk about representation of Terms. It is OK to say that a term is an Odd Integer, but not a String Integer. Both String and Integer are valued types. Valued Types have a representation and imply a value. Terms are only allowed to have one Valued Type. Valued Types are sometimes called Data Types. Data Types are singular.
This preface is all we need to explain before even getting started to talk about unification. Unification is the beating heart of LSTS and it is very important that we get it right.
Implication vs Most General Unifier
Implication naturally has a direction, of that which is to potentially change somehow and therefore must be satisfied by what already exists on the other side. If nothing can change then we are checking for exact equality. If both sides can change, then the result is the Most General Unifier. This is why there are three unification operations: implication, MGU, and structural equality.
A x B | A can narrow | A cannot narrow |
---|---|---|
B can narrow | MGU | B => A |
B cannot narrow | A => B | structural equality |
Unification, as implemented
First let's look at the structure of a Type before we start moving anything around.
type Type =
Any
| Named(String,Type[])
| And(Type[])
| Arrow(Type,Type)
| Tuple(Type[])
| Product(Type[])
| Ratio(Type,Type)
| Constant(Term)
Bottom is represented as the empty conjunctive. Tuple is order-sensitive. Nil is represented as the empty tuple. Product is order-insenstive.
Unification is represented as the binary arrow operator: A => B. However, before we can start unification we must satisfy some preconditions. The following function gets called before any unification.
let apply_preconditions( left_type: Type, right_type: Type ) = {
//rewrite Types with implied types explicitly included
//A + A => B will be rewritten as A + B
expand_subtypes( left_type );
expand_subtypes( right_type );
//Constant types will be evaluated and normalized
//Constants may not become a Value, but they will be normal
reduce_constant_expressions( left_type );
reduce_constant_expressions( right_type );
//Types will be converted into CNF
//Products and Ratios will be reduced and ordered
normalize( left_type );
normalize( right_type );
//Assert that all preconditions are satisfied
//Check that Type is in CNF
//Check that Normal Kinds and Data Types are singular
assert_preconditions( left_type );
assert_preconditions( right_type );
}
The actual unification function can be very fragile in some cases. How each case is treated has an outsized effect on the rest of our codebases.
let $"=>"(left_type: Type, right_type: Type): Type {
match (left_type, right_type) {
//the bottom type implies nothing
(And(lts), _) if lts.length==0 => { raise TypeError("Bottom implies nothing") },
//any type implies Any type
(l, Any) => { l },
//type variables get substituted for their greatest-common-denominator
(Named(lv,lps), rt) if lv.is_uppercase => {
if lv in substitutions {
substitutions[lv] = gcd(substitutions[lv], rt)
} else {
substitutions[lv] = rt
};
substitutions[lv]
},
(lt, Named(rv,rps)) if rv.is_uppercase => {
if rv in substitutions {
substitutions[rv] = gcd(lt, substitutions[rv])
} else {
substitutions[rv] = lt
};
substitutions[rv]
},
//And Unification has highest precedence
(_, And(rts) as rt) if rts.length==0 => { rt },
(And(lts) as lt, And(rts)) => {
let mts = [];
for rt in rts {
match lt => rt {
And(tts) => { mts.append(tts); },
tt => { mts.push(tt); },
}
}
mts.sort(); mts.dedup();
if mts.length==0 { raise TypeError("nothing implies Bottom") }
else if mts.length==1 { mts[0] }
else { And(mts) }
},
(And(lts), rt) => {
let mts = [];
for ltt in lts {
//it is OK if everything doesn't unify
//it is not OK if nothing unifies
if let Ok(nt) = try {ltt => rt} {
match nt {
And(tts) => { mts.append(tts); },
tt => { mts.push(tt); },
}
}
}
mts.sort(); mts.dedup();
if mts.length==0 { raise TypeError("nothing implies Bottom") }
else if mts.length==1 { mts[0] }
else { And(mts) }
},
(lt, And(rts)) => {
let mts = [];
for rt in rts {
if let Ok(nt) = try {lt => rt} {
match nt {
And(tts) => { mts.append(tts); },
tt => { mts.push(tt); },
}
} else {
raise TypeError("Implicit Narrowing of And Types is not permitted on the right")
}
}
mts.sort(); mts.dedup();
if mts.length==0 { raise TypeError("nothing implies Bottom") }
else if mts.length==1 { mts[0] }
else { And(mts) }
}
//Ratio Types have next precedence
(Ratio(nl,dl), Ratio(nr,dr)) => {
Ratio(nl => nr, dl => dr)
},
(lt, Ratio(nr,Tuple(dr))) if dr.length==0 => {
lt => nr
},
//Everything else is a mixed bag
(Named(lv,lps), Named(rv,rps)) if lv==rv && lps.length==rps.length => {
Named(lv, [for (lp,rp) in zip(lps,rps) yield (lp => rp)])
}
(Arrow(pl,bl), Arrow(pr,br)) => {
Arrow(pl => pr, bl => br)
},
(Product(la), Product(ra)) if la.length==ra.length => {
Product([for (lt,rt) in zip(la,ra) yield (lt => rt)]
},
(Tuple(la), Tuple(ra)) if la.length==ra.length => {
Tuple([for (lt,rt) in zip(la,ra) yield (lt => rt)]
},
(Constant(lt), Constant(rt)) => {
if lt == rt {
Constant(lt)
} else {
raise TypeError("Constant Types are not equal")
}
}
}
}
Statements
Statements apply logic to connect Terms and Types. LSTS Statements provide rich algebraic manipulation of Terms and Types. Statements can be either logical theorems or axioms.
axiom @reflexive a:A. [True] = a == a;
forall @reflexive_int a:Integer. [True] = a == a @reflexive;
Syntax
A Statement is formed by constructing
- The keyword axiom or forall
- An optional label
- Optional parameters
- The concluding Type
- The Term definition
Deduction
In Strict Mode statements must be constructed only from axioms, definitions, or from other theorems rooted in axioms. Deduction is label-directed, so it takes near-constant time. Every statement is validated separately, so unfounded theorems will generate separate errors. Theorems resulting from unfounded theorems can be made to generate a warning with the compiler flag: --warn-unfounded.
Theorem Statements
Theorems are Logical Arguments that can be proven. In Strict Mode theorems must be proven by deduction from axioms and definitions.
Hints
Hints apply theorems to contexts inside of Terms. Hints attempt to match a Statement definition against the Term and its Type context. Statement labels can be overloaded to attempt to match multiple patterns in a single Hint. Successful application of a Hinted Statement will decorate the target Term with new Types. Failed application of a Hint will generate an Error.
Time Complexity of Strict Mode
Proof by Deduction of Theorems takes O(n) time, where n is the lexical size of the theorem. Enabling warnings of all theorems derived from unfounded theorems takes a little extra space and time.
Axiom Statements
Axioms don't need to be proven by deduction. The assumption that an axiom is valid is the only difference between an axiom and a theorem. Axioms are a well researched topic in mathematical logic; if an axiom from a prelude is unfamiliar to you, then don't use it. Be warned though that familiar axioms may look a bit different when declared in LSTS. LSTS is very strict, whereas things like the basic axioms of algebra are usually defined casually. Preludes will also define axioms differently for each use-case.
Deduction
All theorems in a Strict LSTS proof can be thought of as a single tree branching out from the axioms. All theorems must be derived from axioms, definitions, or other well-founded theorems. Cycles are not allowed in this tree of theorems, all theorems must lead straight to axioms.
Preludes
Choosing axioms is one of the most important parts of defining a prelude. Not all axioms are appropriate for all preludes. For this reason some preludes are incompatible with each other.
Kinds
Kinds are types for types. Kinds are simple, having a name and potentially some parameters. Types have exactly one Kind. Terms can have multiple Types, so they can have multiple Kinds. Types are attached with a single colon. Kinds are attached with a double colon. There are 3 builtin Kinds: Term, Constant, and Nil. New kinds do not need to be defined before they are used.
5 : Prime :: Term;
2 : Second :: Unit;
1 : [1] :: Constant;
3 : [3] + Integer + Second :: Constant + Term + Unit;
Normal Kinds
We have mentioned that types can be marked as normal. When any Type is marked as normal, the entire Kind that it belongs to is marked as normal. The kind Term can never be marked as normal. Attempting to mark a Term Type as normal will result in an error.
Normal kinds are very useful during typecasting. For most typecasting, the cast must be direct.
let a : Real;
let b : Integer = a as Integer;
For normal kinds, typecasting will search for a viable path that will satisfy the cast. Normal typecasting has two phases: the Into-Normal phase, and the Out-Of-Normal phase.
During the Into-Normal phase, the compiler will try to normalize the type. Multiple casts may be necessary to accomplish this transformation.
let mph: Mile / Hour;
let mps: Metre / Second = mph as Meter / Second;
In the above example, the compiler must typecast both Miles to Metres and Hours to Seconds. Note also that the Hours to Seconds cast is on the denominator. When a Unit cast is on the denominator, the constant conversion factor must be inverted. A well implemented prelude will take this into account.
During the Out-Of-Normal phase, the compiler will try to take the normalized type and convert it into the annotated type. This process works mostly identical to the Into-Normal phase, only with a different target. When Into-Normal and Out-of-Normal phases are connected, multi-step casting can be inferred for fairly complex targets.
let mph: Mile / Hour;
let kmpm: Kilo<Metre> / Minute = mph as Kilo<Metre> / Minute;
Operations on Kinds
Sometimes we need to manipulate Kinds or check their properties. For this we have several operations that work on Kinds. The definition for Kinds is very simple. Kinds can be Nil, Named, or multiple Named.
type Kind =
Nil
| Named(String,Kind[])
| And(Kind[])
The first operation on Kinds is the implication operator. This is similar to the Type implication operator, just much simpler. If a first Kind contains all of the second Kind, then the first implies the second.
let $"=>"(left_kind: Kind, right_kind: Kind): Boolean = {
match (left_kind, right_kind) {
(_, Nil) => { True },
(Nil, _) => { False },
(Named(_), Named(_)) => { left_kind == right_kind },
(And(ks), Named(_)) => { right_kind in ks },
(Named(_), And(ks)) => { False },
(And(lks), And(rks)) => { rks in lks },
}
}
The second significant operation on Kinds is conjoining two Kinds.
let $"+"(left_kind: Kind, right_kind: Kind): Kind = {
match (left_kind, right_kind) {
(_, Nil) => { left_kind },
(Nil, _) => { right_kind },
(Named(_), Named(_)) => { And([left_kind, right_kind]) },
(And(ks), Named(_)) => { And(ks.append(right_kind)) },
(Named(_), And(ks)) => { And(ks.append(left_kind)) },
(And(lks), And(rks)) => { And(lks + rks) },
}
}
Example Programs
LSTS is also a programming language so you can use it to calculate and discover answers to questions. More than just existential proofs, programs can be used to compute. That is why we call it Computer Science.
LSTS takes inspiration from both Haskell and ML for its syntax and semantics. This might surprise some to say that LSTS is like both of these languages. The reason for the surprise is that Haskell is lazily evaluated, whereas ML is eager. You can't have both... can you?
LSTS leaves as much as possible up to the programmer. For example, LSTS is derived from Typed Lambda Calculus, but it derives no types. Similarly, LSTS has no builtin evaluation strategy. When choosing a backend that implements an evaluation strategy, then the programmer must choose between lazy or eager, but not until then. LSTS just type-checks things, and nothing else.
Here are some backends that LSTS can target:
- HVM
- lazy
- massively parallel
- compiled or interpreted
- compilation outputs platform binary or WASM
Code Playground
// Creates a tree with `2^n` elements
(Gen 0) = (Leaf 1)
(Gen n) = (Node (Gen(- n 1)) (Gen(- n 1)))
// Adds all elements of a tree
(Sum (Leaf x)) = x
(Sum (Node a b)) = (+ (Sum a) (Sum b))
// Performs 2^n additions in parallel
(Main n) = (Sum (Gen n))
Finding Right Triangles
Haskell has great list comprehensions. Iterators in LSTS are not that bad either. One great example of list comprehensions comes from a popular Haskell Tutorial. In this example, Haskell is used to find all right triangles with a perimeter of length 24.
Here is the Haskell code.
[ (a,b,c) | c <- [1..10], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2, a+b+c == 24]
Here is the LSTS code.
[ for c in range(1,10)
for b in range(1,c)
for a in range(1,b)
if a^2 + b^2 == c^2 && a+b+c == 24
yield (a,b,c) ]
LSTS takes more characters to write the same program. However, seeing each code for the first time, LSTS would maybe be easier to read. This is how list comprehensions compare for these two languages.
The output for both programs should be [(6,8,10)].
Jump as a Control Flow primitive
It may surprise some to find out that LSTS has non-functional control flow operators. Namely, return and raise both potentially cause stack unwinding to occur. We can account for this control flow in a functional program by using Monads. However, it still doesn't feel functional.
Return
Return yields control flow to the end of the function providing a return value. Even Haskell has a return operator. In Haskell, the return function is defined as part of Monad and maybe works a little different than a naive programmer would expect. In LSTS, the return function actually just jumps to the end of the function. LSTS is not pure functional in this sense.
Raise
The raise keyword is even more problematic to explain in a functional perspective. Raising exceptions can potentially throw control flow back several function calls until it is caught. An exception is either caught by a try block or else it will trace all the way up the call stack to cause a Runtime Error. Some languages may use Result types to make exceptions more explicit. In LSTS the type system is powerful enough to infer Results, so we save the programmer some trouble and just let exceptions fly.
Functional Programming
Some of our backends don't like dirty control flow. This is OK. Monadic code will be generated, but it still may not feel functional. This control flow is a lie. LSTS is still purely functional, though it may not seem so.
fn what_error(x: Integer) {
raise TheError("Where is my type signature...")
}
fn catch_error() {
match try { what_error(3) } {
Ok(()) => { print("OK") }
Error(TheError(msg)) => { print(msg) }
}
}
Batteries Not Included
The title of this example is a pun on the Python motto "batteries included". LSTS is a type-checker. LSTS does nothing else.
LSTS may connect with a bunch of other software products that include batteries. However, LSTS by itself is very minimal. LSTS does not even define "if", although it is given special syntax.
In this example we define "if". An if statement has three arguments. One argument for the Boolean branching condition. Two arguments for each conditional branch. The if statement returns the value of one of the branches. All branches and the return type are parameterized. The shared parameter will become the most-general-unifier of both branches.
let $"if"(condition: Boolean, branch1: A, branch2: A): A;
The same lack of features applies to the type system as well. The Boolean type referenced above must be defined by the user.
type Boolean = True | False;
Numbers are user defined as well.
type Number;
type Integer: Number = /^[0-9][_0-9]*([eE][_0-9]+)?$/;
type Real : Number = /^[0-9][_.0-9]*([eE][-]?[_0-9]+)?$/;
L1 Intermediate Representation
The default backend for interpretation and compilation is L1IR. It is called L1IR because it is the intermediate representation for L1, which is the default prelude for LSTS. L1 is a term from linguistics that means your native language, i.e. the first language that you learn.
L1IR can be compiled or interpreted. Both of these options can be optimized through compilation by using the cranelift code generator.
L1IR is a simple functional programming language. It only has three datatypes:
- Literal Strings
- Functions
- Tuples
Using only these three datatypes it is still possible to implement the full feature set of L1. Initial benchmarks show that compiled L1IR is reasonably fast, within an order of magnitude of Rust with -O3 optimization.
Proofs
LSTS is a theorem prover. LSTS may also be a programming language.
LSTS proofs are built on something called the Curry-Howard Equivalence. The equivalence states and demonstrates that programs are proofs and that proofs are programs. There is a virtuous one-to-one correspondence between proofs and programs. Despite this wonderful finding, most programs do not intend to be proofs. Likewise, most proofs do not intend to be programs.
LSTS is built for this Equivalence. Programs that look like just code, may actually be intended as a proof. Following are some examples of how we can make use of LSTS and the Curry-Howard Equivalence to write better programs and more easily certify proofs.
Rules Based Logic
Proofs in LSTS are built by connecting type definitions to quantified statements. Types are logic at rest. Quantifed Statements are logic in motion. A Type knows what it is. A Statement shows what it is.
Types
Type definitions define logical statements that are then attached to Terms. All valid Terms have at least one Type. Some Terms may have more than one Type. Types may define invariant properties. These invariant properties impose preconditions and postconditions on what values may occupy that Type. Values going into a Type must satisfy that Type's preconditions. Values coming out of a Term are then known to have satisfied each Types' invariants.
type Even: Integer
where self % 2 | 0;
type Odd: Integer
where self % 2 | 1;
Quantified Statements
Quantified Statements connect logic to form conclusions. Each Quantified Statement has a Term part and a Type part. A Quantified Statement may optionally have a label so it can be referenced directly later. Quantified Statements, when applied, provide new information to the Type of a Term. When a Quantified Statement is applied, it must match the pattern of its application context. An application context consists of a Term and a Type, which is then compared to the Term and Type of the Statement. These Term x Type relations form the basis of strict reasoning for LSTS.
forall @inc_odd x: Odd. Even = x + 1;
forall @dec_odd x: Odd. Even = x - 1;
forall @inc_even x: Even. Odd = x + 1;
forall @dec_even x: Even. Odd = x - 1;
((x: Even) + 1) @inc_even : Odd
Trivial Proofs
Some proofs are so simple that just stating the goal is enough for our system to infer the validity of the proof. Most commonly this happens with value manipulation where constant folding will infer that a statement is true without any further work from the programmer.
forall x:Integer. x + 0 == x;
forall x:Integer, y:Integer. x + y == y + x;
forall x:Integer, y:Integer, z:Integer. x + y + z == z + y + x;
Q.E.D.
Proof of the Irrationality of the Square Root of Two
Proposition: The square root of 2 is irrational. Proof:
let $"/"(x:X, y:Y): X/Y;
let $"*"(x:X, y:Y): X*Y;
let square(x:X): X*X;
type Pt; let p:Pt;
type Qt; let q:Qt;
let sqrt_of_two: Pt/Qt;
square(sqrt_of_two) * square(q): Pt*Pt; //2 * q*q = p*p
square(p) / square(sqrt_of_two): Qt*Qt; //p*p / 2 = q*q
p / square(sqrt_of_two) : ?/(); //2 is a factor of p
We define rational division, multiplication, and square of whole numbers. Whole numbers are proposed at the type level, rather than at the term level. Any whole number may be represented as a type T.
Let p be a whole number Pt. Let q be a whole number Qt. Suppose the square root of 2 can be represented as a rational number, having numerator Pt and denominator Qt.
By definition the square of the square root of 2 should be equal to 2. Therefore 2 is equal to (Pt * Pt)/(Qt * Qt). By the following algebraic manipulation we show that 2 must be a factor of p. (Pt * Pt)/(Qt * Qt) is not a factor of p. Therefore the square root of 2 cannot be rational.
Q.E.D.
Proof of the Infinitude of Prime Numbers
Proposition: There is an infinite number of Prime numbers. Proof:
let primes:Prime[];
let p = primes.product() + 1;
forall d:primes. p%d == 1;
Suppose that there are a finite number of Prime numbers. A Prime number is any Integer greater than 1, not evenly divisible by other Prime numbers.
Let p be the product of all Prime numbers plus 1.
No Prime is a factor of p. Therefore p is a Prime number. p is not in the list of Prime numbers.
Our supposition that there is a finite number of Prime numbers leads to a contradiction. Therefore the Prime numbers must be infinite.
Q.E.D.
Strict Proof Verification
It is straight-forward to write a mixed proof verifying some aspects of the proof and leaving the rest to plain English for the reader. For those so inclined, it is also possible to verify the entirety of the proof in LSTS. The first step for writing a strict proof is to declare the assumptions and the conclusion.
The Assumptions and Conclusion method of starting a proof fits conveniently into the form of a function definition. The arguments to the function become the assumptions. The return value of the function becomes the conclusion.
let some_proof <A:Integer,B:Odd,C:Prime> (
assumption1: Property1<A,B>,
assumption2: Property2<C> ):
Conclusion<A,B,C> = {
//Here we will derive the conclusion
//starting from the assumptions
}
This code snippet can be read as
- for all quantified variables in the assumptions: A, B, C
- assume the given properties: Property1<A,B>, Property2<C>
- show that there exists a concluding result: Conclusion<A,B,C>
There is a strong relation between proofs and programs. This is a very direct relationship between proof objects and simple function declarations.
Proof Objects
In strict mode, LSTS will also produce a proof object. A proof object is a program that, when run, will attempt to reduce all logical statements made in the LSTS program. The proof object runs on a much simpler kernel of logical statements, so it may help expose some flaws that occur in higher-level logic. When run, the proof object will produce True, False, or diverge. Currently, for proof objects, we are targeting languages that closely resemble Lambda Calculus to make the translation easier.
// Creates a tree with `2^n` elements
(Gen 0) = (Leaf 1)
(Gen n) = (Node (Gen(- n 1)) (Gen(- n 1)))
// Adds all elements of a tree
(Sum (Leaf x)) = x
(Sum (Node a b)) = (+ (Sum a) (Sum b))
// Performs 2^n additions in parallel
(Main n) = (Sum (Gen n))
Justified Algebraic Manipulation
When algebra is used to manipulate constants, it comes with a cost. By default, any algebraic manipulation is permitted. This means that arbitrary jumps towards proof objectives are permitted. In strict mode, algebraic manipulation must be justified.
Algebra in Strict Mode
By default arbitrary algebra is permitted.
[if a%2==0 then 1 else 2]\[a%2|1] : [if 1==0 then 1 else 2] : [2]
In strict mode, the algebraic substitution operator \ is only permitted when the rule applied appears in an attached invariant.
(x:Odd) \ [self%2|1]
This restriction is necessary to construct sound proof objects. Without the restriction, it would be easy to jump to arbitrary conclusions.
Skolemization in Proofs
Skolemization is, rougly speaking, the practice of writing existential quantifiers as arrow types. This helps simplify our type system by eliminating existential quantifiers.
forall x:X. exists y:X. x + y == x^2
becomes
let g: X => X;
forall x:X. x + g(x) == x^2
In the Assumptions and Conclusions method of proof verification, the skolem function goes after the assumptions. This is equivalent to saying "forall universal quantifiers and some assumptions, there exists a skolem function".
let some_proof <A:Integer,B:Odd,C:Prime> (
assumption1: Property1<A,B>):
Conclusion<A,B,C> = {
let skolem1: (A,B,C) -> Integer;
//Here we will derive the conclusion
//starting from the assumptions
}
Most General Unifier
The Most General Unifier of two type expressions is an intersection of their statements. The MGU is sort of like a Least Common Denominator for logical expressions. MGU is used to unify type variables that appear in multiple places.
let $"if"(cond: Boolean, branch1: T, branch2: T): T
In the above definition the two branches are unified by a single type variable. This single type variable cannot be too specific. Every statement that appears in branch1's version of T must also appear in branch2's version of T. New information should not be introduced in between these branches. The best result of unification here will always be the MGU of branch1 and branch2. This MGU result then goes on to become the return type of the if statement.
MGUs by example
A type expression, intersected with itself, will always yield itself, and its full self.
? | ? = ?
A | A = A
Aa | Aa = Aa
Aa + Bb | AA + Bb = AA + Bb
Aa -> Bb | Aa -> Bb = Aa -> Bb
(Aa,Bb) | (Aa,Bb) = (Aa,Bb)
Aa*Bb | Aa*Bb = Aa*Bb
Aa/Bb | Aa/Bb = Aa/Bb
[1] | [1] = [1]
A plural type expression, intersected with another type expression, will yield its intersection.
? + Aa | ? = ?
Aa + Bb | Aa = Aa
Aa -> Bb + Cc | Aa -> Bb = Aa -> Bb
(Aa,Bb) + Cc | (Aa,Bb) = (Aa,Bb)
Aa*Bb + Cc | Aa*Bb = Aa*Bb
Aa/Bb + Cc | Aa/Bb = Aa/Bb
[1] + [x] | [1] = [1]
A nested plural type expression, intersected with another type expression, will yield its nested intersection.
{? + Aa} -> Aa | ? -> Aa = ? -> Aa
{? + Aa} -> Aa | Aa -> Aa = Aa -> Aa
({Aa + Bb},Cc) | (Aa,Cc) = (Aa,Cc)
{Aa + Bb}*Cc | Aa*Cc = Aa*Cc
{Aa + Bb}/Cc | Aa/Cc = Aa/Cc
There are several special cases.
A/() | A = A
Any failure to unify in a compound type yields the bottom type.
Aa -> Aa | Aa -> Bb = {}
(Aa,Bb) | (Aa,Cc) = {}
Aa*Bb | Aa*Cc = {}
Aa/Bb | Aa/Cc = {}
Implication Unifier
The Implication Unifier of two type expressions is a directed intersection of their statements. Implication is similar to Most General Unifier with the added constraint that information can only flow from left to right.
Implication by example
A type expression, implied with itself, will always yield itself, and its full self.
? => ? = ?
A => A = A
Aa => Aa = Aa
Aa + Bb => AA + Bb = AA + Bb
Aa -> Bb => Aa -> Bb = Aa -> Bb
(Aa,Bb) => (Aa,Bb) = (Aa,Bb)
Aa*Bb => Aa*Bb = Aa*Bb
Aa/Bb => Aa/Bb = Aa/Bb
[1] => [1] = [1]
The left-hand-side can narrow to unify. The right-hand-side cannot narrow.
Aa + Bb => Aa = Aa
Aa + Bb => Bb = Bb
Aa + Bb + Cc => Aa + Bb = Aa + Bb
Aa => Aa + Bb = {}
Bb => Aa + Bb = {}
Aa + Bb => Aa + Bb + Cc = {}
Components of compound types can narrow also.
{Aa + Bb} -> Aa => Aa -> Aa = {}
Aa -> {Aa + Bb} => Aa -> Aa = Aa -> Aa
Aa -> Aa => {Aa + Bb} -> Aa = Aa -> Aa
Aa -> Aa => Aa -> {Aa + Bb} = {}
({Aa + Bb},Aa) => (Aa,Aa) => (Aa,Aa)
(Aa,Aa) => ({Aa + Bb},Aa) => {}
{Aa+Bb)*Aa => Aa*Aa => Aa*Aa
Aa*Aa => {Aa+Bb}*Aa => {}
{Aa+Bb)/Aa => Aa/Aa => Aa*Aa
Aa/Aa => {Aa+Bb}/Aa => {}
There are several special cases.
A/() => A = A
A => A/() = A
Any failure to unify in a compound type yields the bottom type.
Aa -> Aa => Aa -> Bb = {}
(Aa,Bb) => (Aa,Cc) = {}
Aa*Bb => Aa*Cc = {}
Aa/Bb => Aa/Cc = {}
Parameter Unification yields the Most General Unifier of all substitutions. Parameters are invariant with respect to arrow types etc.
(Aa,Aa) -> Aa | (X,X) -> X = (Aa,Aa) -> Aa
(Aa,Bb) -> Cc | (X,X) -> X = {}
({Aa+Bb},{Aa+Bb}) -> {Aa+Bb+Cc} | (X,X) -> X = ({Aa+Bb},{Aa+Bb}) -> {Aa+Bb}
({Aa+Bb},{Aa+Cc}) -> {Aa+Bb+Cc} | (X,X) -> X = (Aa,Aa) -> Aa
({Aa+Bb},{Aa+Cc}) -> {Bb+Cc} | (X,X) -> X = {}
Metaprogramming
LSTS is a thin layer of syntax and logic. Almost all of this thin layer is also exposed to the programmer to manipulate directly. The basic data structure of an LSTS program is the Abstract Syntax Tree (AST). The AST stores a structured representation of the program as defined by its syntactic definition. At the most general level we can refer to an LSTS program as an AST Graph. This graph data structure contains all the terms and their relations to one another.
Some other features available to programmers are things like namespaces, types, and kinds. Each of these data structures can be obtained and referenced through several APIs. When used in combination some impressive features can be developed quite naturally. Many of the LSTS Pearls use metaprogramming.
Traversing the AST
The basic data structure of metaprogramming is the Abstract Syntax Tree (AST) and its graph. The AST Graph contains a sparse representation of all Terms and their relations to one another. Here is an example program that accesses the AST Graph.
let ast_edge_count() = sum(
for $s in $
for $e in $s
yield $e.length
);
Here $
is the set of all symbols defined in the current namespace.
Each $s
is a top-level AST node.
Each $e
is a local-level AST node.
$e.length
is the number of outgoing references to other AST nodes.
Keep in mind that the AST Graph may change as the program is analyzed. The most common change to the AST Graph is for new nodes to be added.
Metaprogramming with Types and Kinds
Types and Kinds can be accessed by the programmer for metaprogramming.
let unique_type_count() = count(
for $s in $
for $e in $s
yield $$e
);
Here we count the number of unique Types in a program.
$
is the set of all symbols defined in the current namespace.
Each $s
is a top-level AST node.
Each $e
is a local-level AST node.
$$e
is the type of $e
.
Then the count
function counts the number of unique types in the list.
let unique_kind_count() = count(
for $s in $
for $e in $s
yield $$$e
);
Here we count the number of unique Kinds in a program.
$
is the set of all symbols defined in the current namespace.
Each $s
is a top-level AST node.
Each $e
is a local-level AST node.
$$$e
is the kind of $e
.
lsts Command Line Interface
LSTS is distributed as a Command Line tool. The program processes files ending with the .tlc file extension. .tlc stands for Typed Lambda Calculus. To find a list of available commands, type lsts help.
lsts help
parse [filenames] -- parse files but nothing more
check [filenames] -- parse and typecheck files
build [filenames] -- compile provided files as a program
run [filenames] -- execute provided files as a program
lsts parse Documentation
lsts help
parse [filenames] -- parse files but nothing more
check [filenames] -- parse and typecheck files
build [filenames] -- compile provided files as a program
run [filenames] -- execute provided files as a program
lsts check Documentation
lsts help
parse [filenames] -- parse files but nothing more
check [filenames] -- parse and typecheck files
build [filenames] -- compile provided files as a program
run [filenames] -- execute provided files as a program
lsts build Documentation
lsts help
parse [filenames] -- parse files but nothing more
check [filenames] -- parse and typecheck files
build [filenames] -- compile provided files as a program
run [filenames] -- execute provided files as a program
lsts run Documentation
lsts run [filenames]
This command compiles the provided files with the JIT compiler and runs them. The warm up time for JIT compilation before execution should be reasonable, usually taking less than a second.
Soundness Guarantees
Soundness is defined here as "only True statements can be proven".
Currently there is no constructive proof of the logical soundness of the type system. Similarly there is currently no proof that the type system is strongly normalizing.
While many of these things may improve with time, there are some unique challenges that LSTS faces. Most of these problems stem from the fact that the user is allowed to ascribe multiple types to terms.
Why are multiple types a problem?
When you have multiple types, it is possible to create a type signature that can only be satisfied by multiple values. LSTS does not permit multiple valued terms, and the suggestion is fairly absurd. What should happen in this case is that LSTS should issue a Type Error and terminate. Currently this does not always happen.
For example, a term can not be both an Integer and a String. The following code can't be satisfied.
let x: Integer + String;
The previous example may seem trivial, but this problem is not always so obvious. Take for example the exclusive definitions of Even and Odd.
let x: Even + Odd;
No single number is both Even and Odd. However, to prove this we need to do some algebra with the definition of Even and Odd. The contradiction here, or in an even more contrived contradiction, is not trivial.
So what do we do?
LSTS issues Type Errors when it sees a problem. It may need some help in seeing the contradiction. This is what extended proofs are for. There is no end-all for squashing contradictions here.
Kinded Proofs
Kinds allow the program to setup a "cleanroom" environment for the type system. The canonical example of polykinding is for the Unit kind.
The International System of Units defines a unit as a Number combined with a Unit. In LSTS this can be modelled as the number being a Term and the unit being a Unit. The SI system additionally defines several units as "base units". All units that are not base units can be written as a compound of base units. Examples of base units are Seconds or Metres. Examples of non-base units are Joules or Teslas.
In LSTS it is convenient to mark base units as normal. A normal Type can be preferred during type conversion. For example, if a unit of Speed is multiplied by a unit of Time, it will result in a unit of Distance. When we look closer at these units we see that Speed is a unit of Distance over Time. It is only within this normal form that we can discover that Speed * Time = Distance.
Kinded Projection
If you want to work within a Kind and only within that Kind then you should use Kind Projection. A compound Type x: Real + Second can be narrowed into a Unit with double colon syntax: x :: Unit. This will leave the term with only the type Second, stripping the variable of its other associated Types.
Benchmark Statistics
Here we benchmark the compiler on different stress tests.
benchmark | time |
---|---|
big file 10k loc | 22.937 ms |
Here we benchmark the default evaluation strategy (L1) on different stress tests.
benchmark | language | time |
---|---|---|
1M 2^20 | Rust | 4.8860006 |
1M 2^20 | L1 | 3.8207040 |
1M 2^20 | Ocaml | 2.5900000 |
note: Rust is compiled without optimizations enabled because it constant folds the entire benchmark
Contributors
Here is a list of the contributors who have helped improving LSTS.
- Andrew Johnson andrew-johnson-4
- Diego Melo POWRFULCOW89
- XChy XChy
If you feel you're missing from this list, feel free to add yourself in a PR.