Source :

A. Jesse Jiryu Davis

Current and Future Tools for Interactive TLA+

This is a summary of a talk I gave at the 2021 TLA+ conference with my MongoDB colleague Samyukta Lanka.

Update: See this Twitter thread for important corrections. Many of the features we asked for are already possible, but obscure.

Table Of Contents

Precise vs. holistic

In our experience there are two ways to understand a system: precisely or holistically. (This isn’t a super well-developed philosophy, just our observation.) The precise way of understanding involves well-defined questions like, “Does the system obey a particular invariant or property?” There are lots of powerful tools for this; this is what TLA+ and TLC are for.

The holistic way of understanding involves vague questions like, “Does the system generally conform to my theory of the system?” Holistic understanding usually requires interaction or visualization. Samy and I are programmers, and programmers use tools when we want to understand code: We can interact with a program by trying some inputs and watching the program run. We observe it with debugging, logging, and profiling. We can use visualization tools like flame charts and call graphs.

In contrast, TLA+ feels like math to us. TLA+ writers spend less time trying things out, and more time reasoning, which is slower and prone to different sorts of mistakes. So, our proposal in this talk is to make TLA+ more like programming: more interactive and visual. We’ll review the existing tools and suggest new ones.

Questions about your spec

Spec authors ask different questions at different times, and different tools are best for answering different questions. For example:

  • Does my spec imply my invariants / properties?
  • Why is my invariant / property false?
  • What does this TLA+ expression mean?
  • Is the spec generally behaving as intended?
  • Why isn’t my action enabled?
  • How does this edit affect the spec’s behavior?
  • How do I use TLA+ to communicate behaviors to other people?

The first question, “Does my spec imply my invariants / properties”, is the most thoroughly supported. It’s the main purpose of model-checking and proofs, and it’s where most of the research seems to go. TLC and TLAPS are production-quality, and there are new tools like Apalache.

As we go down the list there are vaguer questions like “Is the spec generally behaving as intended?” or “How do I use TLA+ to communicate behaviors to other people?” These are decreasingly well-supported. The tools are only prototypes or don’t exist at all.

Why is my invariant or property false?

You ask this question when you have a wrong hypothesis about your spec: you either didn’t write the spec you thought you did, or the spec doesn’t behave as you think. What precisely is the mismatch between your hypothesis and reality? How can the tools help you find it? Specifying Systems § 14.5.2 “Debugging a Specification” is a particularly good section about this, so we won’t repeat it, but we’ll try to add a bit.

Error traces are the main technique for answering this question. You can view the plain-text trace in a .out file, or a GUI trace in the TLA+ Toolbox or VS Code.

Samy’s and my colleague at MongoDB, Siyuan Zhou, made an alternative called tla-trace-formatter:

This is a nice HTML version of a TLA+ trace. It’s obviously easier to read than the plain-text output from TLC. And unlike the Toolbox or VS Code GUIs, you can publish this on the web and share it with colleagues to communicate about spec behaviors. We think that “export to HTML” would be a nice feature for the Toolbox.

What does this TLA+ expression mean?

If you want to experiment and see what some constant TLA+ expression evaluates to, the new REPL is useful. Our former colleague Will Schultz contributed it last year (before he left MongoDB for grad school). Samy and I think it’s especially helpful for beginners, but it’ll be occasionally valuable for any TLA+ user. Download a recent TLC build (the latest release is December 2020 and doesn’t have it) and invoke the REPL thus:

$ java -cp tla2tools.jar tlc2.REPL
Enter a constant-level TLA+ expression.

Let’s say you wanted to understand Bags better. (The REPL preloads some standard modules like Bags and Sequences.)

(tla+) SetToBag({"a", "b"})
[a |-> 1, b |-> 1]
(tla+) SetToBag({1, 2})
<<1, 1>>

When we were playing with this, we were surprised to see that SetToBag({1, 2}) is a tuple. We had to think about it for a second, and it felt like a discovery, something we didn’t know we didn’t know. That’s the great advantage of interacting with a system instead of just reasoning about it.

The REPL is a prototype. It doesn’t have syntax highlighting. We hear that it can extend and instantiate user-written modules, but it’s not well-supported and it crashed when we tried it. If it were more well-developed the REPL would make TLA+ much more interactive.

Is the spec behaving as intended?

Samy and I have this vague question when we start writing specs. It’s not a precise assertion, it’s a general lack of confidence that we’ve specified what we think we’ve specified, without mistakes. If it were a program we’d add print statements or breakpoints, give it some input and see what it does. We don’t have a specific hypothesis like “the program obeys this invariant or that temporal property”, we just watch it run and see if anything looks wrong to us. But print output is confusing in TLA+ model-checking.

Let’s take the HourClock module from Specifying Systems and add a print expression to see how the “hour” variable changes.

HCini == hr \in (1 .. 12)
HCnxt ==
    /\ hr' = IF hr # 12 THEN hr + 1 ELSE 1
    /\ PrintT(<<"hr is ", hr, "hr' is", hr'>>)
HC == HCini /\ [][HCnxt]_hr

Here’s the output:

<<"hr is ", 4, "hr' is", 5>>
<<"hr is ", 3, "hr' is", 4>>
<<"hr is ", 2, "hr' is", 3>>
<<"hr is ", 5, "hr' is", 6>>

We expected hr to begin at some value and increase sequentially like “4 5 6”, but instead we get a random-looking order of states. The model-checker is doing a parallelized breadth-first search of states, not a straightforward sequence of steps like a program would, so we don’t see one behavior in order.

TLC’s simulation mode makes print more useful. Enable it in the Toolbox’s “TLC Options” pane and set max traces to 1. Oddly, you also have to set “Number of worker threads” to 1. Instead of this UI, we think a Toolbox button that says “show me one random behavior” would be a useful feature for spec developers to get quick feedback. Anyway, once simulation mode is configured you get what you want:

<<"hr is ", 4, "hr' is", 5>>
<<"hr is ", 5, "hr' is", 6>>
<<"hr is ", 6, "hr' is", 7>>
<<"hr is ", 7, "hr' is", 8>>

Each time you run TLC, it gives you one example behavior. The hour clock starts at an arbitrary value, in this case 4, and then you see consecutive states of one behavior, so now print is actually useful. This is a great way to see if the spec is generally behaving according to your theory of the system, without asking any precise questions.

But, a random sample of traces may not cover the part of the spec you’re most worried about. You can home in on worrisome behaviors by overriding the initial predicate:

Either hand-write it, or if you have an error trace, the Toolbox has a feature to copy any state of the trace into the initial predicate.

Visualizing the state graph

Besides print expressions, another tool that’s tempting for beginners is visualizing the state graph with Graphviz. The Hour Clock’s graph looks like this:

We’d prefer a nice circle of 12 states. (It would look like a clock!) Tweaking the TLC output and using neato produces:

Let’s see if the state graph is useful for answering, “Is the spec behaving as intended?” Imagine we incorrectly add an AM/PM indicator to the clock. It starts with am = TRUE, and it flips at noon and midnight.

\* Incorrectly add am/pm to HourClock
EXTENDS Naturals
HCini == hr \in (1 .. 12) /\ am = TRUE
HCnxt == 
    /\ hr' = IF hr # 12 THEN hr + 1 ELSE 1
    \* Oops, AM/PM should flip at noon/midnight, not 1 o'clock.
    /\ am' = IF hr = 12 THEN ~am ELSE am
HC == HCini /\ [][HCnxt]_<<hr, am>>

But we have a bug. AM actually flips when the hour goes from 12 to 1, not when it goes from 11 to 12 as intended. If we were debugging a program we might step through it and see our mistake. With TLA+, will the mistake be obvious from the state graph?