11.9.22

Cyber Ontology Stamp Collection

Just a short and shallow note about knowledge representation, specifically in the domain of cybersecurity. As usual, I am the sole owner of my opinions.

Ontology and Knowledge Graphs

Ontology is the word you use when you get tired of saying "machine-readable representation of concepts" too many ways. The machine-readable part is important: over the decades, everyone working in large organizations runs into the need for modeling domains in software, but software devs need to work with crisp, stable definitions. The people working in the domain (the domain experts) do not have ready and sometimes keep deliberately flexible. So defining an ontology is not a philosophical exercise in the nature of things, but actually tied to the goal of helping the domain experts do their work.

An ontology is like a controlled vocabulary. As software devs know, there is rarely a universally accepted standard machine-readable representation. Everyone defines their own databases, schema definitions etc. Therefore, ontology enters the picture when there are multiple (typically large) independent organizations and we are interested in exchanging information. Unlike XML or file formats, ontology classes of objects (entities) and binary relations. This comes from description logic as well as semantic web / RDF which I won't go into.

The essential thing is that in the description logic / semantic web / etc approach, knowledge representation can be separated into terminology definitions (TBox) and assertions/statements (ABox). The terminology definitions are similar to a schema or object model and definewhat kind of objects and relations we can talk about. For instance, we may want to talk about blog, author, post and connections such as isOwner, wrote, linksTo. The assertion "Author Burak owns Blog Uncut Blocks of Wood" can then use these definitions.

Of course none of this really works really well. Barry Smith, an expert on ontology, in the first 10 mins of his entertaining talk Basic Formal Ontology (BFO), where he goes "people go and create thousands of ontologies and reproduced the silos that ontologies were supposed to overcome". This is reality.

Cybersecurity: STIX and ATT&CK

Let's shift gears and pick an interesting domain now, one where we really all are interested in progress. Would it not be nice if the various words that people in charge of cybersecurityare throwing around would be organized in a consistent manner and actionable, useful information ("intelligence") shared between organizations? Would this exchange information, and rapidly so, and realize the dream of having good actors build a united front against the bad guys?

Let's not get ahead of ourselves. If you listened to Barry Smith, using ontologies is not the silver bullet. It is nevertheless interesting to see what organizations are throwing out there. And if you are here for the knowledge representation, you can still see cybersecurity as an interesting case study of how these things can play out in practice.

When we talk about controlled vocabulary, we are talking about a form of standardization. Richard Struse (Homeland Security, MITRE, Oasis, ...) saw the need to and opportunity for defining a controlled vocabulary that goes by the name Structured Threat Information Expression (STIX™). The MITRE ATT&CK "framework" is a collection of objects describing attacker behavior.

STIX is defining the terminology, the objects of interest.

For the technically minded, there are JSON Schemas for the domain objects and likewise for the relations. It is an exercise to build code that reads these things. There is also a protocol called TAXII for realizing the information exchange.

STIX Objects

We have 19 domain objects, and relations between them. I will not list them all, but give an example of the "targets" relation: Let's consider a statement: The attack-pattern 'SQL Injection' targets the identity 'domain administrators'.
  • We have objects of type attack pattern
  • We have objects of type identity (more on this later)
  • We can have a relation (directed edge) of type targets between an attack-pattern object and an identity object.

You get the idea. There are various relationships, and in the end we can imagine a graph with objects being the nodes and the relations being the edges.

MITRE ATT&CK

As mentioned: MITRE ATT&CK is a knowledge base that makes use of the STIX vocabulary. It is a collection of statements that uses the terminology. It is called a knowledge base because unlike relational database, the knowledge and added value comes from the connections.

It seems that the cybersecurity industry picked up MITRE ATT&CK as a buzzword to signal features/capability, something to enhance credibility of cyber services and products. It is very interesting that the vocabulary definitions - the ontology - is not something that generated as much buzz. To some extent, this is expected: the domain model is something left to domain experts, and the service industry sells the expertise and need not burden the customers with "the details". If you want to get into cybersecurity, you have the option of learning from the ATT&CK "framework", but also to pick up the lingo from the underlying interchange format. It is not meant as a good resource for learning for beginners, but the open source style availability of all this is clearly a good thing.

Socio-technical limits of knowledge exchange

This approach to knowledge graph is entirely standard, but an obvious thing to point out is that it is very limiting that relations are binary. But since it is standard in these approaches to ontologies, let's not get hung up on this.

We have a shared/standardized vocabulary, what is not to like? Why did we not crush the bad guys yet? There are some practical obstacles to "information sharing" in general, as Phil Venables lays out in a recent post Crucial Questions from Governments and Regulators. There is clearly more obstacles such as trust, but no need to go further.

Let's instead narrow this down to organizations that are capable of doing their own software and who are committed to keeping threat intel. We get to more specific socio-technical obstacles. This shines new light on the silo problem already mentioned above.

Challenging the vocabulary. Take the identity object type. The docs say "Identities can represent actual individuals, organizations, or groups (e.g., ACME, Inc.) as well as classes of individuals, organizations, systems or groups (e.g., the finance sector)." What if I want to organize my threat intelligence that doesn't lump them all together?

Reasonable arguments can be made to defend this choice: that a more precise ontology that distinguishes between groups of people or roles is beyond the scope of STIX. That this is for exchange in the end and therefore some kind of mapping is to be expected. However, it looks like there are some real limits that are not easily overcome by any machine-readable representation.

Are the statements useful? In the end, all this information is for people. People who may work with different capabilities, different models, different ideas on granularity and naming. What if the organization supposed to consume this information cannot do anything with the attack pattern "SQL Injection"? Or, they could very much but chooise to not represent SQL Injection as its own thing, because it is useless for their purposes? Every organization is subject to their own boundaries. A list of attack patterns is not by itself a systematic approach: it is literally just a list of stuff.

What are the processes that will make use of this information? In any case, it seems that adaptation, mapping and possibly human curation is necessary if one wants the exchange information to have practical benefits.

Detection

There are other types of information that can be shared. For example, here are collections of detection rules:

At least here, it is very clear what the process is: enhance security monitoring so that it covers the relevant (!) part of these open source repositories. Once again: not everything in there might be relevant. The same curation/adaption may be necessary, and maybe mapping the log formats and internal representation so these rules make sense. However, due to the narrower scope of detection, it is immediately clear what one would use these things for.

Wrap up. Socrates strikes.

I promised a shallow post, and here we are. Knowledge representation is exciting technical challenge, and one that is relevant to many industries and it is happening, picked up in many large organizations (e.g. the use of ontology in life sciences, genetics, or the military). Alas, we can also see how using ontology as a domain model comes with limitations: relations are binary, and representing knowledge has to be tied to some actual human process in order to be useful.

I will end with this Socrates quote (source Socrates on the forgetfulness that comes with writing):

Well, then, those who think they can leave written instructions for an art, as well as those who accept them, thinking that writing can yield results that are clear or certain, must be quite naive and truly ignorant of [Thamos’] prophetic judgment: otherwise, how could they possibly think that words that have been written down can do more than remind those who already know what the writing is about?

I can get excited about information sharing when it comes to education, "speaking the same language." When it comes to practical cybersecurity processes, the usefulness of machine-readable knowledge seems to stand and fall with the tools and how they support the people who need to make use of this information. I have doubts that ontology can serve as the defining elements of a community or movement of the good guys. However, it also seems better than nothing in a field like cybersecurity that needs to deal with a fast-changing environment of evolving technology and adversaries. Exchanging threat intelligence with STIX may help those who "already know what the writing is about," and after due mapping, adaptation, curation to their own model, serve as one step towards coordinated response.

7.6.22

There is only stream processing.

This post is about what to call stream processing. Strangely enough, it has little do with terminology, but almost all with perspective.

If you are pressed for time (ha!), the conclusion is that "stream processing vs batch processing" is a false dichotomy. Batch processing is merely an implementation of stream processing. Every time people use these as opposite, they are making the wrong abstraction.

Can a reasonable definition be general?

A reasonable definition of data stream is a sequence of data chunks, where the length of the sequence is either "large" or unbounded.

This is a pretty general definition, esp if we leave open what "large" means.

  • Anything transmitted over TCP, say a file or HTTP response
  • a Twitter feed
  • system or application logs, say metadata from any user's visit to this blog
  • the URLs of websites you visit
  • location data acquired from your mobile phone
  • personal details of children who start school every year

It also stands to reason that stream processing is processing of a data stream. A data stream is data that is separated in time.

When a definition is that general, engineers get suspicious. Should it not make a difference whether a chunk of data arrives every year or every millisecond? Every conversation about stream processing happens in a context where we know a lot more about the temporal nature of the sequence.

The same engineers, when dealing with distributed systems, have no problem refering to large data sets as a whole, even if they fully know that it may be distributed over many machines. We may well call a large data set data that is separated in space. A data stream could also be both, separated in time and space (maybe we could call that a "distributed data stream").

At the warehouse

Where am I going with this? Let's quickly establish that "batch processing" is but a form of stream processing, in the above definition.

What do people call batch processing? The etymology of this goes back to punch cards, but it is not about those.

Batch processing is frequently found in data warehousing, or extract-transform-load (ETL) process. This is a setup that has been around since the 1970s, and this does not make it bad. What is essential is that data is periodically ingested (extract), say in the form of large files. It is then turned (transform) into a uniform representation that is suitable for various kinds of querying (load).

Accumulating data and processing it periodically, we have seen this before. Does the data fit the general definition of data stream? Surely, since there is a notion of new data is coming in.

What could be alternative terminology? Tyler Akidau uses the word "batch engine" in this post on stream processing. So the good old periodic processing could be called "stream processing done with a batch engine."

I promised that this is not only about terminology. When did people first feel the need to distinguish batch processing from stream processing?

Data stream management systems

The people who systematically needed to distinguish processing of data streams from simply large data sets were the database community. Dear academics, I don't mean to hurt any feeling but I will just count all papers on "data stream processing" or "complex event processing" as database community.

The database researchers implemented efficient evaluation of a (reasonaly) standard, high level, declarative language: relational queries (SQL). Efficient evaluation and performance meant to make best use of available, bounded resources. As part of this journey, architectures appeared that look very similar to what streams of partial results (such Volcano, or iterator model).

Take a look the relational query (SELECT becomes $\pi$, WHERE becomes $\sigma$, JOIN becomes $\Join$). What would happen if we flipped the direction of the arrows? Instead of bounded data like the "Employee" and "Building" tables, we could have continuous streams that go through the machinery.

Exercise: draw the diagram above, with all arrows reversed, and think about how this could be a useful stream processing system. Maybe there is a setup procedure that has to happen in each building before an engineer starts working from there and they would like to know about arrivals and new departures.

Data stream management system (DSMS) became a thing 20 years ago when implementors realized that most of their stuff will continue to work when tuples come in continuously, mutatis mutandis.

  • Michael Stonebraker, of PostgreSQL fame, built a system called Aurora and founded a company StreamBase systems
  • At Stanford they wrote about the Stanford Stream Data Management System and Continuous Query Language (CQL)
  • (I'm not going to do a survey here)

Academic authors will delineate their field using descriptions as follows.

  • "high" rate of updates
  • requirement to provide results in "real time" (bounded time)
  • limited resources (e.g. memory)

We are now getting closer to the more specific meaning people attach to "stream processing:" not only do we receive the data in chunks at different times, we also need to produce results in a "short" amount of time, or with "bounded" machine resources.

In order to understand why database folks streaming, one shoud know that evaluating queries in DBMS with high performance is also a form a stream processing. When a user types a SQL query at the prompt (SELECT ... FROM ..., the all-caps giving that distinctive pre-history feel), the result should come in as quickly as possible, and there are surely some limits on the machine resources. So generalizing query evaluation to continuosly arriving data really was a logical next step to overcome limitations.

Interlude: Windowing and micro-batching

If you did the exercise, you may wonder how a join is supposed to work when we take a streaming perspective? There are multiple answers.

A particular nice and useful scenario is if the joins can be considered as a simple lookups, for example when each row in a data stream is enriched with something that is looked up via a service.

Another scenario is the windowed stream join: here we consider bounded segments (windows) of two data stream and join what we find in those. Usually, this requires some guarantees: the streams are a priori not synchronous. They may either be synchronous enough or one may use some amount of buffering and periodically process what is in the buffer.

Wait - did I just write "periodic processing"? That is right, so it looks like when stream processing is hardcore enough, it contains periodic processing again. This is where usually people will say things like "micro-batch." There are simply scenarios in stream processing that cannot be done without periodic processing (everything that involves windows in processing time).

A horizontal argument

Now, the words "limited machine resources" meant something different 20 years ago. We can (and do) build systems that involves machines and communications, sharding or "horizontal scaling". From the good old MapReduce to NoSQL and Craig Chambers FlumeJava (which lives happily on as Apache Beam) to Spark and Cloud Dataflow, there is a series of systems, APIs that deal with stream processing in a distributed manner.

Tying the knot

The irony of talking about "batch processing" today is that periodic processing of large files, also involves distributed stream processing underneath. When a large input data set is processed with a MapReduce, it is distributed across a set of workers that map-phase locally produce partial result. The shuffle phase then takes care of getting all partial results to the right place for the reduce-phase. The "separation in space" is also a "separation in time:" a particular partition can be "done" before others, which means that results is a result stream.

Depending on the level at which one is discussing a system, the performance expectations we have, the trade-offs, one may be able to ignore spatial and temporal separation. It seems that the recognizing the temporal separation always brings advantages.

8.4.22

Polaroids from Swiss Cyber Security Days SCSD Day 1

Here are a few things I took home from Day 1 of the Swiss Cyber Security Days conference (SCSD) which was two days ago. I work as a software engineer and am used to deal with technical angles, but cyber security is one of the areas where technological change impacts society at large. I made the decision to go shortly after the Russia-Ukraine war broke out; if cyber security was a pressing problem before, the new political era should make it unavoidable for society to ignore and I wanted to find out first-hand what that means.

(Note: the title is a kind of metaphor that is resolved at the end. The public domain images I include here are random things from the interwebs, not actually from the conference.)

Switzerland Officials and National Councillors

Day 1 was opened according to official protocol by Doris Fiala, member of Swiss National Council, member of the parliamentary Security Policy Committee, and president of SCSD. She greeted many of the present officials and parliamentary members. The morning featured Swiss officials inlcuding high ranking ones who are directly responsible for dealing with cyber security.

The speakers provided perspectives on Swiss national policy, the role of federal government and the armed forces. The officials spoke in national languages German and French often switching from one to the other in the same talk, and there was simultaneous translation (like in UN conferences.) A few highlights:

Florian Schütz has been acting as Swiss federal Cyber Security Delegate since 2019 and reported on strategy and outlook. He talked about the activities of the National Cyber Security Center NCSC, and how it relates to the Financial Sector security center which was founded this week in Zurich. It is a fact that cyber security means something different in every sector and that a federal center would be limited in depth, which can be addressed by creating and support sector-specific cyber security centers where decision makers from companies in that sector can participate directly. He also pointed out that most companies in Switzerland are SMB, and even if many are aware of the problem now, they do no know what concretely to do about the risk.

The majority of Swiss companies have less than 0.5M CHF revenue per year of which they might be able to set aside 3-5% for cyber security. This is not a lot! Getting companies to take the risk seriously and improve their security posture through awareness and training was picked up by other speakers.

The challenge for official institutions is that they have in general not been made aware of cyber security incidents, since companies are not incentivized to report breaches. There is a current proposal to make reporting mandatory, which would help improve visibility.

Major General Alain Vuitel is director of the Project Cyber Command of the Swiss Armed Forces, and talked about the process of establishing Cyber Command. This is a signficant step, since the army is structured into four existing commands and the Cyber Command will become the fifth one. Vuitel pointed out the importance of information in warfare and that this is a factor Ukraine, and the civil society and collateral damage from attack on communication infrastructure like satellites and disinformation campaigns.

The cyber attack on KA-SAT satellite communications network on Feb 24th, the day Russia attacked Ukraine, led to loss of maintenance capabilities for a German operator of wind turbines. Also in the weeks leading up to the outbreak of war, there were cyber attacks on Ukraining infrastructure. War is no longer armed conflict; covert operations and information war mean that in times of "peace", there are attacks going on and this accepted fact was observed in real-time. Vuitel also pointed out that modern communication also bring a new potential: every citizen with a mobile phone can act as a sensor, providing information, provided they have the freedom to act, which underlines the importance of communication infrastructure.

keyboard with lock - to stand for ransomware

Finally, he pointed out that while recruitment is challenging, Switzerland has a system of mandatory military service and militia system. This means the armed forces is able to reach large parts of the population and provide them cyber security training and, it provided them with the means to recruit their first cyber bataillon.

Judith Bellaiche, National Councillor and CEO of SWICO, described the evolution of cyber security as reflected in parliamentary discussions over the last 15 years. Political discourse in Swiss parliament reflects society and the topic of cyber security evolved from early years of getting aware, demanding reports and making sense to an increasing political demands for new official competencies and laws. Most impressively, she summarized this as raw fear: the parliamentary representatives feel that the situation is out of control and demand action, with proposals to expand the role of the state. In order to validate, representative surveys were conducted and it turned out that the general population is even more afraid and leans on expanding the role of the state. This is a political development which will likekly have consequences. It would be a major paradigm change if the role of the state was expanded to protect private companies from cyber security, however it does not look like the private sector can fix this themselves. The debate for expanding the technical and organizational competencies of the state is certain to affect Swiss society.

Practitioners and the Private Sector: Ransomware

In a podium discussion organized by the World Economic Forum on "What’s next for multistakeholder action against cybercrime," it was articulated that ransomware is the biggest topic in cyber security. Countless conpanies are victims of ransomware attacks and the conference experience report from incident responders. Jacky Fox, managing director Accenture Security, told a war story of Irish hospitals losing their entire infrastructure, doctors having to resort to pen and paper and unable to operate diagnostic machines, unable to effectively treat patients and that incident responders had to face the impact of their decisions are matters of life and death.

keyboard with lock - to stand for ransomware

Serge Droz, seasoned incident responder and director of FIRST (Forum of Incident Response and Security Teams) pointed out that case data is important. Investigating a single case of ransomware is imposible, while with data from 20-30 cases of ransomware provides basis for investigation as one can recognize patterns. He pointed out the difference in roles that in a multistakeholder action, incident responders have as first priority the return to normal conditions, while prosecution's first priority is to identify the attacker and forensics.

Maya Bundt, Cyber Practice Leader at Swiss Re and representing insurances, pointed out how the idea of data sharing is much debated but rarely gets concrete. There are many different kinds of "data" involved in ransomware attacks, from technical security-relevant data like indicators of copmromise to case-specific data, how did something happen or specific data victim company, and that in the debate this is often all mixed up. For insurances, what matters is cost, and that the height of the ransom is often only a small fraction of the cost to a company which needs to deal with data loss, getting back to business and forensics. It was pointed that the role of the insurance company is to make the risk transferable, while the risk associated with a cyber security attack is not completely transferable. Insurances do play an important role since they can demand companies to implement practices and improve their posture, and they have an interest to do so, because if too many people are attacked, the insurances get so expensive that nobody will be able to afford them anymore.

The White House: Chris Inglis on getting left of the attack

Chris Inglis currently serves as the first US National Cyber Director and advisor to the President of the United States Joe Biden on cybersecurity and of course the US persective was a highlight. His talk was a balance of rallying support and putting "cyberspace" into perspective with a conceptual framework. Cyberspace is a permanent reality that affects everyone's life. It affects whether electricity is available, whether public transportation works and whether your local store is able operate. Consequently, the defense of cyberspace is not about the defense of technical "stuff", but about the critical functions that are served by the stuff. From this, he derives three factors of cyber defense is about:
  • the stuff: already since the 60s-70s, we know about notions and challenges of technical quality
  • the people: not only the ones who are adjacent to stuff, but who work for and benefit from the critical functions that affect everyone's lives
  • the doctrine, or "roles and responsibilities" that we assign in society. He mentions the supply-chain attack SolarWinds and how many actors were aware but thought someone else is responsible, and that adversaries are actively looking out for weak doctrine.

Defense is fragmented, because cyberspace is not perceived as a universal, ambient sphere of society but as disconnected patches, which means adversaries can deal with a target one at a time but are not countered with a coordinated response.

He advocates for resilience by design, arguing that there are too many variables for anything to be called "secure." Actually defend, not the "stuff" but the critical functions, ensure that anomaly can be detected as early as possible and countered with maximal leverage, away from current practices that assume an ineffective and meaningless division of labor. He points out that cyberspace is a shared resource, and this type of defense was done successfully many times before: in the automobile industry, in the aviation industry, for food and drugs. When asked on the role of regulation, he answered that first comes a general understanding of what constitues common practice, tailored to the sector of interest, and only then there might be regulation as a last resort, preferably coordinated and not in 50 overlapping variants.

Wrapping up

There was a lot more to the conference, but I will end here. It is clear to see that society at large, including Switzerland, and is definitely aware and things will happen. I noticed people mentioned "supply chain" and also SBOM and the recent NIST standard, how to improve security posture in healthcare which depends on industry as well as certifications (security patches can apparently break certification) and much more. I picked "polaroid" as a title because just like the good old notions, practices and challenges around software quality that in Chris Inglis's words accompany the software industry since the 60s, polaroid (instant pictures) is something that comes, goes bankrupt, and comes back since we cannot really do without.

I share the optimism of some of the speakers like Serge Droz who say that after many years of helplessness, due to raised awareness and a more coordinated action, we as society should be able to tackle the ransomware phenomenon. Beyond technical questions, establishing practices and security standards will require everyone, society at large to take part in the effort.

29.3.22

The ultimate cross language guide to modules and packages

Intent \ Language go java rust npm racket
source file - - - - module
logical grouping package package module module collection
unit of release module module package
(crate)
package package

racket is special in that a racket package (*) may happily mess with multiple collections.

11.12.21

Worked example of automata runs as sheaves

This post is about "sheaf semantics," a way to model, understand, talk about dynamic phenomena. The situation is that you have some representation of such phenomena - possibly partial - and you would like to assign meaning, and are mathematical/logical model for that purpose. Here, we have such a possible mathematical model. This is a semantic technique - we do not consider the representation you may have on your hands, the data, evidence or proof, only the model.

As we will see, sheaves can help describe observations of dynamic phenomena in a precise way — even if these observation are partial, ambiguous, admit multiple interpretations. The definition of a sheaf is technical and presupposes familiarity with a bunch of abstract math. Many people who need to describe "observations of dynamic phenomena" as part of their day-job will not find it very accessible.

I am going to use automata theory as example since I assume this is understood by a lot more people (informaticians, programmers) than sheaves. There are textbooks on regular expressions and automata, but I am quite sure there are no textbooks that would connect it to sheaves. I am not advocating that automata theory be taught this way, but believe this might be a good way to introduce a method and vocabulary that one may find helpful to describe (fragments of) reality. For anyone who shares the motivation, but finds this example simplistic, Joseph A. Goguen's paper "Sheaf semantics for concurrent interacting objects" may be a good thing to read next. $\newcommand{\redq}{\color{red}{q}}$ $\newcommand{\redQ}{\color{red}{Q}}$

Warm-up: Deterministic Automata

This is (roughly) how I learnt about deterministic automata (state machines):

  • we have a set $\Sigma$ of input symbols $a, b, \ldots$ called alphabet. $\Sigma^*$ is the set of sequences $w$ of such letters (or, if you prefer, the free monoid on generators $\Sigma$), with $\epsilon$ the empty word.
  • $\redQ$ a set of states (assumed finite),
  • a transition function $\delta: \redQ \times \Sigma \rightarrow \redQ$.

We extend the transition function $\delta: \redQ \times \Sigma \rightarrow \redQ$ to a transition function $\delta^*: \redQ \times \Sigma^* \rightarrow \redQ$ as usual, meaning: $$\begin{array}{ll} \delta^*(q, \epsilon) &= q \\ \delta^*(q, aw) &= \delta^*(\delta(q, a), w) \\ \end{array} $$

This extended transition function tells us where we end up if we start in a state $\redq$ and feed it a word.

Finite state machine example with comments:  two states for door open, door closed

So far so good. Now, let's imagine that we do not only want the destination state, but also the sequence of states we passed through.

We can write a sequence of alternating state and input symbols $$[ \redq_{\color{red}{(0)}} ] \ \ a_{(0)} \redq_{\color{red}{(1)}} ~ a_{(1)} \redq_{\color{red}{(2)}} ~ \ldots ~ a_{(n-1)} \redq_{(n)}$$ such that $\redq_{(0)} = \redq_0$ and $\delta(\redq_{\color{red}{(i)}}, a_{(i)}) = \redq_{\color{red}{(i+1)}}$. Since the $\redq_0$ does not add much value, let's define a run as being this sequence, without $\redq_0$: $$(a_{(0)}, \redq_{\color{red}{(1)}}) ~ (a_{(1)}, \redq_{\color{red}{(2)}}) ~ \ldots ~ (a_{(n-1)}, \redq_{(n)})$$

The set of all possible runs counts as a "behavior" of the automaton described by the transition function $\delta$ and starting state $\redq_0$. Since $\delta$ is a function, a given word $w$ will correspond to exactly one run, not more, not less.

Let's write down the set of all runs, for a non-empty word $w$: $$ \mathrm{runs}(w) = \{ v\ \in (\Sigma \times \redQ)^*\ \ |\ \mathit{IsRun}(v)\ \}$$ where $\mathit{IsRun}$ is a constraint $v_{(i)} = (a_{(i)}, \delta(\redq_{(i)}, a_{(i)}))$ for $0 \leq i \lt n$, where $\redq_{(i)}$ is either $\redq_0$ or obtained from $v_{(i-1)}$. So far, this is not new or different from the $\delta^*$ definition above.

Presheaf

Let us take a different path to this definition of "set of runs". We call prefix domain any set $U$ of natural numbers that is downward-closed with respect to the normal ordering of natural numbers. Think of such sets to be assembled in a space $X$ like $\{\{\}, \{0\}, \{0, 1\}, \ldots, \{0, ..., n - 1\}, \ldots\} \cup \{\omega\}$ that is closed under pair-wise intersection and arbitrary union. The letter $\omega$ here stands for the set of all natural numbers, and by "arbitrary" we mean even an infinite union of prefix domains would be a prefix domain. The set $ \{0, ..., n - 1\}$ is abbreviated with $[n]$, with $[0]$ being empty set.

The conditions do not require that "all" possible prefixes are present in $X$. A space of prefix domains can be finite or have gaps, like $\{[0], [1], [2], [5]\}$.

A word of length $n$ can be described as a function from $w: [n] \rightarrow \Sigma$. Since we want to talk about the run on a word, let's consider an "observation" $U$ to be function that maps each point of the prefix domain to a pair $(a, \redq)$ at that position. These functions are the reason why we called those sets "domains". In symbols: $$\mathcal{O}(U) = \{~ f: U \rightarrow \Sigma \times \redQ ~|~ \mathit{IsRun}(f) \ \} $$ where $\mathit{IsRun}$ are the same constraints as before:

  • if $0 \in U$, then $f(0) = (a_0, \redq_{\color{red}{(1)}})$
  • if $u \in U$ and $u > 0$, then $f(u) = \delta(f(u-1))$

This really is saying the same thing as before, except that for any position $u$, we have a definition for $f(u)$ that is connected directly to $f(u-1)$, which we can do because $U$ is downwards-closed.

The functions $f$ are called sections of $\mathcal{O}$ under $U$.

Restriction and Contravariance

For prefix domains $U, V$, if we have $U \subseteq V$, then $U$ is — you guessed it — a prefix domain for a prefix of $V$. Write $i: U \rightarrow V$ for the inclusion of $U$ into $V$.

To this inclusion corresponds an operation we can do on any section $f \in \mathcal{O}(V)$, namely the restriction $f\big|_U: U \rightarrow \Sigma \times \redQ$ which is the same as $f$ but only defined on domain $U$.

This correspondence can be written as $\mathcal{O}(i): \mathcal{O}(V) \rightarrow \mathcal{O}(U)$. Note the direction of the arrow.

With a few mild conditions regarding the "identity inclusion" and composition, we can consider $\mathcal{O}$ a contravariant functor. And any contravariant functor $\mathcal{O}: X^{op} \rightarrow \mathsf{Sets}$ for topological space $X$ is called a presheaf.

Non-determinism and the Sheaf Condition

For a presheaf to be considered a sheaf it needs to satisfy the sheaf condition, (1) locality and (2) gluing. Before we get to these, we want to consider non-determinism.

A nondeterministic automaton has a transition relation $\delta \subseteq \redQ \times \Sigma \times \redQ$ instead of a function.

For figuring out the set of runs, we could now keep track of the sets of states the automaton can be on. Let's not do that and use our presheaves instead: $$\mathcal{O}(U) = \{~ f: U \rightarrow \redQ \times \Sigma ~|~ \mathit{IsRun}(f) \ \} $$ where $\mathit{IsRun}$ is very similar to the constraint we had before, just adapted to the non-determinism that comes from $\delta$ being a relation:

  • if $0 \in U$, then $f(0) = (a_0, \redq) $ for some $\redq$ with $(\redq_0, a, \redq) \in \delta$
  • for all $u \in U$, if $u > 0$ and $f(u-1) = (\_, \redq)$ and $f(u) = (a, \redq')$ then $(\redq, a, \redq') \in \delta$

The determinism is gone, as intended: we describe the set of runs, but if someone showed us a suffix of a particular run we would not (in general) be able to tell what happened before. Depending on the $\delta$, we may be able to make some inference.

Sheaf Condition

We almost have enough material to understand the sheaf condition. A set of elements $(U_i)_{i \in I}$ from our space covers $U$ if $U \subseteq \bigcup_{i \in I} U_i$.

  • locality: if $(U_i)_{i \in I}$ cover $U$ and $s, t \in \mathcal{O}(U)$ with $s\big|_{U_i} = f\big|_{U_i}$ for all $i$, then $s = t$.
  • gluing: if $(U_i)_{i \in I}$ cover $U$ and we have $s_i \in \mathcal{O}(U_i)$ such that they agree on the overlaps $U_i \cap U_j$, then there is a section $s \in \mathcal{O}(U)$ and $s\big|_{U_i} = s_i$.

For our space of prefix domains and automata runs, we shall find that it satisfies the sheaf condition (what can you say about covers for prefix domain?).

Snippet Domains and More States

The prefix domains were chosen as a simple example, and the condition may get more interesting if we consider other spaces. Consider "snippet domains" which are sets of contiguous sequences. Now the sheaf condition is more useful as it you can glue the snippets of a run together to obtain an actual run.

From automata (finite state machines), one can start generalizing to other structures. What if the state space is very large (parameterized in some way)? Or we leave certain things unspecified or asynchronous like in I/O automata.

Modeling observations as sheaves means defining a suitable domain for fragments of observation, along with mappings that provide the observation data while doing this in a way that permits the combination of the fragments into a "complete observation". These domains do not have to involve time, but tracking behavior over time is a domain where formal approaches quickly get unwieldy and the intuitive sense we can make of the sheaf condition can prove helpful. There is a whole, and the sheaves describe its "parts". When we solve a jigsaw puzzle, we are taking advantage of a gluing condition of sorts.

Many of these advantages remain in effect when working only with presheaves. Consider transitions between states are labeled not with symbols, but with expressions. This opens up the direction of modeling concurrent processes, like Joyal, Nielsen, Winskel describe in Bisimulation from Open Maps.

Coda

In math, it seems to be well known and somewhat accepted that knowledge acquired by some same-generation group of researchers may get "lost" if it does not make it to textbooks (it appears in some monographs and is available to specialists, but from undergrads to grad school to full professors, people need to traverse a rocky path to get to an understanding which was a mere baseline in that same-generation group of researchers).

In informatics, we have means of codifying knowledge in the forms of programs, type systems. It is then ironic that formalization itself leads to "loss" since the various syntactic and insignificant choices lead to fragmentation among researchers. This can be observed in the modeling of concurrent systems, where the scientific process has led to whole subfields where specialists in one area could not be bothered to learn the language of the others.

If we want to preserve and codify knowledge in ways that does not get fragmented in space or time, it may be good to make use of more abstract modeling languages, as provided by category theory, and leverage the succinctness that comes from aspects that are left informal (semantic techniques). Presheaves seem like a technique where formal and informal can be mixed in ways that can be useful in this sense.

23.1.21

What are Commuting Conversions?

The term "commuting conversions" in proof theory refers to a proof transformation of natural deduction proofs that involve sums (disjunction and existential formulae). The motivation for these comes from a desirable equivalence relation among natural deduction proofs that is lost when $\vee$ and $\exists$ are added. \(\newcommand{\orelim}[0]{\color{blue}{(\vee{\mathrm{\small ELIM}})}}\) Consider the following natural deduction rules $$ \cfrac{\begin{array}[cc]{} \\ {} \\ A \vee B \end{array} \quad \begin{array}[cc]{} [A] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array}\quad \begin{array}[cc]{} [B] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array} } {C}~\color{blue}{(\vee{\mathrm{\small ELIM}})} $$ $$ \cfrac{\begin{array}[cc]{} \\ {} \\ \exists x . (x)\end{array} \quad \begin{array}[c]{} [P(t)] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array}} {C}~\color{blue}{(\exists{\mathrm{\small ELIM}})} $$ When the formula $C$ is not the end of the proof, but there is another formula $W$ that is derived, then we get multiple proofs to derive the same thing. These multiple proofs of the same thing are all normal in the sense that there are no "detours" in the form of implication introduction and elimination. If we want a normal form to be unique, then we must necessarily pick one among these possible proof and designate it as "normal." This happens by adding additional conversions. $$ \cfrac{\begin{array}[cc]{} \\ {} \\ A \vee B \end{array} \quad \begin{array}[cc]{} [A] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array}\quad \begin{array}[cc]{} [B] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array} } {\cfrac{~C~}{~W~}} ~ \rightsquigarrow ~ % \cfrac{\begin{array}[cc]{} \\ {} \\ A \vee B \end{array} \quad \begin{array}[cc]{} [A] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \\ \hline \phantom{[}W\phantom{]} \end{array}\quad \begin{array}[cc]{} [B] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \\ \hline \phantom{[}W\phantom{]} \end{array} } {W}$$ $$ \cfrac{\begin{array}[cc] {} & [P(t)] \\ {} & \vdots \\ \exists x . P(x) & C \\ \end{array}}{\cfrac{~C~}{~W~}} ~ \rightsquigarrow ~ \cfrac{\begin{array}[cc] {} & [P(t)] \\ {} & \vdots \\ {} & C \\ \exists x . P(x) & \overline{~W~} \\ \end{array}}{W} $$

Pattern matching example

Programmers may appreciate the following presentation of the same thing in Scala-like syntax. This:
$\color{blue}{W}$($r$ match { case Left(a) => $\color{blue}{s}$; case Right(b) => $\color{blue}{t}$ })
is the same as this:
$r$ match { case Left(a) => $\color{blue}{W(s)}$; case Right(b) => $\color{blue}{W(t)}$ })

26.12.20

Kripke semantics and Tableaux for Intuitionistic Logic

Simply-typed $\lambda$ calculus terms can be seen as encoding of natural deduction proofs of intuitionistic propositional logic (IPC - the C is for calculus; IPL is also common). This is the Curry Howard correspondence.

So we can prove an IPC formula by finding a suitable term of simply-typed $\lambda$ calculus. Sounds great - but there are other ways to represent proofs and I wanted to understand better how structure of these natural deduction proofs relates to the structure of those other proofs. \(\newcommand{\qfalse}[0]{\color{blue} {\bot}}\) \(\newcommand{\qp}[0]{\color{blue} p}\) \(\newcommand{\qq}[0]{\color{blue} q}\) \(\newcommand{\qX}[0]{\color{blue} X}\) \(\newcommand{\qY}[0]{\color{blue} Y}\) \(\newcommand{\qZ}[0]{\color{blue} Z}\) \(\newcommand{\qand}[0]{\color{blue} \wedge}\) \(\newcommand{\qor}[0]{\color{blue} \vee}\) \(\newcommand{\qimplies}[0]{\rightarrow}\) \(\newcommand{\qneg}[1]{{\neg #1}}\) \(\newcommand{\andintro}[0]{\color{blue}{(\wedge{\mathrm{\small INTRO}})}}\) \(\newcommand{\andelim}[0]{\color{blue}{(\wedge{\mathrm{\small ELIM}})}}\) \(\newcommand{\andelimleft}[0]{\color{blue}{(\wedge{\mathrm{\small ELIM/1}})}}\) \(\newcommand{\andelimright}[0]{\color{blue}{(\wedge{\mathrm{\small ELIM/2}})}}\) \(\newcommand{\orintro}[0]{\color{blue}{(\vee{\mathrm{\small INTRO}})}}\) \(\newcommand{\orelim}[0]{\color{blue}{(\vee{\mathrm{\small ELIM}})}}\) \(\newcommand{\implintro}[0]{\color{blue}{(\rightarrow{\mathrm{\small INTRO}})}}\) \(\newcommand{\implelim}[0]{\color{blue}{(\rightarrow{\mathrm{\small ELIM}})}}\) \(\newcommand{\notelim}[0]{\color{blue}{(\neg{\mathrm{\small ELIM}})}}\) \(\newcommand{\botelim}[0]{\color{blue}{(\bot{\mathrm{\small ELIM}})}}\)

As an example, consider the formula $\qneg{\qneg{(\qp \qor \qneg{\qp)}}}$. We can take $\qneg{\qX}$ as abbreviation for $\qX \qimplies \bot$. Is it "obvious" how to find a term of type $((\qp \qor \qneg{\qp}) \qimplies \bot) \qimplies \bot$? Here is a natural deduction proof of (the names are from this list of intuitionistic natural deduction rules from the post above; I can't be bothered with typesetting a tree). $$\begin{array}{lll} 1. & \qneg{(\qp \qor \qneg{\qp)}} & \mathrm{(assumption)} \\ 2. & \qp & \mathrm{(assumption)} \\ 3. & \qp \qor \qneg{\qp} & \mathrm{(via}~\orintro/1~\mathrm{from~2.}\\ 4. & \bot & \mathrm{(via}~\implelim~\mathrm{apply~1.~to~3.)} \\ 5. & \qneg{\qp} & \mathrm{(via}~\implintro~\mathrm{from~[2.]~to~4.)} \\ 6. & \qp \qor \qneg{\qp} & \mathrm{(via}~\orintro/2~\mathrm{from~5.)} \\ 7. & \bot & \mathrm{(via}~\implelim~\mathrm{apply~1.~to~6.)} \\ 8. & \qneg{\qneg{(\qp \qor \qneg{\qp)}}}& \mathrm{(via}~\implintro~\mathrm{from~[1.]~to~7.)} \\ \end{array} $$

Maybe a better question to ask is: Why does this proof looks the way it looks? How, starting from the formula, can we arrive at such a proof, with steps that are so simple that we can leave them to a machine?

Intuitionistic Tableaux

Kripke semantics and tableaux give us an alternative view of proving IPC formulae. The tableau calculus we discuss here is not very different from sequent calculus, it uses "block tableaux" where each node is set of (signed) formulae. It will be a refutation calculus - despite proving intuitionistic logic!

The definitions here follow Melvin Fitting's book [1] and Smullyan's unified presentation [2].

In contrast to block tableaux, "analytic tableaux" have one formula per node and we consider the set of all formulas on a branch instead of looking at a node). These and connexion methods are out of scope today, just like modal logic or the algebraic view (denotational semantics). The only differences from the aforementioned texts is that instead of using negation as primitive, I will use $\qfalse$ as primitive, and I make it explicit when a principal $T \qX$ formulae has to be preserved in tableau extension.

[1] Melvin Fitting. Intuitionistic Logic, Model Theory and Forcing [researchgate pdf].

[2] Raymond M. Smullyan. A generalization of intuitionistic and modal logics. https://doi.org/10.1016/S0049-237X(08)71543-X

Syntax for Intuitionistic and Minimal Propositional Logic

Once more, we define of well-formed formulae (wff):

  • a propositional variable $\qp$ is a wff
  • $\qfalse$ is a wff
  • if $\qX$ and $\qY$ are wff, then $\qX \wedge \qY$ is a wff.
  • if $\qX$ and $\qY$ are wff, then $\qX \vee \qY$ is a wff.
  • if $\qX$ is a wff, then $\qX \qimplies \qY$ is a wff

We use $\qq,\qp$ for propositional variables and $\qX, \qY, \qZ$ as metavariables for formulae. The symbol $\qfalse$ stands for "falsehood", "falsity" or "absurd" and is also written as $\bot$. We use $\qfalse$ and $\qimplies$ as primitive (built-in) symbols, and define negation $\qneg{X}$ as an abbreviation for $\qX \qimplies \qfalse$. This goes well with the intuitionist notion that "a proof of $\qneg{X}$ amounts to a construction that transforms a proof $\qX$ to a proof of falsehood."

Kripke semantics

A (Kripke) model for intuitionistic propositional logic is a triple $\langle \mathcal{G}, \mathcal{R}, \Vdash \rangle$ where $\mathcal{G}$ is a set of states, $\mathcal{R}$ an accessibility relation between states that is reflexive and transitive, and a relation $\Vdash$ between states and formulae that satisfies conditions $K_0 - K_3$ below. We use $\Gamma, \Delta$ for elements of $\mathcal{G}$, read $\Gamma \Vdash \qX$ as "$\Gamma$ forces $\qX$" and write $\Gamma^\star$ for "any $\Delta$ with $\Gamma \mathcal{R} \Delta$".

($K_0$) If $\Gamma \Vdash \qX$ then $\Gamma^\star \Vdash \qX$

($K_1$) $\Gamma \Vdash \qX \qand \qY$ iff $\Gamma \Vdash \qX$ and $\Gamma \Vdash \qY$

($K_2$) $\Gamma \Vdash \qX \qor \qY$ iff $\Gamma \Vdash \qX$ or $\Gamma \Vdash \qY$

($K_3$) $\Gamma \Vdash \qX \qimplies \qY$ iff for every $\Gamma^\star$, either $\Gamma^\star \not\Vdash \qX$ or $\Gamma^\star \Vdash \qY$.

($K_4$) $\Gamma \not\Vdash \qfalse$.

A formula is valid in a model if $\Gamma \Vdash \qX$ for all $\Gamma \in \mathcal{G}$. A formula is valid if it is valid in all models. Note that with $K_0$ in mind, we could also have written conditions $K_1$ and $K_2$ differently:

($K_1'$) $\Gamma \Vdash \qX \qand \qY$ iff for every $\Gamma^\star$, $\Gamma^\star \Vdash \qX$ and $\Gamma^\star \Vdash \qY$

($K_2'$) $\Gamma \Vdash \qX \qor \qY$ iff for every $\Gamma^\star$, $\Gamma^\star \Vdash \qX$ or $\Gamma^\star \Vdash \qY$

In the above definition, it is understood that $\Vdash$ is "seeded" with a relation $\Vdash^{0}$ between states and propositional variables $\qp$. This $\Vdash^{0}$ is then extended to a relation between states and whole formulae. It is also common to define states directly as subsets of the "available" propositional variables (the ones forced in that state). Here is an example, with two states $\Gamma = \{\qq\}$ and $\Delta = \{\qp, \qq\}$:

The relation $\Vdash$ takes all reachable states into consideration, and once a state forces $\Gamma \Vdash \qq$, any subsequent $\Gamma^\star$ must force it, too. The "truth in a state" is rather more dynamic than absolute "truth" we are used to from classical logic. One can give this a temporal epistemic interpretation and say states represent subjective knowledge of true facts, and that this knowledge "grows over time" (as we access subsequent states).

As an exercise: Convince yourself that $\Delta \Vdash \qp \qand \qq$, $\Gamma \not\Vdash \qp$ and the formula $\qp \qimplies \qq$ is valid in this model. Note further that $\qp \vee \qneg{p}$ is not valid. We have $\Gamma \not\Vdash \qp$, which is very different from saying $\Gamma \Vdash \qneg{p}$.

[When Kripke models are used for modal logics, people consider accessibility relation with different properties, leading to different modal logics; there is an explicit symbol $\square$ to talk about accessibility, and our reflexive and (in particular) transitive $\mathcal{R}$ corresponds to S4 modal logic, with the S4 axiom $\square \qp \rightarrow \square\square \qp$. We will just mention here that intuitionistic logic can be interpreted in S4 modal logic.] \(\newcommand{\qT}[0]{T}\) \(\newcommand{\qF}[0]{F}\) \(\newcommand{\qS}[0]{\mathrm{S}}\)

Kripke semantics for Signed Formulae

On towards a proof system for IPC based on an intuitionistic tableau calculus (or just intuitionistic tableaux). To this end, we introduce signed formulae, using symbols $T$ and $F$ and writing $T\qX, F\qX$ for a signed version of formula $X$:
  • $\Gamma \vdash T\qX$ means $\Gamma \Vdash \qX$ and
  • $\Gamma \vdash F\qX$ means $\Gamma \not\Vdash \qX$.

    In classical logic, $T\qX$ would mean $\qX$ is true and $F\qX$ would mean $\qX$ is false (or $\qneg{X}$ is true). The way to understand $\Gamma \vdash F\qX$ non-classically is to read it as "it is not known that $\Gamma \Vdash \qX$." Note that $\Gamma$ is merely a state, not a context or set of formulae - we are using this assertion as a building block to compare proof situations. For example, the assertions $T\qX \qand \qY$ and $F\qX$ cannot both be true at the same time (or in the same state).

    We follow Smullyan's uniform notation, and classify our formulae into different types (whether they are "conjunctive" or "disjunctive" in nature):

    • type $\alpha$ is any signed formula of the shape $T\qX \qand \qY, F\qX \qor \qY, F\qX \qimplies \qY, F\qfalse$
    • type $\beta$ is any formula $F\qX \qand \qY, T\qX \qor \qY, T\qX \qimplies \qY, T\qfalse$.

    Since we added $\qfalse$ as primitive, we need to say something about its components, too. We treat $\qfalse$ like we treat an empty disjunction.

    For a signed formula $\alpha (\beta)$ we called certain signed subformula its components $\alpha_1$ and $\alpha_2$ (same for $\beta$), according to this table: $$ \begin{array}{c|c|c}% \alpha & \alpha_1 & \alpha_2 \\ \hline T\qX \qand \qY & T\qX & T\qY \\ \hline F\qX \qor \qY & F\qX & F\qY \\ \hline F\qX \qimplies \qY & T\qX & F\qY \\ \hline F \qfalse & F \qfalse & F \qfalse \end{array} \quad \quad \begin{array}{c|c|c}% \beta & \beta_1 & \beta_2 \\ \hline F\qX \qand \qY & F\qX & F\qY \\ \hline T\qX \qor \qY & T\qX & T\qY \\ \hline T\qX \qimplies \qY & F\qX & T\qY \\ \hline T \qfalse & T \qfalse & T \qfalse \end{array} $$

    Again, for $\qfalse$ we have subformulae that are not proper, but this is not going to be a concern.

    Let us call the conjugate of $\overline{T\qX}$ the corresponding $F\qX$ and vice versa. Note that the conjugate of a formula of type $\alpha$ is of type $\beta$ and vice versa. In classical logic, a set of signed formulae $\{T\qX_1,\ldots,T\qX_n, F\qY_1, \ldots, \qY_m\}$ would correspond to a sequent calculus sequent $\qX_1, \ldots \qX_n \longrightarrow \qY_1, \ldots, \qY_m$. The sequent would be provable iff the set of signed formulae is unsatisfiable. The distinction into type $\alpha$ and $\beta$ explain how a formula can either be split into its components for conjunctive formulae $\alpha$ or the proof needs to branch into two possibilities for disjunctive formulae $\beta$.

    For intuitionistic logic, we are going to call all formulae $T \qX$ permanent and all formulae $F \qX$ co-permanent. With all this setup, we can remix our definition of Kripke model and $\Vdash$ into the following conditions using signed formulae $\qX$:

    $(U_0(a))$ Either $\Gamma \Vdash \qX$ or $\Gamma \Vdash \overline{\qX}$, but not both.

    $(U_0(b))$ If $\Gamma \Vdash \qX$ and $\qX$ is permanent, then $\Gamma^\star \Vdash \qX$.

    $(U_1)$ For any permanent $\alpha$, $\Gamma \Vdash \alpha$ iff for every $\Gamma^\star$, $\Gamma^\star \Vdash \alpha_1$ and $\Gamma^\star \Vdash \alpha_2$

    $(U_2)$ For any permanent $\beta$, $\Gamma \Vdash \beta$ iff for every $\Gamma^\star$, $\Gamma^\star \Vdash \beta_1$ or $\Gamma^\star \Vdash \beta_2$

    The behavior on co-permanent $\alpha$ and $\beta$ is determined by $U_1$ and $U_2$ too, by taking into consideration $(U_0(b))$. Observe:

    $(U_1')$ For any co-permanent $\alpha$, $\Gamma \Vdash \alpha$ iff for some $\Gamma^\star$, $\Gamma^\star \Vdash \alpha_1$ and $\Gamma^\star \Vdash \alpha_2$

    $(U_2')$ For any co-permanent $\beta$, $\Gamma \Vdash \beta$ iff for some $\Gamma^\star$, $\Gamma^\star \Vdash \beta_1$ or $\Gamma^\star \Vdash \beta_2$

    As an exercise: Go back to the example above, and convince yourself that $\Delta \Vdash T\qp \qand \qq$ and $\Gamma \Vdash F\qp$. Check that the formula $T \qp \qimplies \qq$ is valid in every state, using the conditions $U_\star$ above. Now do the same for $T \qp \vee \qneg{p}$, remembering that it stands for $T \qp \vee (\qp \qimplies \qfalse)$. Now, to keep it interesting, what about $\Gamma \Vdash F \qp \vee \qneg{p}$? Is $F \qp \vee \qneg{p}$ valid?

    A final definition is to call a formula special if it is co-permanent but has a permanent component, otherwise it is called ordinary. The only special formula we have is $F\qX \qimplies \qY$, of type $\alpha$. If we had negation as a built-in operator, that would be a second special $\alpha$. Check out Smullyan's paper for predicate logic and other generalizations that require special $\beta$.

    Tableau Proofs

    A tableau proof is like a sequent calculus proof written upside-down: we draw it with the root at the top, growing towards the bottom. Instead of $S \cup \qX$, we write $S, \qX$. A vertical bar $\vert$ indicates that there are two children. $\qS_T$ means $\{ \qT\qX \vert \qT\qX \in \qS\}$, and $\alpha_P$ means repeat $\alpha$ if it is permanent (same for $\beta$). This is just to emphasize that $T \qX$ (possibly) needs be to preserved; in sequent calculus, this would surface as a structural operation. $$ \begin{array}{ll} (A1) & \mathrm{for~ordinary} ~ \alpha: ~ \LARGE \frac{\qS,~ \alpha }{\qS,~ \alpha_1,~ \alpha_2 } \\ (A2) & \mathrm{for~special} ~ \alpha: ~ \LARGE \frac{\qS,~ \alpha }{\qS_P,~ \alpha_1,~ \alpha_2 } \\ (B1) & \mathrm{for~ordinary} ~ \beta: ~ \LARGE \frac{\qS,~ \beta }{\qS,~ \beta_1 ~ \vert ~ \qS, ~\beta_2 } \\ \end{array} $$

    Now, a configuration is a collection $[~\qS_1, \qS_2, \ldots, \qS_n~]$ of sets of signed formulae. A rule takes a configuration and replaces one $S_i$ with its result (or its two results). A tableau is a finite sequence of configurations $C_1, \ldots C_n$ in which each configuration except the first resulted from its predecessor by application of a rule.

    A set of signed formulae is closed if it contains both $\qT \qX$ and $\qF \qX$ for some formula $\qX$ or if it contains $\qT \qfalse$. A configuration $[~\qS_1, \qS_2, \ldots, \qS_n~]$ is closed if all sets $\qS_i$ are closed. A tableau $C_1, \ldots C_n$ is closed if some $C_i$ is closed.

    A tableau for a signed set of formula $\qS$ is a tableau $C_1, \ldots, C_n$ with $C_1 = \{\qS\}$. A finite set of of signed formulae $\qS$ is inconsistent if some tableau for $S$ is closed. Otherwise, $\qS$ is consistent. A formula $\qX$ is a theorem, written $\vdash_I \qX$, if $\{\qF \qX\}$ is inconsistent, and a closed tableau for $\{\qF\qX\}$ is called a proof of $\qX$.

    What we have done here is to define provability of $X$ as inconsistency of $FX$. This refutation calculus may not be appealing to an intuitionist.

    Let's look at a proof of $\qneg{\qneg{(\qX \vee \qneg{X})}}$ in our $\qfalse$-primitive logic. $$ \begin{array}{l} [~\{\qF \qneg{\qneg{(\qX \vee \qneg{\qX})})} ] \\ [~\{\qT \qneg{(\qX \vee \qneg{\qX}})\} ] \\ [~\{\qT \qneg{(\qX \vee \qneg{\qX}}), \qF \qX \vee \qneg{\qX}\} ] \\ [~\{\qT \qneg{(\qX \vee \qneg{\qX}}), \qF \qX, \qF \qneg{\qX}\} ] \\ [~\{\qT \qneg{(\qX \vee \qneg{\qX}}), \qT \qX\} ] \\ [~\{\qT \qneg{(\qX \vee \qneg{\qX}}), \qF \qX \vee (\qneg{\qX}), \qT \qX\} ] \\ [~\{\qT \qneg{(\qX \vee \qneg{\qX}}), \qF \qX, \qF \qneg{\qX}, \qT \qX\} ] \\ \end{array} $$ The same proof in a bit more verbose: $$ \begin{array}{l} [~\{\qF ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse) \qimplies \qfalse\}~] \\ \alpha (F\qimplies) \\ [~\{\qT ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse, \qF \bot)\}~] \\ \beta (T\qimplies) \\ [~\{\qT ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse), \qF (\qX \vee (\qX \qimplies \qfalse)), \qF \qfalse\}~, \{\ldots, \qT \qfalse\}] \\ \alpha (F\qor)\\ [~\{\qT ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse), \qF \qX, \qF (\qX \qimplies \qfalse), \qF \qfalse\}~, \{\ldots, \qT \qfalse\}] \\ \alpha (F\qimplies)\\ [~\{\qT ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse), \qT \qX, \qF \qfalse\}, \{\ldots, \qT \qfalse\}] \\ \beta (T\qimplies)\\ [~\{\qT ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse), \qF (\qX \vee (\qX \qimplies \qfalse)), \qT \qX, \qF \qfalse \}, \\ ~ \{\ldots, \qT \qfalse\}, \{\ldots, \qT \qfalse\}] \\ \alpha (F\qor)\\ [~\{\qT ((\qX \vee (\qX \qimplies \qfalse)) \qimplies \qfalse), \qF \qX, \qF (\qX \qimplies \qfalse), \qT \qX, \qF \qfalse \}, \\ ~ \{\ldots, \qT \qfalse\}, \{\ldots, \qT \qfalse\}] \\ \end{array} $$

    What is convenient about this?

    In applying the rules of the tableau calculus, we are taking the formula we want to prove apart and deal with its subformulae until we close a branch (configuration). However, it is not an entirely deterministic procedure, in the sense that we still need to pick which of the formulas we want to take apart, and if we pick the wrong one, we may have to backtrack. Automated reasoning! And it should be easy to convert the result into a natural deduction proof (similar to Prawitz's view that a sequent calculus proof can be seen as a description of how to construct a natural deduction proof).

    Another thing that is great about the tableaux method is that if after taking everything apart, you end up with open branches (that cannot be closed), you can read them as models that provide counter-examples. The application of a "special" rule indicates that we transitioned to another state.

    What is special about intuitionistic tableaux?

    Let's consider that signed formula $\qF \qX$ correspond to the right-hand side of a sequent calculus (the succedent). So unlike Gentzen's LJ, it is quite fine to have multiple formulae mingle on the right-hand side and stay intuitionistic, as long as the right-$\rightarrow$ rule (i.e. $\qF \qX \qimplies \qY$) keeps only one. This has also been explained here:

    [3] Valeria de Paiva, Luiz Carlos Pereira. A short note on intuitionistic propositional logic with multiple conclusions [researchgate pdf]

    The assertion $\qF \qX \qimplies \qY$ says: "at present, we think we can prove $\qX$ without at the same time proving $\qY$". In considering such a possible proof $\qT \qX$, we need to do bookkeeping to ensure that anything we prove along the way is separate from other formulae $\qF \qZ$ that we labeled as not-yet-known. This is why in the special $\alpha$ rule we use $\qS_P$ instead of $\qS$. This looks like the essence of intuitionistic reasoning as far as proof rules are concerned. It also makes intuitionistic tableaux a bit more complicated than classical.

    Coda

    I hope the above sheds some light on the connection between tableaux calculus, Kripke semantics and intuitionistic logic. Soundness and completeness can be found in Fitting's book, as well as generalizations to modal logic, predicate logic. Of course, for stronger logics, carrying over soundness and completeness does not mean we retain decidability; stronger logics require not only taking formulae apart, but also instantiating them, which increases non-determinism and search space. (Slightly updated 2022-01-22, for clarity.)