## Building an Interpreter for Propositional Logic

In the first post of this series, we built an interpreter to work with a basic subset of propositional logic. In this post, we are going to extend that interpreter to handle the full range of valid expressions in classical propositional logic. Specifically, this means we are going to allow for

• handling stacked negation operators (e.g. ~~T),
• parentheses to group sub-expressions (e.g. ~(T v F) & T),
• multiple binary operators to be used in one expression (e.g. T & F v T), and
• handling the proper operator precedence of the logical operators

These additions will give our interpreter the capability to evaluate any properly formed expression of classical propositional logic.1

### Stacked Negations

The simplest addition to make will be the stacked negation operators. When we left our grammar, the formula rule was like so:

 1 formula :: (NOT)? term 

The ? meant that a :NOT token could be present before a term zero or one times. We want to allow the :NOT token to present zero or many times. How might we accomplish this? Well, what precisely can be negated? Is it simply a term (e.g. T or F)? No. We are saying that an expression of the form ~~T is valid. What is the abstract syntax tree of this expression? It is a negation operator whose operand is … another negation operator, but this operator has a term operand.

When we start thinking about what this expression is actually encoding, we should see that a negation operator is not necessarily followed by a term; it is followed by a formula. But a logical formula (that is, a subexpression) can also have no negation operator and simply be either a :TRUE token or a :FALSE token (aka a term):

 1 formula :: (NOT)* formula | term 

In order to encode this new formula rule in our parser, we need simply to change what we pass into the creation of the AST::Negation object:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class Parser # ... # formula :: (NOT)* formula | term def formula token = @current_token if token.type == :NOT eat(:NOT) return AST::Negation.new(formula) else return term end end end 

We can add some tests to the run_tests method we defined in the last post to ensure that this new method is working properly:

 1 2 3 4 5 6 7 def run_tests # ... assert_interpret_equals('~~T', true) assert_interpret_equals('~~~F', true) assert_interpret_equals('~~F & F', false) assert_interpret_equals('F v ~~T', true) end 

You can find the script we have built to this point in this revision of this Gist

### Parenthetical Grouping

Just like in arithmetic, expressions in propositional logic can use parentheses to explicitly create sub-expressions. When parentheses are used in this way, it is explicitly encoding that the grouped sub-expression has higher precedence than the rest of the expression.

The first step in allowing for this feature is to update our Lexer to create the appropriate tokens for these characters:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 class Lexer # ... def tokens @input.split('').map do |char| # for each char, there are only 6 possible things to do case char when ' ' next when '~' Token.new(:NOT, '~') when '&' Token.new(:AND, '&') when 'v' Token.new(:OR, 'v') when '>' Token.new(:IFSO, '>') when '(' Token.new(:LPAREN, '(') when ')' Token.new(:RPAREN, ')') when 'T' Token.new(:TRUE, 'T') when 'F' Token.new(:FALSE, 'F') end end.compact end end 

Next, we need to update our grammar to allow for such grouped sub-expressions.

 1 2 3 expression :: formula ((AND | OR | IFSO) formula)* formula :: (NOT)* formula | LPAREN expression RPAREN | term term :: TRUE | FALSE 

We say that an expression comes between parenthese, and not a formula, because any valid logical expression can be placed between parens, not just a negation operation or a term.

So, we need to update the Parser#formula method to handle this case:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 class Parser # ... # formula :: (NOT)* formula | LPAREN expression RPAREN | term def formula token = @current_token if token.type == :NOT eat(:NOT) return AST::Negation.new(formula) elsif token.type == :LPAREN eat(:LPAREN) result = expression eat(:RPAREN) return result else return term end end end 

Let’s add some more tests to the run_tests method to ensure that this new feature is working properly as well:

 1 2 3 4 5 6 def run_tests # ... assert_interpret_equals('T & (F v T)', true) assert_interpret_equals('~(T & (F v T))', false) assert_interpret_equals('~(T & (F v T)) > T', true) end 

You can find the script we have built to this point in this revision of this Gist

### Multiple Binary Operators

Our current Parser does not properly handle expressions with multiple binary operators. As it is, if we were to parse the expression T & F v T, we would get an abstract syntax tree of this shape:

The disjunction is completely ignored! Before we can fix this, we must first determine why and where this is happening. Well, we know that we handle parsing binary operators in the Parser#expression method, so let’s start looking there:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 class Parser # ... # expression :: formula ((AND | OR | IFSO) formula)? def expression result = formula token = @current_token if token.type == :AND eat(:AND) result = AST::Conjunction.new(result, formula) elsif token.type == :OR eat(:OR) result = AST::Disjunction.new(result, formula) elsif token.type == :IFSO eat(:IFSO) result = AST::Implication.new(result, formula) end result end end 

What we find is that we don’t allow for recursive expressions; that is, if a formula is followed by an operator, we presume that the right hand operand is also a formula. In order to allow for complex expressions, we need to update a grammar to:

 1 expression :: formula ((AND | OR | IFSO) expression)* 

This allows for recursive expressions. And, to encode that in our method, we simply replace the formula param passed to the AST operator intializers with a recursive call to expression:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 class Parser # ... # expression :: formula ((AND | OR | IFSO) expression)* def expression result = formula token = @current_token if token.type == :AND eat(:AND) result = AST::Conjunction.new(result, expression) elsif token.type == :OR eat(:OR) result = AST::Disjunction.new(result, expression) elsif token.type == :IFSO eat(:IFSO) result = AST::Implication.new(result, expression) end result end end 

We can add a couple more tests to ensure that our change is doing what we expect:

 1 2 3 4 5 6 7 def run_tests # ... assert_interpret_equals('T & F v T', true) assert_interpret_equals('F & T v T', false) assert_interpret_equals('~T & F v T', false) assert_interpret_equals('~F & T v T', true) end 

You can find the script we have built to this point in this revision of this Gist

### Operator Precedence

The final issue we have with our interpreter is that is doesn’t properly handle the precedence of the logical operators. Operator precedence describes the order of operations in an expression. In arithmetic, I remember learning this mnemonic to remember the order of operations: “Please excuse my dear aunt Sally”. This is an acronym mnemonic for “Parentheses exponentiation multiplication division addition substraction”, or, to put it in tabular form:

Operator Precedence
^ 1
* 2
/ 2
+ 3
- 3

This order of operations says that the arithmetic expression 1^2 * 3 / 4 + 5 - 6 should be evaluated as ((((1^2) * 3) / 4) + 5) - 6. So, operator precedence tells our interpreter what order to evaluate the operations in. Should it evaluate the multiplication before the addition, or vice versa?

In propositional logic, the operator precedence is “negation conjunction disjunction implication”:

Operator Precedence
~ 1
& 2
v 3
> 4

So, let’s look back at one of the tests we added for handling multiple binary operators:

 1 assert_interpret_equals('F & T v T', false) 

Given the operator precedence, how should F & T v T be understood? Since conjunction (&) has a higher precedence than disjunction (v), it should be read as (F & T) v T, which would evaluate as true; however, we can see that our test expects it to be false. If you have been running the code as we have gone along, step by step, you would have seen your tests pass. This is because our interpreter has no sense of operator precedence and so evaluates the expression from left to right (e.g., in this case, it reads that expression as F & (T v T)).

We need to encode the precedence of our operators, but how do we do so?

I’ll be honest, my first thought was to tweak the order of the branches in the if/else clause of the Parser#expression methods. This will not solve the problem. Regardless of the order of the conditions, if all of those conditions live in the same method, the parser will still imply precedence from left to right. To encode the logic that this operator take precedence over that operator, regardless of which comes first in the expression, we need the precedence levels to be encoded as separate methods; that is, we need to expand our grammar.

This is the state of our grammar after our additions and improvements above:

 1 2 3 expression :: formula ((AND | OR | IFSO) expression)* formula :: (NOT)* formula | LPAREN expression RPAREN | term term :: TRUE | FALSE 

We can recall from the section on the visitor pattern from the previous post that our abstract syntax tree is evaluated from the lowest (leftmost) nodes up. So, for a tree like this:

We will visit nodes until we get the bottom, leftmost T node. The interpreter will then negate it (moving up the tree from the T node to its parent), and then evaluate that result as the lefthand side of the conjunction with the F node on the right. I bring this back up because it will help us to encode our operator precedence. Since our AST is evaluated from the bottom up, we want the operators with the highest precendences to be put lower in the tree. We can see that negation is already encoded as having a higher precedence than conjunction, since the expression ~T & F put the negation operator node below the conjunction node. But, why?

Well, the short answer is what I was getting at before, at the parser level, precedence is encoded via rule/method layering. Since the negation operator is defined as being a part of the formula rule/method, which is used as a part of the expression rule/method, it will always be placed lower than the binary operators defined as a part of the expression rule/method in the abstract syntax tree that the parser generates. So, if we want to distinguish the operator precedence of the binary operators, we will need to define a distinct rule for each of them, and those rules will need to be ordered to encode the precedence. Let’s start defining these rules simply by taking the expression rule and splitting out the operators into separate rules:

 1 expression :: formula ((AND | OR | IFSO) expression)* 

becomes

 1 2 3 conjunction :: formula (AND expression)* disjunction :: formula (OR expression)* implication :: formula (IFSO expression)* 

Now, how do we encode the operator precedence order? The key is the how the grammar rule definitions “nest”. In the original grammar, expression was the topmost rule, and it called into formula, which in term called into term. And our abstract syntax tree would mirror this nesting; Boolean values were placed beneath the negation operator, which would be placed beneath any binary operators. So, in order to encode the operator precedence of the binary operators, we need to nest the rules properly. Since conjunction has the highest precedence, we want it lowest in the tree, just above negation:

 1 2 3 4 expression :: conjunction ((OR | IFSO) expression)* conjunction :: formula (AND expression)* formula :: (NOT)* formula | LPAREN expression RPAREN | term term :: TRUE | FALSE 

Disjunction is next; we want it to end up above conjunctions in the tree, so:

 1 2 3 4 5 expression : disjunction (IFSO expression)* disjunction :: conjunction (OR expression)* conjunction :: formula (AND expression)* formula :: (NOT)* formula | LPAREN expression RPAREN | term term :: TRUE | FALSE 

And now our expression rule has simply become our implication rule. We now have a grammar where the operators with the lower precedence are higher in the levels of rules. Let’s now encode this grammar in our parser.

The only change we must make is what method we call for the right hand side of an operation. Right now, each operation rule calls expression to evaluate the right hand side. If we leave our grammar as is, we would actually end up with the exact same result as when our grammar only had the expression, formula, and term rules. This is because if we “restart” the evaluation chain at the top (the expression rule/method) for every right hand operand, our parser will end up grouping operations from left to right again. Since the new expression, disjunction, and conjunction rules/methods all check if the current token is their respective operator, if we evaluate the right hand side of any of these operations back at expression, we will always end up running a method that finds the next operator and thus groups the sub-expressions in a simplistic, left-to-right manner.

Maybe an actual example will help explain this point. Let’s return to our earlier example of F & T v T. What will happen when our parsers evaluates the tokens that represent this expression?

First, the expression method is called inside of the parse method

 1 2 3 def parse expression end 

The expression method then calls the disjunction method to hydrate its internal result variable

 1 2 3 4 def expression result = disjunction # ... end 

The disjunction method then calls the conjunction method to hydrate its internal result variable

 1 2 3 4 def disjunction result = conjunction # ... end 

The conjunction method then calls the formula method to hydrate its internal result variable

 1 2 3 4 def conjunction result = formula # ... end 

The formula method checks if the current token is a negation operator

 1 2 3 4 5 6 def formula token = @current_token if token.type == :NOT # ... end 

It isn’t (its a true token), so it checks if the current token is a left parens token

 1 2 3 4 5 def formula # ... elsif token.type == :LPAREN # ... end 

It isn’t, so it calls the term method

 1 2 3 4 5 6 def formula # ... else return term end end 

The term method checks if the current token is a true token

 1 2 3 4 5 6 def term token = @current_token if token.type == :TRUE # ... end 

It isn’t, so it checks if the current token is a false token

 1 2 3 4 5 def term # ... elsif token.type == :FALSE # ... end 

It is, so it moves the current token pointer to the next token in the stream (now the current_token is the & conjunction operator token) and returns the false value to the formula method

 1 2 3 4 5 6 7 def term # ... elsif token.type == :FALSE eat(:FALSE) return AST::Atom.new(false) # ... end 

The formula method then immediately returns that value to the conjunction method

 1 2 3 4 5 6 def formula # ... else return term end end 

The conjunction method therefore sets the internal result variable of the conjunction method to the false value

 1 2 3 4 def conjunction result = formula # ... end 

And then it checks if the new current token is the conjunction operator token

 1 2 3 4 5 6 7 def conjunction # ... token = @current_token if token.type == :AND # ... end 

It is, so it moves the current token point to the next token in the stream (now the current token is the T true token) and updates its internal result variable to a conjunction node where the left hand operand is the previous result value (aka the false value) and the right hand operand is a call to expression

 1 2 3 4 5 6 7 def conjunction # ... if token.type == :AND eat(:AND) result = AST::Conjunction.new(result, expression) end end 

So we have now called the expression method and the current token is now pointing to the the T token. The parser will make its way through the various methods until it gets back to the term method, which will return the true value node and move the current token pointer to the v disjunction operator token. In the process of finishing the evaluation of this second call to expression (the call originating from the conjunction method), the disjunction method will be called and it will check if the current token is the disjunction operator. And since it is, it will create a new disjunction node and call expression itself with the new current token set to the final T token. This third call to expression (originating from the disjunction method) will return the true value node so that the right hand side of the disjunction node is that true value. And then the result of that call to disjunction will be returned all the way up to the second call to expression. The right hand side of the conjunction node will therefore be this disjunction node, and with that our parser will finish. It will output an conjunction node where the left hand side is a false value and the right hand side is a disjunction. This is the exact same (faulty) result that our parser had at the end of the last post.

If we call expression to evaluate the right hand side of our binary operations, our parser will end up continuing to parse the order of operations simply based on left-to-right appearance order. In order to ensure the actual order of operations, we need a way to “cap” how high back up the hierarchy methods our parser can go. To put it another way, we want to ensure that if our parser finds a conjunction as the first operation it hits, it guarantees that the right hand side of that conjunction can only be either a Boolean value or a negation operation (since negation has a higher precedence than conjunction); the right hand side of a conjunction cannot be a disjunction (unless of course we use parentheses to explicitly group our expression that way, but we have already solved that problem). In order to achieve this result, we simply need to change the methods that are called for the right hand side of a binary operation to be calls back to that same method. Since our method calls are nested, this will allow the right hand side to be evaluated as any operation or value that has either the same precedence as lower.

I know that this was a pretty large “detour”, but it took me a while to feel like I fully understood why our grammar (and thus our parser) needed to have this shape to properly encode our operator precedence. So, our final grammar is:

 1 2 3 4 5 expression : disjunction (IFSO expression)* disjunction :: conjunction (OR disjunction)* conjunction :: formula (AND conjunction)* formula :: (NOT)* formula | LPAREN expression RPAREN | term term :: TRUE | FALSE 

And our final parse rule methods are:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 class Parser # ... # expression :: disjunction (IFSO expression)* def expression result = disjunction token = @current_token if token.type == :IFSO eat(:IFSO) result = AST::Implication.new(result, expression) end result end # disjunction :: conjunction (OR disjunction)* def disjunction result = conjunction token = @current_token if token.type == :OR eat(:OR) result = AST::Disjunction.new(result, disjunction) end result end # conjunction :: formula (AND conjunction)* def conjunction result = formula token = @current_token if token.type == :AND eat(:AND) result = AST::Conjunction.new(result, conjunction) end result end # formula :: (NOT)* formula | LPAREN expression RPAREN | term # remains the same # term :: TRUE | FALSE # remains the same end 

We can now update our non-grouped binary expression tests:

 1 2 3 4 5 def run_tests # ... assert_interpret_equals('F & T v T', true) assert_interpret_equals('~F & T v T', true) end 

### Wrapping Up

We have now successfully implemented an interpreter that handles the full suite of valid expressions in classical propositional logic:

• it handles stacked negation operators (e.g. ~~T),
• it handles parentheses to group sub-expressions (e.g. ~(T v F) & T),
• it handles multiple binary operators used in one expression (e.g. T & F v T), and
• it handles the proper operator precedence of the logical operators

Some of these bits of functionality were easier to implement than others, but we learned a lot more about our parser and the grammar needed to describe a language with the added complexity of robust classical logic.

In the next post, I want us to expand our interpreter to allow for variables in our expressions, like P & Q. That should be fun.

You can find the script we have built to this point in this revision of this Gist

1. I say “classical propositional logic” because modern propositional logic has many more valid operators. But this is an addition we will get to in the next post.