andersch.dev

<2022-05-19 Thu>

LR parser

An LR parser reads the characters of a text from beginning to end, i.e. left to right (LR), without ever backtracking. During this process, it groups the characters together to tokens using a lexer function. The tokens themselves are then grouped together to subtrees, which are stored on a stack. For every new token it decides what to do with the stack by consulting the parse table.

Example for LR-parsing a simple text

  1. Parser starts out at beginning of string. Stack is empty.

    Text:  x * y + z
    Pos:   ^
    Stack:
    
  2. Parser sees first character and recognizes it as a variable. Pushes [var] on the stack.

    Text:  x * y + z
    Pos:   ^
    Stack: [var]
    
  3. Same with the next 2 tokens.

    Text:  x * y + z
    Pos:        ^
    Stack: [var][*][var]
    
  4. Parser table for + specifies a reduction. This means the last 3 tokens get popped off the stack and get grouped together as a new parent subtree on the stack. Then the + and the variable z can be pushed on.

    Text:  x * y + z
    Pos:           ^
    Stack: [(product var * var)][+][var]
    
  5. Reaching the end of the text means another reduction. This creates the root node of the final syntax tree.

    Text:  x * y + z
    Pos:           ^
    Stack: [(sum (product var * var) + var)]
    

Problems with LR parsing

Take the Javascript code x = (y);. Here x is assigned to y. On the other hand, take x = (y) => z;. Here x is assigned to a function that takes y as a parameter.

Without looking ahead or guessing and then checking, an LR parser cannot know which one it's gonna be. One way to solve this problem is to use generalized LR parsing - or GLR parsing.

Resources