Stack graphs

Last updated: 2020-05-19

Scope graphs provide a single framework for performing name resolution, while abstracting away the specific name resolution rules for each of the languages that we want to support. The basic idea is to represent the definitions and references in a program using graph. A binding maps a reference to all of the possible definitions that the reference could refer to. Because we’ve represented definitions and references as a graph, bindings are represented by paths within that graph.

Unfortunately, scope graphs are not incremental, at least as presented in the original papers. As one example, Scopes as Types presents some rewrite rules that let you handle field access in a record or object type — for instance, being able to resolve the foo part of A.foo. In Scopes as Types, we’d handle this by performing the path-finding algorithm when constructing the graph, to find out what A resolves to. Once we find its definition, we then attach the reference to foo to that result.

This is not incremental because the reference to foo “lives” over in some unrelated part of the graph. If A is defined in a different file (or worse, a different package), then we have to update that distant part of the graph with the node representing the reference. We end up cluttering the graph for a class definition with nodes representing all of its field references, which is especially problematic for popular packages with lots of downstream dependencies.

Our key insight is to recognize that when resolving A.foo, we have to “pause” the resolution of foo, start a resolution of A, and once we’ve found the binding for A, resume the original resolution of foo. This describes a stack! So instead of having our path-finding algorithm look for exactly one symbol, we can keep track of a stack of things to look for. To correctly resolve foo, we do still have to eventually make our way over to the part of the graph where A is defined. But by having a stack of path-finding targets, we can resolve the reference to A.foo with a single execution of the path-finding algorithm. And most importantly, each “chunk” of the overall graph only depends on “local” information from the original source file. (a.k.a., it’s incremental!)

Because of this focus on a stack of things to look for, we’re calling our modification of scope graphs stack graphs. There’s enough different between the two frameworks (especially in terms of the “gadgets” of graph that you create for a particular chunk of code) that we think it’s worth having distinct names for them.

Incremental stack graphs

While searching for a path in an incremental stack graph, we keep track of two stacks: a symbol stack and a scope stack. Broadly speaking, the symbol stack keeps track of what symbols we’re trying to resolve, while the scope stack gives us control over which particular scopes we look for those symbols in.

There are several different kinds of node that can appear in a stack graph; as we search for a path representing a binding, each kind of node has different rules for how it interacts with the two stacks.

Scopes

Root scope

This is a singleton node — there is exactly one of them in the entire stack graph. When we start to compute stack graphs incrementally, the root node in each subgraph represents a single root node: this is the node where the subgraphs are linked together into a single large overall graph.

Internal scope

A scope that is needed to represent the name binding structure of the content of the current file, but that we don’t need to refer to by name in other parts of the stack graph.

Exported scope

A scope that can appear on the scope stack. Used to represent, for example, the parameters passed into a function call.

Symbol stack nodes

The symbol stack keeps track of all of the things that we are currently looking for. These will typically be symbols that come from the source code: when the user clicks on a reference, we push the reference’s symbol onto the stack so that we can search for the corresponding definition (which will be represented by a node that pops that symbol off the stack, leaving it empty). However, we will sometimes need to search for other things. For instance, we’ll handle field access (like A.foo) by searching for a fake symbol called ..

Push symbol

Colored green. Pushes the content of the node onto the symbol stack, indicating a new thing that we have to start searching for. If the node is dark green with a solid border (sometimes called a “reference node”), then this is a “real” reference that the user can click on and that exists in the syntax tree of the source file. If the node is light green with a dashed border (sometimes called an “internal push node”), then we’re using this node to push something onto the stack, but it doesn’t represent a real reference in the source code. (Note that fake symbols like . will only appear in internal push nodes, since by definition those can’t come from the user’s source code.)

Pop symbol

Colored red. Pops the content of the node off of the symbol stack. (If the symbol stack is empty, or if the top of it is something else, then the current potential path is not valid.). If the node is dark red with a solid thick border (sometimes called a “definition node”), then this is a “real” definition that exists in the syntax tree of the source files, and that could be the target of a jump-to-def binding. If the node is light red with a dashed/dotted border (sometimes called an “internal pop node”), then we’re using this node to pop something off of the stack, but it doesn’t represent a real definition in the source code. (Note that fake symbols like . will only appear in internal pop nodes, since by definition those can’t come from the user’s source code.)

Scope stack nodes

The scope stack gives us the ability to change the part of the stack graph where we’re currently looking for things. This is used for things like function calls, where we need to package up a context (containing the actual parameters) and send it over to another part of the source code. (The definition of function doesn’t know in advance which parameters will be passed in — and can receive different parameters for different calls.)

Jump to scope

Pops a scope reference from the top of the scope stack and jumps to it. We resume the path-finding algorithm from the popped scope’s node.

Drop all scopes

Removes all scope references from the scope stack, ignoring all of them.

Hybrid nodes

We’ve talked about how we can do things with the scopes on the scope stack, but we haven’t talked about to put things onto the scope stack. The nodes in this section interact with both stacks.

Push scoped symbol

Just like the push symbol nodes, but for scoped symbols. A scoped symbol is a symbol with a list of scopes attached to it.

These nodes create a new scoped symbol, and pushes it onto the symbol stack. The scoped symbol’s scope list is the current contents of the scope stack, prepended with the scope mentioned in the push node.

Pop scoped symbol

Just like the pop symbol nodes, but for scoped symbols. If the symbol stack is empty, the current potential path is not valid. If the top of the symbol stack is not a scoped symbol, or if the symbol’s content doesn’t match what’s in the node, the current potential path is not valid.

These nodes pop off the top of the symbol stack, and replace the contents of the scope stack with the scoped symbol’s scope list.

Graph construction patterns

This section describes several patterns for constructing the subgraph that corresponds to a particular snippet of code.

Python

Module

# a/b.py defines module "a.b"
...

Import
import a.b

Import star
from a.b import *

Import symbol
from a.b import foo

Import renamed symbol
from a.b import foo as bar

Top-level assignment
foo = expr

Last item in file

Class

class A:
    ...

Class field
class A:
    foo = expr

Instance method
class A:
    def foo(...):
        ...

Formal parameter
class A:
    def ...(..., foo, ...):
        ...

Expressions

Symbol reference
foo

Field access

expr.foo

Function call

foo(...)

Positional actual parameter

...(..., expr, ...)

Keyword actual parameter

...(..., foo=expr, ...)

Larger examples

Using the patterns described above, we can construct the full stack graph for several interesting programs:

Python

Overlapping import statements

# main.py₀
import a
import b
print(a.foo)
# a.py₅
foo = 1
# b.py₇
foo = 2

Overlapping import-star statements

# main.py₀
from a import *
from b import *
print(foo)
# a.py₄
foo = 1
# b.py₆
foo = 2

Sequenced import-star statements

# main.py₀
from a import *
print(foo)
# a.py₃
from b import *
# b.py₅
foo = 2

# main.py₀
class A:
    foo = 1

print(A.foo)

# main.py₀
bar = 1

class A:
    foo = bar

print(A.foo)

# main.py₀
class A:
    bar = 1

def foo(x):
    return A

print(foo(A).bar)

# main.py₀
class A:
    bar = 1

def foo(x):
    return x

print(foo(A).bar)

Paths represent name bindings

With this set of rules for constructing stack graphs, bindings between references and definitions are represented by paths, where each edge leaves the symbol and scope stacks in a well-defined state, and where both stacks are empty at the beginning and end of the path.

For instance, in the “sequenced import star” example:

# main.py₀
from a import *
print(foo)
# a.py₃
from b import *
# b.py₅
foo = 2

We can resolve the foo2 reference in main.py to the definition of foo6 in b.py because the following well-formed path exists:

Let’s formalize it!

Specification text can be dense, so throughout we’ll use commentary like this to explain what’s going on.

Definitions

A symbol is a string of characters that represents a portion of source code.

A stack graph is a graph representing the name binding structure of a program. It consists of a list of nodes and a list of edges.

A node is any of the following:

A push symbol node is also called a reference node if it has a location.

A pop symbol node is also called a definition node if it has a location.

In the graphical notation, we make a clear visual distinction between “exported” and “internal” push and pop nodes. The only difference between an exported and internal node is whether it represents a reference or definition that the user can click on. Therefore, in this formalization, we don’t have separate node types for exported vs internal nodes; instead, the location field of a push or pop node is optional. If it’s present, then it’s a real reference or definition that the user can click on; if it’s missing, then it’s not!

An edge consists of a source node, a sink node, and a label.

The source node and sink node of every edge in a stack graph must be one of the nodes in the stack graph.

There is no edge in a stack graph whose source node is a jump to scope node.

A scoped symbol consists of a symbol and an optional nonempty list of scope identifiers.

A path consists of:

The source node of each edge in a path must be the same as the sink node of the previous edge.

If a path’s edges list is empty, then its starting node must be the same as its ending node.

If a path’s edges list is nonempty, then the starting node of the path must be the same as the source node of the first edge in the path, and the ending node of the path must be the same as the sink node of the last edge in the path.

In the formalization, every element of a symbol stack is called a “scoped symbol”, even if there’s no attached scope stack. For “non-scoped” symbols, the scope identifiers is just empty.

A path is complete if all of the following are true:

  1. Its starting node is a reference node.

  2. Its ending node is a definition node.

  3. Its symbol stack and scope stack are empty.

A complete path is what represents a full name binding that resolves a reference to a definition.

Algorithm 1: Appending an edge to a path

This algorithm looks long, but that’s just because there are a bunch of different node types that we have to handle. Our goal is to add a new edge to a path, while ensuring that all of the “rules” are followed. (For instance, you can’t enter a pop node if the top of the symbol stack doesn’t match.)

We can attempt to append an edge to a path as follows:

  1. If the source node of edge is not the same as the ending node of path, then abort.

  2. Let node be the sink node of edge.

  3. If node is a push symbol node, then:

    1. Let scoped symbol be a new scoped symbol whose:

      • symbol is node’s symbol
      • scope identifier list is missing
    2. Push scoped symbol onto the beginning of the path’s symbol stack.

  4. If node is a push scoped symbol node, then:

    1. Let scoped symbol be a new scoped symbol whose:

      • symbol is node’s symbol
      • scope identifier list is a copy of path’s scope stack prepended with node’s scope identifier
    2. Push scoped symbol onto the beginning of the path’s symbol stack.

  5. If node is a pop symbol node, then:

    1. If path’s symbol stack is empty, then abort.

    2. Let top be the first element of path’s symbol stack.

    3. If top’s symbol is not equal to node’s symbol, then abort.

    4. If top’s scope identifier list is not missing, then abort.

    5. Otherwise, pop top off the beginning of path’s symbol stack.

  6. If node is a pop scoped symbol node, then:

    1. If path’s symbol stack is empty, then abort.

    2. Let top be the first element of path’s symbol stack.

    3. If top’s symbol is not equal to node’s symbol, then abort.

    4. If top’s scope identifier list is missing, then abort.

    5. Otherwise, pop top off the beginning of path’s symbol stack.

    6. Replace path’s scope stack with top’s scope identifier list.

  7. If node is a drop scopes node, then:

    1. Clear path’s scope stack.
  8. Append edge to path’s edges list.

  9. Set path’s ending node to be node.

Algorithm 2: Resolving a jump to scope node in a path

If a path ends with a jump to scope node, then we have to actually perform the jump! There had better be a scope on the scope stack for us to jump to…

We can attempt to resolve a path as follows:

  1. Let node be the ending node of path.

  2. If node is a jump to scope node:

    1. If path’s scope stack is empty, then the jump to scope node is invalid. Abort.

    2. Pop scope identifier from the beginning of path’s scope stack.

    3. Let resolved node be a new exported scope node whose scope identifier is scope identifier.

    4. Let jump edge be a new edge whose:

      • source node is node
      • sink node is resolved node
      • label is jump
    5. Append jump edge to path’s edges list.

    6. Set path’s ending node to be resolved node.

Algorithm 3: Finding name binding paths

a.k.a. “the path-finding algorithm”

This is a fixed-point algorithm — we start with a set of initial paths, and keep trying to grow each one for as long as we can. Whenever we encounter a complete path during this process, we stash it away as one of our final results.

Given a stack graph, we can find the set of name binding paths for references, which is a list of (some of the) reference nodes in stack graph, as follows:

  1. Let current paths and complete paths be empty sets of paths.

  2. For each reference in references:

    1. Let initial path be a new path whose:

      • starting node and ending node are both reference
      • edges list is empty
      • symbol stack contains reference’s symbol as a single element
      • scope stack is empty
    2. Append initial path to current paths.

  3. While current paths is not empty:

    1. Remove an arbitrary path from current paths.

    2. Let extensions be all of the edges in stack graph whose source node is the same as path’s ending node.

    3. For each edge in extensions:

      1. If edge already appears in path’s edges list, skip it. [TODO: Investigate a better cycle detection heuristic.]

      2. Let new path be a copy of path.

      3. Attempt to append edge to new path. If unsuccessful, skip this edge.

      4. Attempt to resolve new path. If unsuccessful, skip this edge.

      5. If new path is complete, append it to complete paths.

      6. Append new path to current paths.

  4. Return complete paths.

Example name binding path

Given the above definitions and algorithms, we can find the name binding path from foo2 to foo6 with the following sequence of paths:

Starting node Ending node Symbol stack Scope stack
foo2 foo2 foo  
foo2 .1 . foo  
foo2 a1 a . foo  
foo2 a3 . foo  
foo2 .3 foo  
foo2 .4 . foo  
foo2 b4 b . foo  
foo2 b5 . foo  
foo2 .5 foo  
foo2 foo6    

The final path is complete, and is therefore represents a valid name binding in the source program.

Partial paths are incremental

The stack graphs presented here are incremental, since we can produce the subgraph for each file without having to look at the contents of any other file in the repo, or of any upstream or downstream dependencies.

For instance, revisiting the “sequenced import star” example again, we can create separate subgraphs for each file:

# main.py₀
from a import *
print(foo)

# a.py₃
from b import *

# b.py₅
foo = 2

This is great, because it means that when we receive a new commit for a repository, we only have to examine, and generate new stack subgraphs for, the files that are changed as part of that commit. We need to keep track of which per-file subgraphs belong to each commit, just like git has to do for the raw file content.

Having done that, one possible way to find name binding paths would be to load in all of the subgraphs that belong to the repo’s most recent commit, union them together into the combined graph for the commit, and run the path-finding algorithm described above. However, we think that this will require too much computation at query time.

Instead, we want to precompute parts of the path-finding algorithm, by calculating partial paths for each file. Because stack graphs have limited places where a path can cross from one file into another, we can calculate all of the possible partial paths that reach those “import/export” points. As we will see later, there is an efficient way to query for “compatible” partial paths, and to concatenate them together, producing the original “full” paths that represent our name bindings.

Definitions

A file is a reference to a source code file.

Every node in a stack graph except for the root node belongs to a file. The root node belongs to every file.

A partial path consists of:

Every scope identifier that appears in a partial path’s symbol stack postcondition or scope stack postcondition must appear somewhere in its symbol stack precondition or scope stack precondition.

Each partial path belongs to a single file. The source node and sink node of all of the edges in the partial path belong to this file.

The main difference between a path and a partial path is that the latter can be used “in the middle of” on existing path. That means we have to keep track of the “precondition” of the partial path — that is, what needs to be true of the current symbol and scope stacks when we “paused” the path-finding algorithm, for this partial path to represent a valid sequence of next steps? As one example, the partial path might start with a pop node. In a regular path, that would be invalid — you can’t start with a pop node, since there’s nothing on the symbol stack to pop off! However, in a partial path, that’s perfectly fine! We just have to make sure that the “incoming” symbol stack starts with the correct symbol, which the partial path will try to immediately pop off.

A precondition symbol consists of:

A postcondition symbol consists of:

An attached scope list consists of:

An attached scope list is empty if its list of scope identifiers is empty and its scope list variable is missing.

A partial path is as complete as possible if all of the following are true:

  1. Its starting node is the root node, an exported scope node, or a reference node:

  2. Its ending node is the root node, a jump to scope node, or a definition node.

  3. If its starting node is a reference node, then its symbol stack precondition is empty and its scope stack precondition is missing.

  4. If its ending node is a definition node, then its symbol stack postcondition is empty and its scope stack postcondition is empty.

In our regular path-finding algorithm, we were able to keep growing each path until we ran out of eligible edges, because the full scope graph for the entire program was available to us. When we try to find patrial paths, that won’t be true — we need to grow the partial path “as much as we can” with the set of nodes and edges available in the current file.

A partial path is productive if any of the following are true:

  1. Its starting node and ending node are different.

  2. Its symbol stack precondition and symbol stack postcondition are different.

  3. Its scope stack precondition and scope stack postcondition are different.

This bit isn’t fully fleshed out yet, to be honest, but it will be used to ensure that we’re not getting into cycles when stitching partial paths together.

Algorithm 4: Appending an edge to a partial path

This is very much like algorithm 1. The main difference is that we have to build up the preconditions for our partial paths as we’re “growing” them. For instance, when growing a path, trying to enter a pop node with an empty symbol stack is an error. When growing a partial path, that adds a new element to the partial path’s precondition.

We can attempt to append an edge to a partial path as follows:

  1. If the source node of edge is not the same as the ending node of path, then abort.

  2. Let node be the sink node of edge.

  3. If node is a push symbol node, then:

    1. Let postcondition symbol be a new postcondition symbol whose:

      • symbol is node’s symbol
      • attached scope list is missing
    2. Push postcondition symbol onto the beginning of the path’s symbol stack postcondition.

  4. If node is a push scoped symbol node, then:

    1. Let attached scope list be a copy of path’s scope stack postcondition.

    2. Prepend node’s scope identifier to attached scope list’s list of scope identifiers.

    3. Let postcondition symbol be a new postcondition symbol whose:

      • symbol is node’s symbol
      • attached scope list is attached scope list
    4. Push postcondition symbol onto the beginning of the path’s symbol stack postcondition.

  5. If node is a pop symbol node, then:

    1. If path’s symbol stack postcondition is empty, then:

      1. Let precondition symbol be a new precondition symbol whose:

        • symbol is node’s symbol
        • scope list variable is missing
      2. Push precondition symbol onto the end of path’s symbol stack precondition.

    2. Otherwise:

      1. Let top be the first element of path’s symbol stack postcondition.

      2. If top’s symbol is not equal to node’s symbol, then abort.

      3. If top’s attached scope list is not missing, then abort.

      4. Otherwise, pop top off the beginning of path’s symbol stack postcondition.

  6. If node is a pop scoped symbol node, then:

    1. If path’s symbol stack postcondition is empty, then:

      1. Let precondition symbol be a new precondition symbol whose:

        • symbol is node’s symbol
        • scope list variable is a fresh scope list variable not already present in path
      2. Push precondition symbol onto the end of path’s symbol stack precondition.

      3. Replace path’s scope stack postcondition with an attached scope list whose list of scope identifiers is empty, and whose scope list variable is precondition symbol’s scope list variable.

    2. Otherwise:

      1. Let top be the first element of path’s symbol stack postcondition.

      2. If top’s symbol is not equal to node’s symbol, then abort.

      3. If top’s attached scope list is missing, then abort.

      4. Otherwise, pop top off the beginning of path’s symbol stack postcondition.

      5. Replace path’s scope stack postcondition with top’s attached scope list.

  7. If node is a drop scopes node, then:

    1. Clear path’s scope stack postcondition.
  8. Append edge to path’s edges list.

  9. Set path’s ending node to be node.

Algorithm 5: Resolving a jump to scope node in a partial path

This is very similar to algorithm 2. The main difference is that we might not know in advance which scope we’re going to jump to. (That can happen when the partial path has a scope stack precondition — if the scope stack needs to be set up correctly before entering the partial path, then the scope that we jump to will be different depending on what other paths we stitch this partial path together with.)

We can attempt to resolve a partial path as follows:

  1. Let node be the ending node of path.

  2. If node is a jump to scope node:

    1. Let postcondition be path’s scope stack postcondition.

    2. If postcondition is empty, then the jump to scope node is invalid. Abort.

    3. If postcondition’s list of scope identifiers is empty, then we don’t have enough information to resolve the jump to scope node yet. Return.

    4. Otherwise, pop scope identifier from the beginning of postcondition’s list of scope identifiers.

    5. Let resolved node be a new exported scope node whose scope identifier is scope identifier.

    6. Let jump edge be a new edge whose:

      • source node is node
      • sink node is resolved node
      • label is jump
    7. Append jump edge to path’s edges list.

    8. Set path’s ending node to be resolved node.

Algorithm 6: Finding partial paths in a file

And this is almost a carbon-copy of algorithm 3; we just use our new partial-path-related definitions of appending and resolving.

We can find the set of partial paths for a stack subgraph as follows:

  1. Let current paths and complete paths be empty lists of partial paths.

  2. For each node in stack subgraph that is the root node, a reference node, or an exported scope node whose scope identifier is the scope identifier of some push scoped symbol node in stack subgraph:

    1. Let initial scope stack be a fresh scope list variable.

    2. Let initial path be a new partial path whose:

      • starting node and ending node are both node
      • edges list is empty
      • symbol stack precondition is empty
      • symbol stack postcondition is empty
      • scope stack precondition is initial scope stack
      • scope stack postcondition is initial scope stack
    3. If node is a reference node, then:

      1. Let initial symbol be a new postcondition symbol whose:

        • symbol is node’s symbol
        • attached scope list is empty
      2. Append initial symbol to initial path’s symbol stack postcondition.

      3. Clear initial path’s scope stack precondition and scope stack postcondition.

    4. Append initial path to current paths.

  3. While current paths is not empty:

    1. Remove an arbitrary path from current paths.

    2. Let extensions be all of the edges in stack graph whose source node is the same as path’s ending node.

    3. For each edge in extensions:

      1. If edge already appears in path’s edges list, skip it. [TODO: Investigate a better cycle detection heuristic.]

      2. Let new path be a copy of path.

      3. Attempt to append edge to new path. If unsuccessful, skip this edge.

      4. Attempt to resolve new path. If unsuccessful, skip this edge.

      5. If new path is as complete as possible and productive, append it to complete paths.

      6. Append new path to current paths.

  4. Return complete paths.

Stitching partial paths together

Having described how to generate partial paths for a stack graph, we can now talk about what we want to do with them. Our core operation is that we can concatenate partial paths together. Our conjecture is that for any complete path in the original stach graph (which included nodes and edges for all of the files that we care about), there is some sequence of partial paths (each of which comes from exactly one file) that we can concatenate together to yield the complete path. Moreover, as we are building up a potential path, we need it to be easy to query our database to find “eligible” partial paths that we can concatenate on next. If all of that is true, then we have an efficient incremental version of our original path-finding algorithm.

Definitions

A path (left) is compatible with a partial path (right) if all of the following are true:

  1. Either of the following are true:

    1. The ending node of left and the starting node of right are both the root node.

    2. The ending node of left and the starting node of right are both exported scope nodes, whose scope identifiers are the same.

  2. The symbol stack of left satisfies the symbol stack precondition of right.

  3. The scope stack of left satisfies the scope stack precondition of right.

A symbol stack satisfies a symbol stack precondition if all of the following are true:

  1. The length of symbol stack is ≥ the length of symbol stack precondition.

  2. For each precondition symbol in symbol stack precondition, and corresponding scoped symbol in symbol stack, all of the following are true:

    1. The precondition symbol’s symbol and the scoped symbol’s symbol are the same.

    2. The precondition symbol’s scope list variable and the scoped symbol’s list of scope identifiers are either both missing or both present.

A scope stack satisfies a scope stack precondition if the following is true:

  1. If the scope stack precondition’s scope list variable is missing, then the scope stack is empty.

    or, equivalently

    If the scope stack is not empty, then the scope stack precondition’s scope list variable is not missing.

These three definitions — compatibility, symbol stack satisfaction, and scope stack satisfaction — are probably the most important parts of this specification, because they translate into efficient database queries if you happen to store your pile of partial paths in a relational database!

Algorithm 7: Promoting a partial path to a path

We can attempt to promote a partial path to a path as follows:

  1. If partial path’s symbol stack precondition is not empty, abort.

  2. If partial path’s scope stack precondition is not missing, abort.

    Note: Because of the above two checks, we know that there are no scope identifiers in partial path’s precondition. That means that there cannot be any scope identifiers in its postcondition, either.

  3. Let symbol stack be a new empty list of scoped symbols.

  4. For each postcondition symbol in partial path’s symbol stack postcondition:

    1. Let scoped symbol be a new scoped symbol whose:

      • symbol is postcondition symbol’s symbol
      • list of scope identifiers is missing
    2. If postcondition symbol’s attached scope list is present, set scoped symbol’s list of scope identifiers to be the same as attached scope list’s list of scope identifiers.

    3. Append scoped symbol to symbol stack.

  5. Let path be a new path whose:

    • starting node is partial path’s starting node
    • ending node is partial path’s ending node
    • list of edges is partial path’s list of edges
    • symbol stack is symbol stack
    • scope stack is partial path’s scope stack postcondition’s list of scope identifiers
  6. Return path.

Algorithm 8: Concatenating a partial path onto a path

This is admittedly a bit dense. Conceptually, it’s not that bad — we just want to append the edges from right onto the edges of left, giving us a new path. However, we also have to make sure that the partial path’s precondition is satisfied. And because the precondition can contain scope list variables, we also have to figure out what parts of the incoming symbol and scope stacks those variables “capture”. And then we have to construct the new symbol and scope stacks using those bindings. Which…takes a few steps to plumb everything through.

We can attempt to concatenate a path left and a partial path right as follows:

  1. If left is not compatible with right, abort.

  2. Let bindings be a new empty mapping from scope list variables to lists of scope identifiers.

  3. Let new symbol stack be a copy of left’s symbol stack.

  4. For each precondition symbol in right’s symbol stack precondition, and corresponding scoped symbol in left’s symbol stack:

    1. Remove scoped symbol from new symbol stack.

    2. If precondition symbol’s scope list variable is not missing, then:

      1. Let scope list variable be precondition symbol’s scope list variable.

      2. Let scope identifiers be scoped symbol’s list of scope identifiers.

        (We know that this field must be present because we’ve already checked that left and right are compatible.)

      3. Add a mapping from scope list variable to scope identifers to bindings.

        (We know that there can’t already be an entry for scope list variable because scope list variables are required to be unique across the entirety of a partial path.)

  5. If right’s scope stack precondition’s scope list variable is not missing, then:

    1. Let scope list variable be right’s scope stack precondition’s scope list variable.

    2. Let scope identifiers be left’s scope stack.

    3. Add a mapping from scope list variable to scope identifers to bindings.

      (We know that there can’t already be an entry for scope list variable because scope list variables are required to be unique across the entirety of a partial path.)

  6. For each postcondition symbol in right’s symbol stack postcondition, in reverse order:

    1. Let scoped symbol be a new scoped symbol whose:

      • symbol is postcondition symbol’s symbol
      • list of scope identifiers is empty
    2. If postcondition symbol’s attached scope list is not missing:

      1. Let attached scope list be postcondition symbol’s attached scope list.

      2. Let scope identifiers be a copy of attached scope list’s list of scope identifiers.

      3. If attached scope list’s scope list variable is not missing:

        1. Let bound scope identifiers be the result of looking up attached scope list’s scope list variable in bindings.

          We know this entry must be present because in a well-formed partial path, any scope list variable mentioned in the postcondition must also be present in the precondition.

        2. Append bound scope identifiers to the end of scope identifiers.

      4. Set scoped symbol’s list of scope identifiers to be scope identifiers.

    3. Prepend scoped symbol to the beginning of new symbol stack.

  7. With scope stack postcondition as right’s scope stack postcondition, do the following:

    1. Let new scope stack be a copy of scope stack postcondition’s list of scope identifiers.

    2. If scope stack postcondition’s scope list variable is not missing:

      1. Let bound scope identifiers be the result of looking up scope stack postcondition’s scope list variable in bindings.

        We know this entry must be present because in a well-formed partial path, any scope list variable mentioned in the postcondition must also be present in the precondition.

      2. Append bound scope identifiers to the end of new scope stack.

  8. Let new path be a new path whose:

    • starting node is left’s starting node
    • ending node is right’s ending node
    • list of edges is the concatenation of left’s list of edges and right’s list of edges
    • symbol stack is new symbol stack
    • scope stack is new scope stack

Algorithm 9: Finding name binding paths using partial paths

Given a set of files and a set of reference nodes, we can find the set of paths that define which definitions each of those references map to:

For the in-repository case, files will be all of the files of the current repo at the current commit. For cross-repository, files will contain that, as well as all of the files in any upstream dependency’s repository (at the most recent commit that satisfies the current repo’s package manifest).

  1. Let database be the set of partial paths that belong to any file in files.

  2. Let current paths and complete paths be new empty sets of paths.

  3. For each partial path in database whose starting node is an element of reference nodes:

    1. Try to promote partial path into an initial path. If unsuccessful, skip this partial path.

    2. Add initial path to current paths.

  4. While current paths is not empty:

    1. Remove an arbitrary path from current paths.

    2. Let extensions be the set of partial paths in database that are compatible with path.

    3. For each partial path in extensions:

      1. [TODO: We need a cycle detection heuristic here too.]

      2. Let new path be the result of attempting to concatenate partial path onto path. If unsuccessful, skip this partial path.

      3. Attempt to resolve new path. If unsuccessful, skip this partial path.

      4. If new path is complete, append it to complete paths.

      5. Append new path to current paths.

  5. Return complete paths.

Find all references algorithm

Do the same thing as above, but you know, backwards. [TODO: Do this for real — we’ll need equivalents for algorithms 8 and 9, I think.]

More examples

Example paths

Given the above, we can find all of the partial paths in our three example files:

# main.py₀
from a import *
print(foo)

Symbols

Symbol Location Type
__main__0 null Def
a1 1,5 - 1,6 Ref
foo2 2,6 - 2,9 Ref

The definition for the file’s module has a null location because it’s the file as a whole that defines the module; the textual name of the module doesn’t actually appear anywhere in the file.

Partial paths

Symbols Scopes From Edges To Scopes Symbols
    a1 P root   a
    foo2 P×5 root   a . foo
__main__ σ root P __main__0 σ  
__main__ . σ root P×8 root σ a .

Note that when a partial path starts at a reference, or ends at a definition, what we’re storing is a reference to the corresponding symbol entry, not the text of the symbol.

# a.py₃
from b import *

Symbols

Symbol Location Type
a3 null Def
b4 1,5 - 1,6 Ref

Partial paths

Symbols Scopes From Edges To Scopes Symbols
    b4 P root   b
a σ root P a3 σ  
a . σ root P×7 root σ b .
# b.py₅
foo = 2

Symbols

Symbol Location Type
b5 null Def
foo6 1,0 - 1,3 Def

Partial paths

Symbols Scopes From Edges To Scopes Symbols
b σ root P b5 σ  
b . foo σ root PPPPP foo6 σ  

Turning the crank

  1. User clicks on foo2 in main.py. Load in all of the partial paths that start with that reference, and promote them to paths. The current paths set starts off as:

    # From Edges To Scopes Symbols
    1 foo2 P×5 root   a . foo
  2. Path 1 has two compatible partial paths in a.py:

    # Symbols Scopes From Edges To Scopes Symbols
    1a a σ root P a3 σ  
    1b a . σ root P×7 root σ b .

    Concatenating path 1 with partial path 1a results in an invalid path (it ends in a definition but doesn’t have an empty ending symbol stack), so we discard it.

    After concatenating path 1 with the partial path 1b, the current paths list at the end of this iteration is:

    # From Edges To Scopes Symbols
    2 foo2 P×12 root   b . foo

    In the binding set that we create while concatenating, σ is mapped to an empty scope stack.

  3. Path 2 has two compatible partial paths in b.py:

    # Symbols Scopes From Edges To Scopes Symbols
    2a b σ root P b5 σ  
    2b b . foo σ root P×5 foo6 σ  

    Concatenating path 2 with partial path 2a results in an invalid path (it ends in a definition but doesn’t have an empty ending symbol stack), so we discard it.

    After concatenating path 2 with partial path 2b, the current paths list at the end of this iteration is:

    # From Edges To Scopes Symbols
    3 foo2 P×17 foo6    

    In the binding set that we create while concatenating, σ is mapped to an empty scope stack.

  4. All of the current paths are valid and complete, and so we have our set of resolved definitions.

A more complex example

We can split the “print the value of a class field through a function parameter” example into separate files:

# main.py₀
from a import *
from b import *
print(foo(A).bar)

Symbols

Symbol Location Type
__main__0 null Def
a1 1,5 - 1,6 Ref
b2 2,5 - 2,6 Ref
foo3 3,6 - 3,9 Ref
A4 3,10 - 3,11 Ref
bar6 3,13 - 3,16 Ref

Partial paths

Symbols Scopes From Edges To Scopes Symbols
    a1 P root   a
    b2 P root   b
    foo3 P×5 root   b . foo
    foo3 P×6 root   a . foo
    A4 P×5 root   b . A
    A4 P×6 root   a . A
    bar6 P×8 root   b . foo ()/5 . bar
    bar6 P×9 root   a . foo ()/5 . bar
0 σ (5) P×7 root σ b . A
0 σ (5) P×8 root σ a . A
__main__ σ root P __main__0 σ  
__main__ . σ root P×8 root σ b .
__main__ . σ root P×9 root σ a .
# a.py₇
def foo(x):
    return x₁₀

Symbols

Symbol Location Type
a7 null Def
foo8 1,4 - 1,7 Def
x9D 1,8 - 1,9 Def
x10 2,11 - 2,12 Ref

Partial paths

Symbols Scopes From Edges To Scopes Symbols
    x10 P×4 x9D    
a σ root P a7 σ  
a . foo σ root P×5 foo8 σ  
a . foo ()1 σ root P×13 (•) σ1 x
a . foo ()1 σ root P×13 (•) σ1 0
# b.py₁₁
class A₁₂:
    bar₁₃ = 1

Symbols

Symbol Location Type
b11 null Def
A12 1,6 - 1,7 Def
bar13 2,4 - 2,7 Def

Partial paths

Symbols Scopes From Edges To Scopes Symbols
b σ root P b11 σ  
b . A σ root P×5 A12 σ  
b . A . bar σ root P×8 bar13 σ  
b . A ()1 . bar σ root P×11 bar13 σ1  

Turning the crank

  1. User clicks on bar6 in main.py. Load in all of the partial paths that start with that reference, and promote them to paths. The current paths set starts off as:

    # From Edges To Scopes Symbols
    1 bar6 P×8 root   b . foo ()/5 . bar
    2 bar6 P×9 root   a . foo ()/5 . bar
  2. Path 1 has one compatible partial path in b.py:

    # Symbols Scopes From Edges To Scopes Symbols
    1a b   root P b11    

    However, concatenating the two together results in an invalid path (since it ends at definition b11 but does not have an empty ending symbol stack). So we discard path 1.

    Path 2 has four compatible partial paths in a.py:

    # Symbols Scopes From Edges To Scopes Symbols
    2a a σ root P a7 σ  
    2b a . foo σ root P×5 foo8 σ  
    2c a . foo ()1 σ root P×13 (•) σ1 x
    2d a . foo ()1 σ root P×13 (•) σ1 0

    Concatenating path 2 with partial paths 2a or 2b results in invalid paths (since they end at definition a7 and foo8 but do not have an empty ending symbol stack or scope stacks). So we discard partial paths 2a and 2b.

    After concatenating path 2 with partial paths 2c and 2d respectively, the current paths list at the end of this iteration is:

    # From Edges To Scopes Symbols
    3’ bar6 P×9, jump, P×13 (•) 5 x . bar
    4’ bar6 P×9, jump, P×13 (•) 5 0 . bar

    In the binding set that we create while concatenating, σ is mapped to an empty scope stack, and σ1 is mapped to the scope stack ⟨5⟩.

    Each of these paths end in a jump to scope node, which we then resolve:

    # From Edges To Scopes Symbols
    3 bar6 P×9, jump, P×13 (5)   x . bar
    4 bar6 P×9, jump, P×13 (5)   0 . bar
  3. Path 3 has no compatible partial paths in any file, so we discard it.

    Path 4 has two compatible partial paths in main.py:

    # Symbols Scopes From Edges To Scopes Symbols
    4a 0 σ (5) P×7 root σ b . A
    4b 0 σ (5) P×8 root σ a . A

    After concatenating path 4 with partial paths 4a and 4b respectively, the current paths list at the end of this iteration is:

    # From Edges To Scopes Symbols
    5 bar6 P×9, jump, P×20 root   b . A . bar
    6 bar6 P×9, jump, P×21 root   a . A . bar

    In the binding set that we create while concatenating, σ is mapped to an empty scope stack.

  4. Path 6 has one compatible partial path in a.py:

    # Symbols Scopes From Edges To Scopes Symbols
    6a a σ root P a7 σ  

    However, concatenating the two together results in an invalid path (since it ends at definition a7 but does not have an empty ending symbol stack). So we discard path 6.

    Path 5 has three compatible partial paths in b.py:

    # Symbols Scopes From Edges To Scopes Symbols
    5a b σ root P b11 σ  
    5b b . A σ root P×5 A12 σ  
    5c b . A . bar σ root P×8 bar13 σ  

    Concatenating path 5 with partial paths 5a or 5b results in invalid paths, so we discard them.

    After concatenating path 5 with partial path 5c, the current paths list at the end of this iteration is:

    # From Edges To Scopes Symbols
    7 bar6 P×9, jump, P×28 bar13    

    In the binding set that we create while concatenating, σ is mapped to an empty scope stack.

  5. All of the current paths are valid and complete, and so we have our set of resolved definitions.

Open questions