Supercharge your code with AST: Abstract Syntax Trees

Cole Turner

July 20, 20207 min read

Blog

LOVE

WOW

LOL

Abstract Syntax Trees (AST) is a fancy technical term to describe how programming language parsers understand our code. So for example, when we take the following code:

console.log("JavaScript ASTs are fun!");

A JavaScript parser will take this code and turn it into an Abstract Syntax Tree (AST). The tree appears as a deep nested object:

{
  "type": "CallExpression",
  "callee": {
    "type": "MemberExpression",
    "computed": false,
    "object": {
      "type": "Identifier",
      "name": "console"
    },
    "property": {
      "type": "Identifier",
      "name": "log"
    }
  },
  "arguments": [
    {
      "type": "Literal",
      "value": "JavaScript ASTs are fun!",
      "raw": "\"JavaScript ASTs are fun!\""
    }
  ]
}

Included in the tree will also be tokens and numbers to identify where in the source code the tree elements will appear. Pretty neat!

Codemods

What's great about Abstract Syntax Trees (AST) is that we can make changes to the tree to create new source code. Using JavaScript, we can write a codemod to transform the input code using a set of pre-programmed instructions.

ELI5: AST vs Codemod

To recap, AST is the representation of our source code's syntax. A codemod makes changes to the AST to transform the input into a new output. Codemods are given a set of instructions to modify properties within the tree, which contains deeply nested objects that represent the contents of the source code.

AST: Types and Properties

A node in the Abstract Syntax Tree is any object that can standalone to represent a piece of source code. The tree is made up of many nodes, which can nest within each other, and contain the recipe for how to print the syntax.

Every node has the following properties:

Type Kind
Location (start, end)
Tokens - used to identify syntactical choices

Every kind of node has its own set of additional properties. These additional properties help add some additional information to the recipe for printing that node.

Example 1: Function Declaration

A Function Declaration is the name given to the AST Node type that represents how we declare functions in JavaScript.

function petTheDog(dog) {
}

When we print out the AST for this function declaration, that looks like this:

{
    "type": "FunctionDeclaration",
    "id": {
      "type": "Identifier",
      "name": "petTheDog"
    },
    "params": [
      {
        "type": "Identifier",
        "name": "dog"
      }
    ],
    "body": {
      "type": "BlockStatement",
      "body": []
    },
    "generator": false,
    "expression": false,
    "async": false
  }

In the example above you will see a few types:

Function Declaration
Identifier
Block Statement

The Function Declaration type is the ancestor node (the whole source code). The name of the function is represented as an Identifier, which is used for variables and function names. The first parameter in the function signature, dog, is also represented as an Identifier. Lastly there is the Block Statement, which represents the source code that would appear within the curly brackets. The body attribute would normally be filled with other kinds of nodes.

Isn't that really neat? Let's look at another example to compare.

Example 2: Variable Declaration

A Variable Declaration is composed of many types, and is a way to represent assigning a value to an identifier.

const truths = {
  blackLives: "Matter"
};

This variable declaration assigns an object as the value to the truths variable. Here's how that would appear in AST:

{
  "type": "VariableDeclaration",
  "declarations": [
    {
      "type": "VariableDeclarator",
      "id": {
        "type": "Identifier",
        "name": "truths",
        "range": [6, 12]
      },
      "init": {
        "type": "ObjectExpression",
        "properties": [
          {
            "type": "Property",
            "key": {
              "type": "Identifier",
              "name": "blackLives"
            },
            "computed": false,
            "value": {
              "type": "Literal",
              "value": "Matter",
              "raw": "\"Matter\""
            },
            "kind": "init",
            "method": false,
            "shorthand": false
          }
        ]
      }
    }
  ],
  "kind": "const"
}

In the example above you will see a few types:

Variable Declaration
Object Expression
Property
Literal

The Variable Declaration type represents the entire source code. The Object Expression represents the value that is assigned to the input Identifier. The object expression is a node that contains multiple properties, which represents the key/value pair. The property in this example uses a Property node type, containing instructions how to describe the key/value pair. They key is represented as an Identifer and the value is represented by the Literal node type. This example also demonstrates some of the additional properties (method, shorthand, computed) that make up the source code recipe.

AST Types: All the Things!

To represent every element of source code, we need tens to hundreds of types. Here is just the short list of some of the types you can use to identify JavaScript source code:

For a more comprehensive list of all the types, checkout the Esprima Abtract Syntax Tree Format guide.

Supercharging your source code with JSCodeshift

Now that we understand Abstract Syntax Trees, and how we can use codemods, it's time to introduce JSCodeshift.

jscodeshift is a toolkit for running codemods over multiple JavaScript or TypeScript files. It provides:
- A runner, which executes the provided transform for each file passed to it. It also outputs a summary of how many files have (not) been transformed. - A wrapper around recast, providing a different API. Recast is an AST-to-AST transform tool and also tries to preserve the style of original code as much as possible.
via JSCodeshift repository.

The Basics: JSCodeshift 101

This utility is great for writing and running codemods that can be re-used across your codebase.

In addition to the types we've learned from the Abstract Syntax Tree, there are two new object types that jscodeshift provides:

Node Path: an object that contains a node, with additional metadata for manipulating that node or navigating through the tree.
Collection: an iterable used to navigate through many Node Paths.

These objects will help you structure your codemod to easily walk through the source code to transform the input. There are also some additional helper methods that make it easier to transverse the syntax tree:

// Reads string into a `Collection`
const j = api.jscodeshift(sourceCode);

// Finds any node in a collection
const collection = j.find(j.Identifier, { name: 'dog' });

// Searches a collection for variable declarations
collection.findVariableDeclarators('truths');

// Renames variable declarators
collection.findVariableDeclarators('facts').renameTo('truths');

// Replace a node
j.find(j.Identifier, { name: 'foo' }).forEach(nodePath -> {
  j(nodePath).replaceWith(j.identifier('bar');
});

// Prints the output code from the syntax tree
collection.toSource({
  /* print options */
});

In the example above, you will notice that the jscodeshift API provides objects for any of the AST node types. For example, for identifiers, the API provides:

// Uppercase is used for searching/identifying types
j.find(j.CallExpression)

// Lowercase is used for composing nodes of a type
j.callExpression(callee, arguments)

Finding and Filtering

In the examples above we saw that jscodeshift has an API for finding nodes by a certain type. Here's another example:

const j = api.jscodeshift(sourceCode);

// Find types by a kind, returning a collection
j.find(j.FunctionDeclaration);

In a larger codebase, this will return all of the function declarations. However, if we don't want to transform all of the functions, we will need to add additional filters.

const j = api.jscodeshift(sourceCode);

// Filter the find operation with additional metadata
// Only match functions with the name "petTheDog"
j.find(j.FunctionDeclaration, { id: { name: 'petTheDog' } });

Now our transform will only run against the function declarations matched by the filter. In some cases, we will need to be very precise with our filters, and use deep matching to target only the code we want to change.

What if I need to filter by an item in an array?

The jscodeshift API provides mechanisms for deep filtering, such as only matching a specific item in an array:

const j = api.jscodeshift(sourceCode);

// Filter the find operation with additional metadata
// Only match functions with the name "petTheDog"
j.find(j.FunctionDeclaration, {
  id: { name: 'petTheDog' },
  params: [, /* arg0 */ { name: 'dogBreed' }],
});

In the example above, now our find operation will only match function declarations where the name is petTheDog and the second parameter is an identifier with the name dogBreed. That's super useful for avoiding false matches.

What if I need to filter with some custom logic?

The find operation also has the ability to specify functions as a matcher, so you can filter deeply with custom logic.

const j = api.jscodeshift(sourceCode);

// Filter the find operation with additional metadata
// Only match functions with the name "petTheDog"
j.find(j.FunctionDeclaration, {
  id: { name: 'petTheDog' },
  params: [, /* arg0 */ { name: (value) => value.startsWith('dog') }],
});

Now our find operation will perform a similar match, however instead of matching function declarations with the second argument dogBreed, it will instead match any function where the second argument starts with the string "dog." This would match: function petTheDog(dog, dogBreed) and function petTheDog(dog, dogColor).

This makes finding and filtering easy to use, and flexible enough to change only the code you want!

Leveling up with jscodeshift

Now you're ready to get started writing codemods. To learn more about JSCodeshift, checkout the repository. It's also worth getting familiar with recast, which is the transformer and printer that is used under the hood.

Checkout this awesome, curated list of codemods - for inspiration!

Starting your AST journey with ASTExplorer.net

In all the examples above, we've parsed the source code and printed the represenation using JSON. To do this, I've used a tool called ASTExplorer.net.

This tool is really useful for parsing, printing, and transforming all kinds of source code. It is absolutely essential for getting started and writing codemods. ASTexplorer allows you to input source code, write your codemod, browse the syntax tree, and view the transform output on the fly!

Ready, Set, Go!

That's all you need to get started with Abstract Syntax Trees, Codemods, and JSCodeshift. These tools are really useful for large scale refactors, or making repeated changes with ease.

Here are some ideas for your next codemod:

Refactoring API or making codebase platform changes.
Migrating or removing deprecated packages (underscore/ lodash -> native prototypes).
Upgrading from ES6 to ES6.

LOVE

WOW

LOL

Supercharge your code with AST: Abstract Syntax Trees

Codemods

ELI5: AST vs Codemod

AST: Types and Properties

Example 1: Function Declaration

Example 2: Variable Declaration

AST Types: All the Things!

Supercharging your source code with JSCodeshift

The Basics: JSCodeshift 101

Finding and Filtering

Leveling up with jscodeshift

Starting your AST journey with ASTExplorer.net

Ready, Set, Go!

Keep reading...

Standing Out LOUDER in the Technical Interview

Search Engine Optimization - Essentials for Web Developers