Posted on March 27, 2018

Grok Your Bazel Build: The Action Graph

Bazel has powerful tools to inspect and monitor your build processes. A recent addition is the Action Graph.

The action graph is different from the target dependency graph, which is generated from Bazel’s loading phase. You might know the target graph from bazel query:

→ bazel query 'deps(//my:target)’  --output=graph > target_graph.in
→ dot -Tpng < target_graph.in > target_graph.png
→ open target_graph.png

If you’re looking for the target graph, check out this Bazel blog post on visualizing your build.

The action graph contains a different set of information: file-level dependencies, full command lines, and other information Bazel needs to execute the build. If you are familiar with Bazel’s build phases, the action graph is the output of the loading and analysis phase and used during the execution phase.

However, Bazel does not necessarily execute every action in the graph. It only executes if it has to, that is, the action graph is the super set of what is actually executed.

The action graph is generated by:

validating the target graph
analyzing the target graph
creating artifact representations
resolving artifacts’ filepaths to the relative paths in the execution root
applying any required configuration, like platform-specific compiler flags.

You can obtain it using bazel dump with these flags:

--action_graph=path/to/output: Specifies the location of the output file. This is relative to the WORKSPACE root. You can also provide an absolute path.
--action_graph:targets=//my:target: Specifies the target(s) you’re interested in.
--action_graph:include_cmdline=true: Specifies whether to include the full generated command lines.

Dumping the graph

Let’s walk though an example of dumping the action graph of an Android application build. We will use the Android example packaged in the Bazel source tree. Note that this requires the Android SDK and NDK:

→ git clone https://github.com/bazelbuild/bazel bazel_graph && cd bazel_graph
# Uncomment android_{sdk, ndk}_repository lines in WORKSPACE
→ grep “android_” WORKSPACE
android_sdk_repository(name = "androidsdk")
android_ndk_repository(name = "androidndk")

Add --experimental_strict_action_env to the project .bazelrc to prevent $PATH pollution.

→ cat .bazelrc
build --experimental_strict_action_env

The android_binary target is //examples/android/java/bazel:hello_world. It’s defined in examples/android/java/bazel/BUILD:

android_binary(
    name = "hello_world",
    srcs = glob([
        "MainActivity.java",
        "Jni.java",
    ]),
    manifest = "AndroidManifest.xml",
    resource_files = glob(["res/**"]),
    deps = [
        ":jni",
        ":lib",
        "@androidsdk//com.android.support:appcompat-v7-25.0.0",
    ],
)

Let’s start by running the loading and analysis phase, and skipping the execution phase with the --nobuild flag.

→ bazel build --nobuild //examples/android/java/bazel:hello_world
INFO: Analysed target //examples/android/java/bazel:hello_world (31 packages loaded).
INFO: Found 1 target...
INFO: Elapsed time: 9.818s
INFO: Build completed successfully, 0 total actions

Note 0 total actions. This doesn’t mean that there are no generated actions, but that there are no executed actions.

Let’s dump the graph from the Bazel server:

→ bazel dump --action_graph=action_graph.bin \
    --action_graph:targets=//examples/android/java/bazel:hello_world \ 
    --action_graph:include_cmdline=true
Warning: this information is intended for consumption by developers
only, and may change at any time.  Script against it at your own risk!

Dumping action graph to 'action_graph.bin'

We specify

--action_graph:targets=//examples/android/java/bazel:hello_world

because the default value of the flag is ..., which will dump every analyzed target, recursively.

Check that the output is not empty:

→ ls -al action_graph.bin
-rw-r--r--  1 jin  staff  101765 Mar 24 23:02 action_graph.bin

If it is empty, it means that Bazel hasn’t analyzed the target. Make sure that build --nobuild and dump --action_graph:targets are referencing the same target.

Reading the graph

action_graph.bin is a raw protobuf message. analysis.proto is the protobuf that defines the types of the message. Let’s use the protobuf compiler, protoc, to decode it:

→ protoc --decode=analysis.ActionGraphContainer \ 
    src/main/protobuf/analysis.proto \
    < action_graph.bin > action_graph.txt

For reference, I’ve uploaded my action_graph.txt here. It’s in human readable plain text, so that’s great!

Analyzing the graph

Now that it is possible to read the graph, we can analyze some of the useful bits: the file contains a ton of information!

The top level message type is ActionGraphContainer. Let’s investigate each of these message types one by one.

message ActionGraphContainer {
  repeated Artifact artifacts = 1;
  repeated Action actions = 2;
  repeated Target targets = 3;
  repeated DepSetOfFiles dep_set_of_files = 4;
  repeated Configuration configuration = 5;
  repeated AspectDescriptor aspect_descriptors = 6;
  repeated RuleClass rule_classes = 7;
}

RuleClass

Starting with the simplest, we have a one RuleClass message.

rule_classes {
  id: "0"
  name: "android_binary"
}

This is no surprise: we dumped the action graph of an android_binary target.

Target

targets {
  id: "0"
  label: "//examples/android/java/bazel:hello_world"
  rule_class_id: "0"
}

Correspondingly, there’s also one Target message. We see that it encodes the id of the target’s RuleClass. In this case, the rule_class_id refers to android_binary.

Configuration

configuration {
  id: "0"
  mnemonic: "darwin-fastbuild"
  platform_name: "darwin"
}

We have one build configuration mnemonic: darwin-fastbuild. This is a reference to our execution platform (macOS) and the fastbuild compilation mode.

Artifact

artifacts {
  id: "16"
  exec_path: "external/local_jdk/bin/javac"
}

artifacts {
  id: "190"
  exec_path: "bazel-out/android-armeabi-v7a-fastbuild/bin/external/androidsdk/com.android.support/_aar/unzipped/resources/support-vector-drawable-25.0.0"
  is_tree_artifact: true
}

artifacts {
  id: "227"
  exec_path: "examples/android/java/bazel/res/values/styles.xml"
}

artifacts {
  id: "229"
  exec_path: "bazel-out/host/genfiles/external/androidsdk/aapt_runner.sh"
}

artifacts {
  id: "349"
  exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_unsigned.apk"
}

Every file that Bazel handles is an Artifact. It represents:

a source file
or a derived output file

The “file” can also be a directory (e.g. artifact 190), which is referred to as a TreeArtifact. Check out the detailed documentation on the different Artifact types here.

exec_path is the relative path of the Artifact within the execution root. The execution root is the working directory where Bazel executes all actions during the execution phase:

→ bazel info execution_root
.....................
/private/var/tmp/_bazel_jin/ed227ac31d5e65f9c3effb1d1fe2605e/execroot/io_bazel

The exec_paths come in different prefix flavours:

external/..: Contains symlinks to external repositories, such as @local_jdk and @androidsdk.
examples/..: Contains the source files. This is a symlink to the actual examples/ folder.
bazel-out/host/genfiles/..: Contains generated sources, usually from genrules, for the host target BuildConfiguration.
bazel-out/darwin-fastbuild/bin/..: Contains derived binary outputs for the darwin target BuildConfiguration.
bazel-out/android-armeabi-v7a-fastbuild/bin/..: Contains derived binary outputs for the android-armeabi-v7a target BuildConfiguration.

DepSetOfFiles

dep_set_of_files {
  id: "198"
  transitive_dep_set_ids: "136"
  direct_artifact_ids: "292"
}

dep_set_of_files {
  id: "136"
  transitive_dep_set_ids: "137"
  direct_artifact_ids: "337"
}

Depset is a data structure for collecting data on transitive dependencies. It’s optimized to be time and space efficient around merging, because it’s common to have very large depsets, scaling to hundreds of thousands of files. Read the documentation to learn more about depsets.

In our protobuf, a dep_set_of_files can refer to other depsets with transitive_dep_set_ids, or directly to artifacts with direct_artifact_ids.

It’s crucial to highlight the ability to recursively refer to other depsets: it’s an important catalyst for space efficiency. Rule implementations should not flatten depsets to lists unless they are at the top level. Flattening large depsets incur huge memory consumption.

Action

Finally, we have Action. An action, as described in the protobuf’s documentation, is a function from Artifact to Artifact. It’s might be easier to think of an Action as all of the information required to create an output file, which usually contains a command line representation.

actions {
  target_id: "0"
  action_key: "e121f7eb29e0828eef502582d5134d37"
  mnemonic: "ResourceExtractor"
  configuration_id: "0"
  arguments: "bazel-out/host/bin/external/bazel_tools/tools/android/resource_extractor"
  arguments: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_deploy.jar"
  arguments: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/_dx/hello_world/extracted_hello_world_deploy.jar"
  input_dep_set_ids: "198"
  output_ids: "338"
}

targets {
  id: "0"
  label: "//examples/android/java/bazel:hello_world"
  rule_class_id: "0"
}

dep_set_of_files {
  id: "198"
  transitive_dep_set_ids: "136"
  direct_artifact_ids: "292"
}

artifacts {
  id: "338"
  exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/_dx/hello_world/extracted_hello_world_deploy.jar"
}

artifacts {
  id: "292"
  exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_deploy.jar"
}

In this selected Action, we are extracting resources out of a jar using a tool called resource_extractor. The full command line is captured with the list of arguments with the first argument as the executable. Every file referenced in the command line must be an Artifact in either in the transitive depset(s) input_dep_set_ids or artifact(s) output_ids. This enables Bazel to discover actions to run in order to get a requested output artifact.

The action_key is computed based on the command line that will be executed, which contains information like compiler flags, library locations and system headers. This enables Bazel to keep track of actions to invalidate and re-run incrementally, and cache aggressively if there is no need to rerun an action.

The Action’s configuration_id is 0, as this action is executed with the darwin-fastbuild BuildConfiguration.

Each Action has a mnemonic, which is a short human readable string to quickly understand what the Action is doing. We can grep the protobuf for all mnemonics to see mostly Android-related actions, like AndroidDexer and RClassGenerator.

→ grep "mnemonic" action_graph.txt | sort | uniq
  mnemonic: "AaptPackage"
  mnemonic: "AaptSplitResourceApk"
  mnemonic: "AndroidBuildSplitManifest"
  mnemonic: "AndroidDexManifest"
  mnemonic: "AndroidDexer"
  mnemonic: "AndroidInstall"
  mnemonic: "AndroidStripResources"
  mnemonic: "AndroidZipAlign"
  mnemonic: "ApkBuilder"
  mnemonic: "ApkSignerTool"
  mnemonic: "CppLink"
  mnemonic: "Desugar"
  mnemonic: "DexBuilder"
  mnemonic: "DexMerger"
  mnemonic: "Fail"
  mnemonic: "FileWrite"
  mnemonic: "InjectMobileInstallStubApplication"
  mnemonic: "JavaDeployJar"
  mnemonic: "JavaSourceJar"
  mnemonic: "Javac"
  mnemonic: "ManifestMerger"
  mnemonic: "RClassGenerator"
  mnemonic: "ResourceExtractor"
  mnemonic: "ShardClassesToDex"
  mnemonic: "Symlink"
  mnemonic: "Turbine"
  mnemonic: "darwin-fastbuild"

Summary

The action graph is a powerful tool to gain introspection into Bazel’s analysis and execution phases. It provides just enough information to visualize the Action data structure before it is transformed into an executable command line as seen with the --subcommands flag.

If you wish to learn more about the underlying data representation of the action graph, check out the design document of Bazel’s parallel evaluation and incrementality model, Skyframe.

Jin

Site built with Hakyll