jin.crypt.sg | Jin's blog

Generating pretty-printed sources with Bazel

2018-08-11T00:01:00Z

Posted on August 11, 2018

Generating pretty-printed sources with Bazel

Originally written as an answer for StackOverflow.

Introduction

Pretty-printers are excellent for enforcing style standards across the codebase. In this article, we’ll show how to use Bazel to generate pretty-printed sources in your build.

This method uses involves writing a new Bazel macro and rule. There is another method via aspects, but we are not covering that in this article.

For hermeticity reasons, Bazel does not modify your source files in place. If you want formatting-on-save (e.g. with gofmt or prettier), please use editor plugins instead.

As an example, let’s use the C++ tutorial from the Bazel C++ examples and clang-format for pretty-printing.

Setup

Let’s first mess up the formatting of main/hello-world.cc:

#include 



#include 

#include 

std::string get_greet(const std::string& who) { return "Hello " + who; }

void print_localtime() {
  std::time_t result =
    std::time(nullptr);
  std::cout << std::asctime(std::localtime(&result));
}

int main(int argc, char** argv) {
  std::string who = "world";
  if (argc > 1) {who = argv[1];}
  std::cout << get_greet(who) << std::endl;
  print_localtime();


  return 0;
}

And this is the BUILD file to build main/hello-world.cc:

# In main/BUILD
cc_binary(
    name = "hello-world",
    srcs = ["hello-world.cc"],
)

Macro: `clang_formatted_cc_binary`

Since cc_binary doesn’t know anything about clang-format or pretty-printing in general, let’s create a macro called clang_formatted_cc_binary and replace cc_binary with it. The BUILD file now looks like this:

# In main/BUILD
load("//:clang_format.bzl", "clang_formatted_cc_binary")

clang_formatted_cc_binary(
    name = "hello-world",
    srcs = ["hello-world.cc"],
)

Next, create a file called clang_format.bzl with a macro named clang_formatted_cc_binary. The macro is currently just a wrapper around native.cc_binary:

# In clang_format.bzl
def clang_formatted_cc_binary(**kwargs):
    native.cc_binary(**kwargs)

At this point, you can build the cc_binary target, but it’s not running clang-format yet. Let’s add an intermediary rule to do that in clang_formatted_cc_binary which we’ll call clang_format_srcs:

# In clang_format.bzl
def clang_formatted_cc_binary(name, srcs, **kwargs):
    # Using a filegroup for code cleaniness
    native.filegroup(
        name = name + "_unformatted_srcs",
        srcs = srcs,
    )

    clang_format_srcs(
        name = name + "_formatted_srcs",
        srcs = [name + "_unformatted_srcs"],
    )

    native.cc_binary(
        name = name,
        srcs = [name + "_formatted_srcs"],
        **kwargs
    )

Note that we are compiling the cc_binary’s the formatted sources, but retained the original name attribute to allow for in-place replacements of cc_binary -> clang_formatted_cc_binary within BUILD files.

Rule: `clang_format_srcs`

Finally, we’ll write the implementation of the clang_format_srcs rule, in the same clang_format.bzl file:

# In clang_format.bzl
def _clang_format_srcs_impl(ctx):
    formatted_files = []

    for unformatted_file in ctx.files.srcs:
        formatted_file = ctx.actions.declare_file("formatted_" + unformatted_file.basename)
        formatted_files += [formatted_file]
        ctx.actions.run_shell(
            inputs = [unformatted_file],
            outputs = [formatted_file],
            progress_message = "Running clang-format on %s" % unformatted_file.short_path,
            command = "clang-format %s > %s" % (unformatted_file.path, formatted_file.path),
        )

    return struct(files = depset(formatted_files))

clang_format_srcs = rule(
    attrs = {
        "srcs": attr.label_list(allow_files = True),
    },
    implementation = _clang_format_srcs_impl,
)

Here’s what this clang_format_srcs rule is doing:

Go through every source file in the target’s srcs attribute
For each source file, declare a output source file with the formatted_ prefix
Run clang-format on the unformatted file to produce the formatted output.

Results

Now, by executing bazel build //main:hello-world, Bazel runs the actions in clang_format_srcs before running the cc_binary compilation actions on the formatted files. We can prove this by running bazel build with the --subcommands flag:

$ bazel build //main:hello-world --subcommands
..
SUBCOMMAND: # //main:hello-world_formatted_srcs [action 'Running clang-format on main/hello-world.cc']
.. 
SUBCOMMAND: # //main:hello-world [action 'Compiling main/formatted_hello-world.cc']
.. 
SUBCOMMAND: # //main:hello-world [action 'Linking main/hello-world']
..

Looking at the contents of formatted_hello-world.cc, looks like clang-format did its job:

#include 
#include 

#include 

std::string get_greet(const std::string& who) { return "Hello " + who; }

void print_localtime() {
  std::time_t result = std::time(nullptr);
  std::cout << std::asctime(std::localtime(&result));
}

int main(int argc, char** argv) {
  std::string who = "world";
  if (argc > 1) {
    who = argv[1];
  }
  std::cout << get_greet(who) << std::endl;
  print_localtime();
  return 0;
}

If all you want are the formatted sources without compiling them, you can run build the target with the _formatted_srcs suffix from clang_format_srcs directly:

$ bazel build //main:hello-world_formatted_srcs
INFO: Analysed target //main:hello-world_formatted_srcs (0 packages loaded).
INFO: Found 1 target...
Target //main:hello-world_formatted_srcs up-to-date:
  bazel-bin/main/formatted_hello-world.cc
INFO: Elapsed time: 0.247s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

Questions to Ask Before Writing A Bazel Rule

2018-07-01T00:01:00Z

Posted on July 1, 2018

Questions to Ask Before Writing A Bazel Rule

Do you need a rule? Can you write a macro to compose and reuse existing rules? Or an aspect to traverse the existing build graph and execute additional actions?
What does your rule do? Does it already exist?
What files, if any, does it take as inputs?
What tool does it use? A compiler? A shell script?
Is the tool deterministic? Does every invocation of the tool with the same inputs generate the same outputs?
How is the tool provided to the rule? A binary installed in /usr/bin? A repository rule? Toolchains?
What output files does it generate?
Does the rule depend on the outputs of other rules using providers?
Does the rule provide inputs to other rules using providers?
What actions do you need to construct in order to generate the output files from the input files using the tool?

Grok Your Bazel Build: The Action Graph

2018-03-27T00:01:00Z

Posted on March 27, 2018

Grok Your Bazel Build: The Action Graph

Bazel has powerful tools to inspect and monitor your build processes. A recent addition is the Action Graph.

The action graph is different from the target dependency graph, which is generated from Bazel’s loading phase. You might know the target graph from bazel query:

→ bazel query 'deps(//my:target)’  --output=graph > target_graph.in
→ dot -Tpng < target_graph.in > target_graph.png
→ open target_graph.png

If you’re looking for the target graph, check out this Bazel blog post on visualizing your build.

The action graph contains a different set of information: file-level dependencies, full command lines, and other information Bazel needs to execute the build. If you are familiar with Bazel’s build phases, the action graph is the output of the loading and analysis phase and used during the execution phase.

However, Bazel does not necessarily execute every action in the graph. It only executes if it has to, that is, the action graph is the super set of what is actually executed.

The action graph is generated by:

validating the target graph
analyzing the target graph
creating artifact representations
resolving artifacts’ filepaths to the relative paths in the execution root
applying any required configuration, like platform-specific compiler flags.

You can obtain it using bazel dump with these flags:

--action_graph=path/to/output: Specifies the location of the output file. This is relative to the WORKSPACE root. You can also provide an absolute path.
--action_graph:targets=//my:target: Specifies the target(s) you’re interested in.
--action_graph:include_cmdline=true: Specifies whether to include the full generated command lines.

Dumping the graph

Let’s walk though an example of dumping the action graph of an Android application build. We will use the Android example packaged in the Bazel source tree. Note that this requires the Android SDK and NDK:

→ git clone https://github.com/bazelbuild/bazel bazel_graph && cd bazel_graph
# Uncomment android_{sdk, ndk}_repository lines in WORKSPACE
→ grep “android_” WORKSPACE
android_sdk_repository(name = "androidsdk")
android_ndk_repository(name = "androidndk")

Add --experimental_strict_action_env to the project .bazelrc to prevent $PATH pollution.

→ cat .bazelrc
build --experimental_strict_action_env

The android_binary target is //examples/android/java/bazel:hello_world. It’s defined in examples/android/java/bazel/BUILD:

android_binary(
    name = "hello_world",
    srcs = glob([
        "MainActivity.java",
        "Jni.java",
    ]),
    manifest = "AndroidManifest.xml",
    resource_files = glob(["res/**"]),
    deps = [
        ":jni",
        ":lib",
        "@androidsdk//com.android.support:appcompat-v7-25.0.0",
    ],
)

Let’s start by running the loading and analysis phase, and skipping the execution phase with the --nobuild flag.

→ bazel build --nobuild //examples/android/java/bazel:hello_world
INFO: Analysed target //examples/android/java/bazel:hello_world (31 packages loaded).
INFO: Found 1 target...
INFO: Elapsed time: 9.818s
INFO: Build completed successfully, 0 total actions

Note 0 total actions. This doesn’t mean that there are no generated actions, but that there are no executed actions.

Let’s dump the graph from the Bazel server:

→ bazel dump --action_graph=action_graph.bin \
    --action_graph:targets=//examples/android/java/bazel:hello_world \ 
    --action_graph:include_cmdline=true
Warning: this information is intended for consumption by developers
only, and may change at any time.  Script against it at your own risk!

Dumping action graph to 'action_graph.bin'

We specify

--action_graph:targets=//examples/android/java/bazel:hello_world

because the default value of the flag is ..., which will dump every analyzed target, recursively.

Check that the output is not empty:

→ ls -al action_graph.bin
-rw-r--r--  1 jin  staff  101765 Mar 24 23:02 action_graph.bin

If it is empty, it means that Bazel hasn’t analyzed the target. Make sure that build --nobuild and dump --action_graph:targets are referencing the same target.

Reading the graph

action_graph.bin is a raw protobuf message. analysis.proto is the protobuf that defines the types of the message. Let’s use the protobuf compiler, protoc, to decode it:

→ protoc --decode=analysis.ActionGraphContainer \ 
    src/main/protobuf/analysis.proto \
    < action_graph.bin > action_graph.txt

For reference, I’ve uploaded my action_graph.txt here. It’s in human readable plain text, so that’s great!

Analyzing the graph

Now that it is possible to read the graph, we can analyze some of the useful bits: the file contains a ton of information!

The top level message type is ActionGraphContainer. Let’s investigate each of these message types one by one.

message ActionGraphContainer {
  repeated Artifact artifacts = 1;
  repeated Action actions = 2;
  repeated Target targets = 3;
  repeated DepSetOfFiles dep_set_of_files = 4;
  repeated Configuration configuration = 5;
  repeated AspectDescriptor aspect_descriptors = 6;
  repeated RuleClass rule_classes = 7;
}

RuleClass

Starting with the simplest, we have a one RuleClass message.

rule_classes {
  id: "0"
  name: "android_binary"
}

This is no surprise: we dumped the action graph of an android_binary target.

Target

targets {
  id: "0"
  label: "//examples/android/java/bazel:hello_world"
  rule_class_id: "0"
}

Correspondingly, there’s also one Target message. We see that it encodes the id of the target’s RuleClass. In this case, the rule_class_id refers to android_binary.

Configuration

configuration {
  id: "0"
  mnemonic: "darwin-fastbuild"
  platform_name: "darwin"
}

We have one build configuration mnemonic: darwin-fastbuild. This is a reference to our execution platform (macOS) and the fastbuild compilation mode.

Artifact

artifacts {
  id: "16"
  exec_path: "external/local_jdk/bin/javac"
}

artifacts {
  id: "190"
  exec_path: "bazel-out/android-armeabi-v7a-fastbuild/bin/external/androidsdk/com.android.support/_aar/unzipped/resources/support-vector-drawable-25.0.0"
  is_tree_artifact: true
}

artifacts {
  id: "227"
  exec_path: "examples/android/java/bazel/res/values/styles.xml"
}

artifacts {
  id: "229"
  exec_path: "bazel-out/host/genfiles/external/androidsdk/aapt_runner.sh"
}

artifacts {
  id: "349"
  exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_unsigned.apk"
}

Every file that Bazel handles is an Artifact. It represents:

a source file
or a derived output file

The “file” can also be a directory (e.g. artifact 190), which is referred to as a TreeArtifact. Check out the detailed documentation on the different Artifact types here.

exec_path is the relative path of the Artifact within the execution root. The execution root is the working directory where Bazel executes all actions during the execution phase:

→ bazel info execution_root
.....................
/private/var/tmp/_bazel_jin/ed227ac31d5e65f9c3effb1d1fe2605e/execroot/io_bazel

The exec_paths come in different prefix flavours:

external/..: Contains symlinks to external repositories, such as @local_jdk and @androidsdk.
examples/..: Contains the source files. This is a symlink to the actual examples/ folder.
bazel-out/host/genfiles/..: Contains generated sources, usually from genrules, for the host target BuildConfiguration.
bazel-out/darwin-fastbuild/bin/..: Contains derived binary outputs for the darwin target BuildConfiguration.
bazel-out/android-armeabi-v7a-fastbuild/bin/..: Contains derived binary outputs for the android-armeabi-v7a target BuildConfiguration.

DepSetOfFiles

dep_set_of_files {
  id: "198"
  transitive_dep_set_ids: "136"
  direct_artifact_ids: "292"
}

dep_set_of_files {
  id: "136"
  transitive_dep_set_ids: "137"
  direct_artifact_ids: "337"
}

Depset is a data structure for collecting data on transitive dependencies. It’s optimized to be time and space efficient around merging, because it’s common to have very large depsets, scaling to hundreds of thousands of files. Read the documentation to learn more about depsets.

In our protobuf, a dep_set_of_files can refer to other depsets with transitive_dep_set_ids, or directly to artifacts with direct_artifact_ids.

It’s crucial to highlight the ability to recursively refer to other depsets: it’s an important catalyst for space efficiency. Rule implementations should not flatten depsets to lists unless they are at the top level. Flattening large depsets incur huge memory consumption.

Action

Finally, we have Action. An action, as described in the protobuf’s documentation, is a function from Artifact to Artifact. It’s might be easier to think of an Action as all of the information required to create an output file, which usually contains a command line representation.

actions {
  target_id: "0"
  action_key: "e121f7eb29e0828eef502582d5134d37"
  mnemonic: "ResourceExtractor"
  configuration_id: "0"
  arguments: "bazel-out/host/bin/external/bazel_tools/tools/android/resource_extractor"
  arguments: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_deploy.jar"
  arguments: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/_dx/hello_world/extracted_hello_world_deploy.jar"
  input_dep_set_ids: "198"
  output_ids: "338"
}

targets {
  id: "0"
  label: "//examples/android/java/bazel:hello_world"
  rule_class_id: "0"
}

dep_set_of_files {
  id: "198"
  transitive_dep_set_ids: "136"
  direct_artifact_ids: "292"
}

artifacts {
  id: "338"
  exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/_dx/hello_world/extracted_hello_world_deploy.jar"
}

artifacts {
  id: "292"
  exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_deploy.jar"
}

In this selected Action, we are extracting resources out of a jar using a tool called resource_extractor. The full command line is captured with the list of arguments with the first argument as the executable. Every file referenced in the command line must be an Artifact in either in the transitive depset(s) input_dep_set_ids or artifact(s) output_ids. This enables Bazel to discover actions to run in order to get a requested output artifact.

The action_key is computed based on the command line that will be executed, which contains information like compiler flags, library locations and system headers. This enables Bazel to keep track of actions to invalidate and re-run incrementally, and cache aggressively if there is no need to rerun an action.

The Action’s configuration_id is 0, as this action is executed with the darwin-fastbuild BuildConfiguration.

Each Action has a mnemonic, which is a short human readable string to quickly understand what the Action is doing. We can grep the protobuf for all mnemonics to see mostly Android-related actions, like AndroidDexer and RClassGenerator.

→ grep "mnemonic" action_graph.txt | sort | uniq
  mnemonic: "AaptPackage"
  mnemonic: "AaptSplitResourceApk"
  mnemonic: "AndroidBuildSplitManifest"
  mnemonic: "AndroidDexManifest"
  mnemonic: "AndroidDexer"
  mnemonic: "AndroidInstall"
  mnemonic: "AndroidStripResources"
  mnemonic: "AndroidZipAlign"
  mnemonic: "ApkBuilder"
  mnemonic: "ApkSignerTool"
  mnemonic: "CppLink"
  mnemonic: "Desugar"
  mnemonic: "DexBuilder"
  mnemonic: "DexMerger"
  mnemonic: "Fail"
  mnemonic: "FileWrite"
  mnemonic: "InjectMobileInstallStubApplication"
  mnemonic: "JavaDeployJar"
  mnemonic: "JavaSourceJar"
  mnemonic: "Javac"
  mnemonic: "ManifestMerger"
  mnemonic: "RClassGenerator"
  mnemonic: "ResourceExtractor"
  mnemonic: "ShardClassesToDex"
  mnemonic: "Symlink"
  mnemonic: "Turbine"
  mnemonic: "darwin-fastbuild"

Summary

The action graph is a powerful tool to gain introspection into Bazel’s analysis and execution phases. It provides just enough information to visualize the Action data structure before it is transformed into an executable command line as seen with the --subcommands flag.

If you wish to learn more about the underlying data representation of the action graph, check out the design document of Bazel’s parallel evaluation and incrementality model, Skyframe.

5 minute guide to Bazel, Part 2: Command lines and tools

2018-02-19T00:01:00Z

Posted on February 19, 2018

5 minute guide to Bazel, Part 2: Command lines and tools

The aim of this guide is to get you up and running with Bazel as fast as possible. The steps will assume you have Bazel installed.

This part will show how to run a command line using genrule. This rule is the generic way to specify sources, a tool (like a shell script), a command line, and the outputs. You can think of it as a way to define a function in your BUILD file with the following signature:

genrule :: (name, sources, tool, command) -> output

In this example, we want to create a C source file, copy it using cp, and run sed on it with a shell script, and build an executable from the result.

Let’s get started in an empty directory called dir.

Create an empty WORKSPACE file.

dir $ touch WORKSPACE

Create a file called main.c and write some C in it.

// dir/main.c

#include 

int main(int argc, char **argv) {
  printf("Hello Blaze.\n");
  return 0;
}

Write a BUILD file with the genrule to copy main.c.

# dir/BUILD

genrule(
  name = "copy_of_main",
  srcs = ["main.c"],
  outs = ["copy_of_main.c"],
  cmd = "cp $< $@",
)

$< expands to the location of main.c, and $@ expands to the location of copy_of_main.c. See the full list of supported variables here.

Let’s build this target, //:copy_of_main.

dir $ bazel build //:copy_of_main
....................
INFO: Analysed target //:copy_of_main (7 packages loaded).
INFO: Found 1 target...
Target //:copy_of_main up-to-date:
  bazel-genfiles/copy_of_main.c
INFO: Elapsed time: 10.323s, Critical Path: 0.08s
INFO: Build completed successfully, 2 total actions

dir $ cat bazel-genfiles/copy_of_main.c
#include 

int main(int argc, char **argv) {
  printf("Hello Blaze.\n");
  return 0;
}

The file is copied successfully!

Use the tool attribute to specify a separate tool to run in the cmd string.

We want to substitute the word “Blaze” with “Bazel” in the source code, because that’s the name the build system was open sourced with. Let’s write the genrule for that:

# dir/BUILD

# ...

genrule(
  name = "renamed_main",
  srcs = ["copy_of_main.c"],
  outs = ["renamed_main.c"],
  tools = ["substitute.sh"],
  cmd = "$(location substitute.sh) 'Blaze' 'Bazel' $< $@",
)

location is Bazel’s helper function to resolve the location of the tool when this command is executed.

Then, create a file substitute.sh that calls out to sed:

#!/bin/bash

sed "s/$1/$2/" $3 > $4

Don’t forget to make it executable with chmod u+x substitute.sh.

Build the target //:renamed_main.

dir $ bazel build :renamed_main
INFO: Analysed target //:renamed_main (0 packages loaded).
INFO: Found 1 target...
Target //:renamed_main up-to-date:
  bazel-genfiles/renamed_main.c
INFO: Elapsed time: 0.261s, Critical Path: 0.07s
INFO: Build completed successfully, 2 total actions

dir $ cat bazel-genfiles/renamed_main.c
#include 

int main(int argc, char **argv) {
  printf("Hello Bazel.\n");
  return 0;
}

We are now using the correct name.

To wrap it all up, let’s use cc_binary from Part 1.

# dir/BUILD

cc_binary(
  name = "hello_bazel",
  srcs = [":renamed_main"],
)

dir $ bazel run //:hello_bazel
INFO: Analysed target //:hello_bazel (3 packages loaded).
INFO: Found 1 target...
Target //:hello_bazel up-to-date:
  bazel-bin/hello_bazel
INFO: Elapsed time: 5.817s, Critical Path: 0.52s
INFO: Build completed successfully, 5 total actions

INFO: Running command line: bazel-bin/hello_bazel
Hello Bazel.

This is how we can use genrule to preprocess files before passing them in to other rules. It’s a simple and flexible way to create pipelines using Bazel.

5 minute guide to Bazel, Part 1: C and C++

2018-02-18T00:01:00Z

Posted on February 18, 2018

5 minute guide to Bazel, Part 1: C & C++

The aim of this guide is to get you up and running with Bazel as fast as possible. The steps will assume you have Bazel installed.

Some quick notes before we start: the most important idea about Bazel is that it is declarative.

You should never need to type out the intermediary build steps; that is the responsibility of the language/platform rule authors. The build steps are hidden away in the rule implementations so you can focus on just telling Bazel what sources to build.

Let’s get started. Each example here assumes that you’re in an empty directory called dir.

Create an empty WORKSPACE file.

dir $ touch WORKSPACE

Create a file called main.c and write some C in it.

// dir/main.c

#include 

int main(int argc, char **argv) {
  printf("Hello Bazel.\n");
  return 0;
}

Write a BUILD file and tell Bazel you want an executable built from main.c.

# dir/BUILD

cc_binary(
  name = "hello_bazel",
  srcs = ["main.c"],
)

The cc_binary rule is all Bazel needs to know that you want to build C/C++ sources.

Build and run the hello_bazel target, //:hello_bazel.

dir $ bazel run //:hello_bazel
...............
INFO: Analysed target //:hello_bazel (9 packages loaded).
INFO: Found 1 target...
Target //:hello_bazel up-to-date:
  bazel-bin/hello_bazel
INFO: Elapsed time: 12.423s, Critical Path: 0.44s
INFO: Build completed successfully, 5 total actions

INFO: Running command line: bazel-bin/hello_bazel
Hello Bazel.

// refers to the directory level where the WORKSPACE is. : specifies a target in a BUILD file.

If you just want to build it, use bazel build instead of bazel run.

dir $ bazel build //:hello_bazel
INFO: Analysed target //:hello_bazel (9 packages loaded).
INFO: Found 1 target...
Target //:hello_bazel up-to-date:
  bazel-bin/hello_bazel
INFO: Elapsed time: 2.058s, Critical Path: 0.24s
INFO: Build completed successfully, 5 total actions

The executable is in the bazel-bin symlink: bazel-bin/hello_bazel.

dir $ cp bazel-bin/hello_bazel hello_bazel
dir $ ./hello_bazel
Hello Bazel.

That’s it!

Semantics | Notes on Types and Programming Languages

2017-05-03T00:01:00Z

Posted on May 3, 2017

Semantics

Notes on Types & Programming Languages by Benjamin Pierce (2002)

The design of a programming language can be divided into two parts: syntax and semantics.

The syntax describes how it looks like.

The semantics describes what it should do.

There are many ways a program can be written with valid syntax but turn nonsensical when evaluated. These nonsensical evaluations are known as runtime errors.

Semantics formally describes how programs should be evaluated. Programs that are well-formed according to its semantics do not get stuck.

There are three main styles of describing semantics: operational, denotational, and axiomatic.

Operational semantics

Operational semantics uses the idea that languages are abstract machines and evaluation of a program is a series of state transitions from an initial to a final state.

Transition functions define how states transit to the next, if there is one. If there is no such next state, the machine either completed its evaluation successfully or faced a runtime error and got stuck. The program halts in both cases.

Every term in the computer program has some meaning, and its form finalizes when the state transitions are complete. State transitions may be single or multi-step.

There are two major ways to write operational semantics: small-step or big-step.

Small-step semantics breaks down behaviour into granular simplification steps. A simplication step might not guarantee evaluation to a finalized form; sometimes multiple steps are needed.

Big-step semantics composes multiple small-step rules that evaluate into a finalized form into a single rule. Such a rule is equivalent with its multi-step counterpart.

Since operational semantics is styled after abstract machine behaviour, they’re useful as a reference for implementation.

Origins: John McCarthy on Semantics of Lisp (1960)

Denotational semantics

Denotational semantics uses the idea that languages are mathematical objects. Unlike operational semantics, evaluation and implementation details are abstracted away.

An interpretation function is defined to map terms in a program to elements in semantic domains (also known as its denotation), removing any occurrences of the original syntax.

Semantic domains are designed to model after specific language features and this study is called domain theory.

Checking whether two programs are the same is achievable by comparing their denotations.

Laws can be derived from the semantic domains and are used for language specifications to verify correctness of an implementation.

The properties of the semantic domains can be used to show impossible instances in a language.

Origins: Christopher Strachey, Dana Scott on “Toward a mathematical semantics for computer languages” (1970, 1971)

Axiomatic semantics

Intuitively related to Hoare Logic. Instead of deriving laws from operational or denotational behaviour definitions, the laws themselves define the semantics of the language.

This reversal simplifies reasoning about a program, leading to developments in software verification.

Two different program implementations with the same set of initial and final assertions (laws) are considered to have the same semantics.

The terms that happen between assertions are just used to prove the assertions themselves and do not contribute to the semantics.

Assertions define relationships between variables and other moving parts in a program, and some of these assertions remain invariant throughout execution. This is the important invariance concept that underlies axiomatic semantics.

Origins: Tony Hoare on Hoare Logic (1969)

A brief guide for potential NUS Computer Science undergraduates

2017-02-26T00:00:00Z

Posted on February 26, 2017

A brief guide for potential NUS Computer Science undergraduates

_{Update Dec 2017: I’ve graduated from NUS. Specific references in this article about NUS and the computing faculty may be outdated - please contact me if there’s information that should be updated.}

As there has been growing interest in CS undergraduate courses over the past few years, I would like to share my experience as a CS major at National University of Singapore, and also shed light on the common misconceptions that people may have. This essay will also be focussed on National University of Singapore’s curriculum and programmes, because I’m most familiar with it.

My background: I’m a fourth year CS major. Prior to this, I graduated from Ngee Ann Polytechnic with a Diploma in Network Systems & Security.

I made the decision to read CS upon realising the gaps in my knowledge of technology.

I felt confident in designing and implementing network systems, but never understood why network protocols were designed in that manner. A quick search on Wikipedia on a network protocol algorithm, Djikstra’s Shortest Path, inundated me with so much math that it quickly made me realize that a CS education will provide the foundational theoretical knowledge to understand these algorithms.

Should I study CS?

You know how to use computers as a tool to get things done. However, you’ve probably never learnt why and how they work behind the scenes.

Consider Google Search. Have you ever wondered why it seems to know everything and how it works behind the scenes? How did your search come back with millions of results in a fraction of a second? How did the information get from Google’s servers to your screen?

CS is the science of computational processes, like how Physics is the science of nature. It’s a foundational science that enables you to solve problems across disciplines and subfields.

It’s about taking problems, figuring out what needs to be solved, and providing a step-by-step solution to compute the solution. These problems come from other fields like healthcare, finance, environmental science, space exploration, game development, or something that you have an interest in. Anything.

If you’re a problem solver, CS will sharpen your mind to produce articulate and well-reasoned solutions, and to communicate them across domains.

If you’re not, CS will equip you with the mental toolbox to approach complex problems with confidence.

You’ll learn to break down complex problems into little problems that can be solved systematically. I recommend reading CS if you’ve enjoyed mathematical and logical challenges.

If this sounds interesting to you, then yes, go forth and study CS.

Computer Science is difficult.

CS is not a walk in the park.

No decent undergraduate degree program is a walk in the park.

A well-designed curriculum will begin with battle-tested fundamental courses. They will expand your mind and change the way you think about the world.

Don’t assume that you’re just going to learn how to program; you can do that in 2 weeks with an online course.

CS will flex your brain muscles and teach you how to reason rigorously in the various subfields, such as computer graphics, artificial intelligence and even programming languages themselves.

Wikipedia has a good outline of the subfields in CS.

What is CS at National University of Singapore like?

CS is taught in the School of Computing (SoC).

There are about 1300 undergraduates in SoC (as of 2017) across CS, Information Systems, Information Security, Business Analytics and Computational Biology.

As a freshman, you’ll do lot of common modules, so the first year tend to be similar with other majors.

The full SoC CS curriculum is available on the website.

The fundamental modules include:

introductory programming methodologies (CS1010 and S, X, E variants, CS1101S, CS2030)
data structures and algorithms (CS2040, CS3230)
calculus (MA1521)
discrete mathematics (CS1231)
linear algebra (MA1101R)
software engineering (CS2103)

After these, it’s generally assumed that you know how to code and are able to pick up the languages as needed.

For example, a parallel programming module will assume that you know how to code in C, or can pick it up in 1-2 weeks, since the syllabus is focussed on parallelism concepts.

The most hardware oriented module is CS2100, Computer Organisation. You’ll learn lower level concepts like logic, CPU design and basic assembly programming. Anything lower than that enters the realm of Computer Engineering, where modules are coded with CG instead of CS. CS students are not required to do electrical and electronic engineering modules.

From the second year onwards, you’ll specialize into a technical area of study called a Focus Area. There are ten of them.

Most students will spend their summer vacations on internships and exchanges. NUS Overseas Colleges is a popular choice for the entrepreneurial minded.

I highly recommend doing internships that are self-sourced, and not on the list provided in the faculty internship portal. Self-sourced internships usually result in more interesting companies and projects.

A SoC alumnus has compiled information on self-sourcing internships in Project Intern.

Don’t worry about a lack of programming background.

Most come in fresh – I didn’t (I learnt programming in NP), but I didn’t feel advantaged at any point in time.

I felt challenged by the first programming module I took in SoC, CS1101S, which I highly recommend freshmen to take, if you have the opportunity. It replaces CS1010.

If you want to prepare yourself, start reading /r/programming, Hacker News, and familiarize yourself with the idea that CS is not solely about programming but understanding how to compute things.

Programming is a tool to implement computations to solve a problem, like a hammer as the tool to drive nails into a piece of wood to create something.

However, disliking programming may lead to difficulty understanding CS materials, as they are intertwined.

Most NUS CS undergraduates will do Java at some point, especially in the Data Structures & Algorithms module. While that is a good starting point, don’t be afraid to branch out to other languages once you feel that you know Java well enough.

A quick glance at http://learnxinyminutes.com will show you the plethora of programming languages out there.

Popular programming languages are typically general purpose - meaning they can be used in many applications - but each language has its own niche area. For example, Python and Ruby are used for scripting languages, while Java is used for building enterprise systems. There are many guides on what language to learn first — I will not delve into that here.

Learning one language well makes learning successive languages easier.

But I’m bad at mathematics!

CS has an enormous intersection with mathematics.

I disliked mathematics prior to University. The lack of interest and curiosity stemmed from rote learning formulas in secondary school and tuition classes. It just didn’t seem like an interesting subject and I associated it with boredom and dread.

However, mathematics pedagogy in University is on a whole other level. You will finally understand why certain things in math are the way they are, and maybe it’ll start to seem more interesting to you.

Your teachers are now passionate professors who are experts in their research fields, and are usually patient enough to explain concepts to you if you ask.

I’m still not great at mathematics, but NUS CS has made me see math in a different light. For example, matrices in linear algebra are used heavily in Computer Graphics and Game Development, and graph theory is used in databases and computer program analysis.

Both CS and mathematics force you to think in terms of abstractions. Improving in one domain helps in the other.

Do grades matter in CS?

The idea of studying for grades is probably ingrained deep into you after the 10-something years toiling through the Singapore education system. It’s time to get rid of that.

In CS, grades serve as your personal benchmark. Being able to score well in a module gives you a sense of confidence that you have understood the subject materials.

It does not mean you can mess school up and get away with it. Scoring consistent C’s and D’s is a signal that you have not understood the material intuitively, and will pose problems when you’re taking higher level modules.

Grades do not mean much when applying for industry roles, such as software engineering. No interviewers have ever asked me for grades.

Relevant internships and side-projects, on the other hand, are great ways to convey your skills to potential employers.

Potential employers can derive your willingness to learn and try new things, which translates to a potentially great attitude in the working environment. Self-directed learning is important in CS.

However, grades are still relatively important for graduate school applications, along with a research portfolio.

What are my job prospects?

CS is one of the most versatile degrees to find work with.

It signifies that you’ve been through a rigorous curriculum and possess the ability to take on large and complex problems.

It is up to you to prove that you’re able to do it.

You can be a web developer, AI researcher, data scientist, software engineer, devops engineer, mobile app developer… the list goes on.

Each of those roles requires a specialized skill set, but CS forms the foundation of all of them.

Non-CS domains, like biology and finance, are full of problems that can be readily solved using CS techniques. See: DNA Editing with CRISPR, human genome project, anti-bank fraud systems and insider trading analysis.

Summary

CS is a field that has an ubiquitous nature.

It manifests itself in many different forms, and in many cases, it doesn’t even involve a computer.

To succeed in University, you’ll need to learn how to learn. That is, identify your gaps in knowledge and figure out how to fill them up efficiently.

The Singapore technology and research ecosystem is expanding rapidly with massive government support — there has never been a better time to pursue CS.

Feel free to ping me on Twitter or Email with questions or comments.

All the best!

Resources

Notes to NUS CS Freshmen, from the future

webuild.sg - List of technology meetup groups in Singapore

engineers.sg - Meetup video recordings in Singapore

What every computer science major should know - Matt Might

Plug: NUS Hackers

I’m part of the NUS Hackers coreteam.

We’re a group of people who wants to spread the hacker culture.

The idea of not hesitating to build and break stuff for fun and knowledge, sharing and just being curious about how things work. Our weekly Friday Hacks have very different topics – the idea is to expose the NUS community to various technical topics they wouldn’t have otherwise learnt in class.

We also run a hackerspace in NUS.

What about Information Systems?

IS is a degree that sits in the intersection of business and IT, while CS is deeply grounded in mathematics and logic. Many CS graduates eventually exit academia into engineering roles though.

It’s easier to see the difference between the core modules of these majors:

IS2101 Business and Technical Communication*
IS2102 Requirements Analysis and Design
IS2103 Enterprise Systems Development Concepts
IS2104 Software Team Dynamics
IS3101 Management of Information Systems
IS3102 Enterprise Systems Development Project
IS4100 IT Project Management

CS2010 Data Structures and Algorithms II
CS2100 Computer Organisation
CS2103T Software Engineering
CS2105 Introduction to Computer Networks
CS2106 Introduction to Operating Systems
CS3230 Design and Analysis of Algorithms

jin.crypt.sg | Jin's blog

Generating pretty-printed sources with Bazel

Generating pretty-printed sources with Bazel

Introduction

Setup

Macro: clang_formatted_cc_binary

Rule: clang_format_srcs

Results

Questions to Ask Before Writing A Bazel Rule

Questions to Ask Before Writing A Bazel Rule

Grok Your Bazel Build: The Action Graph

Grok Your Bazel Build: The Action Graph

Dumping the graph

Reading the graph

Analyzing the graph

RuleClass

Target

Configuration

Artifact

DepSetOfFiles

Action

Summary

5 minute guide to Bazel, Part 2: Command lines and tools

5 minute guide to Bazel, Part 2: Command lines and tools

5 minute guide to Bazel, Part 1: C and C++

5 minute guide to Bazel, Part 1: C & C++

Semantics | Notes on Types and Programming Languages

Semantics

Notes on Types & Programming Languages by Benjamin Pierce (2002)

Operational semantics

Denotational semantics

Axiomatic semantics

A brief guide for potential NUS Computer Science undergraduates

A brief guide for potential NUS Computer Science undergraduates

Should I study CS?

Computer Science is difficult.

What is CS at National University of Singapore like?

Don’t worry about a lack of programming background.

But I’m bad at mathematics!

Do grades matter in CS?

What are my job prospects?

Summary

Resources

Plug: NUS Hackers

What about Information Systems?

Macro: `clang_formatted_cc_binary`

Rule: `clang_format_srcs`