Originally written as an answer for StackOverflow.
Pretty-printers are excellent for enforcing style standards across the codebase. In this article, we’ll show how to use Bazel to generate pretty-printed sources in your build.
This method uses involves writing a new Bazel macro and rule. There is another method via aspects, but we are not covering that in this article.
For hermeticity reasons, Bazel does not modify your source files in place. If you want formatting-on-save (e.g. with gofmt
or prettier
), please use editor plugins instead.
As an example, let’s use the C++ tutorial from the Bazel C++ examples and clang-format
for pretty-printing.
Let’s first mess up the formatting of main/hello-world.cc
:
#include <ctime>
#include <string>
#include <iostream>
std::string get_greet(const std::string& who) { return "Hello " + who; }
void print_localtime() {
std::time_t result =
std::time(nullptr);
std::cout << std::asctime(std::localtime(&result));
}
int main(int argc, char** argv) {
std::string who = "world";
if (argc > 1) {who = argv[1];}
std::cout << get_greet(who) << std::endl;
print_localtime();
return 0;
}
And this is the BUILD file to build main/hello-world.cc
:
# In main/BUILD
cc_binary(
name = "hello-world",
srcs = ["hello-world.cc"],
)
clang_formatted_cc_binary
Since cc_binary
doesn’t know anything about clang-format
or pretty-printing in general, let’s create a macro called clang_formatted_cc_binary
and replace cc_binary
with it. The BUILD file now looks like this:
# In main/BUILD
load("//:clang_format.bzl", "clang_formatted_cc_binary")
clang_formatted_cc_binary(
name = "hello-world",
srcs = ["hello-world.cc"],
)
Next, create a file called clang_format.bzl
with a macro named clang_formatted_cc_binary
. The macro is currently just a wrapper around native.cc_binary
:
# In clang_format.bzl
def clang_formatted_cc_binary(**kwargs):
native.cc_binary(**kwargs)
At this point, you can build the cc_binary
target, but it’s not running clang-format
yet. Let’s add an intermediary rule to do that in clang_formatted_cc_binary
which we’ll call clang_format_srcs
:
# In clang_format.bzl
def clang_formatted_cc_binary(name, srcs, **kwargs):
# Using a filegroup for code cleaniness
native.filegroup(
name = name + "_unformatted_srcs",
srcs = srcs,
)
clang_format_srcs(
name = name + "_formatted_srcs",
srcs = [name + "_unformatted_srcs"],
)
native.cc_binary(
name = name,
srcs = [name + "_formatted_srcs"],
**kwargs
)
Note that we are compiling the cc_binary
’s the formatted sources, but retained the original name
attribute to allow for in-place replacements of cc_binary
-> clang_formatted_cc_binary
within BUILD files.
clang_format_srcs
Finally, we’ll write the implementation of the clang_format_srcs
rule, in the same clang_format.bzl
file:
# In clang_format.bzl
def _clang_format_srcs_impl(ctx):
formatted_files = []
for unformatted_file in ctx.files.srcs:
formatted_file = ctx.actions.declare_file("formatted_" + unformatted_file.basename)
formatted_files += [formatted_file]
ctx.actions.run_shell(
inputs = [unformatted_file],
outputs = [formatted_file],
progress_message = "Running clang-format on %s" % unformatted_file.short_path,
command = "clang-format %s > %s" % (unformatted_file.path, formatted_file.path),
)
return struct(files = depset(formatted_files))
clang_format_srcs = rule(
attrs = {
"srcs": attr.label_list(allow_files = True),
},
implementation = _clang_format_srcs_impl,
)
Here’s what this clang_format_srcs
rule is doing:
srcs
attributeformatted_
prefixclang-format
on the unformatted file to produce the formatted output.Now, by executing bazel build //main:hello-world
, Bazel runs the actions in clang_format_srcs
before running the cc_binary
compilation actions on the formatted files. We can prove this by running bazel build
with the --subcommands
flag:
$ bazel build //main:hello-world --subcommands
..
SUBCOMMAND: # //main:hello-world_formatted_srcs [action 'Running clang-format on main/hello-world.cc']
..
SUBCOMMAND: # //main:hello-world [action 'Compiling main/formatted_hello-world.cc']
..
SUBCOMMAND: # //main:hello-world [action 'Linking main/hello-world']
..
Looking at the contents of formatted_hello-world.cc
, looks like clang-format
did its job:
#include <ctime>
#include <string>
#include <iostream>
std::string get_greet(const std::string& who) { return "Hello " + who; }
void print_localtime() {
std::time_t result = std::time(nullptr);
std::cout << std::asctime(std::localtime(&result));
}
int main(int argc, char** argv) {
std::string who = "world";
if (argc > 1) {
who = argv[1];
}
std::cout << get_greet(who) << std::endl;
print_localtime();
return 0;
}
If all you want are the formatted sources without compiling them, you can run build the target with the _formatted_srcs
suffix from clang_format_srcs
directly:
$ bazel build //main:hello-world_formatted_srcs
INFO: Analysed target //main:hello-world_formatted_srcs (0 packages loaded).
INFO: Found 1 target...
Target //main:hello-world_formatted_srcs up-to-date:
bazel-bin/main/formatted_hello-world.cc
INFO: Elapsed time: 0.247s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
]]>Do you need a rule? Can you write a macro to compose and reuse existing rules? Or an aspect to traverse the existing build graph and execute additional actions?
What does your rule do? Does it already exist?
What files, if any, does it take as inputs?
What tool does it use? A compiler? A shell script?
Is the tool deterministic? Does every invocation of the tool with the same inputs generate the same outputs?
How is the tool provided to the rule? A binary installed in /usr/bin
? A repository rule? Toolchains?
What output files does it generate?
Does the rule depend on the outputs of other rules using providers?
Does the rule provide inputs to other rules using providers?
What actions do you need to construct in order to generate the output files from the input files using the tool?
Bazel has powerful tools to inspect and monitor your build processes. A recent addition is the Action Graph.
The action graph is different from the target dependency graph, which is generated from Bazel’s loading phase. You might know the target graph from bazel query
:
→ bazel query 'deps(//my:target)’ --output=graph > target_graph.in
→ dot -Tpng < target_graph.in > target_graph.png
→ open target_graph.png
If you’re looking for the target graph, check out this Bazel blog post on visualizing your build.
The action graph contains a different set of information: file-level dependencies, full command lines, and other information Bazel needs to execute the build. If you are familiar with Bazel’s build phases, the action graph is the output of the loading and analysis phase and used during the execution phase.
However, Bazel does not necessarily execute every action in the graph. It only executes if it has to, that is, the action graph is the super set of what is actually executed.
The action graph is generated by:
You can obtain it using bazel dump
with these flags:
--action_graph=path/to/output
: Specifies the location of the output file. This is relative to the WORKSPACE
root. You can also provide an absolute path.
--action_graph:targets=//my:target
: Specifies the target(s) you’re interested in.
--action_graph:include_cmdline=true
: Specifies whether to include the full generated command lines.
Let’s walk though an example of dumping the action graph of an Android application build. We will use the Android example packaged in the Bazel source tree. Note that this requires the Android SDK and NDK:
→ git clone https://github.com/bazelbuild/bazel bazel_graph && cd bazel_graph
# Uncomment android_{sdk, ndk}_repository lines in WORKSPACE
→ grep “android_” WORKSPACE
android_sdk_repository(name = "androidsdk")
android_ndk_repository(name = "androidndk")
Add --experimental_strict_action_env
to the project .bazelrc
to prevent $PATH
pollution.
→ cat .bazelrc
build --experimental_strict_action_env
The android_binary
target is //examples/android/java/bazel:hello_world
. It’s defined in examples/android/java/bazel/BUILD
:
android_binary(
name = "hello_world",
srcs = glob([
"MainActivity.java",
"Jni.java",
]),
manifest = "AndroidManifest.xml",
resource_files = glob(["res/**"]),
deps = [
":jni",
":lib",
"@androidsdk//com.android.support:appcompat-v7-25.0.0",
],
)
Let’s start by running the loading and analysis phase, and skipping the execution phase with the --nobuild
flag.
→ bazel build --nobuild //examples/android/java/bazel:hello_world
INFO: Analysed target //examples/android/java/bazel:hello_world (31 packages loaded).
INFO: Found 1 target...
INFO: Elapsed time: 9.818s
INFO: Build completed successfully, 0 total actions
Note 0 total actions
. This doesn’t mean that there are no generated actions, but that there are no executed actions.
Let’s dump the graph from the Bazel server:
→ bazel dump --action_graph=action_graph.bin \
--action_graph:targets=//examples/android/java/bazel:hello_world \
--action_graph:include_cmdline=true
Warning: this information is intended for consumption by developers
only, and may change at any time. Script against it at your own risk!
Dumping action graph to 'action_graph.bin'
We specify
--action_graph:targets=//examples/android/java/bazel:hello_world
because the default value of the flag is ...
, which will dump every analyzed target, recursively.
Check that the output is not empty:
→ ls -al action_graph.bin
-rw-r--r-- 1 jin staff 101765 Mar 24 23:02 action_graph.bin
If it is empty, it means that Bazel hasn’t analyzed the target. Make sure that build --nobuild
and dump --action_graph:targets
are referencing the same target.
action_graph.bin
is a raw protobuf message. analysis.proto
is the protobuf that defines the types of the message. Let’s use the protobuf compiler, protoc
, to decode it:
→ protoc --decode=analysis.ActionGraphContainer \
src/main/protobuf/analysis.proto \
< action_graph.bin > action_graph.txt
For reference, I’ve uploaded my action_graph.txt
here. It’s in human readable plain text, so that’s great!
Now that it is possible to read the graph, we can analyze some of the useful bits: the file contains a ton of information!
The top level message type is ActionGraphContainer
. Let’s investigate each of these message types one by one.
message ActionGraphContainer {
repeated Artifact artifacts = 1;
repeated Action actions = 2;
repeated Target targets = 3;
repeated DepSetOfFiles dep_set_of_files = 4;
repeated Configuration configuration = 5;
repeated AspectDescriptor aspect_descriptors = 6;
repeated RuleClass rule_classes = 7;
}
Starting with the simplest, we have a one RuleClass
message.
rule_classes {
id: "0"
name: "android_binary"
}
This is no surprise: we dumped the action graph of an android_binary
target.
targets {
id: "0"
label: "//examples/android/java/bazel:hello_world"
rule_class_id: "0"
}
Correspondingly, there’s also one Target
message. We see that it encodes the id
of the target’s RuleClass
. In this case, the rule_class_id
refers to android_binary
.
configuration {
id: "0"
mnemonic: "darwin-fastbuild"
platform_name: "darwin"
}
We have one build configuration mnemonic: darwin-fastbuild
. This is a reference to our execution platform (macOS) and the fastbuild
compilation mode.
artifacts {
id: "16"
exec_path: "external/local_jdk/bin/javac"
}
artifacts {
id: "190"
exec_path: "bazel-out/android-armeabi-v7a-fastbuild/bin/external/androidsdk/com.android.support/_aar/unzipped/resources/support-vector-drawable-25.0.0"
is_tree_artifact: true
}
artifacts {
id: "227"
exec_path: "examples/android/java/bazel/res/values/styles.xml"
}
artifacts {
id: "229"
exec_path: "bazel-out/host/genfiles/external/androidsdk/aapt_runner.sh"
}
artifacts {
id: "349"
exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_unsigned.apk"
}
Every file that Bazel handles is an Artifact
. It represents:
a source file
or a derived output file
The “file” can also be a directory (e.g. artifact 190
), which is referred to as a TreeArtifact
. Check out the detailed documentation on the different Artifact types here.
exec_path
is the relative path of the Artifact within the execution root. The execution root is the working directory where Bazel executes all actions during the execution phase:
→ bazel info execution_root
.....................
/private/var/tmp/_bazel_jin/ed227ac31d5e65f9c3effb1d1fe2605e/execroot/io_bazel
The exec_path
s come in different prefix flavours:
external/..
: Contains symlinks to external repositories, such as @local_jdk
and @androidsdk
.examples/..
: Contains the source files. This is a symlink to the actual examples/
folder.bazel-out/host/genfiles/..
: Contains generated sources, usually from genrules
, for the host
target BuildConfiguration.bazel-out/darwin-fastbuild/bin/..
: Contains derived binary outputs for the darwin
target BuildConfiguration.bazel-out/android-armeabi-v7a-fastbuild/bin/..
: Contains derived binary outputs for the android-armeabi-v7a
target BuildConfiguration.dep_set_of_files {
id: "198"
transitive_dep_set_ids: "136"
direct_artifact_ids: "292"
}
dep_set_of_files {
id: "136"
transitive_dep_set_ids: "137"
direct_artifact_ids: "337"
}
Depset
is a data structure for collecting data on transitive dependencies. It’s optimized to be time and space efficient around merging, because it’s common to have very large depsets, scaling to hundreds of thousands of files. Read the documentation to learn more about depsets.
In our protobuf, a dep_set_of_files
can refer to other depsets with transitive_dep_set_ids
, or directly to artifacts with direct_artifact_ids
.
It’s crucial to highlight the ability to recursively refer to other depsets: it’s an important catalyst for space efficiency. Rule implementations should not flatten depsets to lists unless they are at the top level. Flattening large depsets incur huge memory consumption.
Finally, we have Action
. An action, as described in the protobuf’s documentation, is a function from Artifact
to Artifact
. It’s might be easier to think of an Action
as all of the information required to create an output file, which usually contains a command line representation.
actions {
target_id: "0"
action_key: "e121f7eb29e0828eef502582d5134d37"
mnemonic: "ResourceExtractor"
configuration_id: "0"
arguments: "bazel-out/host/bin/external/bazel_tools/tools/android/resource_extractor"
arguments: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_deploy.jar"
arguments: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/_dx/hello_world/extracted_hello_world_deploy.jar"
input_dep_set_ids: "198"
output_ids: "338"
}
targets {
id: "0"
label: "//examples/android/java/bazel:hello_world"
rule_class_id: "0"
}
dep_set_of_files {
id: "198"
transitive_dep_set_ids: "136"
direct_artifact_ids: "292"
}
artifacts {
id: "338"
exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/_dx/hello_world/extracted_hello_world_deploy.jar"
}
artifacts {
id: "292"
exec_path: "bazel-out/darwin-fastbuild/bin/examples/android/java/bazel/hello_world_deploy.jar"
}
In this selected Action
, we are extracting resources out of a jar
using a tool called resource_extractor
. The full command line is captured with the list of arguments
with the first argument
as the executable. Every file referenced in the command line must be an Artifact
in either in the transitive depset(s) input_dep_set_ids
or artifact(s) output_ids
. This enables Bazel to discover actions to run in order to get a requested output artifact.
The action_key
is computed based on the command line that will be executed, which contains information like compiler flags, library locations and system headers. This enables Bazel to keep track of actions to invalidate and re-run incrementally, and cache aggressively if there is no need to rerun an action.
The Action
’s configuration_id
is 0
, as this action is executed with the darwin-fastbuild
BuildConfiguration.
Each Action
has a mnemonic, which is a short human readable string to quickly understand what the Action
is doing. We can grep the protobuf for all mnemonics to see mostly Android-related actions, like AndroidDexer
and RClassGenerator
.
→ grep "mnemonic" action_graph.txt | sort | uniq
mnemonic: "AaptPackage"
mnemonic: "AaptSplitResourceApk"
mnemonic: "AndroidBuildSplitManifest"
mnemonic: "AndroidDexManifest"
mnemonic: "AndroidDexer"
mnemonic: "AndroidInstall"
mnemonic: "AndroidStripResources"
mnemonic: "AndroidZipAlign"
mnemonic: "ApkBuilder"
mnemonic: "ApkSignerTool"
mnemonic: "CppLink"
mnemonic: "Desugar"
mnemonic: "DexBuilder"
mnemonic: "DexMerger"
mnemonic: "Fail"
mnemonic: "FileWrite"
mnemonic: "InjectMobileInstallStubApplication"
mnemonic: "JavaDeployJar"
mnemonic: "JavaSourceJar"
mnemonic: "Javac"
mnemonic: "ManifestMerger"
mnemonic: "RClassGenerator"
mnemonic: "ResourceExtractor"
mnemonic: "ShardClassesToDex"
mnemonic: "Symlink"
mnemonic: "Turbine"
mnemonic: "darwin-fastbuild"
The action graph is a powerful tool to gain introspection into Bazel’s analysis and execution phases. It provides just enough information to visualize the Action
data structure before it is transformed into an executable command line as seen with the --subcommands
flag.
If you wish to learn more about the underlying data representation of the action graph, check out the design document of Bazel’s parallel evaluation and incrementality model, Skyframe.
]]>The aim of this guide is to get you up and running with Bazel as fast as possible. The steps will assume you have Bazel installed.
This part will show how to run a command line using genrule
. This rule is the generic way to specify sources, a tool (like a shell script), a command line, and the outputs. You can think of it as a way to define a function in your BUILD
file with the following signature:
genrule :: (name, sources, tool, command) -> output
In this example, we want to create a C source file, copy it using cp
, and run sed
on it with a shell script, and build an executable from the result.
Let’s get started in an empty directory called dir
.
WORKSPACE
file.dir $ touch WORKSPACE
main.c
and write some C in it.// dir/main.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello Blaze.\n");
return 0;
}
BUILD
file with the genrule
to copy main.c
.# dir/BUILD
genrule(
name = "copy_of_main",
srcs = ["main.c"],
outs = ["copy_of_main.c"],
cmd = "cp $< $@",
)
$<
expands to the location of main.c
, and $@
expands to the location of copy_of_main.c
. See the full list of supported variables here.
//:copy_of_main
.dir $ bazel build //:copy_of_main
....................
INFO: Analysed target //:copy_of_main (7 packages loaded).
INFO: Found 1 target...
Target //:copy_of_main up-to-date:
bazel-genfiles/copy_of_main.c
INFO: Elapsed time: 10.323s, Critical Path: 0.08s
INFO: Build completed successfully, 2 total actions
dir $ cat bazel-genfiles/copy_of_main.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello Blaze.\n");
return 0;
}
The file is copied successfully!
tool
attribute to specify a separate tool to run in the cmd
string.We want to substitute the word “Blaze” with “Bazel” in the source code, because that’s the name the build system was open sourced with. Let’s write the genrule
for that:
# dir/BUILD
# ...
genrule(
name = "renamed_main",
srcs = ["copy_of_main.c"],
outs = ["renamed_main.c"],
tools = ["substitute.sh"],
cmd = "$(location substitute.sh) 'Blaze' 'Bazel' $< $@",
)
location
is Bazel’s helper function to resolve the location of the tool when this command is executed.
Then, create a file substitute.sh
that calls out to sed
:
#!/bin/bash
sed "s/$1/$2/" $3 > $4
Don’t forget to make it executable with chmod u+x substitute.sh
.
//:renamed_main
.dir $ bazel build :renamed_main
INFO: Analysed target //:renamed_main (0 packages loaded).
INFO: Found 1 target...
Target //:renamed_main up-to-date:
bazel-genfiles/renamed_main.c
INFO: Elapsed time: 0.261s, Critical Path: 0.07s
INFO: Build completed successfully, 2 total actions
dir $ cat bazel-genfiles/renamed_main.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello Bazel.\n");
return 0;
}
We are now using the correct name.
cc_binary
from Part 1.# dir/BUILD
cc_binary(
name = "hello_bazel",
srcs = [":renamed_main"],
)
dir $ bazel run //:hello_bazel
INFO: Analysed target //:hello_bazel (3 packages loaded).
INFO: Found 1 target...
Target //:hello_bazel up-to-date:
bazel-bin/hello_bazel
INFO: Elapsed time: 5.817s, Critical Path: 0.52s
INFO: Build completed successfully, 5 total actions
INFO: Running command line: bazel-bin/hello_bazel
Hello Bazel.
This is how we can use genrule
to preprocess files before passing them in to other rules. It’s a simple and flexible way to create pipelines using Bazel.
The aim of this guide is to get you up and running with Bazel as fast as possible. The steps will assume you have Bazel installed.
Some quick notes before we start: the most important idea about Bazel is that it is declarative.
You should never need to type out the intermediary build steps; that is the responsibility of the language/platform rule authors. The build steps are hidden away in the rule implementations so you can focus on just telling Bazel what sources to build.
Let’s get started. Each example here assumes that you’re in an empty directory called dir
.
WORKSPACE
file.dir $ touch WORKSPACE
main.c
and write some C in it.// dir/main.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello Bazel.\n");
return 0;
}
BUILD
file and tell Bazel you want an executable built from main.c
.# dir/BUILD
cc_binary(
name = "hello_bazel",
srcs = ["main.c"],
)
The cc_binary
rule is all Bazel needs to know that you want to build C/C++ sources.
hello_bazel
target, //:hello_bazel
.dir $ bazel run //:hello_bazel
...............
INFO: Analysed target //:hello_bazel (9 packages loaded).
INFO: Found 1 target...
Target //:hello_bazel up-to-date:
bazel-bin/hello_bazel
INFO: Elapsed time: 12.423s, Critical Path: 0.44s
INFO: Build completed successfully, 5 total actions
INFO: Running command line: bazel-bin/hello_bazel
Hello Bazel.
//
refers to the directory level where the WORKSPACE
is. :
specifies a target in a BUILD
file.
bazel build
instead of bazel run
.dir $ bazel build //:hello_bazel
INFO: Analysed target //:hello_bazel (9 packages loaded).
INFO: Found 1 target...
Target //:hello_bazel up-to-date:
bazel-bin/hello_bazel
INFO: Elapsed time: 2.058s, Critical Path: 0.24s
INFO: Build completed successfully, 5 total actions
The executable is in the bazel-bin
symlink: bazel-bin/hello_bazel
.
dir $ cp bazel-bin/hello_bazel hello_bazel
dir $ ./hello_bazel
Hello Bazel.
That’s it!
]]>The design of a programming language can be divided into two parts: syntax and semantics.
The syntax describes how it looks like.
The semantics describes what it should do.
There are many ways a program can be written with valid syntax but turn nonsensical when evaluated. These nonsensical evaluations are known as runtime errors.
Semantics formally describes how programs should be evaluated. Programs that are well-formed according to its semantics do not get stuck.
There are three main styles of describing semantics: operational, denotational, and axiomatic.
Operational semantics uses the idea that languages are abstract machines and evaluation of a program is a series of state transitions from an initial to a final state.
Transition functions define how states transit to the next, if there is one. If there is no such next state, the machine either completed its evaluation successfully or faced a runtime error and got stuck. The program halts in both cases.
Every term in the computer program has some meaning, and its form finalizes when the state transitions are complete. State transitions may be single or multi-step.
There are two major ways to write operational semantics: small-step or big-step.
Small-step semantics breaks down behaviour into granular simplification steps. A simplication step might not guarantee evaluation to a finalized form; sometimes multiple steps are needed.
Big-step semantics composes multiple small-step rules that evaluate into a finalized form into a single rule. Such a rule is equivalent with its multi-step counterpart.
Since operational semantics is styled after abstract machine behaviour, they’re useful as a reference for implementation.
Origins: John McCarthy on Semantics of Lisp (1960)
Denotational semantics uses the idea that languages are mathematical objects. Unlike operational semantics, evaluation and implementation details are abstracted away.
An interpretation function is defined to map terms in a program to elements in semantic domains (also known as its denotation), removing any occurrences of the original syntax.
Semantic domains are designed to model after specific language features and this study is called domain theory.
Checking whether two programs are the same is achievable by comparing their denotations.
Laws can be derived from the semantic domains and are used for language specifications to verify correctness of an implementation.
The properties of the semantic domains can be used to show impossible instances in a language.
Origins: Christopher Strachey, Dana Scott on “Toward a mathematical semantics for computer languages” (1970, 1971)
Intuitively related to Hoare Logic. Instead of deriving laws from operational or denotational behaviour definitions, the laws themselves define the semantics of the language.
This reversal simplifies reasoning about a program, leading to developments in software verification.
Two different program implementations with the same set of initial and final assertions (laws) are considered to have the same semantics.
The terms that happen between assertions are just used to prove the assertions themselves and do not contribute to the semantics.
Assertions define relationships between variables and other moving parts in a program, and some of these assertions remain invariant throughout execution. This is the important invariance concept that underlies axiomatic semantics.
Origins: Tony Hoare on Hoare Logic (1969)
]]>Update Dec 2017: I’ve graduated from NUS. Specific references in this article about NUS and the computing faculty may be outdated - please contact me if there’s information that should be updated.
As there has been growing interest in CS undergraduate courses over the past few years, I would like to share my experience as a CS major at National University of Singapore, and also shed light on the common misconceptions that people may have. This essay will also be focussed on National University of Singapore’s curriculum and programmes, because I’m most familiar with it.
My background: I’m a fourth year CS major. Prior to this, I graduated from Ngee Ann Polytechnic with a Diploma in Network Systems & Security.
I made the decision to read CS upon realising the gaps in my knowledge of technology.
I felt confident in designing and implementing network systems, but never understood why network protocols were designed in that manner. A quick search on Wikipedia on a network protocol algorithm, Djikstra’s Shortest Path, inundated me with so much math that it quickly made me realize that a CS education will provide the foundational theoretical knowledge to understand these algorithms.
You know how to use computers as a tool to get things done. However, you’ve probably never learnt why and how they work behind the scenes.
Consider Google Search. Have you ever wondered why it seems to know everything and how it works behind the scenes? How did your search come back with millions of results in a fraction of a second? How did the information get from Google’s servers to your screen?
CS is the science of computational processes, like how Physics is the science of nature. It’s a foundational science that enables you to solve problems across disciplines and subfields.
It’s about taking problems, figuring out what needs to be solved, and providing a step-by-step solution to compute the solution. These problems come from other fields like healthcare, finance, environmental science, space exploration, game development, or something that you have an interest in. Anything.
If you’re a problem solver, CS will sharpen your mind to produce articulate and well-reasoned solutions, and to communicate them across domains.
If you’re not, CS will equip you with the mental toolbox to approach complex problems with confidence.
You’ll learn to break down complex problems into little problems that can be solved systematically. I recommend reading CS if you’ve enjoyed mathematical and logical challenges.
If this sounds interesting to you, then yes, go forth and study CS.
CS is not a walk in the park.
No decent undergraduate degree program is a walk in the park.
A well-designed curriculum will begin with battle-tested fundamental courses. They will expand your mind and change the way you think about the world.
Don’t assume that you’re just going to learn how to program; you can do that in 2 weeks with an online course.
CS will flex your brain muscles and teach you how to reason rigorously in the various subfields, such as computer graphics, artificial intelligence and even programming languages themselves.
Wikipedia has a good outline of the subfields in CS.
CS is taught in the School of Computing (SoC).
There are about 1300 undergraduates in SoC (as of 2017) across CS, Information Systems, Information Security, Business Analytics and Computational Biology.
As a freshman, you’ll do lot of common modules, so the first year tend to be similar with other majors.
The full SoC CS curriculum is available on the website.
The fundamental modules include:
After these, it’s generally assumed that you know how to code and are able to pick up the languages as needed.
For example, a parallel programming module will assume that you know how to code in C, or can pick it up in 1-2 weeks, since the syllabus is focussed on parallelism concepts.
The most hardware oriented module is CS2100, Computer Organisation. You’ll learn lower level concepts like logic, CPU design and basic assembly programming. Anything lower than that enters the realm of Computer Engineering, where modules are coded with CG instead of CS. CS students are not required to do electrical and electronic engineering modules.
From the second year onwards, you’ll specialize into a technical area of study called a Focus Area. There are ten of them.
Most students will spend their summer vacations on internships and exchanges. NUS Overseas Colleges is a popular choice for the entrepreneurial minded.
I highly recommend doing internships that are self-sourced, and not on the list provided in the faculty internship portal. Self-sourced internships usually result in more interesting companies and projects.
A SoC alumnus has compiled information on self-sourcing internships in Project Intern.
Most come in fresh – I didn’t (I learnt programming in NP), but I didn’t feel advantaged at any point in time.
I felt challenged by the first programming module I took in SoC, CS1101S, which I highly recommend freshmen to take, if you have the opportunity. It replaces CS1010.
If you want to prepare yourself, start reading /r/programming, Hacker News, and familiarize yourself with the idea that CS is not solely about programming but understanding how to compute things.
Programming is a tool to implement computations to solve a problem, like a hammer as the tool to drive nails into a piece of wood to create something.
However, disliking programming may lead to difficulty understanding CS materials, as they are intertwined.
Most NUS CS undergraduates will do Java at some point, especially in the Data Structures & Algorithms module. While that is a good starting point, don’t be afraid to branch out to other languages once you feel that you know Java well enough.
A quick glance at http://learnxinyminutes.com will show you the plethora of programming languages out there.
Popular programming languages are typically general purpose - meaning they can be used in many applications - but each language has its own niche area. For example, Python and Ruby are used for scripting languages, while Java is used for building enterprise systems. There are many guides on what language to learn first — I will not delve into that here.
Learning one language well makes learning successive languages easier.
CS has an enormous intersection with mathematics.
I disliked mathematics prior to University. The lack of interest and curiosity stemmed from rote learning formulas in secondary school and tuition classes. It just didn’t seem like an interesting subject and I associated it with boredom and dread.
However, mathematics pedagogy in University is on a whole other level. You will finally understand why certain things in math are the way they are, and maybe it’ll start to seem more interesting to you.
Your teachers are now passionate professors who are experts in their research fields, and are usually patient enough to explain concepts to you if you ask.
I’m still not great at mathematics, but NUS CS has made me see math in a different light. For example, matrices in linear algebra are used heavily in Computer Graphics and Game Development, and graph theory is used in databases and computer program analysis.
Both CS and mathematics force you to think in terms of abstractions. Improving in one domain helps in the other.
The idea of studying for grades is probably ingrained deep into you after the 10-something years toiling through the Singapore education system. It’s time to get rid of that.
In CS, grades serve as your personal benchmark. Being able to score well in a module gives you a sense of confidence that you have understood the subject materials.
It does not mean you can mess school up and get away with it. Scoring consistent C’s and D’s is a signal that you have not understood the material intuitively, and will pose problems when you’re taking higher level modules.
Grades do not mean much when applying for industry roles, such as software engineering. No interviewers have ever asked me for grades.
Relevant internships and side-projects, on the other hand, are great ways to convey your skills to potential employers.
Potential employers can derive your willingness to learn and try new things, which translates to a potentially great attitude in the working environment. Self-directed learning is important in CS.
However, grades are still relatively important for graduate school applications, along with a research portfolio.
CS is one of the most versatile degrees to find work with.
It signifies that you’ve been through a rigorous curriculum and possess the ability to take on large and complex problems.
It is up to you to prove that you’re able to do it.
You can be a web developer, AI researcher, data scientist, software engineer, devops engineer, mobile app developer… the list goes on.
Each of those roles requires a specialized skill set, but CS forms the foundation of all of them.
Non-CS domains, like biology and finance, are full of problems that can be readily solved using CS techniques. See: DNA Editing with CRISPR, human genome project, anti-bank fraud systems and insider trading analysis.
CS is a field that has an ubiquitous nature.
It manifests itself in many different forms, and in many cases, it doesn’t even involve a computer.
To succeed in University, you’ll need to learn how to learn. That is, identify your gaps in knowledge and figure out how to fill them up efficiently.
The Singapore technology and research ecosystem is expanding rapidly with massive government support — there has never been a better time to pursue CS.
Feel free to ping me on Twitter or Email with questions or comments.
All the best!
Notes to NUS CS Freshmen, from the future
webuild.sg - List of technology meetup groups in Singapore
engineers.sg - Meetup video recordings in Singapore
What every computer science major should know - Matt Might
I’m part of the NUS Hackers coreteam.
We’re a group of people who wants to spread the hacker culture.
The idea of not hesitating to build and break stuff for fun and knowledge, sharing and just being curious about how things work. Our weekly Friday Hacks have very different topics – the idea is to expose the NUS community to various technical topics they wouldn’t have otherwise learnt in class.
We also run a hackerspace in NUS.
IS is a degree that sits in the intersection of business and IT, while CS is deeply grounded in mathematics and logic. Many CS graduates eventually exit academia into engineering roles though.
It’s easier to see the difference between the core modules of these majors:
IS2101 Business and Technical Communication*
IS2102 Requirements Analysis and Design
IS2103 Enterprise Systems Development Concepts
IS2104 Software Team Dynamics
IS3101 Management of Information Systems
IS3102 Enterprise Systems Development Project
IS4100 IT Project Management
CS2010 Data Structures and Algorithms II
CS2100 Computer Organisation
CS2103T Software Engineering
CS2105 Introduction to Computer Networks
CS2106 Introduction to Operating Systems
CS3230 Design and Analysis of Algorithms