Sunday 4 May 2014

Origin of the Go toolchain

I will devote some time to presenting what I understand to be the origins of the toolchain. I think it helps to explain why the design looks like it does. Please take it with a grain of salt because I may not have the complete and accurate picture. I ask any experts out there to please point out any misunderstandings or inaccuracies.

The Go toolchain has a history that extends back to Plan 9, an operating system developed in Bell Labs starting 30 years ago and intended to be the successor to Unix. You need only to check the wikipedia article to see the resemblance between the Go mascot and the Plan 9 mascot. I could devote an entire post about Plan 9 alone. I encourage anyone with an understanding of Unix/Linux to give it a try in a Virtualbox image.

To make a long story short, Plan 9 is a distributed operating system that allows you to treat a collection of computers as one logical system. A system can be composed of nodes with different processor architectures (e.g. arm, powerpc, x86, mips). In order to support such a system there needs to be a platform independent programming language, ANSI C. Also, there needs to be a way to easily compile and distribute architecture-specific binaries.

It is for this reason that Plan 9 has a variety of C compilers that you can invoke. Each one compiles C code into a architecture-specific object file. The object files are linked together with the correct linker for the architecture to produce the final binary. This can be repeated for each architecture so that your program can run anywhere in the system. Plan 9's sophisticated union mounts are configured so that the '/bin' directory usually contains the binaries for the correct architecture for the current node making it transparent to most users.

Plan 9's compilers, linkers and assemblers use a two character naming convention. One example is the "8c" tool, which is the C compiler for x86. Similarly, "5l" is the Linker for ARM. I'm not certain in all cases how the first character is chosen. I found a table with the historical architectures in the C compiler documentation. When you want to refer to the tools in an architecture independent way there are two-letter names: cc (C compiler), ld (linker).

Getting back to the Go toolchain, it retains the naming convention with some additions. They have chosen the number '6' to represent the relatively new AMD64 (64-bit x86) architecture. Also, they have added the letter 'g' to represent the Go compiler (e.g. 8g for x86 Go compiler, gc for the general name). The Go tool hides much of these internals away with its high level commands but you can still find them the pkg/tool/archname directory of your installation.

Go toolchain also uses a unique family of high-level assembly languages originally devised in Plan 9. You can't generally copy assembly language snippets and embed them in your Go programs. For example, there are special move pseudo-instructions in place of the usual load/store that are common in some architectures, such as PowerPC. Also, there are certain addressing modes used in all of the assemblers.

Go has its own object file and archive file format based on Plan 9. Plan 9 existed before ELF so you'll find that even the Linux 'nm' tool is unable to parse the archive (*.a) files in your GOPATH/pkg directory. You will notice the same kinds of issues using any OS-specific tool on these files in Windows, Mac OS, Solaris and FreeBSD. Fortunately, Go provides its own 'nm' and 'objdump' tools to inspect object and archive files. In most cases, you can use OS tools to analyse the final executable because the Go linker eventually converts everything into the OS-specific format.

I'm going to stop here. I hope that this post has been useful, interesting and accurate.

No comments:

Post a Comment