Autotools: The Learning Curve Lessens – Finally!

The Long and Winding Road

I’ve been waiting a LONG time to write this blog entry – over 5 years. Yesterday, after a final couple of eye-opening epiphanies, I think I finally got my head around the GNU Autotools well enough to explain them properly to others. This blog entry begins a series of articles on the use of Autotools. The hope is that others will not have to suffer the same pain-staking climb.

If the articles are well received, there may be a book in it in the end. Believe me, it’s about time for a really good book on the subject. The only book I’m aware of (besides the GNU software manuals) is the New Rider’s 2000 publication of GNU AUTOCONF, AUTOMAKE and LIBTOOL, affectionately known in the community as “The Goat Book”, and so named for the picture on the front cover.

The authors, Gary Vaughan, Ben Elliston, Tom Tromey and Ian Lance Taylor, are well-known in the industry, to say the least – indeed, they’re probably the best people I know of to write such a book. However, as fast as open source software moves these days, a book published in 2000 might as well have been published in 1980. Nevertheless, because of the absolute need for any book on this subject, it’s still being sold new in bookstores. In fairness to the authors, they’ve maintained an online copy through February of 2006 (as of the last time I checked). Regardless, even two years is too long in this business.

As well as it’s written, the biggest gripe I have with the Goat Book is the same gripe I have with the GNU manuals themselves. I’m talking about the sheer number of bits of information that are just assumed to be understood by the reader. The situation is excusable – even reasonable – in the case of the manuals, due to the limited scope of a software manual. My theory is that these guys have been in the business for so long (decades, actually) that many of these topics have become second-nature to them.

The problem, as I see it, is that a large percentage of their readership today are young people just starting out with Unix and Linux. You see, most of these “missing bits” are centered around Unix itself. Sed, for example: What a dream of a tool to work with – I love it! More to the point, however: A solid understanding of the basic functionality of sed is important to grasping the proper use of Autotools. This is true because much of the proper use of Autotools truly involves the proper extension of Autotools.

Another problem is that existing documentation is more reference material than solution-oriented information. I’ll try to make these articles solve real problems, rather than just find new ways to regurgitate the same old reference material found in the manuals.

As you’ve no doubt gathered by now, I’m not an expert on this topic. I don’t have decades of experience in Unix or Linux – well, no more than one decade anyway. But I am a software engineer with significant experience in systems software design and development on multiple hardware and software platforms. As I mentioned in my opening paragraph, I’ve worked extensively with Autotools for about 5 years now. Most of that time was spent in trying to get these tools to do things the wrong way – before finally discovering the way others were doing it.

Claiming not to be an expert gives me a bit of literary – and technical – latitude. To put it bluntly, I’m hoping to gather comments on these articles. So I state here and now: Please comment. I welcome all comments on methods, techniques, tradeoffs and even personal experiences.

I make this statement right up front for the sake of my integrity. I seem to recall a few years ago that Herb Sutter posted a series of articles on the C++ moderated newsgroup entitled GotW – an acronym for “Guru of the Week”. Each article presented a problem in object-oriented software design, specifically related to C++, and elicited responses from the community at large. In and of itself, it was a great idea, and the response was overwhelming. I very much enjoyed reading the GotW threads. But I must say that it surprised me a bit when I saw a book published a year later – Exceptional C++ – that contained most of the information in these threads. Well, I say, good for Herb. And in fairness, perhaps he didn’t plan to write the book until after he’d received such great response. But it feels more comfortable to me to indicate my intentions up front.

Who Should Use Autotools?

I’m going to make a rather broad and sweeping statement here: If you’re writing open source software targeting Unix or Linux systems, then you should be using GNU Autotools. I’m sure I sound a bit biased here. I shouldn’t be, given the number of long nights I’ve spent working around what appeared to be shortcomings in the Autotools system. Normally, I would have been angry enough to toss the entire thing out the window and write a good hand-coded Makefile. But the one truth that I always came back to was the fact that there are literally thousands of projects out there that are apparently very successfully using Autotools. That was too much for me. My pride wouldn’t let me give up.

What if you don’t work on open source software? What if you’re writing proprietary software for Unix or Linux systems? Then, I say, you should still be using Autotools. Even if you only ever intend to target a single distribution of Linux, Autotools will provide you with a build environment that is flexible enough to allow your project to build successfully on future versions or distributions with virtually no changes to the build scripts. This fact, in and of itself, is enough to warrant my statement.

In fact, about the only scenario where it makes sense NOT to use GNU Autotools is the one in which you are writing software for non-Unix platforms only – Microsoft Window comes to mind. Some people will tell you that Autotools can be successfully used on Windows as well, but my opinion is that the POSIX-based approach to software configuration management is just too alien for Windows development. While it can be done, the tradeoffs are too significant to justify the use of an unmodified version of Autotools on Windows.

I’ve seen some project managers develop a custom version of Autotools that allows the use of all native Windows tools. These projects were maintained by people who spent much of their time reconfiguring Autotools to do things it was never intended to do in a totally hostile and foreign environment. Quite frankly, Microsoft has some of the best tools on the planet for Windows software development. If I were developing a Windows software package, I’d use Microsoft’s tools exclusively. In fact, I often write portable software that targets both Linux and Windows. In these cases, I maintain two separate build environments – one for Windows, and one based on Autotools for everything else.

An Overview of Autoconf

If you’ve ever downloaded, built and installed software from a “tarball” (a gzipped or bzipped tar archive, often sporting one of the common extensions, .tar.gz, .tgz or .tar.bz2), then you’re well aware of the fact that there is a common theme to this process. It usually looks something like this:

$ gzip -cd hackers-delight-1.0.tar.gz | tar -xvf -
...
$ cd hackers-delight-1.0
$ ./configure
$ make all
$ sudo make install

NOTE: I have to assume some level of information on your part, and I’m stating right now that this is it. If you’ve performed this sequence of commands before, and you know what it means, then you’ll have no trouble following these articles.

Most developers know and understand the purpose of the make utility. But what’s this “configure” thing? The use of configuration scripts (often named simply, “configure”) started a long time ago on Unix systems because of variety imposed by the fast growing and divergent set of Unix platforms. It’s interesting to note that while Unix systems have generally followed the defacto-standard Unix kernel interface for decades, most software that does anything significant generally has to stretch outside the boundaries. I call it a defacto-standard because POSIX wasn’t actually standardized until recently. POSIX as a standard was more a documentation effort than a standardization effort, although it is a true standard today. It was designed around the existing set of Unix code bases, and for good reason – it takes a long time to incorporate significant changes into a well-established operating system kernel. It was easier to say, “Here’s how it’s currently being done by most.”, than to say, “Here’s how it should be done – everyone change!” Even so, most systems don’t implement all facets of POSIX. So configure scripts are designed to find out what capabilities your system has, and let your Makefiles know about them.

This approach worked well for literally decades. In the last 15 years however, with the advent of dozens of Linux distributions, the explosion of feature permutations has made writing a decent configure script very difficult – much more so than writing the Makefiles for a new project. Most people have generated configure scripts for their projects using a common technique – copy and modify a similar project’s configure script.

Autoconf changed this paradigm almost overnight. A quick glance at the AUTHORS file in the Savannah Autoconf project repository will give you an idea of the number of people that have had a hand in the making of autoconf. The original author was David MacKenzie, who started the autoconf project as far back as 1991. Now, instead of modifying, debugging and losing sleep over literally thousands of lines of supposedly portable shell script, developers can write a short meta-script file, using a concise macro API language, and let autoconf generate the configure script.

A generated configure script is more portable, more correct, and more maintainable than a hand-code version of the same script. In addition, autoconf often catches semantic or logic errors that the author would have spent days debugging. Before autoconf, it was not uncommon for a project developer to literally spend more time on the configure script for the build environment than on the project code itself!

What’s in a Configure Script?

The primary tasks of a typical configure script are:

  • Generate an include file (often called config.h) for inclusion by project source code.
  • Set environment variables so that make can quietly select major build options.
  • Set user options for a particular make environment – such as debug flags, etc.

For more complex projects, configure scripts often generated the project Makefile(s) from one or more templates maintained by the project manager. A Makefile template would contain configuration variables in an easily recognized format. The configure script would replace these variables with values determined during configuration – either from command line options specified by the user, or from a thorough analysis of the platform environment. Often this analysis would entail such things as checking for the existence of certain include files and libraries, searching various file system paths for required utilities, and even running small programs designed to indicate the feature set of the shell or C compiler. The tool of choice here for variable replacement was sed. A simple sed command can replace all of the configuration variables in a Makefile template in a single pass through the file.

Autoconf to the Rescue

Praise to David MacKenzie for having the foresight to – metaphorically speaking – stop and sharpen the axe! Otherwise we’d still be writing (copying) and maintaining long, complex configure scripts today.

The input to autoconf is … (drum roll please) … shell script. Man, what an anti-climax! Okay, so it’s not pure shell script. That is, it’s shell script with macros, plus a bunch of macro definition files – both those that ship with an autoconf distribution, as well as those that you write. The macro language used is called m4. “m-what?!”, you ask? The m4 utility is a general purpose macro language processor that was originally written by none other than Brian Kernighan and Dennis Ritchie in 1977. (The name m4 means “m plus 4 more letters” or the word “macro” – cute, huh?).

Some form of the m4 macro language processor is found on every Unix and Linux variant (as well as other systems) in use today. In fact, this proliferance is the primary reason for it’s use in autoconf. The design goals of autoconf included primarily that it should run on all systems without the addition of complex tool chains and utility sets. Autoconf depends on the existence of relatively few tools, including m4, sed and some form of the bourne shell, as well as many of the standard Unix utilities such as chmod, chown, mkdir, rm, ln and others. Autoconf generates somewhere around 15 thousand lines of portable shell script code that is unrelated to any additional code that you add to it’s main input file! This overhead is boiler plate functionality that existed in most of the well-designed configure scripts that were written (copied) and maintained in the days before autoconf.

Autoconf in Action

Probably the easiest way to get started with autoconf is to use the autoscan utility to scan your project directory from the root down, and generate the necessary configure.ac script – the primary input file to autoconf. If you’d rather do it manually, you can start with as few as three macro calls, as follows:

# configure: generated from configure.ac by autoconf
AC_INIT([my-project], [1.0])
AC_CONFIG_FILES([Makefile])
AC_OUTPUT

echo "Configuration for package ${PACKAGE}, version ${VERSION} complete."
echo "Now type 'make' to continue."

In future articles, I’ll build on this initial script by adding additional macros and shell script to solve various problems that I’ve run into in the past. I believe these to be common problems related to build environments, and I expect others will feel the same.

AC_INIT actually takes three parameters: The package name, the package version and an email address for reporting bugs. The email address is optional and m4 allows trailing parameters (and separating commas) to simply be omitted, as shown in the example. AC_INIT sets some project definitions that are used throughout the rest of the generated configuration script. These variables may be referenced later in the configure.ac script as the environment variables ${PACKAGE} and ${VERSION}, as indicated by the echo statements at the bottom of the script.

This example assumes you have a template for your Makefile called Makefile.in in your top-level project directory (next to the configure.ac script). This file should look exactly like your Makefile, with one exception. Any text you want autoconf to replace should be marked with autoconf replacement variables, like this:

# Makefile: generated from Makefile.in by autoconf

PACKAGE = @PACKAGE@
VERSION = @VERSION@

all : $(PACKAGE)

$(PACKAGE) : main.c
    echo "Building $(PACKAGE), version $(VERSION)."
    gcc main.c -o $@

In fact, any file you list in AC_CONFIG_FILES (separated by white space) will be generated from a file of the same name with a .in extension, and found in the same directory. Autoconf generates sed commands into the configure script that perform this simple string replacement when the configure script is executed. Sed is a Stream EDitor, which is a fancy way of saying that it doesn’t require an entire source file to be loaded into memory while it’s doing it’s thing. Rather, it watches a stream of bytes as they go by, replacing text in the stream with other text, as specified on it’s command line. The expression list passed to sed by the configure script is built by autoconf from a list of variables defined by various autoconf macros, many of which we’ll cover in greater detail later.

Note in these example scripts, that we’ve used three different kinds of variables; autoconf replacement variables are text surrounded by ‘@’ signs, environment variables are indicated by normal shell syntax like this: ${variable}, and make variables, which are almost the same as shell variables, except that parenthesis are used instead of french braces: $(variable). In fact, we set make variables from the text replaced by autoconf at the top of Makefile.in. If you were to look at the contents of the generated Makefile, this is what you’d see:

# Makefile: generated from Makefile.in by autoconf

PACKAGE = my-project
VERSION = 1.0

all : $(PACKAGE)

$(PACKAGE) : main.c
    echo "Building $(PACKAGE), version $(VERSION)."
    gcc main.c -o $@

The important thing to notice here is that the autoconf variables are the ONLY items replaced in Makefile.in while generating the Makefile. The reason this is important to understand is that it helps you to realize the flexibility you have when allowing autoconf to generate a file from a template. This flexibility will become more apparent as we get into various use cases for the pre-defined autoconf macros, and even later when we delve into the topic of writing your own autoconf macros.

Summary

It would be a great learning experience to take an existing project and just apply autoconf to the task of generating your configure script. Forget about the rest of the Autotools right now. Just focus on autoconf. There are actually a fair number of popular open source projects in the world that only use autoconf.

I’ll continue my treatise on autoconf in the next article. In the meantime, please use the following short reading list to put you to sleep at night for the next month or so! You really do need the background information. I’ll try to cover the basics of each of these next time, but you’ll want to be at least familiar with them before then.

  1. The GNU Autoconf Manual
  2. The GNU M4 Manual
  3. The General Electric SED Tutorial
  4. Additional Reading on Unix Shell Script, etc

Advertisements