Autoconf Macros, Exposed (AT/3)

Feb23

In my last article I mentioned that I’d get more into the standard autoconf macros this time. The easiest way to do this is really to start with a clean project directory and run autoscan on it. However, before we can do that, we should probably spend a little time on project organization.

Project Structure

What follows is a typical project directory hierarchy. Open source projects generally have some sort of catchy name – often they’re named after some past hero or ancient god or something. Let’s call this the jupiter project – mainly because that way I don’t have to come up with a name that matches some non-existent functionality! For jupiter, we’ll start with a directory structure something like this:

$ cd projects
$ mkdir -p jupiter/src
$ touch jupiter/Makefile
$ touch jupiter/src/Makefile
$ touch jupiter/src/main.c
$ cd jupiter
$

Woot! One directory called src, one C source file called main.c, and a Makefile for each of these two directories. Minimal yes, but hey, this is a new project and everyone knows that the key to a successful open source project is evolution, right? Start small and grow as required.

The Makefiles provide support for a few key targets in an open source project: all, clean and dist (to create a source tarball from our project directory structure). The top-level Makefile handles dist, while all and clean are passed down to src/Makefile. Here are the contents of each of the files in our project:

$ cat Makefile
package = jupiter
version = 1.0
distdir = $(package)-$(version)

all clean jupiter:
        $(MAKE) -C src $@

dist: $(distdir).tar.gz

$(distdir).tar.gz: $(distdir)
        tar chof - $(distdir) | gzip -9 -c >$(distdir).tar.gz
        rm -rf $(distdir)

$(distdir):
        -rm -rf $(distdir)
        mkdir -p $(distdir)/src
        cp Makefile $(distdir)
        cp src/Makefile $(distdir)/src
        cp src/main.c $(distdir)/src
$
$ cat src/Makefile
all: jupiter

jupiter: main.c
        gcc -g -O0 main.c -o jupiter

clean:
        -rm jupiter
$
$ cat src/main.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[])
{
        printf("Hello from %s!\n", argv[0]);
        return 0;
}
$

Using Autoscan

Now, as I mentioned up front, the simplest way to create a (mostly) complete configure.ac file is to run the autoscan utility, which is part of the autoconf package. This is so simple – just cd into the jupiter directory, and type “autoscan” at the prompt. In less than a second you’re left with a couple of new files in your project directory:

$ autoscan
autom4te: configure.ac: no such file or directory
autoscan: /usr/bin/autom4te failed with exit status: 1
$ ls -l
total 8
-rw-r--r-- 1 user users   0 2008-02-21 21:30 autoscan.log
-rw-r--r-- 1 user users 563 2008-02-21 21:30 configure.scan
-rw-r--r-- 1 user users 377 2008-02-21 20:50 Makefile
drwxr-xr-x 2 user users  96 2008-02-21 21:17 src
$

Okay, so what about that error message? While it is autoscan’s job to generate a configure.ac file, it sort of expects that you’ve taken a little initiative on your own and hand created something for it to analyze. Its actual job is to tell you what’s wrong with your existing configure.ac. Thus, if you run autoscan on a project directory that doesn’t contain an existing configure.ac, you’ll get the above warning, and autoscan will exit with an error. Regardless, it still creates a configure.scan file that works just fine – in fact, there’s really no difference between the file generated with or without an existing configure.ac file. However, it’s instructional to see what does happen when you run autoscan in a directory with an existing configure.ac – say an empty file of that name. We’ll clean up, add an empty configure.ac and run it again:

$ rm configure.scan autoscan.log
$ touch configure.ac
$ autoscan
configure.ac: warning: missing AC_CHECK_HEADERS([stdlib.h]) wanted by: src/main.c:2
configure.ac: warning: missing AC_HEADER_STDC wanted by: src/main.c:2
configure.ac: warning: missing AC_PREREQ wanted by: autoscan
configure.ac: warning: missing AC_PROG_CC wanted by: src/main.c
$ ls -l
total 12
drwxr-xr-x 2 user users 120 2008-02-21 21:35 autom4te.cache
-rw-r--r-- 1 user users 287 2008-02-21 21:35 autoscan.log
-rw-r--r-- 1 user users   0 2008-02-21 21:35 configure.ac
-rw-r--r-- 1 user users 563 2008-02-21 21:35 configure.scan
-rw-r--r-- 1 user users 377 2008-02-21 20:50 Makefile
drwxr-xr-x 2 user users  96 2008-02-21 21:17 src
$

First, notice that there’s an autom4te.cache directory that wasn’t there when we ran autoscan without a configure.ac. The reason for this is that autoscan acutally runs autoconf on the existing configure.ac with options to warn us when things are incorrect. This causes the cache directory to be created. But, enough playing around – let’s take a look at the generated configure.scan file. Since we don’t have an existing configure.ac (of any value), let’s rename the generated file right away:

$ mv configure.scan configure.ac
$ cat configure.ac
#                                               -*- Autoconf -*-
# Process this file with autoconf to produce a configure script.

AC_PREREQ(2.59)
AC_INIT(FULL-PACKAGE-NAME, VERSION, BUG-REPORT-ADDRESS)
AC_CONFIG_SRCDIR([src/main.c])
AC_CONFIG_HEADER([config.h])

# Checks for programs.
AC_PROG_CC

# Checks for libraries.

# Checks for header files.
AC_HEADER_STDC
AC_CHECK_HEADERS([stdlib.h])

# Checks for typedefs, structures, and compiler characteristics.

# Checks for library functions.

AC_CONFIG_FILES([Makefile
                 src/Makefile])
AC_OUTPUT
$

NOTE: The contents of your configure.ac file may differ slightly from mine, depending on the version of autoconf you have installed. I have version 2.59 installed, but if your version of autoscan is newer or older, you may see some slight differences.

autoscan really does a lot of the work for you. The GNU autoconf manual states that you should manually tailor this file to your project before using it. This is true, but there are only a few key issues. We’ll cover each of these as we come to them. Let’s start at the top of the file and work our way down.

Initialization and Package Information

The AC_PREREQ macro simply defines the lowest version of autoconf that may be used to successfully process this configure.ac script. The manual indicates that AC_PREREQ is the only macro that may be used before AC_INIT. The reason for this should be obvious – you’d like to be able to ensure you’re using a late enough version of autoconf before you begin processing any other macros, which may be version dependent. As it turns out, AC_INIT is not version dependent anyway, so you may place it first, if you’re so inclined.

The AC_INIT macro (as its name implies) initializes the autoconf system. It accepts up to three arguments, PACKAGE, VERSION, and an optional BUG-REPORT argument. The PACKAGE argument is intended to be the name of your package. It will end up as the first string in an automake-generated tarball when you run “make dist”. In fact, distribution tarballs will be named PACKAGE-VERSION.tar.gz, so bear this in mind when you choose your package name and version string.

m4 macro arguments, including VERSION, are just strings. m4 doesn’t attempt to interpret any of the text it processes, although it does have a built-in eval macro that will evaluate an expression and resolve it to a new string (eg., eval(2*2) becomes the string “4”). While the VERSION argument can be anything you like, there are a few conventions that will make life easier for you if you follow them. The widely used convention is to pass in MAJOR.MINOR (eg., 1.2). However, there’s nothing that says you can’t use MAJOR.MINOR.REVISION if you want, and it doesn’t do any harm to do this. The none of the resulting VERSION macros (autoconf, shell or make) are parsed or analysed anywhere, but only used in various places as replacement text, so if you want, you can even add non-numeric text into this macro, such as ‘0.15-alpha’.

One thing to note, however is that the AC_INIT arguments must be static text. That is, they can’t be shell variables, and autoconf will flag attempts to use shell variables in these arguments as errors. I once tried to use a shell variable in the VERSION argument so that I could substitute my Subversion revision number into the VERSION argument at configure time. I spent a couple of weeks trying to figure out how to trick autoconf into letting me use a shell variable as the REVISION field. Eventually, I discovered the following trick, which I implemented in my top-level Makefile.am:

distdir = $(PACKAGE)-$(VERSION).$(SVNREV)

The distdir make variable controls the name of the distribution directory and tarball file name generated by automake, and setting it in the top-level Makefile causes it to propagate down to all lower level Makefiles, as well. But I digress; this discussion is getting too deep into automake, so we’ll return to it in a later article specifically on automake.

The AC_CONFIG_SRCDIR macro is just a sanity check. Its purpose is to ensure that autoconf knows that the directory on which autoconf is being executed is in fact the correct project directory. The argument can be a relative path to any source file you like – I try to pick one that sort of defines the project – in case I ever decide to reorganize source code, I’m not likely to lose it, but it doesn’t really matter. If you do lose it, you can always change the argument passed to AC_CONFIG_SRCDIR.

The AC_CONFIG_HEADER macro is a little more functional. Its job is to specify the NAME of an include file that will be generated by autoconf from a template file called NAME.in (which is itself generated by autoheader – more on autoheader later). This template file contains C source code in the following format:

/* config.h.  Generated by configure.  */
/* config.h.in.  Generated from configure.ac by autoheader.  */

/* The build architecture of this build. */
#define BUILDARCH "i586"

/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1

...

/* Define to 1 if your <sys/time.h> declares `struct tm'. */
/* #undef TM_IN_SYS_TIME */

/* Version number of package */
#define VERSION "1.0"

This file is intended to be included in your source code in locations where you might wish to test a configured option in the code itself using the C preprocessor. For instance, in the sample config.h file above, it appears that autoconf has determined that we have the dlfcn.h header file on this system, so we might add the following code to a source file in our project that uses dynamic loader functionality:

...

#if HAVE_CONFIG_H
# include <config.h>
#endif

#if HAVE_DLFCN_H
# include <dlfcn.h>
#else
# error Sorry, this code requires dynamic loader (dlfcn.h) functionality.
#endif

...

#if HAVE_DLFCN_H
   handle = dlopen("/usr/lib/libwhatever.so", RTLD_NOW);
#endif

...

We may be able to get along at compile time without the dynamic loader functionality if we need to, but it would be nice to have it. Perhaps, your project will function in a limited manner without it. Sometimes you just have to bail out with a compiler error (as this code does) if the key functionality is missing. Often this is an acceptable first-attempt solution, until someone comes along and adds support to the code base for some other dynamic loader service that is perhaps available on non-dlfcn-oriented systems.

One important point here is that config.h is only included if HAVE_CONFIG_H is defined in your compilation environment. But doesn’t that definition happen in config.h?! The short answer is no. HAVE_CONFIG_H must be either defined by you on your compiler command line, or automatically defined on the compiler command line by automake-generated Makefiles. (Are you beginning to get the feeling that autoconf really shines when used in conjunction with automake?)

Checks for Programs

The AC_PROG_CC macro ensures that you have a working C language compiler. This call was added to configure.scan when autoscan noticed that I had C source files in my project directory. If I’d had files suffixed with .cxx or .C, it would have inserted a call to the AC_PROG_CXX macro, as well.

Other important programs you might need to check for are lex and yacc, sed or awk, etc. If so, you can add calls to AC_PROG_LEX, AC_PROG_YACC, AC_PROG_SED, or AC_PROG_AWK yourself. There are about a dozen different programs you can check for using these specialized macros.

If you need to check for the existence of a program not covered by these more specialized macros, you can call the generic AC_CHECK_PROG macro, or you can write your own special purpose macro – we’ll cover writing macros later. Now let me highlight a common problem with autoconf. Take a look at the formal definition of AC_CHECK_PROG found in the autoconf manual:

AC_CHECK_PROG(variable, prog-to-check-for, value-if-found, [value-if-not-found], [path], [reject])

Check whether program prog-to-check-for exists in PATH. If it is found, set variable to value-if-found, otherwise to value-if-not-found, if given. Always pass over reject (an absolute file name) even if it is the first found in the search path; in that case, set variable using the absolute file name of the prog-to-check-for found that is not reject. If variable was already set, do nothing. Calls AC_SUBST for variable.

I can extract the following clearly defined functionality from this description:

If prog-to-check-for is found in the system search path, then variable is set to value-if-found, otherwise it’s set to value-if-not-found.
If reject is specified (as a full path), then skip it if it’s found first, and continue to the next matching program in the system search path.
If reject is found first in the path, and then another match is found besides reject, set variable to the absolute path name of the second (non-reject) match.
If variable is already set by the user in the environment, then variable is left untouched (thereby allowing the user to override the check by setting variable before running autoconf).
AC_SUBST is called on variable to make it an autoconf substitution variable.

At first read, there appears to be a terrible conflict of interest here: We can see in point 1 that variable will be set to one or the other of two specified values, based on whether or not prog-to-check-for is found in the system search path. But then in point 3 it states that variable will be set to the full path of some program, but only if reject is found first and skipped. Clearly the documentation needs a little work.

Discovering the real functionality of AC_CHECK_PROG is as easy as reading a little shell script. While you could spend your time looking at the definition of AC_CHECK_PROG in /usr/share/autoconf/autoconf/programs.m4, the problem with this approach is that you’re one level removed from the actual shell code performing the check. Wouldn’t it be better to just look at the resulting shell script generated by AC_CHECK_PROG? Okay, then modify your new configure.ac file in this manner (the changes are highlighted):

...
AC_PREREQ(2.59)
AC_INIT([jupiter], [1.0], [jupiter-devel@lists.example.com])
AC_CONFIG_SRCDIR([src/main.c])
AC_CONFIG_HEADER([config.h])

# Checks for programs.
AC_PROG_CC
AC_CHECK_PROG([bash_var], [bash], [yes], [no], [$PATH], [/usr/sbin/bash])
...

Now just execute autoconf and then open the resulting configure script and search for something specific to the definition of AC_CHECK_PROG. I used the string “ac_cv_prog_bash_var”, a shell variable generated by the macro call. You may have to glance at the definition of a macro to find reasonable search text:

$ autoconf
$ vi -c /ac_cv_prog_bash_var configure
...
# Extract the first word of "bash", so it can be a program name with args.
set dummy bash; ac_word=$2
echo "$as_me:$LINENO: checking for $ac_word" >&5
echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6
if test "${ac_cv_prog_bash_var+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  if test -n "$bash_var"; then
  ac_cv_prog_bash_var="$bash_var" # Let the user override the test.
else
  ac_prog_rejected=no
as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
for as_dir in $PATH
do
  IFS=$as_save_IFS
  test -z "$as_dir" && as_dir=.
  for ac_exec_ext in '' $ac_executable_extensions; do
  if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
    if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/sbin/bash"; then
       ac_prog_rejected=yes
       continue
     fi
    ac_cv_prog_bash_var="yes"
    echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5
    break 2
  fi
done
done

if test $ac_prog_rejected = yes; then
  # We found a bogon in the path, so make sure we never use it.
  set dummy $ac_cv_prog_bash_var
  shift
  if test $# != 0; then
    # We chose a different compiler from the bogus one.
    # However, it has the same basename, so the bogon will be chosen
    # first if we set bash_var to just the basename; use the full file name.
    shift
    ac_cv_prog_bash_var="$as_dir/$ac_word${1+' '}$@"
  fi
fi
  test -z "$ac_cv_prog_bash_var" && ac_cv_prog_bash_var="no"
fi
fi
bash_var=$ac_cv_prog_bash_var
if test -n "$bash_var"; then
  echo "$as_me:$LINENO: result: $bash_var" >&5
echo "${ECHO_T}$bash_var" >&6
else
  echo "$as_me:$LINENO: result: no" >&5
echo "${ECHO_T}no" >&6
fi
...

If you did happen to look at the definition of AC_CHECK_PROG, your first thought might have been, “Don’t these people know how to indent?!”. Remember the m4 white space rules: Leading white space is stripped off, but trailing white space is part of the argument value. This rule pretty much defines your indentation style when writing m4 macros. A glance at the resulting shell code shows that the authors of AC_CHECK_PROG really understand this concept. The macro definition may look terrible, but the generated code isn’t too bad.

Wow! We immediately see by the opening comment that AC_CHECK_PROG has some undocumented functionality: You can pass in arguments with the program name if you wish. But why would you want to? Well, let’s look farther. We can probably fairly accurately deduce that the reject parameter was added into the mix in order to allow your configure script to search for a particular version of a tool. (Could it possibly be that someone might really rather use the GNU C compiler instead of the Solaris C compiler?)

In fact, it appears that variable really is set based on a tri-state condition. If reject is not used, then variable can only be either value-if-found or value-if-not-found. But if reject is used, then variable can also be the full path of the first program found that is not reject! Well, that is exactly what the documentation stated, but examining the generated code yields insight into the authors’ intended use of this macro. We probably should have called AC_CHECK_PROG this way, instead:

AC_CHECK_PROG([bash_shell], [bash -x], [bash -x],,, [/usr/sbin/bash])

Now it makes more sense, and we can see by this example that the manual is in fact accurate, if not clear. If reject is not specified, and bash is found in the system path, then bash_shell will be set to bash -x. If it’s not found in the system path, then bash_shell will be set to the empty string. If, on the other hand, reject is specified, and the undesired version of bash is found first in the path, then bash_shell will be set to the full path of the next version found in the path, along with the originally specified arguments (-x). The bash_shell variable may now be used by the rest of our script to run the desired bash shell, if it doesn’t test out as empty. Wow! No wonder it was hard to document in a way that’s easy to understand! But quite frankly, a good example of the intended use of this macro, along with a couple of sentences of explanation would have made all the difference.

Does (Project) Size Matter?

An issue that might have occurred to you by now is the size of my toy project. I mean, c’mon! One source file?! But, I’ve used autoscan to generate configure scripts for projects with several hundred C++ source files, and some pretty complex build steps. It takes a few seconds longer to run autoscan on a project of this size, but it works just as well. For a basic build, the generated configure script only needed to be touched up a bit – project name, version, etc. To add in compiler optimization options for multiple target tool sets, it took a bit more work. I’ll cover these sorts of issues in another article.

Summary

Well, that was a lot, but it was fairly important that we cover these basics in one shot. Even so, we only made it to the half-way point in our new configure.ac file. I’ll cover the rest of it in the next article. Meanwhile, spend some time generating configure.ac files using autoscan against some of your projects. And listen, don’t get discouraged by all this configuration functionality. I’m guessing that after reading this article, you’re swimming in configuration soup so thick you’re wondering if it’s all worth it! Don’t worry, most of it is optional and is intended to be added incrementally as needed.

For instance, assume you are writing your program to run on Red Hat Linux – say Fedora 6 – and you’ve used a very basic configure.ac file to manage your configuration – perhaps the very one generated by autoscan, with the few necessary required modifications. You may not even be including config.h yet in any of your source files. That’s okay. But then later, a collegue decides he wants to run your code on Solaris. He tries to build your tarball and notices that some portions of your code are not tailored to Solaris very well – a structure field is different, or a library is missing. No problem, you tell him – you can just add certain checks to your configure.ac, and perhaps add config.h to a problematic source file in order to allow your code to determine at compile time the special cases needed for a Solaris build. This is how 99 percent of all autoconf-based project configuration scripts get to where they are today – incrementally.

Digging Deeper into Autoconf (AT/2)

Feb20

I left off last time with a brief overview of autoconf. This time I’m going to dig a bit deeper. I’ll show you how autoconf works from the inside out, and give you a few insights on where to look when you can’t find the answer in the documentation – or even in a Google search.

Transforming Text

Autoconf is a text stream transforming tool. The input is your configure.ac file, some autoconf input files, and several autoconf macro definition files. The output is your configure script. The transforming tool is simply the m4 macro processor. To be clear, autoconf does little more than call m4 on your configure.ac file (sometimes called configure.in – an older, deprecated naming convention for autoconf input files) to generate your configure script. This fact is almost worth repeating, because it’s the root cause of most people’s frustration with autoconf. Thus, a solid understanding of m4 syntax and semantics is key to understanding why certain issues crop up in configure.ac files.

An Autoconf-Oriented M4 Tutorial

Let’s begin with a short tutorial on m4 – I mean long enough to be useful, but short enough to be interesting. I’ll tell you just what you need to know to use autoconf well. Check out the GNU m4 manual for a much more in-depth treatise on the subject.

The fundamental job of a macro processor is text replacement. Sounds simple, but all sorts of problems crop up when the text you’re trying to replace contains the meta-characters or strings that are supposed to delineate text you’re trying to replace.

There are two basic approaches to this problem. The first is to choose meta-characters or delineation strings that no one would ever consider using in their document. The second is to define some sort of escaping mechanism that will allow authors to indicate to the processor which meta-characters or delineation strings should be taken literally, and which should be considered by the processor as such. This is called quoting or escaping. Since m4 is a general purpose macro processor, it’s difficult to conceive of a set of delimiter characters or words that would be considered unnecessary within the scope of all possible input languages. However, m4 uses both of these mechanisms to some degree. How this is so will become clear in a few paragraphs.

m4 parses a stream of input text into a series of tokens. Each token is interpreted in one of three different ways: as a name, as a quoted string, or as any other character that is not a name or a quoted string. In addition, m4 recognizes a secondary form of quoting specifically designed for comments.

Names are sequences of alpha-numeric characters (letters, digits and underscores), where the first character is not a digit. A quoted string is any sequence of text within m4 quote delimiters, which by default are the characters ` and ‘. Comments are sequences of characters delimited by the comment delimiters, which by default are the characters # and CR (the carriage-return, new-line or end-of-line delimiter).

All other characters in the input stream are treated individually as separate tokens. I emphasize this statement because understanding this concept is very important. It means that white space characters are each individual tokens. Unlike a C language lexical analyzer, which simply treats a run of white space characters as a token separator, m4 treats two consecutive SPACE characters as two separate individual tokens. I’ll wager that most of your configure.ac parsing and expansion problems will disappear when you fully grasp the meaning of this statement and then apply it to your build system.

m4 provides a built-in macro, changequote, that modifies the quoting delimiters. Because unbalanced single quotes are often found in shell script, and because unbalanced square brackets are rare in shell script, autoconf changes the m4 quote characters from ` and ‘ to [ and ].

Let’s take a short text file (which I’ve named sample1.in) through m4 and see what comes out. Play with the spacing on the text a bit and notice the changes to the output, bearing in mind my earlier comments on white space as tokens. In this example, I also introduce the dnl macro:

changequote([,])dnl
Some text that will pass through with no changes.
[[Here's a quoted line of text.]]
# Here's a comment

Like many Unix utilities, m4 is written as a filter, which means that, by default, it accepts input on STDIN and sends output to STDOUT. A single file name on the command line redirects STDIN from the file, and output may be redirected to a file if desired. Here is the m4 command and the output:

$ m4 sample1.in
Some text that will pass through with no changes.
[Here's a quoted line of text.]
# Here's a comment
$

The first thing to notice here is the results of processing the changequote and dnl macros. The changequote macro expands to nothing, but has some interesting side effects. You already (mostly) understand changequote, but dnl is new. This is another m4 built-in macro whose sole purpose is to discard the rest of a line of text, including the terminating CR, passing none of it through as output. Effectively dnl expands to less than nothing. The name of the macro is actually an acronym for “do not load”. Without dnl the output text would contain an extra CR before the word, “Some”. Why? Remember the rules: White space characters are tokens, not garbage. If it’s not a macro name or quoted text, then it’s passed through unchanged, along with all the other tokens in the input stream.

Another item of interest is the reduction in quote nesting on the quoted string. I passed in “[[text]]”, but I got back “[text]”. m4 removes one level of quoting with each pass through a stream of input text. This is often another sore spot with autoconf users. A solid understanding of m4 quote removal will fix some of the nastiest problems you’re likely to encounter when writing configure.ac files. When you define a macro with embedded quotes, the quotes are removed in the definition text. When you then call that macro, the call is replaced with the unquoted version of the text, and the expanded text is again processed for macro calls. If you wanted the expanded text to contain the quotes, then you’ll have to double quote it in the macro definition.

Macros themselves are words, optionally followed by a set of parenthesis encapsulating a comma-separated list of arguments. Each time m4 parses a word token, it scans its macro definition list for a match. If it finds one, then it looks for an open parenthesis as the next token. All macro arguments are optional, and may be omitted. Missing argument are treated by m4 as if the empty string had been passed. If leading or middle arguments are not needed, they may be omitted, but the separating comma characters must be in place. If a trailing argument is not needed it may simply be omitted, along with its preceeding comma separator.

In fact, the changequote macro is written in such a way that it may be used with zero, one or two arguments. Zero arguments means “reset to the default quote delimiters” – ` and ‘. In this case, the entire argument list may be omitted, including the parentheses. One argument means, “Use the parameter value as the opening quote delimiter, and CR as the closing quote delimiter”. Here’s another interesting example to lead us into the next aspect of macro expansion (sample2.in):

changequote([ ,])dnl
[Here's an (apparently) quoted line of text.]
[ Here's a real quoted line of text.]

Processing with m4 generates the following output:

$ m4 sample2.in
[Here's an (apparently) quoted line of text.]
Here's a real quoted line of text.
$

Can you see what happened here? White space matters. I added one space after the first [ in the changequote argument list. The opening quote is now “[ “, not “[“, and m4 honors this request. There are actually some strange rules about using white space in argument lists, and unfortunately they’re important, so here they are:

The opening parenthesis in a macro call must immediately follow the macro name, or the macro is called with no arguments and the argument list is treated as part of the input text stream.
If too few arguments are passed in a macro call, the missing arguments are each considered to be the empty string.
If too many arguments are passed in a macro call, m4 will discard the extra arguments up to the closing parenthesis.
Unquoted leading white space (SPACE, TAB, VTAB, CR, LF, and FF) are discarded, but unquoted trailing white space is considered part of the argument.

Regarding the last point in the list above, quoted white space is always considered part of an argument, so it pays to always quote your arguments. Even when you do, however, spacing can still bite you. I’ll prove it with another example (sample3.in):

changequote(`[' , `]')dnl
`Unquoted text'
[Unquoted text]
[ Quoted text]

Processing with m4 reveals some apparently strange behavior:

$ m4 sample3.in
`Unquoted text'
[Unquoted text]
Quoted text
$

“Wha…?”, you ask? Remember the rules: Trailing white space is always considered to be part of the argument, even if that trailing white space follows the closing quote of a quoted portion of the argument. An argument to a macro call is defined as the entire string of input tokens between the opening parenthesis or the comma following the previous argument, and the next comma or closing parenthesis, minus any leading unquoted white space. Okay, just one more example (sample4.in):

define(macro,$1)dnl
macro(

      `  '... text `quoted text'

      a
, ...)

In this example, macro is defined to expand to its first argument. The result of calling m4 on sample4.in is then:

$ m4 sample4.in
  ... text quoted text

      a

$

Leading unquoted white space; spaces, tabs, line feeds, etc., are discarded, but everything else up to the comma is part of the argument.

What Comes with Autoconf?

Well, that’s enough of m4 for now, but I’ll return to it later when I discuss defining your own autoconf macros. In the meantime, let’s look at what comes with the autoconf distribution. I’m going to look at the file list that installs with an autoconf rpm on an rpm-based Linux distribution. Sometimes I find that it’s easier to comprehend a package if I know what’s in it, so long ago I learned to use the rpm -ql command to list the files installed by a package:

$ rpm -ql autoconf
/usr/bin/autoconf
/usr/bin/autoheader
/usr/bin/autom4te
/usr/bin/autoreconf
/usr/bin/autoscan
/usr/bin/autoupdate
/usr/bin/ifnames
/usr/share/autoconf/Autom4te
/usr/share/autoconf/Autom4te/C4che.pm
/usr/share/autoconf/Autom4te/ChannelDefs.pm
/usr/share/autoconf/Autom4te/Channels.pm
/usr/share/autoconf/Autom4te/Configure_ac.pm
/usr/share/autoconf/Autom4te/FileUtils.pm
/usr/share/autoconf/Autom4te/General.pm
/usr/share/autoconf/Autom4te/Request.pm
/usr/share/autoconf/Autom4te/Struct.pm
/usr/share/autoconf/Autom4te/XFile.pm
/usr/share/autoconf/INSTALL
/usr/share/autoconf/autoconf/autoconf.m4
/usr/share/autoconf/autoconf/autoconf.m4f
/usr/share/autoconf/autoconf/autoheader.m4
/usr/share/autoconf/autoconf/autoscan.m4
/usr/share/autoconf/autoconf/autotest.m4
/usr/share/autoconf/autoconf/autoupdate.m4
/usr/share/autoconf/autoconf/c.m4
/usr/share/autoconf/autoconf/fortran.m4
/usr/share/autoconf/autoconf/functions.m4
/usr/share/autoconf/autoconf/general.m4
/usr/share/autoconf/autoconf/headers.m4
/usr/share/autoconf/autoconf/lang.m4
/usr/share/autoconf/autoconf/libs.m4
/usr/share/autoconf/autoconf/oldnames.m4
/usr/share/autoconf/autoconf/programs.m4
/usr/share/autoconf/autoconf/specific.m4
/usr/share/autoconf/autoconf/status.m4
/usr/share/autoconf/autoconf/types.m4
/usr/share/autoconf/autom4te.cfg
/usr/share/autoconf/autoscan/autoscan.list
/usr/share/autoconf/autotest/autotest.m4
/usr/share/autoconf/autotest/autotest.m4f
/usr/share/autoconf/autotest/general.m4
/usr/share/autoconf/m4sugar/m4sh.m4
/usr/share/autoconf/m4sugar/m4sh.m4f
/usr/share/autoconf/m4sugar/m4sugar.m4
/usr/share/autoconf/m4sugar/m4sugar.m4f
/usr/share/autoconf/m4sugar/version.m4

The only files I removed from this list (for the sake of brevity) were documentation files and directory names. That said, what you see here is the entire package. Now, let’s consider the set of files. The installed binaries include:

autoconf
autoheader
autom4te (pronounced “automate” – a bit of leet-speak here)
autoreconf
autoscan
autoupdate
ifnames.

I’ve already briefly discussed autoscan. The autoupdate utility is simliar to autoscan, except that it updates an existing autoconf project to the current version of autoconf. If you wrote your project to an eariler version of autoconf, then autoupdate will update your configure.ac syntax to that of the currently installed version.

The autoheader and autom4te utilities, along with the .pm files used by autom4te, really deserve their own articles, so I’ll skip these for now. The ifnames program is used internally by autoconf (although you can read the man page and use it if you want to).

The autoreconf utility updates all of your generated files if they are older than the files used to generate them. Basically, you can quickly ensure that your build system is up to date by running autoreconf, instead of autoconf. If all files are already up to date, then nothing happens.

What’s left are all .m4 files. These files are all macro files and base configuration files used by autoconf to generate your configure script from your configure.ac file.

Finding Macro Definitions

Now that you know something about m4, and where autoconf files are installed, it’s a simple matter of recursively grep’ing the /usr/share/autoconf directory for the name of the macro to find out how it’s defined. Okay, it’s not that simple, but it’s helped me lots of times to figure out just what a macro actually does, despite what the manual tells me it does. Additionally, sometimes the manual just isn’t detailed enough to help me understand the full scope of a macro. And google searches just don’t do the trick when you’re trying to figure out how to use an autoconf macro.

Summary

Okay, that’s enough for today. If you understood the examples and followed the discussion without too much head scratching, then you passed. Pat yourself on the back, but don’t stop there. Experiment with m4 to get a feel for it. Just run ‘m4’ from the command prompt and start entering text. In an interactive m4 session, each line of text is processed immediately, but the results of previous lines are remembered for the session (use Ctrl-D to quit). For example, if you call changequote, the quotes will be changed for subsequent input lines. Playing with m4 will help you to get a handle on the rules. The rules are simple, but rather strict, so you must internalize them in order to become proficient at writing autoconf input files.

Next time I’ll focus on some of the more important predefined autoconf macros.

Autotools: The Learning Curve Lessens – Finally!

Feb14

The Long and Winding Road

I’ve been waiting a LONG time to write this blog entry – over 5 years. Yesterday, after a final couple of eye-opening epiphanies, I think I finally got my head around the GNU Autotools well enough to explain them properly to others. This blog entry begins a series of articles on the use of Autotools. The hope is that others will not have to suffer the same pain-staking climb.

If the articles are well received, there may be a book in it in the end. Believe me, it’s about time for a really good book on the subject. The only book I’m aware of (besides the GNU software manuals) is the New Rider’s 2000 publication of GNU AUTOCONF, AUTOMAKE and LIBTOOL, affectionately known in the community as “The Goat Book”, and so named for the picture on the front cover.

The authors, Gary Vaughan, Ben Elliston, Tom Tromey and Ian Lance Taylor, are well-known in the industry, to say the least – indeed, they’re probably the best people I know of to write such a book. However, as fast as open source software moves these days, a book published in 2000 might as well have been published in 1980. Nevertheless, because of the absolute need for any book on this subject, it’s still being sold new in bookstores. In fairness to the authors, they’ve maintained an online copy through February of 2006 (as of the last time I checked). Regardless, even two years is too long in this business.

As well as it’s written, the biggest gripe I have with the Goat Book is the same gripe I have with the GNU manuals themselves. I’m talking about the sheer number of bits of information that are just assumed to be understood by the reader. The situation is excusable – even reasonable – in the case of the manuals, due to the limited scope of a software manual. My theory is that these guys have been in the business for so long (decades, actually) that many of these topics have become second-nature to them.

The problem, as I see it, is that a large percentage of their readership today are young people just starting out with Unix and Linux. You see, most of these “missing bits” are centered around Unix itself. Sed, for example: What a dream of a tool to work with – I love it! More to the point, however: A solid understanding of the basic functionality of sed is important to grasping the proper use of Autotools. This is true because much of the proper use of Autotools truly involves the proper extension of Autotools.

Another problem is that existing documentation is more reference material than solution-oriented information. I’ll try to make these articles solve real problems, rather than just find new ways to regurgitate the same old reference material found in the manuals.

As you’ve no doubt gathered by now, I’m not an expert on this topic. I don’t have decades of experience in Unix or Linux – well, no more than one decade anyway. But I am a software engineer with significant experience in systems software design and development on multiple hardware and software platforms. As I mentioned in my opening paragraph, I’ve worked extensively with Autotools for about 5 years now. Most of that time was spent in trying to get these tools to do things the wrong way – before finally discovering the way others were doing it.

Claiming not to be an expert gives me a bit of literary – and technical – latitude. To put it bluntly, I’m hoping to gather comments on these articles. So I state here and now: Please comment. I welcome all comments on methods, techniques, tradeoffs and even personal experiences.

I make this statement right up front for the sake of my integrity. I seem to recall a few years ago that Herb Sutter posted a series of articles on the C++ moderated newsgroup entitled GotW – an acronym for “Guru of the Week”. Each article presented a problem in object-oriented software design, specifically related to C++, and elicited responses from the community at large. In and of itself, it was a great idea, and the response was overwhelming. I very much enjoyed reading the GotW threads. But I must say that it surprised me a bit when I saw a book published a year later – Exceptional C++ – that contained most of the information in these threads. Well, I say, good for Herb. And in fairness, perhaps he didn’t plan to write the book until after he’d received such great response. But it feels more comfortable to me to indicate my intentions up front.

Who Should Use Autotools?

I’m going to make a rather broad and sweeping statement here: If you’re writing open source software targeting Unix or Linux systems, then you should be using GNU Autotools. I’m sure I sound a bit biased here. I shouldn’t be, given the number of long nights I’ve spent working around what appeared to be shortcomings in the Autotools system. Normally, I would have been angry enough to toss the entire thing out the window and write a good hand-coded Makefile. But the one truth that I always came back to was the fact that there are literally thousands of projects out there that are apparently very successfully using Autotools. That was too much for me. My pride wouldn’t let me give up.

What if you don’t work on open source software? What if you’re writing proprietary software for Unix or Linux systems? Then, I say, you should still be using Autotools. Even if you only ever intend to target a single distribution of Linux, Autotools will provide you with a build environment that is flexible enough to allow your project to build successfully on future versions or distributions with virtually no changes to the build scripts. This fact, in and of itself, is enough to warrant my statement.

In fact, about the only scenario where it makes sense NOT to use GNU Autotools is the one in which you are writing software for non-Unix platforms only – Microsoft Window comes to mind. Some people will tell you that Autotools can be successfully used on Windows as well, but my opinion is that the POSIX-based approach to software configuration management is just too alien for Windows development. While it can be done, the tradeoffs are too significant to justify the use of an unmodified version of Autotools on Windows.

I’ve seen some project managers develop a custom version of Autotools that allows the use of all native Windows tools. These projects were maintained by people who spent much of their time reconfiguring Autotools to do things it was never intended to do in a totally hostile and foreign environment. Quite frankly, Microsoft has some of the best tools on the planet for Windows software development. If I were developing a Windows software package, I’d use Microsoft’s tools exclusively. In fact, I often write portable software that targets both Linux and Windows. In these cases, I maintain two separate build environments – one for Windows, and one based on Autotools for everything else.

An Overview of Autoconf

If you’ve ever downloaded, built and installed software from a “tarball” (a gzipped or bzipped tar archive, often sporting one of the common extensions, .tar.gz, .tgz or .tar.bz2), then you’re well aware of the fact that there is a common theme to this process. It usually looks something like this:

$ gzip -cd hackers-delight-1.0.tar.gz | tar -xvf -
...
$ cd hackers-delight-1.0
$ ./configure
$ make all
$ sudo make install

NOTE: I have to assume some level of information on your part, and I’m stating right now that this is it. If you’ve performed this sequence of commands before, and you know what it means, then you’ll have no trouble following these articles.

Most developers know and understand the purpose of the make utility. But what’s this “configure” thing? The use of configuration scripts (often named simply, “configure”) started a long time ago on Unix systems because of variety imposed by the fast growing and divergent set of Unix platforms. It’s interesting to note that while Unix systems have generally followed the defacto-standard Unix kernel interface for decades, most software that does anything significant generally has to stretch outside the boundaries. I call it a defacto-standard because POSIX wasn’t actually standardized until recently. POSIX as a standard was more a documentation effort than a standardization effort, although it is a true standard today. It was designed around the existing set of Unix code bases, and for good reason – it takes a long time to incorporate significant changes into a well-established operating system kernel. It was easier to say, “Here’s how it’s currently being done by most.”, than to say, “Here’s how it should be done – everyone change!” Even so, most systems don’t implement all facets of POSIX. So configure scripts are designed to find out what capabilities your system has, and let your Makefiles know about them.

This approach worked well for literally decades. In the last 15 years however, with the advent of dozens of Linux distributions, the explosion of feature permutations has made writing a decent configure script very difficult – much more so than writing the Makefiles for a new project. Most people have generated configure scripts for their projects using a common technique – copy and modify a similar project’s configure script.

Autoconf changed this paradigm almost overnight. A quick glance at the AUTHORS file in the Savannah Autoconf project repository will give you an idea of the number of people that have had a hand in the making of autoconf. The original author was David MacKenzie, who started the autoconf project as far back as 1991. Now, instead of modifying, debugging and losing sleep over literally thousands of lines of supposedly portable shell script, developers can write a short meta-script file, using a concise macro API language, and let autoconf generate the configure script.

A generated configure script is more portable, more correct, and more maintainable than a hand-code version of the same script. In addition, autoconf often catches semantic or logic errors that the author would have spent days debugging. Before autoconf, it was not uncommon for a project developer to literally spend more time on the configure script for the build environment than on the project code itself!

What’s in a Configure Script?

The primary tasks of a typical configure script are:

Generate an include file (often called config.h) for inclusion by project source code.
Set environment variables so that make can quietly select major build options.
Set user options for a particular make environment – such as debug flags, etc.

For more complex projects, configure scripts often generated the project Makefile(s) from one or more templates maintained by the project manager. A Makefile template would contain configuration variables in an easily recognized format. The configure script would replace these variables with values determined during configuration – either from command line options specified by the user, or from a thorough analysis of the platform environment. Often this analysis would entail such things as checking for the existence of certain include files and libraries, searching various file system paths for required utilities, and even running small programs designed to indicate the feature set of the shell or C compiler. The tool of choice here for variable replacement was sed. A simple sed command can replace all of the configuration variables in a Makefile template in a single pass through the file.

Autoconf to the Rescue

Praise to David MacKenzie for having the foresight to – metaphorically speaking – stop and sharpen the axe! Otherwise we’d still be writing (copying) and maintaining long, complex configure scripts today.

The input to autoconf is … (drum roll please) … shell script. Man, what an anti-climax! Okay, so it’s not pure shell script. That is, it’s shell script with macros, plus a bunch of macro definition files – both those that ship with an autoconf distribution, as well as those that you write. The macro language used is called m4. “m-what?!”, you ask? The m4 utility is a general purpose macro language processor that was originally written by none other than Brian Kernighan and Dennis Ritchie in 1977. (The name m4 means “m plus 4 more letters” or the word “macro” – cute, huh?).

Some form of the m4 macro language processor is found on every Unix and Linux variant (as well as other systems) in use today. In fact, this proliferance is the primary reason for it’s use in autoconf. The design goals of autoconf included primarily that it should run on all systems without the addition of complex tool chains and utility sets. Autoconf depends on the existence of relatively few tools, including m4, sed and some form of the bourne shell, as well as many of the standard Unix utilities such as chmod, chown, mkdir, rm, ln and others. Autoconf generates somewhere around 15 thousand lines of portable shell script code that is unrelated to any additional code that you add to it’s main input file! This overhead is boiler plate functionality that existed in most of the well-designed configure scripts that were written (copied) and maintained in the days before autoconf.

Autoconf in Action

Probably the easiest way to get started with autoconf is to use the autoscan utility to scan your project directory from the root down, and generate the necessary configure.ac script – the primary input file to autoconf. If you’d rather do it manually, you can start with as few as three macro calls, as follows:

# configure: generated from configure.ac by autoconf
AC_INIT([my-project], [1.0])
AC_CONFIG_FILES([Makefile])
AC_OUTPUT

echo "Configuration for package ${PACKAGE}, version ${VERSION} complete."
echo "Now type 'make' to continue."

In future articles, I’ll build on this initial script by adding additional macros and shell script to solve various problems that I’ve run into in the past. I believe these to be common problems related to build environments, and I expect others will feel the same.

AC_INIT actually takes three parameters: The package name, the package version and an email address for reporting bugs. The email address is optional and m4 allows trailing parameters (and separating commas) to simply be omitted, as shown in the example. AC_INIT sets some project definitions that are used throughout the rest of the generated configuration script. These variables may be referenced later in the configure.ac script as the environment variables ${PACKAGE} and ${VERSION}, as indicated by the echo statements at the bottom of the script.

This example assumes you have a template for your Makefile called Makefile.in in your top-level project directory (next to the configure.ac script). This file should look exactly like your Makefile, with one exception. Any text you want autoconf to replace should be marked with autoconf replacement variables, like this:

# Makefile: generated from Makefile.in by autoconf

PACKAGE = @PACKAGE@
VERSION = @VERSION@

all : $(PACKAGE)

$(PACKAGE) : main.c
    echo "Building $(PACKAGE), version $(VERSION)."
    gcc main.c -o $@

In fact, any file you list in AC_CONFIG_FILES (separated by white space) will be generated from a file of the same name with a .in extension, and found in the same directory. Autoconf generates sed commands into the configure script that perform this simple string replacement when the configure script is executed. Sed is a Stream EDitor, which is a fancy way of saying that it doesn’t require an entire source file to be loaded into memory while it’s doing it’s thing. Rather, it watches a stream of bytes as they go by, replacing text in the stream with other text, as specified on it’s command line. The expression list passed to sed by the configure script is built by autoconf from a list of variables defined by various autoconf macros, many of which we’ll cover in greater detail later.

Note in these example scripts, that we’ve used three different kinds of variables; autoconf replacement variables are text surrounded by ‘@’ signs, environment variables are indicated by normal shell syntax like this: ${variable}, and make variables, which are almost the same as shell variables, except that parenthesis are used instead of french braces: $(variable). In fact, we set make variables from the text replaced by autoconf at the top of Makefile.in. If you were to look at the contents of the generated Makefile, this is what you’d see:

# Makefile: generated from Makefile.in by autoconf

PACKAGE = my-project
VERSION = 1.0

all : $(PACKAGE)

$(PACKAGE) : main.c
    echo "Building $(PACKAGE), version $(VERSION)."
    gcc main.c -o $@

The important thing to notice here is that the autoconf variables are the ONLY items replaced in Makefile.in while generating the Makefile. The reason this is important to understand is that it helps you to realize the flexibility you have when allowing autoconf to generate a file from a template. This flexibility will become more apparent as we get into various use cases for the pre-defined autoconf macros, and even later when we delve into the topic of writing your own autoconf macros.

Summary

It would be a great learning experience to take an existing project and just apply autoconf to the task of generating your configure script. Forget about the rest of the Autotools right now. Just focus on autoconf. There are actually a fair number of popular open source projects in the world that only use autoconf.

I’ll continue my treatise on autoconf in the next article. In the meantime, please use the following short reading list to put you to sleep at night for the next month or so! You really do need the background information. I’ll try to cover the basics of each of these next time, but you’ll want to be at least familiar with them before then.

Open Sourcery

Technology, Open Source and Identity

Tag Archives: m4

Autoconf Macros, Exposed (AT/3)

Project Structure

Using Autoscan

Initialization and Package Information

Checks for Programs

Does (Project) Size Matter?

Summary

Digging Deeper into Autoconf (AT/2)

Transforming Text

An Autoconf-Oriented M4 Tutorial

What Comes with Autoconf?

Finding Macro Definitions

Summary

Autotools: The Learning Curve Lessens – Finally!

The Long and Winding Road

Who Should Use Autotools?

An Overview of Autoconf

What’s in a Configure Script?

Autoconf to the Rescue

Autoconf in Action

Summary