Autoconf Macros, Exposed (AT/3)

In my last article I mentioned that I’d get more into the standard autoconf macros this time. The easiest way to do this is really to start with a clean project directory and run autoscan on it. However, before we can do that, we should probably spend a little time on project organization.

Project Structure

What follows is a typical project directory hierarchy. Open source projects generally have some sort of catchy name – often they’re named after some past hero or ancient god or something. Let’s call this the jupiter project – mainly because that way I don’t have to come up with a name that matches some non-existent functionality! For jupiter, we’ll start with a directory structure something like this:

$ cd projects
$ mkdir -p jupiter/src
$ touch jupiter/Makefile
$ touch jupiter/src/Makefile
$ touch jupiter/src/main.c
$ cd jupiter
$

Woot! One directory called src, one C source file called main.c, and a Makefile for each of these two directories. Minimal yes, but hey, this is a new project and everyone knows that the key to a successful open source project is evolution, right? Start small and grow as required.

The Makefiles provide support for a few key targets in an open source project: all, clean and dist (to create a source tarball from our project directory structure). The top-level Makefile handles dist, while all and clean are passed down to src/Makefile. Here are the contents of each of the files in our project:

$ cat Makefile
package = jupiter
version = 1.0
distdir = $(package)-$(version)

all clean jupiter:
        $(MAKE) -C src $@

dist: $(distdir).tar.gz

$(distdir).tar.gz: $(distdir)
        tar chof - $(distdir) | gzip -9 -c >$(distdir).tar.gz
        rm -rf $(distdir)

$(distdir):
        -rm -rf $(distdir)
        mkdir -p $(distdir)/src
        cp Makefile $(distdir)
        cp src/Makefile $(distdir)/src
        cp src/main.c $(distdir)/src
$
$ cat src/Makefile
all: jupiter

jupiter: main.c
        gcc -g -O0 main.c -o jupiter

clean:
        -rm jupiter
$
$ cat src/main.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[])
{
        printf("Hello from %s!\n", argv[0]);
        return 0;
}
$

Using Autoscan

Now, as I mentioned up front, the simplest way to create a (mostly) complete configure.ac file is to run the autoscan utility, which is part of the autoconf package. This is so simple – just cd into the jupiter directory, and type “autoscan” at the prompt. In less than a second you’re left with a couple of new files in your project directory:

$ autoscan
autom4te: configure.ac: no such file or directory
autoscan: /usr/bin/autom4te failed with exit status: 1
$ ls -l
total 8
-rw-r--r-- 1 user users   0 2008-02-21 21:30 autoscan.log
-rw-r--r-- 1 user users 563 2008-02-21 21:30 configure.scan
-rw-r--r-- 1 user users 377 2008-02-21 20:50 Makefile
drwxr-xr-x 2 user users  96 2008-02-21 21:17 src
$

Okay, so what about that error message? While it is autoscan’s job to generate a configure.ac file, it sort of expects that you’ve taken a little initiative on your own and hand created something for it to analyze. Its actual job is to tell you what’s wrong with your existing configure.ac. Thus, if you run autoscan on a project directory that doesn’t contain an existing configure.ac, you’ll get the above warning, and autoscan will exit with an error. Regardless, it still creates a configure.scan file that works just fine – in fact, there’s really no difference between the file generated with or without an existing configure.ac file. However, it’s instructional to see what does happen when you run autoscan in a directory with an existing configure.ac – say an empty file of that name. We’ll clean up, add an empty configure.ac and run it again:

$ rm configure.scan autoscan.log
$ touch configure.ac
$ autoscan
configure.ac: warning: missing AC_CHECK_HEADERS([stdlib.h]) wanted by: src/main.c:2
configure.ac: warning: missing AC_HEADER_STDC wanted by: src/main.c:2
configure.ac: warning: missing AC_PREREQ wanted by: autoscan
configure.ac: warning: missing AC_PROG_CC wanted by: src/main.c
$ ls -l
total 12
drwxr-xr-x 2 user users 120 2008-02-21 21:35 autom4te.cache
-rw-r--r-- 1 user users 287 2008-02-21 21:35 autoscan.log
-rw-r--r-- 1 user users   0 2008-02-21 21:35 configure.ac
-rw-r--r-- 1 user users 563 2008-02-21 21:35 configure.scan
-rw-r--r-- 1 user users 377 2008-02-21 20:50 Makefile
drwxr-xr-x 2 user users  96 2008-02-21 21:17 src
$

First, notice that there’s an autom4te.cache directory that wasn’t there when we ran autoscan without a configure.ac. The reason for this is that autoscan acutally runs autoconf on the existing configure.ac with options to warn us when things are incorrect. This causes the cache directory to be created. But, enough playing around – let’s take a look at the generated configure.scan file. Since we don’t have an existing configure.ac (of any value), let’s rename the generated file right away:

$ mv configure.scan configure.ac
$ cat configure.ac
#                                               -*- Autoconf -*-
# Process this file with autoconf to produce a configure script.

AC_PREREQ(2.59)
AC_INIT(FULL-PACKAGE-NAME, VERSION, BUG-REPORT-ADDRESS)
AC_CONFIG_SRCDIR([src/main.c])
AC_CONFIG_HEADER([config.h])

# Checks for programs.
AC_PROG_CC

# Checks for libraries.

# Checks for header files.
AC_HEADER_STDC
AC_CHECK_HEADERS([stdlib.h])

# Checks for typedefs, structures, and compiler characteristics.

# Checks for library functions.

AC_CONFIG_FILES([Makefile
                 src/Makefile])
AC_OUTPUT
$

NOTE: The contents of your configure.ac file may differ slightly from mine, depending on the version of autoconf you have installed. I have version 2.59 installed, but if your version of autoscan is newer or older, you may see some slight differences.

autoscan really does a lot of the work for you. The GNU autoconf manual states that you should manually tailor this file to your project before using it. This is true, but there are only a few key issues. We’ll cover each of these as we come to them. Let’s start at the top of the file and work our way down.

Initialization and Package Information

The AC_PREREQ macro simply defines the lowest version of autoconf that may be used to successfully process this configure.ac script. The manual indicates that AC_PREREQ is the only macro that may be used before AC_INIT. The reason for this should be obvious – you’d like to be able to ensure you’re using a late enough version of autoconf before you begin processing any other macros, which may be version dependent. As it turns out, AC_INIT is not version dependent anyway, so you may place it first, if you’re so inclined.

The AC_INIT macro (as its name implies) initializes the autoconf system. It accepts up to three arguments, PACKAGE, VERSION, and an optional BUG-REPORT argument. The PACKAGE argument is intended to be the name of your package. It will end up as the first string in an automake-generated tarball when you run “make dist”. In fact, distribution tarballs will be named PACKAGE-VERSION.tar.gz, so bear this in mind when you choose your package name and version string.

m4 macro arguments, including VERSION, are just strings. m4 doesn’t attempt to interpret any of the text it processes, although it does have a built-in eval macro that will evaluate an expression and resolve it to a new string (eg., eval(2*2) becomes the string “4”). While the VERSION argument can be anything you like, there are a few conventions that will make life easier for you if you follow them. The widely used convention is to pass in MAJOR.MINOR (eg., 1.2). However, there’s nothing that says you can’t use MAJOR.MINOR.REVISION if you want, and it doesn’t do any harm to do this. The none of the resulting VERSION macros (autoconf, shell or make) are parsed or analysed anywhere, but only used in various places as replacement text, so if you want, you can even add non-numeric text into this macro, such as ‘0.15-alpha’.

One thing to note, however is that the AC_INIT arguments must be static text. That is, they can’t be shell variables, and autoconf will flag attempts to use shell variables in these arguments as errors. I once tried to use a shell variable in the VERSION argument so that I could substitute my Subversion revision number into the VERSION argument at configure time. I spent a couple of weeks trying to figure out how to trick autoconf into letting me use a shell variable as the REVISION field. Eventually, I discovered the following trick, which I implemented in my top-level Makefile.am:

distdir = $(PACKAGE)-$(VERSION).$(SVNREV)

The distdir make variable controls the name of the distribution directory and tarball file name generated by automake, and setting it in the top-level Makefile causes it to propagate down to all lower level Makefiles, as well. But I digress; this discussion is getting too deep into automake, so we’ll return to it in a later article specifically on automake.

The AC_CONFIG_SRCDIR macro is just a sanity check. Its purpose is to ensure that autoconf knows that the directory on which autoconf is being executed is in fact the correct project directory. The argument can be a relative path to any source file you like – I try to pick one that sort of defines the project – in case I ever decide to reorganize source code, I’m not likely to lose it, but it doesn’t really matter. If you do lose it, you can always change the argument passed to AC_CONFIG_SRCDIR.

The AC_CONFIG_HEADER macro is a little more functional. Its job is to specify the NAME of an include file that will be generated by autoconf from a template file called NAME.in (which is itself generated by autoheader – more on autoheader later). This template file contains C source code in the following format:

/* config.h.  Generated by configure.  */
/* config.h.in.  Generated from configure.ac by autoheader.  */

/* The build architecture of this build. */
#define BUILDARCH "i586"

/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1

...

/* Define to 1 if your <sys/time.h> declares `struct tm'. */
/* #undef TM_IN_SYS_TIME */

/* Version number of package */
#define VERSION "1.0"

This file is intended to be included in your source code in locations where you might wish to test a configured option in the code itself using the C preprocessor. For instance, in the sample config.h file above, it appears that autoconf has determined that we have the dlfcn.h header file on this system, so we might add the following code to a source file in our project that uses dynamic loader functionality:

...

#if HAVE_CONFIG_H
# include <config.h>
#endif

#if HAVE_DLFCN_H
# include <dlfcn.h>
#else
# error Sorry, this code requires dynamic loader (dlfcn.h) functionality.
#endif

...

#if HAVE_DLFCN_H
   handle = dlopen("/usr/lib/libwhatever.so", RTLD_NOW);
#endif

...

We may be able to get along at compile time without the dynamic loader functionality if we need to, but it would be nice to have it. Perhaps, your project will function in a limited manner without it. Sometimes you just have to bail out with a compiler error (as this code does) if the key functionality is missing. Often this is an acceptable first-attempt solution, until someone comes along and adds support to the code base for some other dynamic loader service that is perhaps available on non-dlfcn-oriented systems.

One important point here is that config.h is only included if HAVE_CONFIG_H is defined in your compilation environment. But doesn’t that definition happen in config.h?! The short answer is no. HAVE_CONFIG_H must be either defined by you on your compiler command line, or automatically defined on the compiler command line by automake-generated Makefiles. (Are you beginning to get the feeling that autoconf really shines when used in conjunction with automake?)

Checks for Programs

The AC_PROG_CC macro ensures that you have a working C language compiler. This call was added to configure.scan when autoscan noticed that I had C source files in my project directory. If I’d had files suffixed with .cxx or .C, it would have inserted a call to the AC_PROG_CXX macro, as well.

Other important programs you might need to check for are lex and yacc, sed or awk, etc. If so, you can add calls to AC_PROG_LEX, AC_PROG_YACC, AC_PROG_SED, or AC_PROG_AWK yourself. There are about a dozen different programs you can check for using these specialized macros.

If you need to check for the existence of a program not covered by these more specialized macros, you can call the generic AC_CHECK_PROG macro, or you can write your own special purpose macro – we’ll cover writing macros later. Now let me highlight a common problem with autoconf. Take a look at the formal definition of AC_CHECK_PROG found in the autoconf manual:

AC_CHECK_PROG(variable, prog-to-check-for, value-if-found, [value-if-not-found], [path], [reject])

Check whether program prog-to-check-for exists in PATH. If it is found, set variable to value-if-found, otherwise to value-if-not-found, if given. Always pass over reject (an absolute file name) even if it is the first found in the search path; in that case, set variable using the absolute file name of the prog-to-check-for found that is not reject. If variable was already set, do nothing. Calls AC_SUBST for variable.

I can extract the following clearly defined functionality from this description:

  1. If prog-to-check-for is found in the system search path, then variable is set to value-if-found, otherwise it’s set to value-if-not-found.
  2. If reject is specified (as a full path), then skip it if it’s found first, and continue to the next matching program in the system search path.
  3. If reject is found first in the path, and then another match is found besides reject, set variable to the absolute path name of the second (non-reject) match.
  4. If variable is already set by the user in the environment, then variable is left untouched (thereby allowing the user to override the check by setting variable before running autoconf).
  5. AC_SUBST is called on variable to make it an autoconf substitution variable.

At first read, there appears to be a terrible conflict of interest here: We can see in point 1 that variable will be set to one or the other of two specified values, based on whether or not prog-to-check-for is found in the system search path. But then in point 3 it states that variable will be set to the full path of some program, but only if reject is found first and skipped. Clearly the documentation needs a little work.

Discovering the real functionality of AC_CHECK_PROG is as easy as reading a little shell script. While you could spend your time looking at the definition of AC_CHECK_PROG in /usr/share/autoconf/autoconf/programs.m4, the problem with this approach is that you’re one level removed from the actual shell code performing the check. Wouldn’t it be better to just look at the resulting shell script generated by AC_CHECK_PROG? Okay, then modify your new configure.ac file in this manner (the changes are highlighted):

...
AC_PREREQ(2.59)
AC_INIT([jupiter], [1.0], [jupiter-devel@lists.example.com])
AC_CONFIG_SRCDIR([src/main.c])
AC_CONFIG_HEADER([config.h])

# Checks for programs.
AC_PROG_CC
AC_CHECK_PROG([bash_var], [bash], [yes], [no], [$PATH], [/usr/sbin/bash])
...

Now just execute autoconf and then open the resulting configure script and search for something specific to the definition of AC_CHECK_PROG. I used the string “ac_cv_prog_bash_var”, a shell variable generated by the macro call. You may have to glance at the definition of a macro to find reasonable search text:

$ autoconf
$ vi -c /ac_cv_prog_bash_var configure
...
# Extract the first word of "bash", so it can be a program name with args.
set dummy bash; ac_word=$2
echo "$as_me:$LINENO: checking for $ac_word" >&5
echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6
if test "${ac_cv_prog_bash_var+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  if test -n "$bash_var"; then
  ac_cv_prog_bash_var="$bash_var" # Let the user override the test.
else
  ac_prog_rejected=no
as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
for as_dir in $PATH
do
  IFS=$as_save_IFS
  test -z "$as_dir" && as_dir=.
  for ac_exec_ext in '' $ac_executable_extensions; do
  if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
    if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/sbin/bash"; then
       ac_prog_rejected=yes
       continue
     fi
    ac_cv_prog_bash_var="yes"
    echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5
    break 2
  fi
done
done

if test $ac_prog_rejected = yes; then
  # We found a bogon in the path, so make sure we never use it.
  set dummy $ac_cv_prog_bash_var
  shift
  if test $# != 0; then
    # We chose a different compiler from the bogus one.
    # However, it has the same basename, so the bogon will be chosen
    # first if we set bash_var to just the basename; use the full file name.
    shift
    ac_cv_prog_bash_var="$as_dir/$ac_word${1+' '}$@"
  fi
fi
  test -z "$ac_cv_prog_bash_var" && ac_cv_prog_bash_var="no"
fi
fi
bash_var=$ac_cv_prog_bash_var
if test -n "$bash_var"; then
  echo "$as_me:$LINENO: result: $bash_var" >&5
echo "${ECHO_T}$bash_var" >&6
else
  echo "$as_me:$LINENO: result: no" >&5
echo "${ECHO_T}no" >&6
fi
...

If you did happen to look at the definition of AC_CHECK_PROG, your first thought might have been, “Don’t these people know how to indent?!”. Remember the m4 white space rules: Leading white space is stripped off, but trailing white space is part of the argument value. This rule pretty much defines your indentation style when writing m4 macros. A glance at the resulting shell code shows that the authors of AC_CHECK_PROG really understand this concept. The macro definition may look terrible, but the generated code isn’t too bad.

Wow! We immediately see by the opening comment that AC_CHECK_PROG has some undocumented functionality: You can pass in arguments with the program name if you wish. But why would you want to? Well, let’s look farther. We can probably fairly accurately deduce that the reject parameter was added into the mix in order to allow your configure script to search for a particular version of a tool. (Could it possibly be that someone might really rather use the GNU C compiler instead of the Solaris C compiler?)

In fact, it appears that variable really is set based on a tri-state condition. If reject is not used, then variable can only be either value-if-found or value-if-not-found. But if reject is used, then variable can also be the full path of the first program found that is not reject! Well, that is exactly what the documentation stated, but examining the generated code yields insight into the authors’ intended use of this macro. We probably should have called AC_CHECK_PROG this way, instead:

AC_CHECK_PROG([bash_shell], [bash -x], [bash -x],,, [/usr/sbin/bash])

Now it makes more sense, and we can see by this example that the manual is in fact accurate, if not clear. If reject is not specified, and bash is found in the system path, then bash_shell will be set to bash -x. If it’s not found in the system path, then bash_shell will be set to the empty string. If, on the other hand, reject is specified, and the undesired version of bash is found first in the path, then bash_shell will be set to the full path of the next version found in the path, along with the originally specified arguments (-x). The bash_shell variable may now be used by the rest of our script to run the desired bash shell, if it doesn’t test out as empty. Wow! No wonder it was hard to document in a way that’s easy to understand! But quite frankly, a good example of the intended use of this macro, along with a couple of sentences of explanation would have made all the difference.

Does (Project) Size Matter?

An issue that might have occurred to you by now is the size of my toy project. I mean, c’mon! One source file?! But, I’ve used autoscan to generate configure scripts for projects with several hundred C++ source files, and some pretty complex build steps. It takes a few seconds longer to run autoscan on a project of this size, but it works just as well. For a basic build, the generated configure script only needed to be touched up a bit – project name, version, etc. To add in compiler optimization options for multiple target tool sets, it took a bit more work. I’ll cover these sorts of issues in another article.

Summary

Well, that was a lot, but it was fairly important that we cover these basics in one shot. Even so, we only made it to the half-way point in our new configure.ac file. I’ll cover the rest of it in the next article. Meanwhile, spend some time generating configure.ac files using autoscan against some of your projects. And listen, don’t get discouraged by all this configuration functionality. I’m guessing that after reading this article, you’re swimming in configuration soup so thick you’re wondering if it’s all worth it! Don’t worry, most of it is optional and is intended to be added incrementally as needed.

For instance, assume you are writing your program to run on Red Hat Linux – say Fedora 6 – and you’ve used a very basic configure.ac file to manage your configuration – perhaps the very one generated by autoscan, with the few necessary required modifications. You may not even be including config.h yet in any of your source files. That’s okay. But then later, a collegue decides he wants to run your code on Solaris. He tries to build your tarball and notices that some portions of your code are not tailored to Solaris very well – a structure field is different, or a library is missing. No problem, you tell him – you can just add certain checks to your configure.ac, and perhaps add config.h to a problematic source file in order to allow your code to determine at compile time the special cases needed for a Solaris build. This is how 99 percent of all autoconf-based project configuration scripts get to where they are today – incrementally.

Advertisements

2 comments on “Autoconf Macros, Exposed (AT/3)

  1. Rexanneqy says:

    well done, brother

Comments are closed.