.\" .\" aegis - project change supervisor .\" Copyright (C) 1997, 1998, 2001-2003, 2005-2008, 2010, 2012 Peter Miller .\" .\" This program is free software; you can redistribute it and/or modify .\" it under the terms of the GNU General Public License as published by .\" the Free Software Foundation; either version 3 of the License, or .\" (at your option) any later version. .\" .\" This program is distributed in the hope that it will be useful, .\" but WITHOUT ANY WARRANTY; without even the implied warranty of .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU .\" General Public License for more details. .\" .\" You should have received a copy of the GNU General Public License .\" along with this program. If not, see . .\" .nh 1 "Efficient Makefiles" The central theme of this paper is the .I semantic side\[hy]effects of artificially separating a \f(CWMakefile\fP into the pieces necessary to perform a recursive \fImake\fP. However, once you have a large number of \f(CWMakefile\fPs, the speed at which .I make can interpret this multitude of files also becomes an issue. .LP Builds can take \[lq]forever\[rq] for both these reasons: the traditional fixes for the separated DAG may be building too much .I and your \f(CWMakefile\fP may be inefficient. .nh 2 "Deferred Evaluation" The text in a \f(CWMakefile\fP must somehow be read from a text file and understood by .I make so that the DAG can be constructed, and the specified actions attached to the edges. This is all kept in memory. .LP The input language for \f(CWMakefile\fPs is deceptively simple. A crucial distinction that often escapes both novices and experts alike is that \fImake\fP's input language is .I "text based," as opposed to token based, as is the case for C or AWK. .I Make does the very least possible to process input lines and stash them away in memory. .LP As an example of this, consider the following assignment: .TS box,center; lf(CW). OBJ = main.o parse.o .TE Humans read this as the variable OBJ being assigned two filenames \[lq]main.o\[rq] and \[lq]parse.o\[rq]. But .I make does not see it that way. Instead OBJ is assigned the \fIstring\fP \[lq]main.o parse.o\[rq]. It gets worse: .TS box,center; lw(2.5i)f(CW). SRC = main.c parse.c OBJ = $(SRC:.c=.o) .TE In this case humans expect .I make to assign two filenames to OBJ, but .I make actually assigns the string \[lq]$(SRC:.c=.o)\[rq]. This is because it is a .I macro language with deferred evaluation, as opposed to one with variables and immediate evaluation. .LP If this does not seem too problematic, consider the following \f(CWMakefile\fP: .TS box,center; lw(2.5i)f(CW). SRC = $(shell echo 'Ouch!' \e 1>&2 ; echo *.[cy]) OBJ = \e $(patsubst %.c,%.o,\e $(filter %.c,$(SRC))) \e $(patsubst %.y,%.o,\e $(filter %.y,$(SRC))) .sp 0.2 test: $(OBJ) $(CC) \-o $@ $(OBJ) .TE How many times will the shell command be executed? .B "Ouch!" It will be executed .I twice just to construct the DAG, and a further .I two times if the rule needs to be executed. .LP If this shell command does anything complex or time consuming (and it usually does) it will take .I four times longer than you thought. .LP But it is worth looking at the other portions of that OBJ macro. Each time it is named, a huge amount of processing is performed: .IP \(bu 2n The argument to \fIshell\fP is a single string (all built\[hy]in\[hy]functions take a single string argument). The string is executed in a sub\[hy]shell, and the standard output of this command is read back in, translating newlines into spaces. The result is a single string. .IP \(bu 2n The argument to \fIfilter\fP is a single string. This argument is broken into two strings at the first comma. These two strings are then each broken into sub\[hy]strings separated by spaces. The first set are the patterns, the second set are the filenames. Then, for each of the pattern sub\[hy]strings, if a filename sub\[hy]string matches it, that filename is included in the output. Once all of the output has been found, it is re\[hy]assembled into a single space\[hy]separated string. .IP \(bu 2n The argument to \fIpatsubst\fP is a single string. This argument is broken into three strings at the first and second commas. The third string is then broken into sub\[hy]strings separated by spaces, these are the filenames. Then, for each of the filenames which match the first string it is substituted according to the second string. If a filename does not match, it is passed through unchanged. Once all of the output has been generated, it is re\[hy]assembled into a single space\[hy]separated string. .LP Notice how many times those strings are disassembled and re\[hy]assembled. Notice how many ways that happens. .I "This is slow." The example here names just two files but consider how inefficient this would be for 1000 files. Doing it \fIfour\fP times becomes decidedly inefficient. .LP If you are using a dumb \fImake\fP that has no substitutions and no built\[hy]in functions, this cannot bite you. But a modern \fImake\fP has lots of built\[hy]in functions and can even invoke shell commands on\[hy]the\[hy]fly. The semantics of \fImake\fP's text manipulation is such that string manipulation in \fImake\fP is very CPU intensive, compared to performing the same string manipulations in C or AWK. .nh 2 "Immediate Evaluation" Modern .I make implementations have an immediate evaluation \[lq]\f(CW:=\fP\[rq] assignment operator. The above example can be re\[hy]written as .TS box,center; lw(2.5i)f(CW). .\" --------------------------| SRC := $(shell echo 'Ouch!' \e 1>&2 ; echo *.[cy]) OBJ := \e $(patsubst %.c,%.o,\e $(filter %.c,$(SRC))) \e $(patsubst %.y,%.o,\e $(filter %.y,$(SRC))) .sp 0.2 test: $(OBJ) $(CC) \-o $@ $(OBJ) .TE Note that \fIboth\fP assignments are immediate evaluation assignments. If the first were not, the shell command would always be executed twice. If the second were not, the expensive substitutions would be performed at least twice and possibly four times. .LP As a rule of thumb: always use immediate evaluation assignment unless you knowingly want deferred evaluation. .nh 2 "Include Files" Many \f(CWMakefile\fPs perform the same text processing (the filters above, for example) for every single \fImake\fP run, but the results of the processing rarely change. Wherever practical, it is more efficient to record the results of the text processing into a file, and have the \f(CWMakefile\fP include this file. .nh 2 "Dependencies" Don't be miserly with include files. They are relatively inexpensive to read, compared to \f[CW]$(shell)\fP, so more rather than less doesn't greatly affect efficiency. .LP As an example of this, it is first necessary to describe a useful feature of GNU Make: once a \f(CWMakefile\fP has been read in, if any of its included files were out\[hy]of\[hy]date (or do not yet exist), they are re\[hy]built, and then .I make starts again, which has the result that .I make is now working with up\[hy]to\[hy]date include files. This feature can be exploited to obtain automatic include file dependency tracking for C sources. The obvious way to implement it, however, has a subtle flaw. .TS box,center; lw(2.5i)f(CW). SRC := $(wildcard *.c) OBJ := $(SRC:.c=.o) .sp 0.2 test: $(OBJ) $(CC) \-o $@ $(OBJ) .sp 0.2 include dependencies .sp 0.2 dependencies: $(SRC) depend.sh $(CFLAGS) \e $(SRC) > $@ .TE The \f(CWdepend.sh\fP script prints lines of the form .QP \fIfile\fP\f(CW.o: \fP\fIfile\fP\f(CW.c\fP \fIinclude\fP\f(CW.h\fP ... .LP The most simple implementation of this is to use .I GCC, but you will need an equivalent awk script or C program if you have a different compiler: .TS box,center; lw(2.5i)f(CW). #!/bin/sh gcc \-MM \-MG "$@" .TE This implementation of tracking C include dependencies has several serious flaws, but the one most commonly discovered is that the \f(CWdependencies\fP file does not, itself, depend on the C include files. That is, it is not re\[hy]built if one of the include files changes. There is no edge in the DAG joining the \f(CWdependencies\fP vertex to any of the include file vertices. If an include file changes to include another file (a nested include), the dependencies will not be recalculated, and potentially the C file will not be recompiled, and thus the program will not be re\[hy]built correctly. .LP A classic build\[hy]too\[hy]little problem, caused by giving \fImake\fP inadequate information, and thus causing it to build an inadequate DAG and reach the wrong conclusion. .LP The traditional solution is to build too much: .TS box,center; lw(2.5i)f(CW). SRC := $(wildcard *.c) OBJ := $(SRC:.c=.o) .sp 0.2 test: $(OBJ) $(CC) \-o $@ $(OBJ) .sp 0.2 include dependencies .sp 0.2 \&.PHONY: dependencies .sp 0.2 dependencies: $(SRC) depend.sh $(CFLAGS) \e $(SRC) > $@ .TE Now, even if the project is completely up\[hy]do\[hy]date, the dependencies will be re\[hy]built. For a large project, this is very wasteful, and can be a major contributor to \fImake\fP taking \[lq]forever\[rq] to work out that nothing needs to be done. .LP There is a second problem, and that is that if any \fIone\fP of the C files changes, \fIall\fP of the C files will be re\[hy]scanned for include dependencies. This is as inefficient as having a \f(CWMakefile\fP which reads .TS box,center; lw(2.5i)f(CW). prog: $(SRC) $(CC) \-o $@ $(SRC) .TE What is needed, in exact analogy to the C case, is to have an intermediate form. This is usually given a \[lq]\f(CW.d\fP\[rq] suffix. By exploiting the fact that more than one file may be named in an include line, there is no need to \[lq]link\[rq] all of the \[lq]\f(CW.d\fP\[rq] files together: .TS box,center; lw(2.5i)f(CW). SRC := $(wildcard *.c) OBJ := $(SRC:.c=.o) .sp 0.2 test: $(OBJ) $(CC) \-o $@ $(OBJ) .sp 0.2 include $(OBJ:.o=.d) .sp 0.2 %.d: %.c depend.sh $(CFLAGS) \e $*.c > $@ .TE .LP This has one more thing to fix: just as the object (\f(CW.o\fP) files depend on the source files and the include files, so do the dependency (\f(CW.d\fP) files. .QP \fIfile\fP\f(CW.d \fIfile\fP\f(CW.o: \fP\fIfile\fP\f(CW.c\fP \fIinclude\fP\f(CW.h\fP .LP This means tinkering with the \f(CWdepend.sh\fP script again: .TS box,center; lw(3i)f(CW). #!/bin/sh gcc \-MM \-MG "$@" | sed \-e 's@^\e(.*\e)\e.o:@\e1.d \e1.o:@' .TE .LP This method of determining include file dependencies results in the \f(CWMakefile\fP including more files than the original method, but opening files is less expensive than rebuilding all of the dependencies every time. Typically a developer will edit one or two files before re\[hy]building; this method will rebuild the \fIexact\fP dependency file affected (or more than one, if you edited an include file). On balance, this will use less CPU, and less time. .LP In the case of a build where nothing needs to be done, .I make will actually do nothing, and will work this out very quickly. .LP However, the above technique assumes your project fits enitrely within the one directory. For large projects, this usually isn't the case. This means tinkering with the \f(CWdepend.sh\fP script again: .TS box,center; lw(2.5i)f(CW). #!/bin/sh DIR="$1" shift 1 case "$DIR" in "" | ".") gcc \-MM \-MG "$@" | sed \-e 's@^\e(.*\e)\e.o:@\e1.d \e1.o:@' ;; *) gcc \-MM \-MG "$@" | sed \-e "s@^\e(.*\e)\e.o:@$DIR/\e1.d \e $DIR/\e1.o:@" ;; esac .TE And the rule needs to change, too, to pass the directory as the first argument, as the script expects. .TS box,center; lw(2.5i)f(CW). %.d: %.c depend.sh `dirname $*` \e $(CFLAGS) $*.c > $@ .TE Note that the \f[CW].d\fP files will be relative to the top level directory. Writing them so that they can be used from any level is possible, but beyond the scope of this paper. .nh 2 "Multiplier" All of the inefficiencies described in this section compound together. If you do 100 \f(CWMakefile\fP interpretations, once for each module, checking 1000 source files can take a very long time \- if the interpretation requires complex processing or performs unnecessary work, or both. A whole project \fImake\fP, on the other hand, only needs to interpret a single \f(CWMakefile\fP. .\" vim: set ts=8 sw=4 et :