http://invisible-island.net/personal/
Copyright © 2015-2022,2023 by Thomas E. Dickey

Lint Tools – checking C/C++ programs

As a software developer, I use many tools. Here are some comments about some of the analysis tools which I have encountered.


Background

When I began programming, there were no static (or dynamic) analyzers, no profilers, no debuggers. There was only a compiler (initially in two parts: a precompiler and a compiler). In lieu of a debugger, the IBM 1620 conveniently responded to use of an uninitialized variable by going into an infinite loop. There were no compiler warnings (only errors). This was in fortran, of course.

A little later, as a graduate student, I used the watfiv (fortran) compiler, which had better diagnostics (many more error messages, still no warnings). There were other languages than fortran; I encountered a computer science student who said he was programming in SAIL, and that because of this, his program was automatically well-structured and well-formatted.

Perhaps. I encountered people in 1990 who said the same about Ada.

But my initial exposure to compilers was that they produced error messages, rather than warnings. The first case where I recall discussing compiler warnings was a few years later regarding Univac's “ASCII FORTRAN” (Fortran 77). I showed Henry Bowlden a program where I complained that the compiler had not treated a goto into an if-then-else construct as an error—or at least warned about it. By then, compilers had evolved to

Later languages and tools introduced type-checking. Besides making the resulting programs more reliable, it made the languages easier to learn. Here are a couple of examples from the early 1980s:

Things went along in this way for some time, with newer compilers providing better (more specific) diagnostics.

In contrast to compiler diagnostics (static analysis), the means for testing programs lagged:

Static Tools

Instrumenting programs to make them testable or to measure their performance takes time and requires running the programs in controlled conditions. But compilers give warnings for free, same result every time.

There is no sure-fire technique for improving programs by static analysis. Rather, there are a variety of techniques which can be learned, based on various tools. All of these evolve over time.

lint

I started using lint in 1986, continuing into the late 1990s. Initially it “just worked” and became the first thing I would do when I found that new changes to a program broke it. lint would usually find the problem.

By itself, lint could only go so far. It came with lint libraries telling the program about the interface of the C runtime (before C was standardized). But if I used a C library not known to lint, it was less useful.

I learned to make lint libraries. Initially (in 1991 or 1992) I did these by hand, converting header files (with comments for the parameter names) into the quasi-prototype form used by lint. The header files for X libraries were large, taking a lot of time.

I noticed cproto early in 1993, and sent Chin Huang my changes to allow it to generate lint library sources. Having those, I could compile the sources and get usable lint libraries. This set of changes appeared in June 1993. I made further changes (including making the enhanced error reporting which works with the different types of yacc), and worked with Chin Huang off and on for the next few years.

I used this feature in ncurses (June 1996) to generate lint library sources for the four component libraries (ncurses, form, menu and panel). However (unlike SunOS), Solaris support for lint was poor. Sun delivered stubs for the lint libraries and its tools were not capable of producing usable lint libraries. I used other platforms for lint, as long as those were available. Judging by my email, by around 2000 lint was gone.

Even after the tool itself was no longer useful, I kept the lint library text in ncurses because it is useful as a documentation aid. Updating the files was not completely automatic, e.g., changing attr_t to int to match the prototypes for the legacy attribute manipulation, and of course adding the copyright notice. Finally, in 2015 I wrote a script make-ncurses-llibs to generate the sources without requiring manual editing, and used that in preparing the ncurses6 release.

gcc

As an alternative to lint, gcc has some advantages and some disadvantages. The main disadvantage is that it has no way to detect that a function is not used in any of the various modules that comprise a program.

While I preferred lint, I found the compiler warnings from gcc useful. For instance, in a posting to comp.os.linux.misc in September 1995, I remarked

From dickey Wed Sep  6 07:50:15 1995
Subject: Re: Lint for Linux?
Newsgroups: comp.os.linux.misc
References: <42ia4f$9k2@sifon.cc.mcgill.ca> <42iir4$81a@solaria.cc.gatech.edu>
Organization: Clark Internet Services, Inc., Ellicott City, MD USA
Distribution:
Lines: 28
X-Newsreader: TIN [UNIX 1.3 950824BETA PL0]

Byron A Jeff (byron@cc.gatech.edu) wrote:
: In article <42ia4f$9k2@sifon.cc.mcgill.ca>,
: Marc BRANCHAUD <marcnarc@cs.mcgill.ca> wrote:
: >
: >Hi, all...
: >
: >I'm trying to find a copy of lint to use with Linux.  I've scoured
: >sunsite and a few other places, and I can't find one at all.  Does it
: >even exist?
:
: I'll give the standard party line:
:
: gcc -Wall
:
: will give you about the same level of warning messages as lint.
Actually, no. There's a number of checks that lint does that gcc doesn't
(read the gcc internal to-do list stuff).

When I don't have lint, I use

gcc -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wconversion

(though -Wconversion has been butchered in the last release, so it's
really only useful with gcc 2.5.8)

--
Thomas E. Dickey
dickey@clark.net

I used those options in a wrapper script, gcc-normal. I also used an improved set of options in a wrapper script, gcc-strict, from a posting by Fergus Henderson. I created both of these scripts in April 1995. The former has grown a little since then, to allow it to ignore warnings from OSX and Solaris header files. Originally it was just this:

#!/bin/sh
# these are my normal development-options
OPTS="-Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wconversion"
gcc $OPTS "$@"

With a configure script, those scripts can override the default compiler, e.g.,

CC=gcc-normal ./configure
./configure CC=gcc-normal

There were other compilers, of course. The best warnings came from DEC's compiler for OSF/1 (later Tru64). On other platforms, lint was where checking occurred, so the compilers were neglected.

However, gcc was available for most of the platforms on which I was developing.

A wrapper script worked well enough, except when defining quoted values on the command-line. To work around that, I incorporated a --with-warnings or --enable-warnings option into my configure scripts. Until August 1997, I maintained those within the separate programs' aclocal.m4 files, before combining them into an archive using acsplit and acmerge. Initially I just added checks for the available warnings, and later (in 2002) added macros to the archive for the options themselves.

Besides enabling warnings, my checks also tested for gcc support for features which were based on lint. You may have seen this chunk in curses.h:

/*
 * GCC (and some other compilers) define '__attribute__'; we're using this
 * macro to alert the compiler to flag inconsistencies in printf/scanf-like
 * function calls.  Just in case '__attribute__' isn't defined, make a dummy.
 * Old versions of G++ do not accept it anyway, at least not consistently with
 * GCC.
 */
#if !(defined(__GNUC__) || defined(__GNUG__) || defined(__attribute__))
#define __attribute__(p) /* nothing */
#endif
 
/*
 * We cannot define these in ncurses_cfg.h, since they require parameters to be
 * passed (that is non-portable).  If you happen to be using gcc with warnings
 * enabled, define
 *      GCC_PRINTF
 *      GCC_SCANF
 * to improve checking of calls to printw(), etc.
 */
#ifndef GCC_PRINTFLIKE
#if defined(GCC_PRINTF) && !defined(printf)
#define GCC_PRINTFLIKE(fmt,var) __attribute__((format(printf,fmt,var)))
#else
#define GCC_PRINTFLIKE(fmt,var) /*nothing*/
#endif
#endif
 
#ifndef GCC_SCANFLIKE
#if defined(GCC_SCANF) && !defined(scanf)
#define GCC_SCANFLIKE(fmt,var)  __attribute__((format(scanf,fmt,var)))
#else
#define GCC_SCANFLIKE(fmt,var)  /*nothing*/
#endif
#endif
 
#ifndef GCC_NORETURN
#define GCC_NORETURN /* nothing */
#endif
 
#ifndef GCC_UNUSED
#define GCC_UNUSED /* nothing */
#endif
 

I added that to ncurses in July 1996 to get checking comparable to these lint features:

/* PRINTFLIKEn */
        makes lint check the first (n-1) arguments as usual.  The n-th
        argument is interpreted as a printf(3) format string that is used
        to check the remaining arguments.

/* SCANFLIKEn */
        makes lint check the first (n-1) arguments as usual.  The n-th
        argument is interpreted as a scanf(3) format string that is used
        to check the remaining arguments.

/* NOTREACHED */
        At appropriate points, inhibit complaints about unreachable code.
        (This comment is typically placed just after calls to functions
        like exit(3)).

/* ARGSUSEDn */
        Makes lint check only the first n arguments for usage; a missing
        n is taken to be 0 (this option acts like the -v option for the
        next function).

lint uses comments, gcc provides attributes but some compilers may provide __attribute__ as a macro, some as a built-in symbol. When I started writing these configure scripts:

Taking that into account, I made macros which would not rely on the macro/symbol difference, and wrote configure checks to ensure that the corresponding attributes were supported. Looking at the source-code for 2.7.2.3, released in July 1997, it seems that gcc by then supported each attribute that I used. But the configure checks are still used.

Still later (2004 and 2012), the Intel and clang compilers provided useful (and different) warnings. I added checks for these under the existing warning options. Although both try to imitate gcc, using the compiler-specific options gives better results, since neither imitates all of the options which I use.

Conversely, gcc's developers (apparently starting in version 8) have apparently decided to imitate clang. Writing in March 2019, that effort appears to have been largely unsuccessful, producing as many misleading messages as useful ones. For example, all of this came from one line of code in a build of ncurses using gcc 9.0.1:

In file included from ../ncurses/./tinfo/write_entry.c:39:
../ncurses/./tinfo/write_entry.c: In function ‘_nc_write_entry’:
../ncurses/curses.priv.h:865:18: warning: ‘%s’ directive writing up to 4095 bytes into a region of size 4094 [-Wformat-overflow=]
  865 | #define LEAF_FMT "%c"
      |                  ^~~~
../ncurses/./tinfo/write_entry.c:469:7: note: in expansion of macro ‘LEAF_FMT’
  469 |       LEAF_FMT "/%s", ptr[0], ptr);
      |       ^~~~~~~~
../ncurses/./tinfo/write_entry.c:469:18: note: format string is defined here
  469 |       LEAF_FMT "/%s", ptr[0], ptr);
      |                  ^~
In file included from ../ncurses/curses.priv.h:259,
                 from ../ncurses/./tinfo/write_entry.c:39:
../include/nc_string.h:81:46: note: ‘sprintf’ output between 3 and 4098 bytes into a destination of size 4096
   81 | #define _nc_SPRINTF             NCURSES_VOID sprintf
../ncurses/./tinfo/write_entry.c:468:2: note: in expansion of macro ‘_nc_SPRINTF’
  468 |  _nc_SPRINTF(linkname, _nc_SLIMIT(sizeof(linkname))
      |  ^~~~~~~~~~~

That is, the message points to a character format as the problem, rather than the (possibly) too-long string format which is appended to the character. At the same time, gcc is not doing enough analysis to determine that the appropriate checks were already made (a failing of clang as well).

coverity

I investigated lclint (later renamed to splint) in 1995, but found it too cumbersome and fragile to use because it could not handle the regular header files.

But Coverity is a useful product. It follows a program through several plausible steps using the conditional checks to infer critical values of variables, to look for inconsistencies. Most of the reports are valid, i.e., about 90%. It will occasionally find serious problems with a program. Like lint, the only way to find those is to fix the minor issues along the way.

My email first mentions it used by FreeBSD in August 2004. Largely due to promotional efforts by the Coverity people, I grew interested enough to ask to have my project scanned in April 2006. A year later, dialog and ncurses were accepted for scanning, and I made several fixes using its reports.

The initial workflow for this tool was based on releases, which does not mesh well with my development process. Coverity was more or less forgotten for a few years until they streamlined the submission procedure, and I got involved again in 2012. Along with streamlining things, they made it simpler to submit projects for scanning, and I added several of the programs which I maintain.

clang

I have been using clang for both its compiler warnings and the static analysis feature since May 2010.

Clang provides useful checks also, but has some shortcomings relative to Coverity:

On the other hand, it runs on my machines, and can be run many times without running into resource limits.

The FreeBSD developers have replaced gcc with it (except for ports). Clang's developers have made it quasi-compatible with gcc, i.e., it is not only able to handle the non-standard headers but also most of the command-line options. That introduces problems with configure scripts because clang does not return error status for unrecognized options, even in many cases where the option directs gcc to report an error.

However, not all is well with Clang:

For example, with MacOS 10.13.2 (late 2017), I got these results from test-builds of xterm:

The difference between 2557 and 2673 is due to a type-error in XQuartz's header file

/usr/X11/include/X11/Xpoll.h

which has been present for many years.

The bigger number is a known, unresolved defect in clang:

cppcheck

The cppcheck project started early in 2009; I first noticed reports for it late in 2009. As I noted on the xorg mailing list (after reading the cppcheck source-code):

On Sat, 3 Oct 2009, Martin Ettl wrote:

> Hello friends,
>
> further analysation with the static code analysis tool cppcheck brought up another issue. The tool printed the following warning:
>
> .../xfree86/common/xf86AutoConfig.c,337,possible error,Dangerous usage of strncat. Tip: the 3rd parameter means maximum number of characters to append
>
> Take a look into the code at line 337:
> .....
> char path_name[256];
> .....
> 334        if (strncmp(&(direntry->d_name[len-4]), ".ids", 4) == 0) {
>            /* We need the full path name to open the file */
>            strncpy(path_name, PCI_TXT_IDS_PATH, 256);
> 337         strncat(path_name, "/", 1);
>            strncat(path_name, direntry->d_name, (256 - strlen(path_name) - 1));
> .....
>
> I is possible (suppose to be the string PCI_TXT_IDS_PATH) is 256
> characters long) that the array path_name is allready filled. Then (lin
> 337) an additional character is appended --> array index might be go out
> of range.

It's possible, but cppcheck isn't that smart.
It's "only" warning that its author disapproves of strncat.
cppcheck only notes its presence in the code, makes no analysis of the
parameters.  It's a _checking_ tool, by the way, not to be confused with
static analysis.

(dynamic allocation as an alternative is not necessarily an improvement)

According to the project metrics, it has grown by a factor of six since I first saw it in 2009. The checks have been improved (reducing the false positives) and checks added. But essentially it is still a style checker. A static analyzer is expected to go beyond that, doing analysis of the data flows—more than the diagnostics emitted by an optimizing compiler pointing out unused or uninitialized variables do.

I pointed this out in 2015:

cppcheck is essentially only a style-checker (and like other tools which incorporate the developer's notion of "good style", its usefulness depends on various factors).

There are suitable tools for detecting memory leaks (such as valgrind); cppcheck is not one of those. Of course, you will find differing opinions on which are the best tools, and even on what a tool is suitable for, e.g., a blog entry *Valgrind is NOT a leak checker *

Daniel Marjamäki, a cppcheck developer, did not disagree with my comment in his response. At the time that I wrote those comments, I was at that moment mulling over a Coverity report regarding an incorrect sign-extension check in xterm ResizeScreen for patch #319.

On the other hand, I have found the tool useful. David Binderman reported some warnings from cppcheck 1.73 which I used to improve the style of xterm, in patch #325. Most of the changes were to reduce the scope of variables.

Dynamic Tools

Static analysis is only part of the process. When I was first introduced to computer programming, I was told:

Any interesting program has at least

The situation has not improved; interesting programs can be run in many more ways than a static analyzer can see. Dynamic analysis is used to explore a few ways the program might be run, to look for performance and reliability problems.

doalloc

I started looking for memory leaks in November 1992 by modifying a function doalloc in my directory editor's library. I had written this in 1986 as a wrapper for malloc and realloc (since pre-ANSI C could not be relied upon for reallocating a null pointer). Because I made all calls to malloc or realloc through this function, I could easily extend it to check for memory leaks.

This was back in the days when tools for leak-checking were not common. So I wrote my own,

To do this effectively, the program which is being analyzed must also be modified to free memory which was allocated permanently, i.e., not normally freed while the program is running. This process exposes inconsistencies in the program's use of memory and usually will make it fail. Later (by 1996), I used ElectricFence to trigger these failures in the debugger, where I could get the file- and line-information from a traceback.

Later I encountered other programs using a similar approach: keep track of memory allocated, and report just before exiting. I improved some of those. For instance:

dbmalloc/dmalloc

One drawback to the approach I used with doalloc was that it relied upon debug-code which I compiled into the directory editor. It would be nice to have an add-on library that I could just re-link the program to get memory-leak checking (which is fast), rather than re-compile (which is slow).

People posted useful programs to Usenet; I collected useful programs for my projects. I went in search of a memory-leak checker, at the same time that I was extending doalloc.

The first one that I found was dbmalloc in November 1992, written by Conor Cahill, originally posted to comp.sources.misc in volume 32 (September 1992). This was patch-level 14; Cahill had posted a much smaller version in comp.sources.unix volume 22 (July 1990).

While dbmalloc worked, it had a few drawbacks:

That aspect of patching the system library was a drawback: it did not always work, and became less viable as shared libraries became more prevalent. Also, it was awkward not being able to turn the leak-checking off in a given executable. I looked for alternatives, and found dmalloc 3.1.0, published in July 1995.

There were later releases (see its homepage), but for this discussion the initial version from 1995-1996 is appropriate.

According to its NEWS file, it was first published 1993/04/06 on comp.sources.unix (but it is not present in the archives). Its ChangeLog states that development began in March 1992. The earliest reliable mention of dmalloc that I have found is a comment in Mail Archives: djgpp/1995/04/05/16:10:59 by Marty Leisner that he had ported it to DJGPP the previous year. dmalloc's changelog comments on that for July 1994. That was probably version 2.1.0, from 1994/5/11.

Like dbmalloc, one must include its header and link with its static library to use it. On the other hand:

The two libraries used a different approach toward intercepting function calls to allow diagnosing them. dbmalloc

In contrast, dmalloc

At the same time, I preferred the reports from dbmalloc. It also seemed to provide better coverage.

I also found dbmalloc to be more robust. Indeed, on revisiting the tools to write this summary, I find that it was easy to get dbmalloc to build and work, but that is not the case for dmalloc. For both, it was necessary to delete their conflicting prototypes for malloc, etc. But dmalloc's configure script

Here is a simple demo to show how a program would be instrumented for both tools:

#include <stdlib.h>
#include <stdio.h>
 
#ifdef DBMALLOC
#include <dbmalloc.h>
#endif
 
#ifdef DMALLOC
#include <dmalloc.h>
#endif
 
int main(void)
{
        char *oops = malloc(100);
#ifdef  DBMALLOC
        malloc_dump(fileno(fopen("dbmalloc.log", "w")));
#endif
        exit (oops != 0);
}

Turning on the DBMALLOC definition for the first tool, here is the resulting logfile:

************************** Dump of Malloc Chain ****************************
POINTER     FILE  WHERE         LINE      ALLOC        DATA     HEX DUMP
TO DATA      ALLOCATED         NUMBER     FUNCT       LENGTH  OF BYTES 1-7
-------- -------------------- ------- -------------- ------- --------------
0200C088 demo.c                    14 malloc(1)          100 01010101010101
0200C178 unknown                      malloc(2)          568 8034ADFB010101

Here is a report generated with dmalloc 3.1.0 (after making fixes for the problems noted):

1: Dmalloc version '3.1.0'.  UN-LICENSED copy.
1: dmalloc_logfile 'dmalloc.log': flags = 0x4f41d83, addr = 0
1: starting time = 1471795976
1: free count/bits:  31/7
1: basic-block 4096 bytes, alignment 8 bytes, heap grows up
1: heap: start 0x6d8000, end 0x6db000, size 12288 bytes, checked 1
1: alloc calls: malloc 1, realloc 0, calloc 0, free 0
1:  total memory allocated: 112 bytes (1 pnts)
1:  max in use at one time: 112 bytes (1 pnts)
1: max alloced with 1 call: 112 bytes
1: max alloc rounding loss: 16 bytes (12%)
1: max memory space wasted: 3952 bytes (96%)
1: final user memory space: basic 0, divided 1, 4080 bytes
1:    final admin overhead: basic 1, divided 1, 8208 bytes (66%)
1: not freed: '0x6da008|s1' (100 bytes) from 'demo.c:14'
1:   known memory not freed: 1 pointer, 100 bytes
1: ending time = 1471795976, elapsed since start = 0:0:0

Because neither tool did everything, I used both, depending on what I was trying to do.

I first mentioned dmalloc in email in July 1996, and incorporated it and dbmalloc as options in the configure script for ncurses later that year (1996/12/21). Up until then, I would simply edit the makefile for programs because I preferred to not make debugging features part of the build-scripts. I did this for ncurses first, because it uses several makefiles.

Later, I made similar changes to other configure scripts:

Finally, in 2006 I combined these options in a configure macro, and gradually added that to each program that I maintain.

atac

Checking for memory leaks is not the only useful thing to do at runtime. Developers would like to know how effective their test cases are, by measuring test-coverage. In 1995, some people at Bell Labs made available a version of ATAC. I made some changes to allow it to work with gcc 2.7.0 (see the iBiblio archive), and was interested in this for a few years.

Here is my original announcement:

From dickey Sun Dec 31 13:10:39 1995
Subject: atac (test-coverage)
Newsgroups: comp.os.linux.announce
Organization: Clark Internet Services, Inc., Ellicott City, MD USA
Summary: atac3.3.13 ported to Linux
Keywords: test coverage
Lines: 16
X-Newsreader: TIN [UNIX 1.3 950824BETA PL0]
Status: RO

ATAC was written by some folks at BellCore, who're working on a newer
version (licensed).  I modified the version that they made available for
free so that it'll run on Linux with gcc 2.7.0 (mostly by accommodating
the non-standard features of gcc), and uploaded it to sunsite.unc.edu
(now in Incoming).  I've found it very useful for setting up regression
tests for my programs.

ATAC measures how thoroughly a program is tested by a set of tests using
data flow coverage techniques, identifies areas that are not well
tested, identifies overlap among tests, and finds minimal covering test
sets.  Atac displays C source code, highlighting code fragments not
covered by test executions.

--
Thomas E. Dickey
dickey@clark.net

I used this for improving the test-cases for cproto and diffstat. I also built ncurses with it, in September 1996. Unlike cproto and diffstat, there is no set of test cases for ncurses which can be run in batch mode. It was interesting to explore the coverage of the ncurses test-programs, but I quickly found that doing this properly would take a little time:

From dickey Wed Sep 25 20:33:03 1996
Subject: test-coverage of ncurses...
To: ncurses-list@netcom.com (Ncurses Mailing List)
Date: Wed, 25 Sep 1996 20:33:03 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL24alpha3]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Content-Length: 939
Status: RO

Just for grins (I had it on my list for this month, to gauge feasibility), I
build ncurses with atac (a test coverage tool that I did some work on last
year, so I could use it for analysis of some of my programs).  If/when I have
time, I've got on my list to design a test for lib_doupdate.c that'll exercise
more than the 5-10% of the paths/conditions that're being tested at present.

Basically atac generates a listing showing the places (and conditions) in the
code that aren't exercised (i.e., it highlights them).

The listings tend to be long, since most paths aren't exercised, and I didn't
spend a lot of time pruning the data down (the shortest I generated is ~11000
lines).  I use a script that converts vile into a pager (I'll post it if anyone
needs it).

I've put the listings I generated in

        ftp.clark.net:/pub/dickey/ncurses/atac-output.zip

(I'll leave it there a couple of days).

--
Thomas E. Dickey
dickey@clark.net

But gcc 2.7.2 introduced a problem, as I commented to Jürgen Pfeifer early in 1997:

From dickey Mon Jan 27 20:45:38 1997
Subject: simpler solutions
To: Juergen.Pfeifer@t-online.de (Juergen Pfeifer)
Date: Mon, 27 Jan 1997 20:45:38 -0500 (EST)
X-Mailer: ELM [version 2.4 PL24alpha3]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Status: RO
Content-Length: 1615
Lines: 31

I looked closely at the atac vs ({ ...  }) conflict, and realized that this is
supposed to be a gcc extension.  The one file that's giving me trouble isn't
ifdef'd to turn it off when gcc extensions are to be suppressed.  (And the gcc
#define __STRICT_ANSI__ isn't turned on when I set -ansi).

However, understanding the problem, I can work around it w/o modifying atac.
The ({ ...  }) delimits a compound statement, which is a block that can appear
(almost) anywhere an expression can.  That'd be rather difficult to implement
in atac, since it completely changes the notion of flow control.  So I'm
working around it by modifying <sys/time.h> to be ANSI-compliant (there's code
in the X distribution that works just fine).  I'm willing to fix bugs in atac,
but not to maintain a whole host of non-standard stuff.

The “compound statement” is a GCC extension statement expression. GCC's statement expressions are used in glibc's header files. Most compilers accept some extensions, but this particular one changes the syntax rather than just adding a keyword. An ANSI C compiler cannot handle this extension.

ATAC works by using the C preprocessor to add line- and column-markers to every token, and starting with that, adds its own functions for counting the number of times each part of the program is executed.

That sounds simple, but there are a few problems:

All of that is technical, and could be fixed. However, its incorporation of GNU C preprocessor makes the resulting license too restrictive to bother with.

purify

In writing this page and looking for the state of the art in 1992 when I modified doalloc, I found a contemporary paper citing a 1992 paper with this bibliographic entry:

R. Hastings and B. Joyce. Fast detection of memory leaks and access errors. In Proceedings of the Winter ’92 USENIX conference, pages 125–136. USENIX Association, 1992.

The actual paper has "Purify" in the title.

We did not have Purify at the Software Productivity Consortium (SPC). It came on the market a little too late to be of interest to the management.

I encountered it after I left SPC, and joined a large development project. The management there was interested, and seeing that I was involved in making several improvements (such as converting to ANSI C), I was asked to evaluate two tools for possible use in 1995:

Insure++ was interesting because it had a nice user interface depicting the problems found and the corresponding stack traces. However, it was very slow on our Solaris servers. Purify performed reasonably well, and Quantify seemed as if it could be useful.

Even Purify was not fast. It relies upon making a modified executable (see patents by Reed Hastings), and for a newly compiled executable that takes time. Even after “purifying” the executable, it uses more memory (because it makes a map of the data to keep track of what was initialized, etc.). Our servers did not have much memory by today's standards. Also, some resource settings for Motif applications were ignored, making the windows lose their background pixmaps.

On the other hand, Purify required less setup than dbmalloc/dmalloc to use (essentially just linking with the purify tool). No special header file was needed. It has *functions* which can be called, but for casual use, those (like most of the functions in dbmalloc/dmalloc) are unnecessary.

Quantify was not that useful. Our applications spent most of their time in a tiny area of its graphical display and the GUI had no way to select an area for detailed analysis.

prof/gprof

Quantify was not the only tool for profiling.

I was familiar with gprof by the time I started working with Eric Raymond and Zeyd Ben-Halim in 1995. After the initial discussion of incorrect handling of background color and (inevitably) compiler warnings, I sent this:

From dickey Sun Apr 23 06:57:14 1995
Subject: profiling comparison (brief)
To: esr@snark.thyrsus.com, esr@locke.ccil.org,
        zmbenhal@netcom.com (Zeyd M. Ben-Halim)
Date: Sun, 23 Apr 1995 06:57:14 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL24alpha3]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Content-Length: 1970
Status: RO

Here's a first cut of the differences between profiling my test case with 1.8.7
and 1.9.  (I added/subtracted in my calculator program to account for the major
differences between the numbers from gprof).

As you'll see, the biggest (but not all) differences were from
        1.36    wnoutrefresh
         .84    wrenderchar
         .66    write
         .38    relative_move
         .32    wchangesync
         .33    wladdch

My impression of the other numbers is that you're making fewer I/O calls, but
possibly emitting more characters.  Next pass, I'll make the mods to allow
redirection of the screen, so I can get a character count to verify this hunch.

btw, in one of zeyd's most recent patches, the definition of TERMINFO
got moved, so that the program doesn't make w/o patching the makefile.

-------------------------------------------------------------------------------
+5.56   5.56    #mcount (profiling 1.8.7)
-8.40   -2.84   #mcount (1.9)
+32.53  29.69   #ncurses (total 1.9)
-25.72  3.97    #ncurses (total 1.8.7) -- actual difference
+11.94  15.91   #write (1.8.7)
-12.61  3.30    #write (1.9)
+1.54   4.84    #IDcTransformLine (1.8.7)
-1.62   3.22    #IDcTransformLine (1.9)
+1.20   4.42    #wladdch (1.8.7)
-1.53   2.89    #wladdch (1.9)
+1.01   3.90    #_IO_vfprintf (1.8.7)
-.80    3.10    #_IO_vfprintf (1.9)
+.52    3.62    #_IO_file_overflow (1.8.7)
-.39    3.23    #_IO_file_overflow (1.9)
+.41    3.64    #strcat (1.8.7)
-.61    3.03    #strcat (1.9)
+.36    3.39    #tparm (1.8.7)
-.41    2.98    #tparm (1.9)
+.35    3.33    #_IO_default_xsputn
-.26    3.07    #_IO_default_xsputn (1.9)
+.34    3.41    #strcpy
-.39    3.02    #strcpy (1.9)
+.30    3.32    #wnoutrefresh
-1.66   1.66    #wnoutrefresh (1.9)
-.84    .82     #wrenderchar (1.9)
-.38    .44     #relative_move (1.9)
-.32    .12     #wchangesync (1.9)
+.22    .34     #baudrate (1.8.7!)
+.18    .52     #waddnstr
-.25    .27     #waddnstr (1.9)
-.00    .27     #baudrate (1.9)

--
Thomas E. Dickey
dickey@clark.net

I began the autoconf-generated configure script shortly after (during May 1995). After some discussion regarding the libraries that should be built, I extended the script to generate makefile rules for profiling. Combining all of the flavors in one build was not my idea (since it makes things unnecessarily complex). But I agreed to make the changes:

### ncurses 1.9.2c -> 1.9.2d

* revised 'configure' script to produce libraries for normal, debug,
  profile and shared object models.

### ncurses 1.9.1 -> 1.9.2

* use 'autoconf' to implement 'configure' script.
* panels support added
* tic now checks for excessively long termcap entries when doing translation
* first cut at eliminating namespace pollution.

The profiling libraries for ncurses are not used often (if something is used, I get bug reports). Debian's package description says they are present in the debugging package for ncurses. But they are not there. The changelog says they have been gone a while:

ncurses (5.2.20020112a-1) unstable; urgency=low

  * New upstream patchlevel.
    - Correct curs_set manual page (Closes: #121548).
    - Correct kbs for Mach terminal types (Closes: #109765).
  * Include a patch to improve clearing colored lines (Closes: #112561).
  * Build even shared library with debugging info; we strip it out anyway,
    but this makes the build directory more useful.
  * Build in separate object directories.
  * Build wide character support in new packages.
  * Change the -dbg packages to include debugging shared libraries in
    /usr/lib/debug; lose the profiling and static debugging libraries;
    ship unstripped libraries in -dev.
  * Don't generate debian/control or debian/shlibs.dummy.
  * Use debhelper in v3 mode.

 -- Daniel Jacobowitz <dan@debian.org>  Wed, 16 Jan 2002 22:20:00 -0500

For a while before that change, profiling had fallen into disuse with Linux because no one was in a hurry to get gprof working after the transition to ELF in the late 1990s. For example, here is part of a message I sent to vile's development list:

From dickey Mon Jul  5 21:52:37 1999
Subject: vile-8.3k.patch.gz
To: vile-code@foxharp.boston.ma.us (Paul Fox's list)
Date: Mon, 5 Jul 1999 21:52:37 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Status: RO
Content-Length: 1950
Lines: 36

vile 8.3k - patch 1999/7/5 - T.Dickey <dickey@clark.net>

Miscellenous fixes.  The biggest change is that I've got about 2/3 of the
fixes I had in mind for long lines.  This makes the ruler info (and related
cursor positioning) cached so it runs much faster.  That's for nonwrapped
lines (I got bogged down in the logic for wrapped lines - it works, but
the caching isn't doing anything useful there).  For both (wrapped/nonwrapped)
there are other performance improvements.

I used gcov to find the code I wanted to change.  Has anyone got a gprof that
works with Linux ELF format?

and in 2001, I had this to say:

Date: Thu, 5 Jul 2001 14:44:48 -0400 (EDT)
From: "Thomas E. Dickey" <dickey@herndon4.his.com>
To: <os2-unix@eyup.org>
Subject: Re: [UnixOS2] ncurses.build
In-Reply-To: <Pine.GSO.4.21.0107051917500.15084-100000@cdc-ultra1.cdc.informatik.tu-darmstadt.de>
Message-ID: <Pine.BSI.4.33.0107051443020.12415-100000@herndon4.his.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-os2-unix@eyup.org
Precedence: bulk
Reply-To: os2-unix@eyup.org
Status: RO
Content-Length: 639
Lines: 22

On Thu, 5 Jul 2001, Stefan Neis wrote:

> On Thu, 5 Jul 2001, Thomas E. Dickey wrote:
>
> > >             --with-debug \
> > >             --with-profile \
> >
> > I don't know if profiling works on OS/2.
>
> Works nicely since IIRC EMX-0.9a, standard -pg switch to gcc, so I
> suppose it'll work for ncurses, too.

"works" can be relative - it's been broken on Linux since the switch to
ELF libraries some years ago (it "works" now only in the sense that the
compile and gprof run, but the numbers are worthless because there's no
timing information).

-- 
T.E.Dickey <dickey@herndon4.his.com>
http://dickey.his.com
ftp://dickey.his.com

Still, after a while, gprof was useful:

I have used it frequently in improvements for mawk, starting in September 2009 with a discussion with Jim Mellander about the hashing algorithm. Rather than embed rules for profiling as done in ncurses, I added another wrapper script for configuring with the profiling option:

#!/bin/sh
CFLAGS='-pg' cfg-normal "$@"

Doing this for ncurses (to just get useful profiling libraries) is different, because I want to use the profiling version of the C runtime as well. Otherwise timing for strcpy, memset, etc., will be overlooked. Here is a script which I started in 1995, updated last in 2005 to accommodate a changed name for the C runtime profiling library:

#!/bin/sh
CFLAGS='-pg' \
LDFLAGS='-pg' \
LIBS="-lc_p" \
cfg-normal \
        --with-profile \
        --without-normal \
        --without-debug \
        "$@"

gcov

I began using gcov (GNU coverage analyzer) in 1999. It was not a replacement for ATAC because:

The last point (a different format of gprof) is the way I have used it, e.g., with vile, mawk and other programs. Using gprof to study more than one file is cumbersome.

Along the way, the options I needed for running gcov changed, from

#!/bin/sh
LDFLAGS="--coverage" \
CFLAGS='-fprofile-arcs -ftest-coverage' \
cfg-normal "$@"

to just

#!/bin/sh
CFLAGS='--coverage' cfg-normal $*

Some documentation states that the profile-arcs and test-coverage options are needed.

valgrind

I began using valgrind in 2002, on more than one program:

Urs Janßen summarized the choices in discussing a bug report

ccmalloc (<http://www.inf.ethz.ch/personal/biere/projects/ccmalloc/>)
and valgrind (<http://developer.kde.org/~sewardj/>) are very usefull
(on Linux), MMS (<http://hem1.passagen.se/blizzar/mss/> also look
very usefull last time I tested it (~2 years ago).

and

*shrug* as I said before, I haven't used checkergcc for ages.
insure looked like a good leak/bounds checker but it very expensive.
ccmalloc is a bit noisy but imho the best freeware leak checker.
valgrind has an acceptable noise level but misses some usefull
informations like exact line numbers. I can't remember how good
MMS was (but it must have been better than dmalloc, dbmalloc and
electric fence).

Until 2004, I used Slackware for my development machine, and routinely built things like valgrind as they became available. I had a comment:

On Mon, 30 Sep 2002, Martin Klaiber wrote:

> Urs Janßen  wrote:
>
> > *shrug* as I said before, I haven't used checkergcc for ages.
>
> Probably a bug in checkergcc. In the docs they say that libc and
> ncurses are supported. I'll contact the developers.

perhaps (but when I last looked at checkergcc a few months ago on Debian,
there was no support for the current glibc, so it was useless).

valgrind is interesting, but I had to hack it to make it run with
Slackware 7.1 (it makes too many assumptions about the header files)

The nice thing about valgrind was that it produced reports comparable to purify. And it was free.

Because effective use of valgrind requires the program to be built with the debugging option, I modified the configure scripts to combine the --with-no-leaks or --disable-leaks options as a new option --with-valgrind. I did that first for vile, then ncurses, late in 2006.

Like purify, valgrind gave warnings that could be regarded as false positives. And like purify, valgrind allows those to be suppressed by configuration files. Ruben Thomas contributed a few sample suppression files for ncurses in 2008. In practice, I do not use those, relying on the “no-leaks” configuration for my testing. Either way, others rarely use the suppression files and do not build no-leaks configurations of ncurses (see Testing for Memory Leaks in the ncurses FAQ).

Valgrind comes packaged with default suppression files. For instance, there are 152 items in default.supp on my Debian/testing system in 2016. The one for ncurses has 15.

When I check for leaks in xterm I have to ignore about 5000 lines of listing. I could generate a suppression file using valgrind (and experimented with this early in 2006, suppressing 285 items). But once ignored, even some useful cases can be overlooked.

Late in 2006, I reorganized my autoconf macros for memory-leak checks, so that doalloc, dbmalloc, dmalloc and valgrind were all handled by the same set of macros. Purify also was supported by these macros, but the main ones are --with-no-leaks and --with-valgrind. I kept the existing --disable-leaks option in configure scripts such as ncurses which had provided this for some time.

lcov

I started using lcov early in 2014:

LCOV is a graphical front-end for GCC's coverage testing tool gcov. It collects gcov data for multiple source files and creates HTML pages containing the source code annotated with coverage information. It also adds overview pages for easy navigation within the file structure. LCOV supports statement, function and branch coverage measurement.

I mentioned it as a way to organize regression tests while discussing with Tom Shields a large change that he proposed making to byacc (i.e., incorporating the btyacc code).

This may have been the article which I came across while looking for a better test-coverage tool:

Use gcov and lcov to know your test coverage

Since then, I have used it as well for cproto, diffstat, indent, mawk and xterm.

Process

ANSIfication

Early on, in 1995-1996, my goal was to convert all of the K&R code which I was maintaining to ANSI C, so that I could use function prototypes, const and the standard variable-argument syntax. I took care to avoid changing the program unnecessarily while making these changes, relying on changes to object files to alert me to unintended changes.

Working with Paul Fox and others on vile, I had some concern about being able to build the program on machines where ANSI C was not supplied by the vendor (HPUX and SunOS). I used unproto, on these platforms, and for a while, compatibility with unproto determined what features of ANSI C I would use. In particular, it could not handle string concatenation in the C preprocessor.

Not all of this went smoothly. I became the maintainer for vile rather suddenly after one of my changes made a variable const which should not have been.

Code Cleanup

Making a variable “const” asks the compiler and linker to make it read-only. Developers do this to ensure that a function parameter does not change within that function.

One unexpected issue with const is that it is not possible to retrofit an application such as lynx or ncurses without using casts. That is because some of the longstanding interfaces use non-const values, introducing an inconsistency. But casts hide type mismatches. While there is (and I use) a compiler warning to tell me about this, it is possible to introduce errors due to isolated cases where a “const” variable is modified.

ncurses...

I added a configure option --enable-const to ncurses which changes several of these inconsistent interfaces to use const, noting that this is a difference versus X/Open Curses. In NetBSD curses, the developers changed interfaces to use const (saying that they had gotten agreement from someone at The Open Group), but did not make it configurable. This is not reflected in X/Open Curses Issue 7 (in 2013). Finally with ncurses6 in 2015, I changed the default for this option, to use const.

Besides making C programs (a little) type-safe, using const also improves the time needed to load the program. With Linux, you can see the loader statistics by setting the environment variable LD_DEBUG=statistics. Besides compiler warnings some of my work with const has been aimed at improving the loader statistics.

Constant/read-only variables are not the only aspect to cleanup when doing a conversion to ANSI C. There are variable-length argument lists to consider. Using gcc you can declare that a given function is “printflike” or “scanlike” — making the compiler check calls to the function as if they were printf or scanf. Both vile and lynx had interesting quirks to work around.

vile...

When I first started working on vile, its internal messages were formatted with lsprintf (a function that was much like sprintf, but different):

Because of the nonstandard formatting features provided by lsprintf it was not possible to use the gcc compiler warnings to look for parameter mismatches. When I realized this, I started by making a patch for gcc which would implement the extra features. But I decided it was a bad idea, not because the features were non-standard (gcc assumes the GNU C library, which has a few non-standard features), but because it meant I would have to patch the compiler — repeatedly.

Instead, I modified lsprintf to use standard formatting codes and revised some calls to lsprintf to avoid those features which had no counterpart in standard C. Most of the changes for lsprintf dealt with the lower-level dofmt function. I did that rapidly in 1999:

        + modify dofmt() to make its repertoire closer to printf's: remove 'D'
          format code, and change 'p' to 'Q', 'u' to 'f'.

but the process of revising calls took some time, e.g., in 2004

        + modify dofmt() to handle "%*.*s" and "%.*s" formats, removed "%*S"
          format.
and 2005:
        + fixes to build clean(er) with Intel compiler:
          + adjust ifdef's for fallback prototype for open() in case it is
            really open64().
          + add a "%u" format type to dofmt(), use it to display file-sizes.
          + define DIRENT to struct dirent64 for the case where
            _FILE_OFFSET_BITS is 64 and _LP64 is not defined.  Use a configure
            script check to ensure the type exists.
          + add/use function bpadc() to replace "%P" and "%Q" formats in
            dofmt().
          + add/use function format_int() to replace "%r" format in dofmt().

Of course, GNU C is not the only runtime with printf or scanf extensions (vendor Unix systems have some longstanding quirks in this area which are not in standard C). But in practice the only extensions which I use are those which provide better diagnostics without interfering with portability.

lynx...

My work on lynx in 1998 included changes to reduce the possibility of buffer-overflows. At that point, lynx used its own functions for allocating and copying memory for strings, but used sprintf for formatting strings.

The latter was a problem because not all of the buffer-sizes were checked, as reported by Jeffrey Honig in May 1998. The suggested “Fix” was not a solution, because snprintf requires the same information for its correct use as the (presumably correct) checks already in place.

Rather than imitate snprintf with its emphasis on repeating the programmer's assumptions about fixed buffer sizes, a more appropriate solution should take into account that many of the formatting operations in lynx used string copying and concatenation to avoid using sprintf. If there were a portable function which could format into a dynamically-allocated string, that would solve the problem as well as making lynx easier to maintain.

There were (are are as of 2016) no suitable standard functions for this purpose. Standardization lags. Originally written by Chris Torek in the early 1990s, snprintf was incorporated in NetBSD, OpenBSD, FreeBSD, and adapted by Linux and Solaris developers. Finally it was standardized in POSIX (issue 5 was published in 1997). At the time, that meant that newer platforms would have the function, but older ones (such as SunOS) would not. See for example

As you can see by referring to the page by Martinec, if snprintf had been a good-enough solution for Lynx, and I had chosen to not write one specially for porting applications, I could have waited a year or two and gotten someone else to write it.

While not suitable for use by a portable program, the asprintf function in the GNU C library was interesting because it showed that a version of sprintf was possible with a dynamically allocated output buffer. Solaris developers were slower here (finally appearing in Solaris 11 twelve years later):

A number of new routines are included in the Oracle Solaris C library to improve familiarity with Linux and BSD operating systems and help reduce the time and cost associated with porting applications to Oracle Solaris 11 Express. Examples of new routines include asprintf(), vsprintf(), getline(), strdupa() and strndup().

It has not been standardized (as of 2016).

Again, asprintf did not do what I needed in lynx:

Taking all of that into account, I added two functions HTSprintf and HTSprintf0 later that year. The latter creates a new string, the former appends to a string. Lynx uses both:

The two are closely related: HTSprintf0 is a special case of HTSprintf, since it sets the destination to an empty buffer and calls the latter function. From the counts, you can see that lynx uses the more general form most of the time.

In my initial version, I made this “portable” (not relying on any non-standard function). If lynx is built on a machine which has vasprintf, it will use that function. One reason is that the NLS support (message files) may use an obscure feature for plurals:

    /*
     * If vasprintf() is not available, this works - but does not implement
     * the POSIX '$' formatting character which may be used in some of the
     * ".po" files.
     */

Rather than double the amount of work, I chose to use vasprintf, which is available with Linux and the BSDs. On other platforms, lynx uses the easily ported code:

Compiler warnings help show when the format cannot handle a given data type (and the resulting printout will be incorrect). Compiler warnings come into play in lynx with a couple of troublesome data types:

off_t

This data type is associated with file size, because the lseek function uses an offset with this type when positioning a file descriptor within a file. The original lynx developers decided that was the same as long and passed the filesize around — and printed it — with a format for long. Times changed, and machines got bigger: off_t can have more bits than long. C has no predefined format for printing off_t, but does for long. A cast is needed, as well as definitions for printing format.

One reason why it is so complicated is that even when off_t and long happen to be the same size, gcc has a preference (made known via compiler warnings) for the “right” type to use. The configure script can easily determine the size of these types, but getting the compiler to tell which are the preferred names for a given size is harder.

/*
 * Printing/scanning-formats for "off_t", as well as cast needed to fit.
 */
#if defined(HAVE_LONG_LONG) && defined(HAVE_INTTYPES_H) && defined(SIZEOF_OFF_T)
#if (SIZEOF_OFF_T == 8) && defined(PRId64)
 
#define PRI_off_t       PRId64
#define SCN_off_t       SCNd64
#define CAST_off_t(n)   (int64_t)(n)
 
#elif (SIZEOF_OFF_T == 4) && defined(PRId32)
 
#define PRI_off_t       PRId32
#define SCN_off_t       SCNd32
 
#if (SIZEOF_INT == 4)
#define CAST_off_t(n)   (int)(n)
#elif (SIZEOF_LONG == 4)
#define CAST_off_t(n)   (long)(n)
#else
#define CAST_off_t(n)   (int32_t)(n)
#endif
 
#endif
#endif
 
#ifndef PRI_off_t
#if defined(HAVE_LONG_LONG) && (SIZEOF_OFF_T > SIZEOF_LONG)
#define PRI_off_t       "lld"
#define SCN_off_t       "lld"
#define CAST_off_t(n)   (long long)(n)
#else
#define PRI_off_t       "ld"
#define SCN_off_t       "ld"
#define CAST_off_t(n)   (long)(n)
#endif
#endif

time_t

Since it is used far more often, one would suppose that time_t would be less of a problem. Again, I solved this with a thicket of ifdef's. But on some platforms, gcc decides that time_t is really an int and warns when I use a %ld (long) format:

/*
 * Printing-format for "time_t", as well as cast needed to fit.
 */
#if defined(HAVE_LONG_LONG) && defined(HAVE_INTTYPES_H) && defined(SIZEOF_TIME_T)
#if (SIZEOF_TIME_T == 8) && defined(PRId64)
 
#define PRI_time_t      PRId64
#define SCN_time_t      SCNd64
#define CAST_time_t(n)  (int64_t)(n)
 
#elif (SIZEOF_TIME_T == 4) && defined(PRId32)
 
#define PRI_time_t      PRId32
#define SCN_time_t      SCNd32
 
#if (SIZEOF_INT == 4)
#define CAST_time_t(n)  (int)(n)
#elif (SIZEOF_LONG == 4)
#define CAST_time_t(n)  (long)(n)
#else
#define CAST_time_t(n)  (int32_t)(n)
#endif
 
#endif
#endif
 
#ifndef PRI_time_t
#if defined(HAVE_LONG_LONG) && (SIZEOF_TIME_T > SIZEOF_LONG)
#define PRI_time_t      "lld"
#define SCN_time_t      "lld"
#define CAST_time_t(n)  (long long)(n)
#else
#define PRI_time_t      "ld"
#define SCN_time_t      "ld"
#define CAST_time_t(n)  (long)(n)
#endif
#endif

lex/yacc...

Generated code should compile with as few (or fewer) warnings than normal source-code. For both byacc and reflex I have done this. That is not true of bison and “new” flex.

I use lex/flex for most of the syntax filters in vile. Occasionally someone wants that to support “new” flex. In a recent (July 2016) episode I made some build-fixes to make that work. But as I reported in Debian #832973, I preferred to not use that tool because it added more than 25,000 lines of warnings to my build-logs.

Routine Builds

I started writing build-scripts when I started working on programs that took more than a minute or two to compile. For example, in 1996 I wrote this build-x script:

#!/bin/sh
WD=`pwd`
LEAF=`basename $WD`
if [ $LEAF = xc ];then
        head -1 programs/Xserver/hw/xfree86/CHANGELOG |sed -e 's/^/** /' >make.out
        cat >>make.out <<EOF
** tree: 
`pwd`
** host: 
`partition`
EOF

        run nice make-out World
else
        echo '** You must be in the xc-directory'
fi

Later, I found that it helped to construct build-scripts which knew about the specific compilers available on different machines — so that I could verify that my programs built correctly with each compiler. I collected logs from these builds, starting in 1997 (both ncurses and lynx). However, these collections were not systematic; I did not at first store the logs in a source repository to allow comparison, but settled for a record of the “last good build” to use in trouble-shooting the configure scripts.

There were a few exceptions: as part of my release process for lynx I kept the logfile from a test-build on the server at ISC which hosted lynx. I started that in December 1997.

But for the other programs: my versioned archives for ncurses build-logs start in July 2006. Other programs followed, as well as more elaborate (and systematic) build-scripts. For an overview of those, see my discussion of sample build-scripts. As of September 2016, I have 218 scripts for building programs, in various configurations. I collect the logs from these, and compare against the previous build — and look for new problems such as compiler warnings.

Packaging

I track build-logs to avoid introducing problems for others who build my programs (either packagers or individual developers). None of my machines run continuously, and a build-server would make little sense (because I have to test on many platforms), so a set of scripts provides a workable solution.

Packagers give the most immediate feedback when there is a problem building ncurses or xterm. They typically use a particular set of machines, with build-servers. Packaging and systematic builds go together.

In a few cases, others contributed scripts for building my programs and creating packages:

but this was not done systematically. Also (until around the time that I got involved in packaging), Linux packagers did not as a rule provide source repositories for their packaging scripts from which one could get useful information about the reasons for package changes. The BSD ports on the other hand provide historical information but are strongly dependent on the structure within which a port is built.

I began packaging all of my programs in 2010 when I started using virtual machines for the bulk of my development. Most of these packages use either "deb" (Debian and derived distributions) or "rpm" (Red Hat, OpenSUSE and others). I wrote (and am still writing) a set of scripts to manage this. Rather than adding onto an existing upstream source and adding version information to that source, my packaging scripts work within the existing structure and use my existing versioning scheme.

During this process, I finally got around to dropping support for K&R compilers in the configure script checks for the C compiler. The unproto program was of course long unused.

For each program XXX:

While developing a new set of changes, I “release” several updates for packaging and make test-builds, comparing the build-logs to the previous release to eliminate new compiler warnings. The build scripts actually invoke several scripts depending on the availability of packaging tools and cross-compilers. Each produces a log file.

As of September 2016, I have 77 release scripts, including 22 which generate documentation for my website. I have not written scripts for everything: I have 14 programs in my to-do list for scripting.

Problems

at work...

Paid work, of course.

When I first used lint in 1986, I was working with another developer. I pointed out that including <string.h> in a program would give the correct return-type for strcpy, making it unnecessary to use a cast:

char *p = (char *)strcpy(target, source);

He refused to make the change, giving as his reason:

My supervisor on another job told me to do it this way.

At the time (1985-1987), we were developing networking applications for M68K-based computers. The C compiler for those machines used one set of registers for data, and another set for addressing. Without a suitable declaration:

char *strcpy();

the compiler would assume (cast or no cast) that strcpy returned an integer. The cast would cause the the resulting data register to be used as an address.

I found lint useful anyway. It would tell me about cases where I made a typo, using fprintf where I meant printf, e.g., this error

fprintf("Hello World!\n");

The alternative (without lint) was a core dump.

At the time I wrote the gcc-normal and gcc-strict scripts, I found a use for gcc solely for its warnings. This helped me to improve the code quality of a fairly large system which had been written in K&R C, retrofitted to some POSIX features, and ported to many platforms. I used gcc warnings to flag places where

I kept track of my progress by keeping the build-logs in the source repository, and measuring the daily rate of change using c_count and diffstat in a cron job. Due to its lack of warranty, we did not use it for the end product.

As part of the process, I made the cron job send everyone on the development team a daily report of the diffstat. That was ... not popular. I stopped the email. But I kept on with the compiler warnings, converting the programs to ANSI C.

Seeing that a few developers made a majority of the changes, I discussed the compiler warnings with those people. Some liked the idea. One developer, however, got up immediately and left the office when I sat down next to him. When I caught up with him, he had not calmed down, saying:

I know what you're trying to do, and I think it's good.
But I just can't stand it.

So not everyone liked fixing compiler warnings. This developer was fairly productive, but liked to work alone in the evenings when others were not around. One morning I was chatting with another developer when I happened to notice a hole in the wall, perhaps 8-10 inches in diameter. I remarked that I hadn't seen that before. The person I was talking to remarked that (the other developer) had done it. "How…", I began. He replied that (the other) wore boots. Enough said.

gcc

During the last year or so that I was at the Software Productivity Consortium, I spent some time reviewing and suggesting improvements to programs that I found on the Internet. One of those was gcc. In its bootstrap, it compiled a program named “enquire” which it used to determine the sizes of various datatypes. That in particular had a lot of compiler warnings (because the code ignored the difference between signed and unsigned values), but other parts of gcc needed work as well.

I sent a patch for gcc (which fixed about 5,000 warnings) to gcc's developers (probably early 1993). Richard Stallman responded, a little oddly I thought: he asked what I wanted them to do with the patch. I replied that I wanted them to use it to improve the program. I heard no more. If they did incorporate any of the fixes, there was no record of that in later versions.

I kept that in mind, and sent no more fixes to gcc's developers.

Expanding a little on my remarks here, quite a while ago, Richard Stallman sent a message to mailing list explaining that others found warnings useful, but that he did not. Likely that had some influence on the default compiler warnings as well as the choice of options which comprise -Wall:

Date: Thu, 2 Sep 1999 23:19:27 -0400
Message-Id: <gnusenet199909030319.XAA08701@psilocin.gnu.org>
From: Richard Stallman <rms@gnu.org>
To: gnu-prog@gnu.org
Subject: On using -Wall in GCC
Reply-to: rms@gnu.org
Resent-From: info-gnu-prog-request@gnu.org

Status: RO
Content-Length: 841
Lines: 17

I'd like to remind all GNU developers that the GNU Project
does not urge or recommend using the GCC -Wall option.

When I implemented the -Wall option, I implemented every warning that
anyone asked for (if it was possible).  I implemented warnings that
seemed useful, and warnings that seemed silly, deliberately without
judging them, to produce a feature which is at the upper limit of
strictness.

If you want such strict criteria for your programs, then -Wall is for
you.  But changing code to avoid them is a lot of work.  If you don't
feel inclined to do that work, please don't let anyone else pressure
you into using -Wall.  If people say they would like to use it, you
don't have to listen.  They're asking you to do a lot of work.
If you don't feel it is useful, you don't have to do it.

I never use -Wall myself.

In gcc-2.7.2.3's ChangeLog.4 file, the -Wstrict-prototypes option (though present in gcc 1.42 in January 1992) is first mentioned as being distinct from -Wall:

Thu Nov 21 15:34:27 1991  Michael Meissner  (meissner at osf.org)

        * gcc.texinfo (warning options): Make the documentation agree with
        the code, -Wstrict-prototypes and -Wmissing-prototypes are not
        turned on via -Wall; -Wnoparenthesis is now spelled
        -Wno-parenthesis.
        (option header): Mention that -W options take the no- prefix as well
        as -f options.

Also, in documentation (gcc.info-3), it appeared in the section begun by this paragraph:

   The remaining `-W...' options are not implied by `-Wall' because
they warn about constructions that we consider reasonable to use, on
occasion, in clean programs.

The option itself was documented like this:

`-Wstrict-prototypes'
     Warn if a function is declared or defined without specifying the
     argument types.  (An old-style function definition is permitted
     without a warning if preceded by a declaration which specifies the
     argument types.)

The reason for the option being separate is easy to understand, given the context: this was only a few years after C had been standardized, and few programs had been converted to ANSI C. gcc had other options to help with this, e.g., -Wtraditional.

Developers of complex tools have to keep in mind compatibility. Moving options between categories is guaranteed to break some people's build scripts. For instance, gcc also has

`-Werror'
     Make all warnings into errors.

which some people use regularly. Needlessly turning on warnings that developers had earlier chosen to not use and stopping the compile as a result is not a way to maintain compatibility.

For more context on -Wall versus -Wstrict-prototypes it helps to read the entire section rather than selectively pick out text. The last paragraph in current documentation for -Wall for instance points out that -Wall is not comprehensive, and that ultimately the reason for inclusion was a matter of judgement (as in the original documentation):

Note that some warning flags are not implied by -Wall. Some of them warn about constructions that users generally do not consider questionable, but which occasionally you might wish to check for; others warn about constructions that are necessary or hard to avoid in some cases, and there is no simple way to modify the code to suppress the warning. Some of them are enabled by -Wextra but many of them must be enabled individually.

As for whose judgement that was – it would be the original developers of gcc around 1990.

mawk

Around the same time (1993/1994), I sent Mike Brennan suggested changes for mawk. Some of those were prompted by lint warnings. Those he rejected as unnecessary. For example, one of the diffs would have begun like this:

--- execute.c.orig      1996-02-01 00:05:42.000000000 -0500
+++ execute.c   2016-08-11 18:30:56.726609195 -0400
@@ -219,7 +219,7 @@
    }
 
    while (1)
-      switch (cdp++->op)
+      switch ((cdp++)->op)
       {
 
 /* HALT only used by the disassemble now ; this remains
@@ -234,13 +234,13 @@
 
         case _PUSHC:
            inc_sp() ;
-           cellcpy(sp, cdp++->ptr) ;
+           cellcpy(sp, (cdp++)->ptr) ;
            break ;
 
         case _PUSHD:
            inc_sp() ;
            sp->type = C_DOUBLE ;
-           sp->dval = *(double *) cdp++->ptr ;
+           sp->dval = *(double *) (cdp++)->ptr ;
            break ;
 
         case _PUSHS:

Interestingly enough, gcc has nothing to say about that. Compiling execute.c with gcc-normal gives me 40 warnings, and with gcc-strict 84 warnings. So there is something to be said, even without lint.

After Brennan released mawk 1.3.3 in November 1996, there was no maintainer except for packagers until I adopted it in September 2009. I noticed this because one of the packagers made an inappropriate change involving byacc and mawk. Debian had accumulated 8 patches; I incorporated those and set about making the normal sort of improvements: compiler warnings, portability fixes and bug-reports.

One of the bug-reports dealt with gsub (Debian #158481). Brennan had written this to recur each time it made a change. I made an initial fix to avoid the recursion in December 2010, but it was slow. Returning to this after some time, I was in the middle of making a plan to deal with this in August 2014 when I received mail from Brennan.

It was always my intention to return to mawk and fix some mistakes.
Never intended to wait 15+ years, but now I am going to do it.

I hope you want to cooperate with me.   If so, let's figure out how we both
work on mawk.  If not, let's figure out how to separate.

I recalled his attitude toward lint and compiler warnings, but agreed, saying:

ok.  I have some changes past the last release/snapshot (working off/on, since
there are other programs...).  At the moment I was mulling over how to measure
performance with/without -Wi for

        https://code.google.com/p/original-mawk/issues/detail?id=12

so... I'll put that aside for the moment, and see about adding your patch,
resolving any rejects and then making a snapshot available for discussion.

(I'm not currently working in the area you mentioned - was working on some
simple stuff while thinking how to revise gsub - intending to make a public
release once _that_ is done).

That lasted 5 weeks, ending because we were not in agreement regarding compiler warnings. Here is one of my replies:

| What requires changes? Compiles -Wall without a peep. Are you adding

gcc -Wall is sub-minimal, actually.  To see minimal warnings, use the configure
script's "--enable-warnings" option.  For development work (as opposed to
warnings which some packager might consider using), I use the gcc-normal and
gcc-stricter scripts here:

http://invisible-island.net/scripts/readme.html#build_scripts

Also, Mike regarded the no-leaks code as unnecessary, proposing a change to remove it. His parting message in September was:

I've decided to work on something else.
Please discard the code I sent you on 20140908.
Your gsub3 works just as well as my gsub, so use you own code not mine. It
will be easier to maintain code you wrote yourself.
Also, remove me from  2014 in the version display.

In August 2016, Mike Brennan posted to comp.lang.awk his announcement of a beta for mawk 2.0, stating in the README file:

In my absence, there have been other developers that produced mawk 1.3.4-xxx.
I started from 1.3.3 and there is no code from the 1.3.4 developers in
this mawk, because their work either did not address my concerns or
inadequately addressed my concerns or, in some cases,
was wrong.  I did look at the
bug reports and fixed those that applied to 1.3.3.
I did switch to the FNV-1a hash function as suggested in a bug report.

From the discussion above, the reader can see that the "in some cases, was wrong" refers to compiler warnings and checking for memory leaks. Because Brennan expressed no other concerns during five weeks, likely the entire sentence is focused on that one issue. For the record,

purify

The Usenet thread X11, Xt, Xm and purify on comp.windows.x in late 1998 illustrates the differences of opinion between developers and vendors.

snprintf

Not all warnings are beneficial. When they are overdone, they are detrimental. Consider snprintf and its followups strlcpy, strlcat.

In my first encounter with these in 2000, it was to modify the names of the strlcpy function which I wrote in 1990 to avoid conflict with the latter (one of several cases where BSD header files included non-standard functions without any ifdef's to avoid namespace pollution).

My function lowercases its argument. The OpenBSD function attempts to remedy the ills of the world by offering a better version of strncpy. Conveniently enough, there is an strncat variant. There are a few drawbacks to using the standard strncpy and strncat:

Those are mentioned in the paper strlcpy and strlcat - consistent, safe, string copy and concatenation by Miller and de Raadt (USENIX 99). But none of that was an original observation by the OpenBSD developers. Just from my own experience, while strcpy, etc., are used in the source code (usually correctly), dynamically allocated strings are the way to go:

According to the CVS history, those functions were added in 1998, with substantial changes going into 2001. Later (July 2003) after relicensing the replacements, these standard functions were modified to force the linker to warn about their use:

The same was done later when incorporating these functions (though on OpenBSD the wide-character functions are little used):

On the other hand, there are no analogously improved versions of scanf or sscanf, etc., and there is no warning about their use. The developers have been inconsistent.

I left ncurses out of the list. The paper mentioned ncurses, so some discussion is needed.

Todd Miller provided several fixes for ncurses but none of those dealt with a buffer overflow. There was one report of a buffer overflow — on the bug-ncurses mailing list in October 2000, and on freebsd-security a week later (after I had made fixes in 2000/10/07). I addressed that by doing what I had done with my directory editor (see above). Incidentally there was a different report earlier that year (mentioned in the FreeBSD discussion by the person who had reported it), against 1.8.6 which was then about 5 years out of date, having been released in October 1994.

The changes in OpenBSD in 2003 pointed out that ncurses still used strcpy and strcat, e.g., to copy to a newly allocated chunk of memory with the correct size.

For example, linking the ncurses C++ demo produces these messages:

linking demo
../lib/libncurses++.a(cursslk.o)(.text+0xde): In function `Soft_Label_Key_Set::Soft_Label_Key::operator=(char*)':
../c++/cursslk.cc:46: warning: strcpy() is almost always misused, please use strlcpy()
../lib/libncurses.so.6.0: warning: strcat() is almost always misused, please use strlcat()
../obj_s/demo.o(.text+0xbf): In function `TestApplication::init_labels(Soft_Label_Key_Set&) const':
./cursesf.h:64: warning: sprintf() is often misused, please use snprintf()

Like lynx, ncurses was then — and still is — part of OpenBSD base. In that case, Todd Miller globally substituted to use the strlcpy, etc., in 2003 (in preparation for the change to warn about strcpy, etc.). As of September 2016, OpenBSD CVS has ncurses 5.7 (released in 2008/11/02).

Early in 2012 (after ncurses 5.9) I added a configure option to make it simpler and less error-prone if OpenBSD updates ncurses in the future. It is called “--enable-string-hacks,” and uses macros to switch between strcpy and strlcpy, etc. I did this to demonstrate that the checks either way were equivalent.

The strlcpy and related programs are only as good as the buffer-size which is passed to them. In real programs, the buffer-size may be passed through several levels of function calls. Or it may simply be assumed, via a symbol definition. Those cases would require a static analyzer to verify the calls.

On the other hand, there are cases that a simple checker should be able to do. Consider this test program:

#include <stdio.h>
#include <string.h>
#include <bsd/string.h>
 
static int
copyit(char *argument)
{
    char buffer[20];
    strlcpy(buffer, argument, sizeof(buffer) + 40);
    return (int) strlen(buffer);
}
 
int
main(int argc, char *argv[])
{
    int n;
    for (n = 0; n < argc; ++n) {
        copyit(argv[n]);
    }
    return 0;
}

Running the program and passing an argument longer than 20 characters will make it overflow the buffer. I checked for warnings about this program using these tools in October 2016.

Coverity of course is remote; the other two were on my Debian/testing system.

Oddly enough, none of them reported a problem with the test program. Just to check, I ran it with valgrind, and got the expected result: a core dump and a nice log:

==26109== Jump to the invalid address stated on the next line
==26109==    at 0x787878: ???
==26109==    by 0x3FFFF: ???
==26109==    by 0xFFF0005E7: ???
==26109==    by 0x20592F147: ???
==26109==    by 0x40061F: ??? (in /tmp/foo)
==26109==  Address 0x787878 is not stack'd, malloc'd or (recently) free'd
==26109== 
==26109== 
==26109== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==26109==  Bad permissions for mapped region at address 0x787878
==26109==    at 0x787878: ???
==26109==    by 0x3FFFF: ???
==26109==    by 0xFFF0005E7: ???
==26109==    by 0x20592F147: ???
==26109==    by 0x40061F: ??? (in tmp/foo)

That is, the behavior of these functions is reasonably well-known, but none of the tool developers thought it important to verify the parameters for them.