Copyright © 1996-2013,2014 by Thomas E. Dickey
diffstat reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. It is useful for reviewing large, complex patch files.
I originally wrote this in 1992, along with an associated utility rcshist, to trace the change history of collections of files. Since then, I've found it most useful for summarizing source patches.
See the changelog for details:
Initially, I used diff and
diffstat in a script named
diff-patch. In 1994, I started using makepatch which gave
more consistent results.
It was not until early 1996 that there was much attention by others to the tool. At that point, developers on both XFree86 and ncurses mailing lists started using it.
One of those developers (Tony Nugent) pointed it out to Linus Torvalds in July 1996, on linux.dev.kernel. Much later (in 2002), it was documented as part of the process for submitting Linux kernel patches for BitKeeper (BK) in Linux 2.4.20. Linus commented on the process:
Ok, pulled. But _please_ do this the regular way next time. There's even a script to help you do it in linux/Documentation/BK-usage/bk-mak-sum, which does it all for you for BK patches.
(many people end up doing their own thing, you don't have to use that particular script, of course. But the important thing I want is that the _email_ should contain enough information to make a good first pass judgement on what the patch does, and in particular it is important for me to see what a "bk pull" will actually change.)
That's why the "diffstat" is important to me if I do a BK pull – and why I want to see the patches as plaintext if I apply stuff to generic files..
Later, in 2005 Linus wrote git, which has the ability to generate a diffstat. There are some enhancements (git is able to track moves and renames of files).
introduced a misfeature. Briefly, it checks if a
COLUMNS environment variable is set, and uses
atoi decodes to override the default
of 80 columns for the report width. My advice was overruled (the
bug report offers a disingenuous reason—see
this for the context in which the remarks were made).
There is more than one reason why that is not a suitable change:
The change modifies existing behavior—silently.
The change is redundant (the "
already provides the desired functionality). Assuming that
COLUMNS were set reliably to a useful value, one
I noted this in my initial response on the topic.diffstat -w$COLUMNS
The change does no error-checking. If that variable happens to be set (even to an empty string) then it will use that value.
The patch hardcodes
STDOUT as a variable in
the main function, rather than using
environment variables are set by only a few applications
(resize being one),
and bash being another. A third does not
come to mind (certainly not another shell).
The resize program's environment variables are generally discarded (not applied to the shell); it is useful for making system calls to tell the the computer the actual size of the terminal window. On the other hand, bash does set the variables.
In some configurations (Debian), bash
sets a shell variable (which is not exported to
subprocesses). In others (apparently the case with OpenSuSE),
bash exports environment variables. This is
without the complication of scripts which do an
export of the shell variables.
Some programs use these variables. In ncurses for
instance, this is a standard legacy feature, useful for cases
where the operating system cannot provide the required
information (see use_env, compare with
use_tioctl). Otherwise it is a nuisance
because it interferes with programs that are able to obtain
the screensize without this crutch (xterm sets the variable only on a few
very old platforms for this reason). A few other programs
which do not use ncurses (such as ps as
noted in Novell #793536)
can be overridden by
This behavior of bash's has been seen as a nuisance, e.g., Novell #828877, along with these threads from bug-bash in 2013-01, 2013-07. After reading several reports, e.g., ArchLinux #32821, Debian #628638 (and blogs), a common thread emerges: bash has tied two behaviors together:
As long as the two behaviors (making bash work properly, and telling applications to use that information) are tied together, the feature is going to be a nuisance.
Because there is no relevant standard (for the behavior of
shell programs), users with scripts which happen to set the
variable would be impacted. Debian for instance switched from
bash to dash years ago. The
latter does nothing with
COLUMNS, so that
scripts which work properly with dash would
behave differently on a machine where bash
The change was applied to the Debian package two years later, without discussion immediately after a change of package maintainers, (see Debian #588876). A user pointed out part of the problem with the change in Debian #697696, but made no headway with yet another maintainer.