Copyright © 1996-2014,2015 by Thomas E. Dickey
diffstat reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. It is useful for reviewing large, complex patch files.
I originally wrote this in 1992, along with an associated utility rcshist, to trace the change history of collections of files. Since then, I've found it most useful for summarizing source patches.
See the changelog for details:
Initially, I used diff and
diffstat in a script named
In 1994, I started using makepatch which gave
more consistent results.
It was not until early 1996 that there was much attention by others to the tool. At that point, developers on both XFree86 and ncurses mailing lists started using it.
One of those developers (Tony Nugent) pointed it out to Linus Torvalds in July 1996, on linux.dev.kernel. Much later (in 2002), it was documented as part of the process for submitting Linux kernel patches for BitKeeper (BK) in Linux 2.4.20. Linus commented on the process:
Ok, pulled. But _please_ do this the regular way next time. There's even a script to help you do it in linux/Documentation/BK-usage/bk-mak-sum, which does it all for you for BK patches.
(many people end up doing their own thing, you don't have to use that particular script, of course. But the important thing I want is that the _email_ should contain enough information to make a good first pass judgement on what the patch does, and in particular it is important for me to see what a "bk pull" will actually change.)
That's why the "diffstat" is important to me if I do a BK pull – and why I want to see the patches as plaintext if I apply stuff to generic files..
Later, in 2005 Linus wrote git, which has the ability to generate a diffstat. There are some enhancements (git is able to track moves and renames of files).
Of course, I did not write diffstat as an isolated program. Rather, it provides a useful summary of the output of diff. That same output is typically processed by the patch program to apply changes to programs. Early on, this was the predominant method for distributing changes to programs. That was for two reasons:
For both of these reasons, I still provide diff's for the larger programs (in addition to complete sources):
Besides being used in the usenet sources groups, Larry Wall's program was distributed as part of X11. The file sizes and dates indicate that there were ongoing improvements (data gleaned from the X distribution tarballs):
|X11R1||1987/09/12||199||312||28||1778||patch 1.3, copyright 1984 by Larry Wall|
|X11R2||1987/12/31||4169||1328||224||453||patch kit 2.0 (patch level 9), copyright 1986 by Larry Wall|
|X11R3||1988/08/31||1278||203||520||3961||patch kit 2.0 (patch level 12), copyright 1988 by Larry Wall|
|X11R6||1993/05/28||1396||39||361||4736||Wayne Davison added support for unified diff 1990/05/01|
|X11R6.1||1994/09/14||226||9||25||5924||Stephen Gildea added ifdef's for WIN32|
|X11R6.5.1||2000/08/21||0||0||20||6155||changed CVS identifier|
I distinguish contributors versus authors based on a 20% threshold. By this rule, patch had two authors: Larry Wall and Wayne Davison.
introduced a misfeature. Briefly, it checks if a
COLUMNS environment variable is set, and uses
atoi decodes to override the default
of 80 columns for the report width. My advice was overruled (the
bug report offers a disingenuous reason—see
this for the context in which the remarks were made).
There is more than one reason why that is not a suitable change:
The change modifies existing behavior—silently.
The change is redundant (the "
already provides the desired functionality). Assuming that
COLUMNS were set reliably to a useful value, one
I noted this in my initial response on the topic.diffstat -w$COLUMNS
The change does no error-checking. If that variable happens to be set (even to an empty string) then it will use that value.
The patch hardcodes
STDOUT as a variable in
the main function, rather than using
environment variables are set by only a few applications
(resize being one),
and bash being another. A third does not
come to mind (certainly not another shell).
The resize program's environment variables are generally discarded (not applied to the shell); it is useful for making system calls to tell the the computer the actual size of the terminal window. On the other hand, bash does set the variables.
In some configurations (Debian), bash
sets a shell variable (which is not exported to
subprocesses). In others (apparently the case with OpenSuSE),
bash exports environment variables. This is
without the complication of scripts which do an
export of the shell variables.
Some programs use these variables. In ncurses for
instance, this is a standard legacy feature, useful for cases
where the operating system cannot provide the required
information (see use_env, compare with
use_tioctl). Otherwise it is a nuisance
because it interferes with programs that are able to obtain
the screensize without this crutch (xterm sets the variable only on a few
very old platforms for this reason). A few other programs
which do not use ncurses (such as ps as
noted in Novell #793536)
can be overridden by
This behavior of bash's has been seen as a nuisance, e.g., Novell #828877, along with these threads from bug-bash in 2013-01, 2013-07. After reading several reports, e.g., ArchLinux #32821, Debian #628638 (and blogs), a common thread emerges: bash has tied two behaviors together:
As long as the two behaviors (making bash work properly, and telling applications to use that information) are tied together, the feature is going to be a nuisance.
Because there is no relevant standard (for the behavior of
shell programs), users with scripts which happen to set the
variable would be impacted. Debian for instance switched from
bash to dash years ago. The
latter does nothing with
COLUMNS, so that
scripts which work properly with dash would
behave differently on a machine where bash
The change was applied to the Debian package two years later, without discussion immediately after a change of package maintainers, (see Debian #588876). A user pointed out part of the problem with the change in Debian #697696, but made no headway with yet another maintainer.
I changed the copyright notice of diffstat to use MIT-X11 licensing at the beginning of 1998 (version 1.26). Before that, I had used the same wording as I did in other works distributed from 1994 onward, e.g., the resizeterm patch. The reason for this change was likely prompted by my work to relicense ncurses, but also taking into account an old (October 1996) discussion with Joey Hess.
The license is (of course) given in full as a comment at the top of the files which comprise the program. Nothwithstanding this, some packagers find it inconvenient to cite the license properly. Here are a few examples:
A bug report for Haiku in 2010 commented that the packager had trouble finding the license, and referred to it as “DEC”, apparently unfamiliar with MIT-X11. A followup patch for the package script referred to it as the “diffstat” license.
Next, (going down the scale), there are instances where
the packager labels it “gpl-like” in the license
That is analogous to a pet-shop owner who puts a sign saying “dog-like” in front of a feline. Some people might object.
Possibly related, Mageia did this with diffstat 1.57 as of 2014. I notified them in February 2014, received no response. Finally, in July 2015, a different developer reported that the problem was addressed with diffstat 1.60 (on updating this page July 26, 2015, the package page still shows 1.59 with “GPL-like”).
Version control systems which have implemented diffstat's include
Some are slower:
A few tools extend one or more of the version control systems, enabling their diffstat features to be used via the tool:
Besides imitating diffstat, there are embedded uses of the original tool: