April 22, 2021 at 4:12 PM by Dr. Drang
This morning’s post by John D. Cook brought up an interesting problem: what’s a good way to check for differences between files when the files consist of just a few (or maybe just one) very long lines?
The problem is that
diff, the standard Unix tool for finding differences, tells you which lines are different. This is great for source code files, which are written by human beings and in which the lines tend to be short, but not so much for machine-generated files like JSON, XML, and some HTML files, in which line breaks may be few and far between.
Cook’s solution for comparing long-lined files is to pass them through
fold before sending them to
diff <(fold -s -w 20 temp1.txt) <(fold -s -w 20 temp2.txt)
-s option tells
fold to break at space characters instead of in the middle of a word. The
-w 20 option tells it to make the new lines no more than 20 characters long. Breaking the text into lines of only 20 characters is overkill, but it certainly is easy to see differences between lines when you have only 20 characters to scan through.
<() thing is a bit of clever shell scripting known as process substitution. It’s used instead of piping when you have more than one input that needs to be fed to a command.
I was unfamiliar with
fold until today. Whenever I’ve needed to reformat a text file to a given line length, I’ve used
fmt. What I like about
fmt is that it defaults to breaking lines at spaces—no need for the equivalent of the
-s option. So I’d do
diff <(fmt temp1.txt) <(fmt temp2.txt)
if I were OK with
fmt’s default line length of 75 characters, or
diff <(fmt -20 temp1.txt) <(fmt -20 temp2.txt)
if I wanted to use Cook’s extremely short lines.
But I’d be even more likely to open both files in BBEdit and use its command. That would give me this nice two-pane output:
In this example, there’s only one line, but it’s broken into easily digestible pieces by BBEdit’s soft wrapping, and the exact spot at which the difference occurs is highlighted by the darker purple background.1
There is a diffing app called Kaleidoscope that I’ve heard good things about, but I’ve never felt hampered by BBEdit.
In his example text, Cook punned by changing Melville’s “hypos” to “typoes.” Not to be outpunned, I changed it to “typees.” ↩