Sort of handy
May 19, 2020 at 5:38 PM by Dr. Drang
This afternoon, I wanted to see how much disk space I had left on the server that runs this blog. As is often the case with Unix/Linux commands, after I getting the needed information, I started thinking about other ways to do things. And, as is also often the case, I learned something new. New to me, anyway.
First, I logged into the server and ran the df
command:
df -h .
The output, which summarizes the disk usage of the file system containing the given file (in this case, the current directory, .
, was my home directory) was
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-label/DOROOT 25G 9.5G 14G 41% /
This shows I’m using 41% of the 25 GB I’m paying for. The -h
option told df
to use a “human” format for the output, Instead of showing the usage in “1-K blocks,” it shows it in kilobytes, megabytes, and gigabytes. Lots of GNU utilities have an -h
option that works this way.
This disk usage includes everything on my virtual server—all the executables, libraries, and support files in addition to the files specific to the blog. I wanted to refine this to see just what the blog was using. That called for the du
command:
du -hd 1 .
Once again, the -h
option meant “human formatted values.” The -d 1
option told du
to go only one directory level deep. The output was
8.0K ./.gnupg
68K ./pagelogs
114M ./all-this
9.0M ./.local
116K ./php-markdown
1.5M ./.cache
68K ./.ipython
20K ./.pip
8.0K ./.ssh
522M ./tmp
16K ./bin
8.0K ./.conda
1.1G ./public_html
4.0K ./.nano
3.4G ./anaconda3
5.1G .
The line for public_html
was what I was looking for: 1.1 GB. So relatively little of the space on the server is being used for the blog.
As I looked at the du
output, I thought it would be more useful to have it in numerical order. I thought about adding an -s
option to du
, but that doesn’t work. The du
man page shows no option for sorting the output.
The standard Unix way of doing things would suggest piping the output to sort
, but I was sure that wouldn’t work here. Because although sort
has an -n
option for sorting numerically, the numbers in du
’s human output weren’t what needed to be sorted. It’s the quantities I wanted sorted, and that means the suffixes had to be accounted for. A test with
du -hd 1 . | sort -n
gave me
1.1G ./public_html
1.5M ./.cache
3.4G ./anaconda3
4.0K ./.nano
5.1G .
8.0K ./.conda
8.0K ./.gnupg
8.0K ./.ssh
9.0M ./.local
16K ./bin
20K ./.pip
68K ./.ipython
68K ./pagelogs
114M ./all-this
116K ./php-markdown
522M ./tmp
which confirmed my suspicions. Perfect for sorting the numbers but useless for sorting the quantities.
I could use a different output switch for du
:
du -kd 1 . | sort -n
The -k
tells du
to output the sizes in kilobytes. The sorted output is
4 ./.nano
8 ./.conda
8 ./.gnupg
8 ./.ssh
16 ./bin
20 ./.pip
68 ./.ipython
68 ./pagelogs
116 ./php-markdown
1448 ./.cache
9184 ./.local
115896 ./all-this
534104 ./tmp
1098316 ./public_html
3470796 ./anaconda3
5333936 .
which is great until the numbers get up past five or six digits and you lose track of the order of magnitude.
But here comes the part where I learn something. It turns out the GNU folks recognized the need to read human-formatted values as well as write them, and they added an -h
option to sort
. So
du -hd 1 . | sort -hr
gives
5.1G .
3.4G ./anaconda3
1.1G ./public_html
522M ./tmp
114M ./all-this
9.0M ./.local
1.5M ./.cache
116K ./php-markdown
68K ./pagelogs
68K ./.ipython
20K ./.pip
16K ./bin
8.0K ./.ssh
8.0K ./.gnupg
8.0K ./.conda
4.0K ./.nano
which is exactly what I wanted: easy to read and properly sorted. (The -r
switch tells sort
to reverse so the biggest directories come first.)
The -h
option was added to GNU sort
in 2009, four or five years after I moved back to the Mac and stopped being as intense a command line user as I had been. I don’t feel too bad about not knowing of it.