More automated plots

Three years ago I wrote a post describing how I used Gnuplot to generate monthly Iraq (and now Afghanistan) casualty charts. With a newer version on Gnuplot and my recent eps2png script, the system is now more streamlined, and I thought it was worth a brief description. I’m not doing this because I think you’re interested in replicating the plots I do, but because the system can be generalized to any set of graphs that need to be created repetitively.

The goal is to generate plots that look like this:

Afghanistan May

Actually, four plots are made: a full-sized (1000×700) and thumbnail (500×350) plot for both Afghanistan and Iraq. In my posts, as above, the thumbnail acts as a link to the larger plot.

I start with text files of the data. Here’s the tail end of acasualties.txt, a file with Afghanistan casualties, which I update every month with information from the icasualties.org site:

2010-11    53    58
2010-12    33    41
2011-01    25    32
2011-02    20    38
2011-03    31    39
2011-04    46    51
2011-05    35    59

The first item in each line is the date in YYYY-MM format. The second and third items are the US and coalition military deaths for that month. There’s a similar file, icasualties.txt, for Iraq.

The Gnuplot script file that reads this data is called acasualties.gp.

 1:  # input format for dates
 2:  set timefmt "%Y-%m"
 3:  
 4:  # horizontal (time) axis layout
 5:  set xdata time
 6:  set format x "%b '%y"
 7:  set xtics "2001-10", 60*60*24*365.2425/4, "2011-12"\
 8:      rotate by 90 offset 0,-3.25                          # quarterly
 9:  set mxtics 3                                             # monthly
10:  
11:  # left vertical axis layout
12:  set ylabel "Military Deaths" offset 2,0
13:  set yrange [0:120]
14:  set ytics 20
15:  set mytics 4
16:  
17:  # overall layout
18:  set title "Afghanistan War Timeline" offset 0,-.5
19:  set grid lt 1 lc 9 lw .25
20:  set key at "2002-03",93 left width -3 height 1\
21:    samplen 1.5 spacing 1.25 box lw .5 lc 9
22:  set border lw .25 lc -1
23:  
24:  # Make labels for the totals
25:  ustot = `perl -e '$s=0;while(<>){($m,$a,$b)=split;$s+=$a}print$s;' acasualties.txt`
26:  tot = `perl -e '$s=0;while(<>){($m,$a,$b)=split;$s+=$b}print$s;' acasualties.txt`
27:  
28:  # Make label for data source
29:  set label 3 "Data source: http://icasualties.org" at '2002-02n',30 left
30:  
31:  # choose output and plot it
32:  set terminal postscript eps color solid font "Helvetica,12"
33:  set output
34:  plot "acasualties.txt" using 1:2 title sprintf("US only (%d)",ustot)\
35:    with linespoints pt 5 lt 3 lw 4,\
36:    "acasualties.txt" using 1:3 title sprintf("Coalition (%d)",tot)\
37:    with points pt 7 lt 1

There’s a lot in this script. The trickiest parts, I think, are Lines 4-9, where the horizontal axis is laid out and Lines 24-26, where the totals are calculated.

Line 5 tells Gnuplot that the x axis will be used for dates, and Line 6 specifies how the dates are to be printed. Gnuplot uses printf-style format codes for dates (which you also see in Line 2 for reading the input), so Line 6 says the dates are to be displayed with a three-letter abbreviation for the month and two digits (preceded by an apostrophe) for the year. Hence: Apr '11. Line 7 uses the <start>, <increment>, <end> form of xtics to set the major tic marks—the onnes that get labeled—three months apart1, and Line 8 puts the monthly minor tic marks between them.

Lines 25 and 26 use backticks to shell out to Perl one-liners that sum up the monthly numbers and store them in variables. These variables are later used in the plot command that starts on Line 34 to set the labels in the key.

Lines 32 and 33 tell Gnuplot to generate Encapsulated PostScript (EPS) output and send it to standard output. I have it set this way because, as I said in this earlier post, I don’t like Gnuplot’s PNG output and I’m going to use my eps2png script to turn the graph into a PNG via a pipeline.

There’s a similar script, called icasualties.gp, that does the same thing for Iraq data.

The script that sits on top and controls everything is cplot:

bash:
 1:  #!/bin/bash
 2:  
 3:  cd ~/Dropbox/war
 4:  
 5:  # Afghanistan plots
 6:  abn=ac-`tail -1 acasualties.txt | awk '{print $1}'`
 7:  afn=$abn.png
 8:  atn=$abn-t.png
 9:  
10:  echo "Making $afn..."
11:  gnuplot acasualties.gp | eps2png -r200 - $afn
12:  echo "Making $atn..."
13:  sips -Z 500 $afn --out $atn &>/dev/null
14:  
15:  # Iraq plots
16:  ibn=ic-`tail -1 icasualties.txt |  awk '{print $1}'`
17:  ifn=$ibn.png
18:  itn=$ibn-t.png
19:  
20:  echo "Making $ifn..."
21:  gnuplot icasualties.gp | eps2png -r200 - $ifn
22:  echo "Making $itn..."
23:  sips -Z 500 $ifn --out $itn &>/dev/null
24:  
25:  # FTP to the server and upload the files
26:  echo "Uploading..."
27:  ftp -V ftp://drdrang:notmypassword@leancrew.com << cmd
28:  cd public_html/all-this/images2011
29:  put $afn
30:  put $atn
31:  put $ifn
32:  put $itn
33:  quit
34:  cmd

Line 3 cds to the directory where I keep the data and Gnuplot script files.

Lines 5-8 set the file names for the Afghanistan plots. Line 6 gets the first item (awk '{print $1}') in the last line (tail -1) of the data file. That, along with an ac- prefix, forms the base of the full-sized and thumbnail file names created in Lines 7 and 8.

Lines 10-13 then make the Afghanistan plots. Line 11 runs the Gnuplot script we covered earlier and pipes the output to eps2png which makes the PNG (at 200 pixels per inch) and puts the resulting file on my Desktop. Line 13 then makes the thumbnail via the builtin sips command.

Why don’t I rerun the gnuplot | eps2png pipeline with a different output resolution to make the thumbnail? Because I prefer the antialiasing sips does. I tried it both ways and this way just looks better to me. Clearly sips and Ghostscript, which is the engine behind eps2png use different rendering algorithms.

Lines 15-23 do the same thing as Lines 5-13, but with the Iraq data.

Lines 25-34 upload the files to the server via FTP. They create a here document with the interactive commands that put the files into the proper directory on the server.

So now I have a single command that generates all four plots and puts them on the server, ready to be included in a post. This is why I use Gnuplot and why I wrote the eps2png and eps2pdf scripts. It’s virtually impossible to get this kind of efficiency with a GUI tool.


  1. I suspect there’s a better way to do this than calculating the number of seconds in a year and dividing it by four, but it works.