Last post on RSS subscriber counting
October 14, 2012 at 5:12 PM by Dr. Drang
I’ve made a few final (I think) changes to the RSS subscriber count script and figured I might as well post them here in case anyone is interested.
The script is based on Marco Arment’s original. The great bulk of the script is his; I made a couple of improvements to the counting pipelines, added the ability to query multiple feeds, and changed the output method from emailing the counts to appending them to a history file.
Here’s the script:
bash:
1: #!/bin/bash
2:
3: # A modification of Marco Arment's script at
4: #
5: # https://gist.github.com/3783146
6: #
7: # It's intended to be run once a day as a cron job. The main
8: # differences between it and Marco's script are:
9: #
10: # 1. It checks two feeds instead of just one.
11: # 2. It combines the non-Google Reader counts into a single number.
12: # 3. It doesn't write anything to stdout or send email.
13: # 4. It adds a line to a history file with the date and counts.
14: #
15:
16: # Required variables. Edit these for your server.
17: FEED_LIST="/all-this/feed/ /all-this/feed/atom/"
18: LOG_FILE="/path/to/apache/access/log/file"
19: HISTORY_FILE="subscribers.txt"
20:
21: # Date expression for yesterday
22: DATE="-1 day"
23:
24: # Date format in Apache log
25: LOG_FDATE=`date -d "$DATE" '+%d/%b/%Y'`
26:
27: # Date format for display in emails
28: HUMAN_FDATE=`date -d "$DATE" '+%F'`
29:
30: # Date format for history file.
31: HISTORY_FDATE=`date -d "$DATE" '+%Y-%m-%d'`
32:
33: # Start the line with yesterday's date.
34: DAYLINE=$(printf "%s: " $HISTORY_FDATE)
35:
36: # Loop through the feeds, collecting subscriber counts and adding
37: # them to the line.
38: for RSS_URI in $FEED_LIST; do
39:
40: # Unique IPs requesting RSS, except those reporting "subscribers":
41: IPSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI " | egrep -v '[0-9]+ subscribers' | cut -d' ' -f 1 | sort | uniq | wc -l`
42:
43: # Google Reader subscribers and other user-agents reporting "subscribers"
44: # and using the "feed-id" parameter for uniqueness:
45: GRSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI " | egrep -o '[0-9]+ subscribers; feed-id=[0-9]+' | sort -t= -k2 -s | tac | uniq -f2 | awk '{s+=$1} END {print s}'`
46:
47: # Other user-agents reporting "subscribers", for which we'll use the
48: # entire user-agent string for uniqueness:
49: OTHERSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI " | fgrep -v 'subscribers; feed-id=' | egrep '[0-9]+ subscribers' | egrep -o '"[^"]+"$' | tac | awk -F\( '!x[$1]++' | egrep -o '[0-9]+ subscribers' | awk '{s+=$1} END {print s}'`
50:
51: # Add the non-Google Reader subscribers.
52: NONGRSUBS=$(($IPSUBS + $OTHERSUBS))
53:
54: DAYLINE=$DAYLINE$(printf "%5d " $GRSUBS; printf "%5d " $NONGRSUBS)
55: done
56:
57: # Append yesterday's info to the history file.
58: echo "$DAYLINE" >> $HISTORY_FILE
The line that’s appended to the history file looks like this:
2012-10-13: 2783 521 27 21
After the date, the subscriber counts are in the order (Google Reader count, non-Google Reader count) for each feed in the list. The list of feed URLs is simply a string with the two feeds separated by a space. The “http://leancrew.com” prefix to the URLs isn’t included because it isn’t present in the Apache log file.
I decided to append the counts to a history file for two reasons:
- I was tired of getting a daily email with the counts.
- When I did want to look at the counts, I didn’t want just a snapshot of that day’s subscriber counts. I wanted to be able to see how they were changing.
The history file is kept on the server. If I want it on my local machine, a quick scp
will copy it here.