Last post on RSS subscriber counting
October 14, 2012 at 5:12 PM by Dr. Drang
I’ve made a few final (I think) changes to the RSS subscriber count script and figured I might as well post them here in case anyone is interested.
The script is based on Marco Arment’s original. The great bulk of the script is his; I made a couple of improvements to the counting pipelines, added the ability to query multiple feeds, and changed the output method from emailing the counts to appending them to a history file.
Here’s the script:
1 #!/bin/bash
2
3 # A modification of Marco Arment's script at
4 #
5 # https://gist.github.com/3783146
6 #
7 # It's intended to be run once a day as a cron job. The main
8 # differences between it and Marco's script are:
9 #
10 # 1. It checks two feeds instead of just one.
11 # 2. It combines the non-Google Reader counts into a single number.
12 # 3. It doesn't write anything to stdout or send email.
13 # 4. It adds a line to a history file with the date and counts.
14 #
15
16 # Required variables. Edit these for your server.
17 FEED_LIST="/all-this/feed/ /all-this/feed/atom/"
18 LOG_FILE="/path/to/apache/access/log/file"
19 HISTORY_FILE="subscribers.txt"
20
21 # Date expression for yesterday
22 DATE="-1 day"
23
24 # Date format in Apache log
25 LOG_FDATE=`date -d "$DATE" '+%d/%b/%Y'`
26
27 # Date format for display in emails
28 HUMAN_FDATE=`date -d "$DATE" '+%F'`
29
30 # Date format for history file.
31 HISTORY_FDATE=`date -d "$DATE" '+%Y-%m-%d'`
32
33 # Start the line with yesterday's date.
34 DAYLINE=$(printf "%s: " $HISTORY_FDATE)
35
36 # Loop through the feeds, collecting subscriber counts and adding
37 # them to the line.
38 for RSS_URI in $FEED_LIST; do
39
40 # Unique IPs requesting RSS, except those reporting "subscribers":
41 IPSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI " | egrep -v '[0-9]+ subscribers' | cut -d' ' -f 1 | sort | uniq | wc -l`
42
43 # Google Reader subscribers and other user-agents reporting "subscribers"
44 # and using the "feed-id" parameter for uniqueness:
45 GRSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI " | egrep -o '[0-9]+ subscribers; feed-id=[0-9]+' | sort -t= -k2 -s | tac | uniq -f2 | awk '{s+=$1} END {print s}'`
46
47 # Other user-agents reporting "subscribers", for which we'll use the
48 # entire user-agent string for uniqueness:
49 OTHERSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI " | fgrep -v 'subscribers; feed-id=' | egrep '[0-9]+ subscribers' | egrep -o '"[^"]+"$' | tac | awk -F\( '!x[$1]++' | egrep -o '[0-9]+ subscribers' | awk '{s+=$1} END {print s}'`
50
51 # Add the non-Google Reader subscribers.
52 NONGRSUBS=$(($IPSUBS + $OTHERSUBS))
53
54 DAYLINE=$DAYLINE$(printf "%5d " $GRSUBS; printf "%5d " $NONGRSUBS)
55 done
56
57 # Append yesterday's info to the history file.
58 echo "$DAYLINE" >> $HISTORY_FILE
The line that’s appended to the history file looks like this:
2012-10-13: 2783 521 27 21
After the date, the subscriber counts are in the order (Google Reader count, non-Google Reader count) for each feed in the list. The list of feed URLs is simply a string with the two feeds separated by a space. The “http://leancrew.com” prefix to the URLs isn’t included because it isn’t present in the Apache log file.
I decided to append the counts to a history file for two reasons:
- I was tired of getting a daily email with the counts.
- When I did want to look at the counts, I didn’t want just a snapshot of that day’s subscriber counts. I wanted to be able to see how they were changing.
The history file is kept on the server. If I want it on my local machine, a quick scp
will copy it here.