Pandas and the Electoral College
August 29, 2024 at 12:15 PM by Dr. Drang
A couple of weeks ago, I used the Pandas groupby
function in some data analysis for work, so when I started writing my previous post on the Electoral College, groupby
came immediately to mind when I realized I wanted to add this table to the post:
Electors | States | Pop Pct | EC Pct |
---|---|---|---|
3 | AK, DE, DC, ND, SD, VT, WY | 1.61% | 3.90% |
4 | HI, ID, ME, MT, NH, RI, WV | 3.04% | 5.20% |
5 | NE, NM | 1.22% | 1.86% |
6 | AR, IA, KS, MS, NV, UT | 5.60% | 6.69% |
7 | CT, OK | 2.29% | 2.60% |
8 | KY, LA, OR | 3.98% | 4.46% |
9 | AL, SC | 3.13% | 3.35% |
10 | CO, MD, MN, MO, WI | 8.93% | 9.29% |
11 | AZ, IN, MA, TN | 8.49% | 8.18% |
12 | WA | 2.33% | 2.23% |
13 | VA | 2.60% | 2.42% |
14 | NJ | 2.77% | 2.60% |
15 | MI | 3.00% | 2.79% |
16 | GA, NC | 6.53% | 5.95% |
17 | OH | 3.52% | 3.16% |
19 | IL, PA | 7.62% | 7.06% |
28 | NY | 5.84% | 5.20% |
30 | FL | 6.75% | 5.58% |
40 | TX | 9.11% | 7.43% |
54 | CA | 11.63% | 10.04% |
I got the population and elector information from the Census Bureau and the National Archives, respectively, and put them in a CSV file named states.csv
(which you can download). The header and first ten rows are
State | Abbrev | Population | Electors |
---|---|---|---|
Alabama | AL | 5108468 | 9 |
Alaska | AK | 733406 | 3 |
Arizona | AZ | 7431344 | 11 |
Arkansas | AR | 3067732 | 6 |
California | CA | 38965193 | 54 |
Colorado | CO | 5877610 | 10 |
Connecticut | CT | 3617176 | 7 |
Delaware | DE | 1031890 | 3 |
District of Columbia | DC | 678972 | 3 |
Florida | FL | 22610726 | 30 |
(Because the District of Columbia has a spot in the Electoral College, I’m lumping it in with the 50 states and referring to all of them as “states” in the remainder of this post. That makes it easier for both you and me.)
Here’s the initial code I used to summarize the data:
python:
1: #!/usr/bin/env python
2:
3: import pandas as pd
4:
5: # Import the raw data
6: df = pd.read_csv('states.csv')
7:
8: # Generate new fields for percentages
9: popSum = df.Population.sum()
10: ecSum = df.Electors.sum()
11: df['PopPct'] = df.Population/popSum
12: df['ECPct'] = df.Electors/ecSum
13:
14: # Collect states with same numbers of electors
15: dfg = df.groupby(by='Electors')
16:
17: # Print out the summary table
18: print('State EC States Population Pop Pct Electors EC Pct')
19: for k, v in dfg:
20: print(f' {k:2d} {v.State.count():2d} {v.Population.sum():11,d}\
21: {v.PopPct.sum():6.2%} {v.Electors.sum():3d} {v.ECPct.sum():6.2%}')
The variable df
contains the dataframe of all the states. It’s read in from the CSV in Line 6, and then extended in Lines 9–12. After Line 12, the first five rows of df
are
State | Abbrev | Population | Electors | PopPct | ECPct |
---|---|---|---|---|---|
Alabama | AL | 5108468 | 9 | 0.015253 | 0.016729 |
Alaska | AK | 733406 | 3 | 0.002190 | 0.005576 |
Arizona | AZ | 7431344 | 11 | 0.022189 | 0.020446 |
Arkansas | AR | 3067732 | 6 | 0.009160 | 0.011152 |
California | CA | 38965193 | 54 | 0.116344 | 0.100372 |
The summarizing is done first by using the groupby
function in Line 15, which groups the states according to the number of electors they have. The resulting variable, dfg
, is of type
pandas.core.groupby.generic.DataFrameGroupBy
which works like a dictionary, where the keys are the numbers of electors per state and the values are dataframes of the subset of states with the number of electors given by that key. Lines 19–21 loop through the dictionary and print out summary information for each subset of states. The output is this:
State EC States Population Pop Pct Electors EC Pct
3 7 5,379,033 1.61% 21 3.90%
4 7 10,196,485 3.04% 28 5.20%
5 2 4,092,750 1.22% 10 1.86%
6 6 18,766,882 5.60% 36 6.69%
7 2 7,671,000 2.29% 14 2.60%
8 3 13,333,261 3.98% 24 4.46%
9 2 10,482,023 3.13% 18 3.35%
10 5 29,902,889 8.93% 50 9.29%
11 4 28,421,431 8.49% 44 8.18%
12 1 7,812,880 2.33% 12 2.23%
13 1 8,715,698 2.60% 13 2.42%
14 1 9,290,841 2.77% 14 2.60%
15 1 10,037,261 3.00% 15 2.79%
16 2 21,864,718 6.53% 32 5.95%
17 1 11,785,935 3.52% 17 3.16%
19 2 25,511,372 7.62% 38 7.06%
28 1 19,571,216 5.84% 28 5.20%
30 1 22,610,726 6.75% 30 5.58%
40 1 30,503,301 9.11% 40 7.43%
54 1 38,965,193 11.63% 54 10.04%
This is what I was looking at when I wrote the last few paragraphs of the post, but I didn’t want to present the data to you in this way—I wanted a nice table with only the necessary columns. I added a couple of print()
statements to the code above to add a little whitespace and then this code to create a MultiMarkdown table.
python:
25: # Print out the Markdown summary table
26: print('| Electors | States | Pop Pct | EC Pct |')
27: print('|:--:|:--:|--:|--:|')
28: for k, v in dfg:
29: print(f'| {k:2d} | {", ".join(v.Abbrev)} | {v.PopPct.sum():6.2%} \
30: | {v.ECPct.sum():6.2%} |')
This is very much like the previous output code. Apart from adding the pipe characters and the formatting line, there’s the
python:
{", ".join(v.Abbrev)}
piece in the f-string of Line 29. The join
part concatenates all the state abbreviations, putting a comma-space between them. I decided a list of states with the given number of electors would be better than just a count of how many such states there are.
The output from this additional code was
| Electors | States | Pop Pct | EC Pct |
|:--:|:--:|--:|--:|
| 3 | AK, DE, DC, ND, SD, VT, WY | 1.61% | 3.90% |
| 4 | HI, ID, ME, MT, NH, RI, WV | 3.04% | 5.20% |
| 5 | NE, NM | 1.22% | 1.86% |
| 6 | AR, IA, KS, MS, NV, UT | 5.60% | 6.69% |
| 7 | CT, OK | 2.29% | 2.60% |
| 8 | KY, LA, OR | 3.98% | 4.46% |
| 9 | AL, SC | 3.13% | 3.35% |
| 10 | CO, MD, MN, MO, WI | 8.93% | 9.29% |
| 11 | AZ, IN, MA, TN | 8.49% | 8.18% |
| 12 | WA | 2.33% | 2.23% |
| 13 | VA | 2.60% | 2.42% |
| 14 | NJ | 2.77% | 2.60% |
| 15 | MI | 3.00% | 2.79% |
| 16 | GA, NC | 6.53% | 5.95% |
| 17 | OH | 3.52% | 3.16% |
| 19 | IL, PA | 7.62% | 7.06% |
| 28 | NY | 5.84% | 5.20% |
| 30 | FL | 6.75% | 5.58% |
| 40 | TX | 9.11% | 7.43% |
| 54 | CA | 11.63% | 10.04% |
which I then ran through my in BBEdit to produce the nicer looking filter
| Electors | States | Pop Pct | EC Pct |
|:--------:|:--------------------------:|--------:|-------:|
| 3 | AK, DE, DC, ND, SD, VT, WY | 1.61% | 3.90% |
| 4 | HI, ID, ME, MT, NH, RI, WV | 3.04% | 5.20% |
| 5 | NE, NM | 1.22% | 1.86% |
| 6 | AR, IA, KS, MS, NV, UT | 5.60% | 6.69% |
| 7 | CT, OK | 2.29% | 2.60% |
| 8 | KY, LA, OR | 3.98% | 4.46% |
| 9 | AL, SC | 3.13% | 3.35% |
| 10 | CO, MD, MN, MO, WI | 8.93% | 9.29% |
| 11 | AZ, IN, MA, TN | 8.49% | 8.18% |
| 12 | WA | 2.33% | 2.23% |
| 13 | VA | 2.60% | 2.42% |
| 14 | NJ | 2.77% | 2.60% |
| 15 | MI | 3.00% | 2.79% |
| 16 | GA, NC | 6.53% | 5.95% |
| 17 | OH | 3.52% | 3.16% |
| 19 | IL, PA | 7.62% | 7.06% |
| 28 | NY | 5.84% | 5.20% |
| 30 | FL | 6.75% | 5.58% |
| 40 | TX | 9.11% | 7.43% |
| 54 | CA | 11.63% | 10.04% |
This is the Markdown I added to the post.
Update 1 Sep 2024 7:11 AM
There’s a better version of the script here.