Pandas and the Electoral College

August 29, 2024 at 12:15 PM by Dr. Drang

A couple of weeks ago, I used the Pandas groupby function in some data analysis for work, so when I started writing my previous post on the Electoral College, groupby came immediately to mind when I realized I wanted to add this table to the post:

Electors	States	Pop Pct	EC Pct
3	AK, DE, DC, ND, SD, VT, WY	1.61%	3.90%
4	HI, ID, ME, MT, NH, RI, WV	3.04%	5.20%
5	NE, NM	1.22%	1.86%
6	AR, IA, KS, MS, NV, UT	5.60%	6.69%
7	CT, OK	2.29%	2.60%
8	KY, LA, OR	3.98%	4.46%
9	AL, SC	3.13%	3.35%
10	CO, MD, MN, MO, WI	8.93%	9.29%
11	AZ, IN, MA, TN	8.49%	8.18%
12	WA	2.33%	2.23%
13	VA	2.60%	2.42%
14	NJ	2.77%	2.60%
15	MI	3.00%	2.79%
16	GA, NC	6.53%	5.95%
17	OH	3.52%	3.16%
19	IL, PA	7.62%	7.06%
28	NY	5.84%	5.20%
30	FL	6.75%	5.58%
40	TX	9.11%	7.43%
54	CA	11.63%	10.04%

I got the population and elector information from the Census Bureau and the National Archives, respectively, and put them in a CSV file named states.csv (which you can download). The header and first ten rows are

State	Abbrev	Population	Electors
Alabama	AL	5108468	9
Alaska	AK	733406	3
Arizona	AZ	7431344	11
Arkansas	AR	3067732	6
California	CA	38965193	54
Colorado	CO	5877610	10
Connecticut	CT	3617176	7
Delaware	DE	1031890	3
District of Columbia	DC	678972	3
Florida	FL	22610726	30

(Because the District of Columbia has a spot in the Electoral College, I’m lumping it in with the 50 states and referring to all of them as “states” in the remainder of this post. That makes it easier for both you and me.)

Here’s the initial code I used to summarize the data:

python:
 1:  #!/usr/bin/env python
 2:  
 3:  import pandas as pd
 4:  
 5:  # Import the raw data
 6:  df = pd.read_csv('states.csv')
 7:  
 8:  # Generate new fields for percentages
 9:  popSum = df.Population.sum()
10:  ecSum = df.Electors.sum()
11:  df['PopPct'] = df.Population/popSum
12:  df['ECPct'] = df.Electors/ecSum
13:  
14:  # Collect states with same numbers of electors
15:  dfg = df.groupby(by='Electors')
16:  
17:  # Print out the summary table
18:  print('State EC   States   Population   Pop Pct   Electors   EC Pct')
19:  for k, v in dfg:
20:    print(f'    {k:2d}       {v.State.count():2d}    {v.Population.sum():11,d}\
21:      {v.PopPct.sum():6.2%}     {v.Electors.sum():3d}     {v.ECPct.sum():6.2%}')

The variable df contains the dataframe of all the states. It’s read in from the CSV in Line 6, and then extended in Lines 9–12. After Line 12, the first five rows of df are

State	Abbrev	Population	Electors	PopPct	ECPct
Alabama	AL	5108468	9	0.015253	0.016729
Alaska	AK	733406	3	0.002190	0.005576
Arizona	AZ	7431344	11	0.022189	0.020446
Arkansas	AR	3067732	6	0.009160	0.011152
California	CA	38965193	54	0.116344	0.100372

The summarizing is done first by using the groupby function in Line 15, which groups the states according to the number of electors they have. The resulting variable, dfg, is of type

pandas.core.groupby.generic.DataFrameGroupBy

which works like a dictionary, where the keys are the numbers of electors per state and the values are dataframes of the subset of states with the number of electors given by that key. Lines 19–21 loop through the dictionary and print out summary information for each subset of states. The output is this:

State EC   States   Population   Pop Pct   Electors   EC Pct
     3        7      5,379,033     1.61%      21      3.90%
     4        7     10,196,485     3.04%      28      5.20%
     5        2      4,092,750     1.22%      10      1.86%
     6        6     18,766,882     5.60%      36      6.69%
     7        2      7,671,000     2.29%      14      2.60%
     8        3     13,333,261     3.98%      24      4.46%
     9        2     10,482,023     3.13%      18      3.35%
    10        5     29,902,889     8.93%      50      9.29%
    11        4     28,421,431     8.49%      44      8.18%
    12        1      7,812,880     2.33%      12      2.23%
    13        1      8,715,698     2.60%      13      2.42%
    14        1      9,290,841     2.77%      14      2.60%
    15        1     10,037,261     3.00%      15      2.79%
    16        2     21,864,718     6.53%      32      5.95%
    17        1     11,785,935     3.52%      17      3.16%
    19        2     25,511,372     7.62%      38      7.06%
    28        1     19,571,216     5.84%      28      5.20%
    30        1     22,610,726     6.75%      30      5.58%
    40        1     30,503,301     9.11%      40      7.43%
    54        1     38,965,193    11.63%      54     10.04%

This is what I was looking at when I wrote the last few paragraphs of the post, but I didn’t want to present the data to you in this way—I wanted a nice table with only the necessary columns. I added a couple of print() statements to the code above to add a little whitespace and then this code to create a MultiMarkdown table.

python:
25:  # Print out the Markdown summary table
26:  print('| Electors | States | Pop Pct | EC Pct |')
27:  print('|:--:|:--:|--:|--:|')
28:  for k, v in dfg:
29:     print(f'| {k:2d} | {", ".join(v.Abbrev)} | {v.PopPct.sum():6.2%} \
30:  | {v.ECPct.sum():6.2%} |')

This is very much like the previous output code. Apart from adding the pipe characters and the formatting line, there’s the

python:
{", ".join(v.Abbrev)}

piece in the f-string of Line 29. The join part concatenates all the state abbreviations, putting a comma-space between them. I decided a list of states with the given number of electors would be better than just a count of how many such states there are.

The output from this additional code was

| Electors | States | Pop Pct | EC Pct |
|:--:|:--:|--:|--:|
|  3 | AK, DE, DC, ND, SD, VT, WY |  1.61% |  3.90% |
|  4 | HI, ID, ME, MT, NH, RI, WV |  3.04% |  5.20% |
|  5 | NE, NM |  1.22% |  1.86% |
|  6 | AR, IA, KS, MS, NV, UT |  5.60% |  6.69% |
|  7 | CT, OK |  2.29% |  2.60% |
|  8 | KY, LA, OR |  3.98% |  4.46% |
|  9 | AL, SC |  3.13% |  3.35% |
| 10 | CO, MD, MN, MO, WI |  8.93% |  9.29% |
| 11 | AZ, IN, MA, TN |  8.49% |  8.18% |
| 12 | WA |  2.33% |  2.23% |
| 13 | VA |  2.60% |  2.42% |
| 14 | NJ |  2.77% |  2.60% |
| 15 | MI |  3.00% |  2.79% |
| 16 | GA, NC |  6.53% |  5.95% |
| 17 | OH |  3.52% |  3.16% |
| 19 | IL, PA |  7.62% |  7.06% |
| 28 | NY |  5.84% |  5.20% |
| 30 | FL |  6.75% |  5.58% |
| 40 | TX |  9.11% |  7.43% |
| 54 | CA | 11.63% | 10.04% |

which I then ran through my Normalize Table filter in BBEdit to produce the nicer looking

| Electors |           States           | Pop Pct | EC Pct |
|:--------:|:--------------------------:|--------:|-------:|
|    3     | AK, DE, DC, ND, SD, VT, WY |   1.61% |  3.90% |
|    4     | HI, ID, ME, MT, NH, RI, WV |   3.04% |  5.20% |
|    5     |           NE, NM           |   1.22% |  1.86% |
|    6     |   AR, IA, KS, MS, NV, UT   |   5.60% |  6.69% |
|    7     |           CT, OK           |   2.29% |  2.60% |
|    8     |         KY, LA, OR         |   3.98% |  4.46% |
|    9     |           AL, SC           |   3.13% |  3.35% |
|    10    |     CO, MD, MN, MO, WI     |   8.93% |  9.29% |
|    11    |       AZ, IN, MA, TN       |   8.49% |  8.18% |
|    12    |             WA             |   2.33% |  2.23% |
|    13    |             VA             |   2.60% |  2.42% |
|    14    |             NJ             |   2.77% |  2.60% |
|    15    |             MI             |   3.00% |  2.79% |
|    16    |           GA, NC           |   6.53% |  5.95% |
|    17    |             OH             |   3.52% |  3.16% |
|    19    |           IL, PA           |   7.62% |  7.06% |
|    28    |             NY             |   5.84% |  5.20% |
|    30    |             FL             |   6.75% |  5.58% |
|    40    |             TX             |   9.11% |  7.43% |
|    54    |             CA             |  11.63% | 10.04% |

This is the Markdown I added to the post.

Update 1 Sep 2024 7:11 AM
There’s a better version of the script here.

And now it’s all this

I just said what I said and it was wrong
Or was taken wrong

Pandas and the Electoral College

Site search

Meta

Recent posts

Credits

And now it’s all this

I just said what I said and it was wrong Or was taken wrong

Pandas and the Electoral College

Site search

Meta

Recent posts

Credits

I just said what I said and it was wrong
Or was taken wrong