New Python analysis book

Through some screwup at O’Reilly, I’m on a list of people who get offered free review copies of their new data analysis books. I can’t remember the last time I took them up on an offer—I’m not a “big data” kind of guy, and few of their books in that area are in my wheelhouse—but last week I jumped at the chance to get Python for Data Analysis by Wes McKinney.

Python for Data Analysis

McKinney is the developer of pandas, a Python library that’s intended to work efficiently with data sets, structures that are more heterogenous than NumPy’s arrays but faster than Python’s standard lists and dictionaries. Its fundamental object is the DataFrame, a concept (and name) it stole from R.

When I first heard of pandas, I thought “thanks, but no thanks.” I’m already trying to juggle NumPy, SciPy, and matplotlib, I don’t need another library to learn. But as I read more about it, I began to warm to the idea. I used R a little several years ago, and while I hated its syntax, I loved the flexibility of its data frames. If pandas can bring that flexibility to a language whose syntax I like, it’s worth the investment.

My copy of Python for Data Analysis arrived today, and I hope to have a decent review posted here in the next couple of weeks. Although pandas figures prominently in the book, as you would expect, there’s a lot of coverage of NumPy, SciPy, and matplotlib. A quick skim gave me hope that it may provide the decent matplotlib introduction that matplotlib’s own website lacks. There’s even the possibility that it’ll persuade me to use IPython.


Let me finish this post with a slight detour. I’m having a hard time deciding how to present libraries, commands, functions, directory paths, menus, etc. typographically. Shell commands and programming functions seem like a natural for a monospaced font, but I’m not so sure about the others. In this post, I used italics for pandas because that’s how it’s presented on its own website. But for other libraries, I’ve used a mixture of monospaced and regular fonts, depending on the name of the library and whether I thought the name was obviously being used as a proper noun. Capitalization helps, so I’ve always written Beautiful Soup rather than Beautiful Soup, but math rather than math. And frankly, I’m sick of matplotlib’s twee refusal to capitalize itself, even when it’s at the beginning of a sentence.

Is there a good style guide for these sorts of things? I once had that Yahoo! style guide checked out from the library, but I don’t remember if it covered this.