SQL placeholders and Python
March 24, 2025 at 11:06 AM by Dr. Drang
A few people wrote to me after yesterday’s post with gentle warnings about the potential danger of building an SQL query by concatenating strings, in particular when one or more of the strings comes from the user. Which is exactly how the shell scripts in that post worked. The “Bobby Tables” comic from XKCD was mentioned.
It’s not that the scripts in the post are dangerous—at least not now. They sit exclusively on my computer and are used only by me. But I mentioned in an update to the post that I intended to put the database on a server and access it from a web page. That’s what led to the warnings.
I didn’t really plan to use the same scripts in the online system, but I figured I might as well convert the scripts to a more secure form now while they’re fresh in my mind. So that’s what I did last night, using Python’s sqlite3
module—which I’ve used before—with placeholders to handle the user input. I don’t understand the behind-the-scenes workings of placeholders or what makes them secure. I assume they use the binding routines from the SQLite C interface, but that only reduces my ignorance one tiny step.
Anyway, here’s the new version of bytitle
:
1 #!/usr/bin/env python3
2
3 import sqlite3
4 import sys
5
6 # Search string from command line
7 params = [f'%{sys.argv[1]}%']
8
9 # Connect to database
10 con = sqlite3.connect('library.db')
11 cur = con.cursor()
12
13 # Query with placeholder
14 q = '''SELECT b.title, GROUP_CONCAT(a.name, '; '), b.loc
15 FROM book_author ba
16 JOIN author a ON a.id = ba.author_id
17 JOIN book b ON b.id = ba.book_id
18 WHERE b.title LIKE ? GROUP BY b.id ORDER BY b.loc;'''
19
20 # Execute the query
21 cur.execute(q, params)
22
23 # Output
24 found = [ '\n'.join(b) for b in cur.fetchall() ]
25 print('\n\n'.join(found))
26
27 # Disconnect
28 con.close()
Here’s the new version of byauthor
:
1 #!/usr/bin/env python3
2
3 import sqlite3
4 import sys
5
6 # Search string from command line
7 params = [f'%{sys.argv[1]}%']
8
9 # Connect to database
10 con = sqlite3.connect('library.db')
11 cur = con.cursor()
12
13 # Query with placeholder
14 q = '''SELECT b.title, GROUP_CONCAT(a.name, '; '), b.loc
15 FROM book_author ba
16 JOIN author a ON a.id = ba.author_id
17 JOIN book b ON b.id = ba.book_id
18 WHERE b.id IN (
19 SELECT ba.book_id
20 FROM book_author ba
21 JOIN author a ON a.id = ba.author_id
22 WHERE a.name LIKE ?
23 )
24 GROUP BY b.id ORDER BY b.loc;'''
25
26 # Execute the query with parameters
27 cur.execute(q, params)
28
29 # Output
30 found = [ '\n'.join(b) for b in cur.fetchall() ]
31 print('\n\n'.join(found))
32
33 # Disconnect
34 con.close()
And, finally, here’s the new version of bytitleauthor
:
1 #!/usr/bin/env python3
2
3 import sqlite3
4 import sys
5
6 # Search strings (title, author) from command line
7 params = [f'%{sys.argv[1]}%', f'%{sys.argv[2]}%']
8
9 # Connect to database
10 con = sqlite3.connect('library.db')
11 cur = con.cursor()
12
13 # Query with placeholder
14 q = '''SELECT b.title, GROUP_CONCAT(a.name, '; '), b.loc
15 FROM book_author ba
16 JOIN author a ON a.id = ba.author_id
17 JOIN book b ON b.id = ba.book_id
18 WHERE b.title LIKE ?
19 AND b.id IN (
20 SELECT ba.book_id
21 FROM book_author ba
22 JOIN author a ON a.id = ba.author_id
23 WHERE a.name LIKE ?
24 )
25 GROUP BY b.id ORDER BY b.loc;'''
26
27 # Execute the query
28 cur.execute(q, params)
29
30 # Output
31 found = [ '\n'.join(b) for b in cur.fetchall() ]
32 print('\n\n'.join(found))
33
34 # Disconnect
35 con.close()
As you can see, most of the code is a bunch of boilerplate that sets up the connection, runs the query, and prints out the results. The main difference between the scripts is the query itself. Most of what’s in the queries was covered yesterday, but these new queries have question marks where the title and name strings are supposed to go. Those are the placeholders that get filled with the search parameters when cur.execute
is run.
One thing that’s nice about doing this in Python is that cur.fetchall()
, which you see in the output section of each script, returns a straightforward Python data structure. It’s a list of lists, where the “outer” list is the books that meet the search criteria, and the “inner” list is the title, concatenated authors, and Library of Congress classification of each of those books. That makes the join
s easy to write to get the same output as yesterday’s shell scripts. So
byauthor gere
returns
Theory of elastic stability
Timoshenko, Stephen; Gere, James M.
QA931 .T54 1961
Mechanics of materials
Gere, James M.; Timoshenko, Stephen
TA405 .G44 1984
Matrix analysis of framed structures
Weaver, William; Gere, James M.
TA645 .W36 1980
just like yesterday’s script. The inner list items are separated by a single newline, and the outer list items are separated by two newlines.
I tested both the old shell scripts and the new Python scripts against some simple SQL injection input. The shell scripts happily ran the bad code, while the Python scripts did essentially nothing. For example, I renamed the old byauthor
shell script to byauthor-insecure
and ran
byauthor-insecure "' or 1=1 --"
The script spit out info on all of the books.
byauthor
Python script returned just a blank line. So that was encouraging.
To be clear, I think that whenever I move this system to a server, I’m going to put it behind some sort of authentication scheme. But there should be a second level of security.
-
No, I didn’t try any SQL injection with
DROP TABLES
. ↩