24th October 2006

Posted in

Blog :: Code Smells II

I previously explained at length what I mean by code smells and gave an example or two of what how I'm thinking. This (and any subsequent posts in the series) will be more limited in scope - a single bad example and a corresponding solution.

I know I've sung the praises of parameterised queries - manually concatenating strings into SQL statements is not only a bad code smell from the aspect of maintainability, its a bad idea from a security perspective as well. So

$sql = "update foo set fname='$fname', lname='$lname', email='$email' where id=$id";
mysql_query($sql);

is bad because you need to call addslashes (or hopefully your db specific escaping function like mysql_real_escape_string()) on each of the variables you are interpolating into the string. A better style is to use a database abstraction library like the PEAR DB or MDB2 packages:

$db->query('update foo set fname=?, lname=?, email=? where id=?', array($fname, $lname, $email, $id));

This is better because the database driver takes care of calling the appropriate escaping function for you and handles quoting as well. Think about this sort of statement, however, from a maintainability viewpoint and you'll quickly see that there are some gotchas. It is readily apparent in this example, for instance, which item in the array goes with which placeholder question mark. But what if you have 20 fields to update?

Suddenly just making sure that you have the same number of question marks in your sql string as values in your array requires counting (not to mention making sure they're in the right order...) Using hand written sql queries of any complexity is a bad code smell.

Obviously the thing to do would be to write a function that generates your sql statements for you - and so I recently started to do so. That's when I noticed (for users of the PEAR DB class at least) the job is already done*. The autoExecute function takes the name of the table, the array of field/value pairs and optionally a constant and "where" clause to automatically build and execute insert and update statements for you. Now, the code from our previous example would be

  $db->autoExecute('foo', array('fname'=>$fname, 'lname'=>$lname, 'email'=>$email), DB_AUTOQUERY_UPDATE, 'id='. (int) $id);

All our variables are autoescaped and now we don't have to worry about what field in our sql corresponds to what positional value in our array - keys are simply matched to values. Removing the last two arguments (the constant and the where clause) would generate and execute an insert statement instead.

I've long had mixed feelings about ORM packages designed to insulate you from the relational databases. Fundamentally understanding relational ideas (and having a solid grasp on SQL) is really a necessary skill as far as I am concerned. Hand coding sql statements that are long but not complex, however, is a false rigor - updates and inserts tend to very simple logically but very repetitive in structure. This is exactly the sort of code that needs a higher level abstraction. While generated queries are not quite at the level of ORM, they do allow me to pursue the ever elusive goal of writing less code.

*It does occur to me that an API that makes you remember the completely unnecessary constant is a bad code smell in its own right. I'd rather have an autoInsert() and autoUpdate() function that take 2 and 3 arguments respectively. Or, why not dispense with the constant, but keep the same function name and decide based on whether the third argument (the where clause) is set whether you are doing an update or insert statement? Oh well, still better than manually creating them...

Posted on October 24th 2006, 10:21 PM