17th November 2006

Posted in

Blog :: Code Smells III

This is the third and final installment of the Code Smells series. Visit the first article in the series to see what I mean by "Code Smells" and check out my recomendations in part II as well. I've been giving primarily examples of how NOT to do things, for more examples of how to do things right check out Paul Bissex's recommendations for quick but not dirty PHP. In fact I would like to expand on one of his recommendations: Make interpolation clean.

Paul refers to the fact that strings in PHP may contain variable references that are evaluated when the string is parsed. This isn't as insane as it may sound coming from more static languages: in PHP all variables are prefixed with a dollar sign, so it's easy to spot where interpolation is occurring.

$foo = "metapundit";
print "$foo is my name";

will very naturally yield "metapundit is my name". There are, of course, some provisos. First, only doubly quoted strings are interpolated.

print '$foo is my

name'; will print exactly that. The backslash character maintains its unix sense of escaping special data so "\$foo" is not interpolated. Additionally it can be harder to spot variable references once arrays and objects are introduced, so braces are used to provide clarification.

$x = "{$obj->foo} and {$array[1]}";

is a string with two interpolations. See, simple!

One common thought in PHP circles is that single quoted strings are preferable to double quoted strings (if not doing interpolation) on performance grounds - after all it must be a lot faster to parse a string if you're not looking for all those brackets and dollar signs. This turns out to be sorta not really kinda true. The arguments about parsing speed, however, miss the point. If you are serious about performance you will use an opcode cache (in which case parse performance is not as important), otherwise you don't care about performance. With that in mind - it's time to identify some bad code smells.

Premature Optimization may not be the root of all evils, but it certainly is responsible for it's share of them: how often have I seen (and even written) code like:

 $row['fname_' . $id] = "Joe"; 
 $row['lname_' . $id] = "Blow"; 
 $row['email_'. $id] = "joe@hotmail.com";

Can we all agree that would look a lot better as

 $row["fname_$id"] = "Joe"; 
 $row["lname_$id"] = "Blow"; 
 $row["email_$id"] = "joe@hotmail.com";

String interpolation exists, so use it! More problematically I have also written the following code:

$table = "<table id=\"navigation\">
     <tr><td><a href=\"javascript: pop('$page1.html');\">Contents</td></tr>
 </table>";

Doesn't that look ugly? Escaped quotes inside quotes and the problem of elements (say javascript) that must contain quoted strings... Fortunately PHP also supports the HEREDOC syntax (stolen from Perl, I believe) that lets you set an arbitrary "quote" delimiter like so:

 $table =<<<HTML <table id="navigation">
   <tr><td><a href="javascript: pop('$page1.html');">Contents</td></tr>
   </table>
 HTML;

this does look a little cleaner: enough so that one of my rules of thumb is that multi-line text fragments should be quoted HEREDOC style. Seeing more than one or two such snippets, however, is a bad code smell!

There are, as far as I'm concerned, only two different ways to handle the mixing of code and HTML. There is embedding code in HTML, and there is embedding HTML in code. Despite what some people say about separating code and html, there are in fact no other alternatives (I'll come back to this point in a moment.) Embedding code in HTML is fine: everybody can quickly see whats going on here

 <html>
 <head>
 <body>
 <title>My PHP Page</title>
 <div id="header"><?php print($header); ?></div>
 <div id="body"><?php print($body); ?></div>
 </body>
 </html>

and this is the style most PHP newbies start out with. Now it's valuable to get out of this one-php-file for every page style of development and all the many reasons for doing so are well known and need not be repeated by me. Unfortunately many PHP developers graduate to embedding HTML in their code: functions are hundreds of lines long with table definitions chunked here or there without ever leaving PHP mode, loops construct tds with colspans set willy nilly, variables are interpolated, strings are concatenated and no where is it actually possible to get a single clean separation of the view (what does the table look like, how many columns does it have) and the code that accumulates the data.

This is a bad code smell and I've got piles of code that stinks with it. Now it is possible to write useful software with this approach (look at this; well formatted, commented, and organized, yet still impossible to make much immediate sense of ) but it always diminishes the elegance and maintainability of your code.

What is the solution? Use a template! This is not a contradiction of my earlier position that there are only two styles possible: the simplest possible template is an include of a PHP file that simply prints out variables it expects to be defined. Even more advanced templates, however, (my favorite is Flexy right now) are simply embedding code of some kind (template elements like being nothing more than an easier to read version of ) in the HTML. The advantage of using templates is that it forces you to separate your view (HTML and the code that constructs it) from the code that processes input and fetches your data. This nearly always makes your view easier to understand and fix - and it always makes your model code more succint and understandable.

We've expanded Paul's point quite a bit. What does it mean to make interpolation clean? For me, at least, it means short textual fragments should use double quotes and variable interpolation with single quoted contained strings where necessary. Multi-line textual fragments should use HEREDOC syntax, but more than one or two substantial fragments of HTML embedded in your code is a bad code smell. A template should be used in this case, even if it is nothing more than a separate PHP-embedded-in-HTML style php file. Refactor some existing code and see: while using a template library introduces a few additional complications, using this style for anything more than trivial problems always turns out to be gain in productivity for me...

Posted on November 17th 2006, 11:01 AM