13th February 2007

Posted in

Blog :: Partial function application in PHP

I should just get this straight right off the bat: you can't really do much functional programing in PHP. Functions are not first class citizens and the equivalent of passing functions around is passing around strings or arrays and relying on convention. No really.

For those of you who don't know what I'm talking about, go read the wikipedia link - briefly, however, languages that treat functions as first class citizens allow a less imperative style of code. Describing all the benefits and drawbacks of this style is quite beyond my capabilities, but even absorbing a snippet or two of functional style yields some nice benefits in terms of reducing boilerplate code.

Loops, for example. Loops are good (as I've written before) except, of course, for when they're not. Consider the common case: I have an array of strings which I'd like to all be htmlentitie'd. (That sentence is deliberately awkward so that I don't give the game away to soon.) How would you propose doing so? The new PHP developer would of course use a foreach loop like so:

foreach($data as $k=>$v)
    $data[$k] = htmlentities($v);

Of course PHP5 allows the slightly cleaner reference syntax

foreach($data as &$v)
    $v = htmlentities($v);

but in any case the proper answer to the question is to loop over the data applying a function to each element. Right? Actually I gave away the answer there - I said earlier that my sentence describing the problem was awkward. How I started to write it was "What if I have an array of values to which I need to apply a function?"

This more general formulation reveals what I really want to be able to do: specify a function and have it applied to each of the elements of a list. This is a task that is common enough that any reasonable language should make it as simple as possible to accomplish.

In PHP the array_* functions is a confusing mess of buried gems. One underused gem is array_map(). This does exactly what we want - it takes a callback as its first argument and then accepts arrays of data that will be passed as arguments to the callback function. Our example problem then is reduced to the single line

$data = array_map("htmlentities", $data);

That's it. $data now contains the new array generated by applying the function htmlentities to each value in the original array. This, by the way, displays a couple of hallmarks of functional programming style: functions are used as data (in the sense that I am passing a variable "pointing" to a function) and functions do not have side effects. $data would be unchanged if I hadn't set it to the result (a new array) returned from htmlentities. This lack of side effects is a key benefit of functional programming: programs are easier to debug (the theory goes) if changes of state are not hidden and the units of the program (the functions) are truly independent of each other.

This allows for easier changes to implementation, easier debugging, and for developing more sophisticated abstractions by composition: combining the pieces you already have. Here, however, is where we hit the wall of PHP's broken design. Functions are not (callback psuedo-types or not) first class citizens. PHP does not support closures and there is no such thing as "anonymous" functions (obviously, since functions are only passed as strings referring to their names.) There is a built in "create_function" that given an argument list and function body returns a function name (hilariously starting with "sigma_") but I am unable to see how it is superior to

eval("function sigma_8987($a,$b){/*some function body here */}");

All this is to say I won't be discussing function composition in PHP any time soon. Sorry, no curry for you! Even the very basic array_map() construct we've already discussed breaks down quickly: what if you want to use htmlentities with the optional "ENT_QUOTES" option that converts both double quotes and single quotes to html entities. You can pass array_map additional parameters that will be used as input to the function, but the additional parameters must be arrays of the same length as the first array or they will be filled with nulls. Alternatively we could define a new function that will hold the constant for us and pass it to array_map. Lets look at what we're trying to do and compare the solutions:

//we want this
foreach($data as &$v)
    $v = htmlentities($v, ENT_QUOTES);

//we could try generating an array of ENT_QUOTES arguments
$arg = array_fill(0, count($data), ENT_QUOTES);
array_map('htmlentities', $data, $arg);

//Or we could just create a new function that holds the ENT_QUOTES constant
$func = create_function('$v','return htmlenties($v,ENT_QUOTES);');
array_map($func, $data);

Frankly, of the 3 solutions, the loop ends up looking the best. Generating the array of ENT_QUOTES constants increases the memory requirements of the script - if you have 10,000 pieces of data you now also have an array with 10,000 constants you're only going to use once. Blech. Using create_function is at least semantically closer to what we're trying to do, but as I noted before, this is just eval in sheep's clothing. Functions created by create_function are just dynamically named functions inserted into the global namespace - a create_function call in a loop or in a function that is called repeatedly is a huge memory leak - loop 100 times and you have 100 functions taking up memory with no way to reclaim it. There are other difficulties: since the body of the function is just a string (eval, remember) compile errors aren't caught till runtime in the generated functions...

Is all lost for anything more than a trivial application of array_map? The answer came to me as part of my Python exploration. Python 2.5 has a new module called functools that defines a concept called partial function application. The basic concept is that you pass a function and a list of parameters to partial and it returns a new function that represents the original function with the passed parameters already filled in on the right side of the parameter list. This sounds confusing I know, but it's simplicity in action:

//assuming some magic call "partial" exists in PHP
array_map(partial("htmlentities", ENC_QUOTES), $data);

That looks good, assuming that partial somehow surmounts the global namespace clobbering and memory leakage that create_function causes. Taking to heart Raganwald's advice to sharpen my saw (but stuck in Blub and not yet being smart enough to understand the Y! anyways) I decided to come up with a way to make my magic code work without create_function() or eval(). The answer is fairly obvious - you can't do partial function application as defined in PHP without serious drawbacks. Functions just aren't first class citizens.

Objects are, however, even in PHP4. And the much derided psuedo-type "callback" in PHP is defined as either a string containing a function name OR an array of the form array($obj, "methodname") (or you can use array("class_name","static_method")). And while using a class would add a single class to the global namespace, the objects it creates can be dynamically created and destroyed. Memory leak solved. The implementation turned out to be fairly simple:

function partial()
{
    if(!class_exists('partial'))
    {
        class partial{
            var $values = array();
            var $func;

            function partial($func, $args)
            {
                $this->values = $args;
                $this->func = $func;
            }

            function method()
            {
                $args = func_get_args();
                return call_user_func_array($this->func, array_merge($args, $this->values));
            }
        }
    }
    //assume $0 is funcname, $1-$x is partial values
    $args = func_get_args();   
    $func = $args[0];
    $p = new partial($func, array_slice($args,1));
    return array($p, 'method');
}

I can't decide if moving the class definition out of the function (and making it non-conditional) would be cleaner - at least this way it isn't created if the function is never called. Note that you do have to have both a function and a class, but the function is there just to wrap the generated object in a "callback" array for you.

So how does this puppy perform. Well, not too well. To test this out I created a benchmark page that applies htmlentities 10000 times to a list of data in 5 different ways. A few runs have no ENT_QUOTES option (merely to compare the performance of a simple array_map call to the traditional foreach method. There's no significant difference in speed.) The numbers of interest, however, indicate that using foreach with the ENT_QUOTES option takes about .17 seconds while using partial() to construct a function and array_map to apply it takes about .7 seconds. Interestingly, using create_function is a little faster at about .55 seconds. Looking at memory usage, however, tells the tale: partial and foreach both have negligible memory consumption at approximately 1/4 Mb, but calling create_function repeatedly makes memory shoot up to about 9Mb. You can see the results and the code that generated them for yourself.

So what's the end result? Is it worth the extra CPU to use partial in your code? For me it is - there are enough places where I've compromised by defining extra functions (or class methods) merely to wrap some built in functionality for array_map application that using partial will be a significant clean up of the existing code.

Techniques like this would be easier (and faster) if PHP could clean up a few of the edge cases around function handling. Some things (like "print") are technically language constructs rather than functions even though they look like functions. Additionally, none of the operators are valid functions for the purpose of callbacks. Now I'm not asking for operator overloading or anything crazy, but it would be nice if PHP6 would do some checking for these "non-function" functions and allow them as special cases for call_user_func (and by extension array_map, array_walk, etc). This would allow for interesting code like

$a = array(5, 45, 23, 17);
$b = array(3, 67, 83, 21);
//$c has array composed of addition of elements from $a and $b
$c = array_map("+", $a, $b);
//$d has the multiple of every element in $a by 2
$d = array_map(partial("*",2), $a);

Once you start thinking in these terms it becomes easy to move towards a more declarative (and less imperative) style of programming. My experience so far is that every little step in this direction helps my code to be both clearer and more succinct.

Posted on February 13th 2007, 05:38 PM