New PHP4 and PHP5 Releases

In the last two days there has been a new release of PHP4 (4.3.9) and PHP5 (5.0.2). When the first non-beta release of PHP5 came out I started trying it out. There are definitely some cool new features that were sorely needed in PHP, especially in the area of OO code. I’m still a bit bitter that they didn’t do multiple inheritance, I would have been happy with something similar to the way Perl does it (left to right depth first).

After trying out a couple of different applications that we run at work I decided to stick with PHP4 for production work for the time being. As much as I’d like to make use of PHP5 features, I just don’t see a major move to PHP4 for at least another 12 to 18 months. I must admit one surprise here, seeing apps that are already being written exclusively for PHP5. That must make for a relatively small target audience compared to the entire PHP community.

Another use for a blog, making predictions about the future so that the whole world can look back and see how wrong you were.

Programming In PHP

One of the great things about PHP is its flexibility and options. One of the bad things about PHP is its flexibility and options. The result is that programmers can’t be 100% sure what the conditions are (think php.ini options) that their software will be running under. So here are some tips that every PHP programmer should keep in mind, specifically while in development mode:

  1. Magic Quotes: Ideally set this via .htaccess with php_value, either on or off, which ever way you want to work with your data. My personal preference of late is to turn this off and make sure that my code escapes everything. This avoids trying to guess when you need to escape something and when you don’t. This can be tricky when you have systems (databases usually) that have different formats for escaping quotes. Look at magic_quotes_gpc and magic_quotes_runtime settings.
  2. Error Level: There are multiple error levels in PHP. When you are in development mode please, please, please set this to E_ALL (or maybe even to E_STRICT if you are using PHP 5). This can be done from within your code using the error_reporting() function. Once you are ready to make a release of your software you can turn the error level down if you want.
  3. Display Errors: Along with cranking up the errors, displaying them makes them pretty obvious during the development process. This is also easily turned on using the ini_set() function. This is all it takes: ini_set('display_errors', '1');. This one isn’t quite as big a deal as the others because there is nothing wrong with watching errors in a log file, but I’ve found that if the errors aren’t being displayed in the page they are often forgotten about. And again, this is something that can be turned off when you make a release of your software. But if you’ve fixed all of the bugs that display errors or warnings then you shouldn’t need to :-)

So when an error pops up in your page, don’t just turn off display errors or lower the error level, fix it. These settings I recommended above are the default for PHP on my systems, so if you have warnings and errors in your code they end up all of the page when I run your software. This makes your software look pretty low in the quality department. Keep in mind that as the author of a software package, you don’t know what people will have these options set to on their server, so your software should force them to a known value that you can rely upon.

I’ve been putting off this little rant for quite sometime, but lately the number of software packages that I’ve tried out that ignore one or all of the above suggestions was really high. In some cases large security holes were opened because of these assumptions (think magic quotes and sql injection attacks). Following these suggestions is much easier if you implement them at the beginning of a project instead of after version 1.0 has already shipped.

Exporting A CVS Module

Here is another one of those things that I do often enough to know that there is way to do it, but infrequently enough to forget exactly how to do it. This time it was remembering how to get a cvs module without all of those CVS directories that are used in working copies of the module. The command I was looking for is ‘cvs export’, instead of ‘cvs checkout’. The export command also requires a date or tag to be specified so that the export can be reproduced. If you aren’t worried about that and just want an export as of right now you can use NOW as the date. This would look something like this:

cvs -d /home/cvs export -DNOW mymodule

Answer To Peter Bowyer's Question

Peter Bowyer asked a question on his blog about optional case statements in PHP. After a couple of minutes I came up with two possible solutions. I tried to post the solutions as a comment to the blog entry, but it kept asking for a login in order to post a comment. I don’t have an account at his blog and I couldn’t find any info on how to create one. So I’m using this post to show a solution to his question. Hopefully the TrackBack to his entry works so he’ll be able to find this. Follow the link above to read the original question, below are the two possible solutions I came up with:

Solution 1)

$foo = 'baz';
switch($foo){
        case 'bar':
            print 'BAA';
            break;
        case 'moo' && !empty($varisset):
            print 'ANIMAL!';
            break;
        case 'brrrrr' && !empty($varisset):
            print 'COLD?';
            break;
} // switch

Solution 2)

$foo = 'moo';
switch($foo){
        case 'bar':
            print 'BAA';
            break;
        case (!empty($varisset)):
            switch($foo) {
                case 'moo':
                    print 'ANIMAL!';
                    break;
                case 'brrrrr':
                    print 'COLD?';
                    break;
            } //switch
            break;
} // switch

I’ve only done some basic testing to see if these work. I haven’t done any sort of benchmarking to see if one is faster than the other. I’d guess that solution 2 would be faster, but I don’t have any data to back that up.

Research Systems Unix Group

I stumbled upon the Research Systems Unix Group for the University of Michigan today. I came across it because I was looking at beepagePHP, which a modified version RSUG’s beepage. In the past I’ve used QuickPage (QPage) for paging services, but if beepage turns out to be better than I’ll gladly change. Then I looked at nefu, a network service monitoring package. Nefu looking interesting enough that I may try it out.

Zend's PHP Certification

It was likely to happen eventually, Zend has announced a PHP certification. Despite having just announced the release of PHP 5, the certification will only focus on PHP 4. The thinking behind that seems to be that it will still be awhile before PHP 5 becomes wide spread. I’m not sure yet how I feel about this idea. In general I have really mixed feelings about certification because some people are able to get certified that don’t really “know” anything about what they got certified in. On the other hand not everyone who has a certification falls into that category.

For some additional info on Zend’s PHP certification you can read comments by Chris Shiflett, John Coggeshall and Marco Tabini

On a related note, Zend has secured an additional $8 million investment. They now have a new U.S. headquarters in Cupertino, California. I wonder how far from Apple HQ they are?

PHP 5.0 Released

The day has finally come, PHP 5.0 has been released. If for some reason you’ve been off the planet lately, PHP 5 is huge update to the guts of the PHP language. There is a long list of features and changes in this release. The MySQL client libraries are no longer included in releases (license issues aside this just makes sense, especially how old the client libraries they were including were) and SQLite is now included. The OO support has been heavily changed, most of which I’m happy with. I’m a bit disappointed that multiple inheritance is supported, in favor of the Java model of interfaces. I would have been happy with the Perl model of multiple inheritance.

There are has been lots of discussion about the new features of PHP 5, a good place to start is at the archive of PHP talks, many of which talk about PHP 5. Undoubtedly there will be some new books coming out about coding with PHP 5 in mind.

On another PHP front, Roadsend has release a PHP compiler for PHP 4. The benchmarks they list do show some impressive improvements in some areas. I think in the long run this sort of idea will either get swallowed up by the PHP project or only be useful to a smaller group of people.

So far I’ve resisted pretty heavily getting to involved with PHP 5. Now that there is non-beta release I’m going to swallow my pride and try it out. Initially my biggest concern is that all of my existing code based on PHP 4 works under PHP 5.

Update (6:47 pm 13 Jul 2004): Turns out there is PHP 4 to PHP 5 migration manual.

Update (3:09 pm 15 Jul 2004): Because OO support in PHP 5 has so many changes it may be helpful to read through the Classes and Object documentation specific to PHP 5. There are some sections that haven’t been filled in yet, but is still a good resource.

What Is A Database Abstraction Layer?

From time to time the concept of ‘Database Abstraction’ comes up, lately a lot in PHP circles because it often runs along side the ‘Using PHP Templates’ (like Smarty) argument. One of the unfortunate side affects of these discussions are arguments over what is a ‘Database Abstraction Layer’ and what isn’t it and what is a ‘Template System’ and what isn’t. I think most people get these terms confused with other concepts. So let’s tackle just one of these for now, the question of ‘What Is A Database Abstraction Layer?’.

Definition 1a for abstract from Merriam Webster applies most to what we want out of database abstraction: ‘disassociated from any specific instance’. In the context of our discussion we might word this as ‘disassociated from any specific database server’. Meaning that no matter what database back end is being used (presuming that it is supported by your abstraction software), everything runs fine, we are completely ‘disassociated’ from the database server abilities. The result of such a thing would be the ability to swap MySQL for Sybase, then move to Oracle, back to MySQL and then to PostgreSQL, all with out changing a single line of code (again presuming that MySQL, Sybase, Oracle and PostgreSQL are all supported by you abstraction layer). Certainly sounds impressive. One might wonder if such a thing even exists. There are certainly projects that are trying to do just that.

Getting all of this portability isn’t for free though. For starters it is never going to be as fast as the ‘native’ database functions. That isn’t to say that it won’t scale though. But I’m not going to go down that road right now, I simply want to state out front that adding extra code will add some execution time. One of the biggest jobs of the abstraction layer is dealing with the same feature that is implemented in many different ways across database servers. As examples of this take a look at ADOdb‘s page on writing portable SQL in PHP. Even something that seems simple enough, like limiting the result set to the first ten rows, is done five different ways. As a result instead of writing plain SQL you end up learning ADOdb functions + SQL. To truly abstract away the database differences your queries must be done this way, otherwise you might accidentally write SQL that only works with your current database. Even that limitation isn’t the biggest hold up with abstraction layers though, it is the idea that your feature set must be constrained to the lowest common feature set among all of the database systems your software will support. One example is a feature that I enjoy in both PostgreSQL and MySQL, regular expression support. If those were the only two databases that I wanted to support, that would be fine, but as far as I know MS SQL and DB2 don’t support it and Oracle only started supporting it in recent versions. Another example is support for Sub-Selects. Most of the big databases support it, but MySQL doesn’t support in their current production version (although their beta version 4.1 does), so making use of that feature is out if you want to be able to support MySQL as a current database back end.

The summary version: a true database abstraction layer will take a speed hit (ok, livable), require you to not write SQL but a mix of functions plus partial SQL (I don’t really like that, but still livable in some cases) and restrict you to the lowest common feature set among all of the database systems you want to support (this can be the real killer). My feeling is that for many projects (not all) using a true database abstraction layer simply isn’t worth it. You might ask then, under what situations is it worth it. An example of when using an abstraction could be the right decision is something like PHPLens, a tool that derives a lot of it’s usefulness by being able to be used against several different database servers. On the flip side of that I think that most ‘in house’ projects probably don’t need to use an abstraction layer, mostly because of the limited feature set requirement.

One of the things that many people confuse with an abstraction layer is something that we’ll call a ‘Database Access Layer’. A database access layer is a consistent way to access your database resources, but makes no attempt to abstract away the differences between database systems. This is the approach that I personally have taken on most of my projects (no sense in trying to hide my preferences). This technique usually provides a simple class (or set of functions) that are always the same no matter which database you are connecting to, but leaves it up to you to know how to correctly deal with the features of your particular database. This means that if you switch from Oracle to MySQL you’ll have to change your SQL queries to deal with that, but the functions (or class) that you use will remain the same. The benefit there is only having to learn one set of functions to access your database, even though actually moving from one database to another will require knowledge of each database’s feature set and how to properly use it.

Virtually everyone seems to agree that an access layer is a good idea, but they tend to call it an abstraction layer, which then confuses the whole discussion because now the conversation turns to the differences between an access layer and an abstraction layer. Hopefully this post has cleared up some of that.

This whole thing started with a post from Jeremy complaining about abstraction layers which in turn started with a two year old SitePoint post where the author ranted about: how smart he was, how stupid template systems are, how MySQL is a joke (why do people insist on getting on that wagon again and again?) and how database abstraction layers are the only way to go. Jeremy points out what is generally obvious to most people, moving from one database system to another is not easy. I would add to that that it is not easy unless you are willing to strictly abide by the lowest common feature set rule and even then it isn’t a cake walk.

In the end, do what works for you and your project. At a minimum using an access layer just makes plain sense. Deciding to use an abstraction layer will take a lot more research and evaluation to determine if it will actually meet your goals without out restricting your feature set beyond what you are comfortable with.

Update (9:06 am 9 Jul 2004): Post over at PHP Everywhere in response to Jeremy’s post. This post is a perfect example of confusing an abstraction layer with an access layer. They are not the same thing. Trying to use the term abstraction to mean all things to all people just causes more confusion.

Update (10:46 am 9 Jul 2004): Lambda’s post about Jeremy’s post. Basically falls into the same trap that PHP Everywhere did, labeling everything that provides any sort of layer to database functions an abstraction. One more time; abstraction layer: hides all of the features and uniqueness of your database, access layer: provides a consistent set of functions to access your database without hiding any features.

Update (12:10 pm 9 Jul 2004): RevJim’s response to Jeremy and PHP Everywhere, looks like he gets the idea of what an abstraction layer is and how it is different from the other approaches.

PHP over XML-RPC

If you’ve followed Tim O’Reilly‘s theme for the last few years, the Internet Operating System, then maybe you are also wondering what sort of steps will be needed to really make such things possible. Back in October 2003 he listed a few “basic” features revolving around the idea that all applications should be network aware. It was thinking with these things in mind that I came up with what seemed to be a simple idea, what if you could expose an entire programming language (more or less) as a web service? Let me scale that back a bit, what if you could expose all of the functions for a programming language as a web service? Would doing so even be possible without a lot of hand holding in the code? So just to see where that might go I wanted to try it out.

I’m doing most of my work lately in PHP so I picked that as my target. Right off the back PHP saved me a lot headache by making use of get_defined_functions() to get a list of PHP’s functions (along with any user defined functions). There are plenty of XML-RPC PHP libraries out there, for this project I went with Simon Willison‘s nicely done XML-RPC library for PHP. I did have to make one small change, I removed the method signature enforcement from the IXR_IntrospectionServer class because I had no way of determining what the correct signature should be for each function without writing customer code for each function, which I didn’t want to do. It’s a fairly small change, you can grab my modified version if you want to try this at home. The PHP over XML-RPC server simply goes through all of the PHP functions and adds callbacks for each one. All of the internal PHP functions are under php.* and user defined functions are under php_user.*. Take a quick look at it, it gets the job done in about 30 lines of code.

So what can you do with such a beast? An over simplification is that you can now access all of the internal PHP functions. A more correct analysis would be you can access all of the PHP functions, but some of them aren’t very useful over XML-RPC, like database functions. Other functions work but have reduced functionality, like preg_match which can optionally populate an array with matches from the given regular expression. That doesn’t work in our case because we can only return one value. Getting the return value works fine, but to make the match array work would require code to deal with exactly that function, which something I was trying to avoid. Here is a client example that does work as expected, calling PHP’s strlen:

##
## PHP over XML-RPC client
##
require('./IXR_Library.inc.php');
$client = new IXR_Client('http://localhost/php-rpc/server.php');
if(!$client->query('php.strlen', 'teststring')) {
    print("Error: (" . $client->getErrorCode() . ") " .
        $client->getErrorMessage());
    exit();
}
print("PHP over XML-RPC: php.strlen('teststring'): ");
print($client->getResponse());


The big downside for something like this is speed, using XML-RPC adds considerable overhead. In my simple benchmarks for strlen('teststring'), natively it takes about 0.0000730 seconds, using PHP over XML-RPC it takes about 0.1438220 seconds. I’m sure that this could be made to vary by either using different XML-RPC libraries or moving to something completely different like SOAP. I doubt that either of those options would increase the speed much. The speed hit is simply too much to take, but this was all just an experiment anyway.

It might be interesting to see what it would take to create structures that would support database operations. This would increase the complexity of the server but would add some useful possibilities. If you didn’t like the Java approach to your database you could use PHP’s database functions over XML-RPC. The speed hit would still be the killer hold up making this realistic though.

PHP's in_array And Loose Typing Grief

I’ve got a small function that builds an HTML select form given a name, array of options, an array of selected options and some javascript options. While looping over the array of options I have a very simple call to PHP’s in_array function to determine if the option currently being looked at is also one of the selected options. The code looks something like this:


           if(in_array($option, $selected_options)) {
               $out .= " selected";
           }

This worked most of the time, but started given me problems under certain conditions. This is where having loose typing in PHP can be a bit of a down side. So my first thought was to simply add true as the third argument to in_array which forces a type comparison as well as a value comparison. This fixed the problem for my previous error conditions, but broke the conditions where it was working before. I thought about adding another argument to my simple function as a hint to know if it should use strict type checking or not. That approach didn’t feel right, I wanted something that would just work. So I threw out the code above completely and replaced it with the following:


            foreach($selected_options as $selected) {
                if("{$option}" == "{$selected}") {
                    $out .= " selected";
                }
            }

In concept this does the same thing as in_array, but the comparison is always done in string context. So far this works under all of the conditions that I’m using it. One of the problem combinations is when 0 (zero) is a valid option. Another problem is when an option could be a number or a letter. With in_array I had to decide which way I wanted to it break, with a simple foreach using string context, both conditions work.

I doubt that this will be the best solution in call cases, but it seems to be working in mine where zero values and type issues were causing me grief. I’ve only been testing this for about 15 minutes, so we’ll see if this stands the test of time as it gets more use.